Module HLA_Arena
HLA-Arena: A Customizable environment for the structural modeling and analysis of peptide-HLA complexes for cancer immunotherapy
Installation
HLA-Arena is made available through Docker Hub.
Installation instructions are also provided in our github page.
Video Tutorials
Modeller license key
Several of HLA-Arena workflows rely on Modeller to perform the homology modeling of a given HLA receptor. This modeling task is integrated into a specific HLA-Arena function (more details below). However, using Modeller requires you to register and obtain your own license key, if you do not already have one. First, follow instructions on the Modeller registration page.
Once you have the key, you can permanently update the HLA-Arena container with your key. For that, you should execute the commands below, replacing MODELLER_KEY
with the correct key.
docker run -it kavrakilab/hla-arena
sed -i "s/XXXX/MODELLER_KEY/g" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py
exit
docker commit $(docker ps -a | sed -n 2p | awk '{ print $1 }') kavrakilab/hla-arena
docker container rm $(docker ps -a | sed -n 2p | awk '{ print $1 }')
Note that this modification is permanent, in the sense that will not be lost when you close the container. However, it will be required again when you update the container (e.g., docker pull kavrakilab/hla-arena).
There is also the option of adding the key temporarily, when running a specific workflow. For that, just add the content below as one of the first cells to be executed in the workflow. Remember to replace MODELLER_KEY
with the correct key.
from subprocess import call
call(["sed -i "s/XXXX/" + MODELLER_KEY + "/g" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py"], shell=True)
Using Jupyter Notebook
Each file with a '.ipynb' extension in this folder is a Jupyter notebook allowing you to run one of HLA-Arena workflows. Note that Jupyter Notebook is already installed in this docker image of HLA-Arena. If you are new to Jupyter Notebook, you can check this tutorial on how to interact with its interface. Numerous other resources are available online.
Available workflows
0_minimal_example.ipynb
This is the most basic workflow, which will walk you through the steps of HLA modeling, peptide docking and pHLA visualization. You can run all the cells at once, to sequentially execute Modeller, APE-Gen, and NGL Viewer. You can also edit cells to change, for example, the HLA allele name or the peptide sequence.
1_geometry_prediction.ipynb
This is a short version of the first workflow, which allows you to carry out geometry prediction with DINC. By default, this workflow performs the self-docking of the pHLA complex with PDB code 1E27. You can edit the cell containing this PDB code, to try and reproduce the crystal structure of another complex. You can also edit the cells containing commands, to choose other types of input (e.g., you can use your own PDB files, rather than using a PDB code) or switch between a self-docking and a cross-docking experiment. Note that lines starting with character '#' are not executed; simply remove this character to make them executable by Jupyter Notebook.
1b_geometry_prediction_w-rescoring.ipynb
This is a longer version of the first workflow. It additionally performs the rescoring of all output conformations using different scoring functions. In this workflow, input processing for rescoring is handled by MGLTools, through DINC. This process includes steps to check the protonation state of both the ligand and receptor.
1c_geometry_prediction_paper-demo.ipynb
This is a demo version of the first workflow. It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the cross-docking of the pHLA complex with PDB code 2GTW. Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization. You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.
2_binding_prediction.ipynb
This is the regular version of our second workflow, which allows you to perform binding prediction using ensembles of bound conformations generated by APE-Gen. As input, you can provide an HLA allele, a list of peptide sequences, and corresponding binding affinities. This workflow includes steps for a fast rescoring of the results (which is done by Smina) and correlation with experimental data (when available).
2b_binding_prediction_w-pdbs.ipynb
This is an alternative version of the second workflow. In addition to the regular input (i.e., HLA allele, list of peptide sequences and binding affinities), this version of the workflow also takes as input the PDB codes of pHLA complexes you want to reproduce (when available). Therefore, this version offers the possibility to calculate RMSDs between modeled complexes and their corresponding crystal structures.
2c_binding_prediction_paper-demo.ipynb
This is a demo version of the second workflow. It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the prediction of the binding to HLA-A*02:01 for a small dataset of peptides. Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization. You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.
3_virtual_screening.ipynb
This is the regular version of our third workflow, which allows you to perform a virtual screening of potential HLA binders using structures generated by APE-Gen. As input, you can provide a list of peptide sequences and HLA allele names.
3b_virtual_screening_paper-demo.ipynb
This is a demo version of the third workflow. It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the structure-based virtual screening of a large set of potential HLA-binding peptides. Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization. You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.
Expand source code
#
# Here we wrap up the various docker calls in functions for each stage
# in the arena workflow
#
# python 3
#
# Last update: July 27 2020
"""
HLA-Arena: A Customizable environment for the structural modeling and analysis of peptide-HLA complexes for cancer immunotherapy
## Installation
HLA-Arena is made available through [Docker Hub](https://hub.docker.com/r/kavrakilab/hla-arena).
Installation instructions are also provided in our [github page](https://github.com/KavrakiLab/hla-arena).
## Video Tutorials
* [HLA-Arena video tutorial part 1 (background)](https://youtu.be/gIFHmejEulo)
* [HLA-Arena video tutorial part 2 (demo)](https://youtu.be/fPhnmYez4QA)
## Modeller license key
Several of HLA-Arena workflows rely on Modeller to perform the homology modeling of a given HLA receptor. This modeling task is integrated into a specific HLA-Arena function (more details below). However, using Modeller requires you to register and obtain your own license key, if you do not already have one. First, follow instructions on the [Modeller registration page](https://salilab.org/modeller/registration.html).
Once you have the key, you can permanently update the HLA-Arena container with your key. For that, you should execute the commands below, replacing `MODELLER_KEY` with the correct key.
docker run -it kavrakilab/hla-arena
sed -i "s/XXXX/MODELLER_KEY/g" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py
exit
docker commit $(docker ps -a | sed -n 2p | awk '{ print $1 }') kavrakilab/hla-arena
docker container rm $(docker ps -a | sed -n 2p | awk '{ print $1 }')
Note that this modification is permanent, in the sense that will not be lost when you close the container. However, it will be required again when you update the container (e.g., docker pull kavrakilab/hla-arena).
There is also the option of adding the key temporarily, when running a specific workflow. For that, just add the content below as one of the first cells to be executed in the workflow. Remember to replace `MODELLER_KEY` with the correct key.
from subprocess import call
call(["sed -i \"s/XXXX/" + MODELLER_KEY + "/g\" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py"], shell=True)
## Using Jupyter Notebook
Each file with a '.ipynb' extension in this folder is a Jupyter notebook allowing you to run one of HLA-Arena workflows. Note that Jupyter Notebook is already installed in this docker image of HLA-Arena. If you are new to Jupyter Notebook, you can check this [tutorial on how to interact with its interface](https://www.youtube.com/watch?v=HW29067qVWk&feature=youtu.be&t=274). Numerous other resources are available online.
## Available workflows
**0_minimal_example.ipynb**
This is the most basic workflow, which will walk you through the steps of HLA modeling, peptide docking and pHLA visualization. You can run all the cells at once, to sequentially execute Modeller, APE-Gen, and NGL Viewer. You can also edit cells to change, for example, the HLA allele name or the peptide sequence.
**1_geometry_prediction.ipynb**
This is a short version of the first workflow, which allows you to carry out geometry prediction with DINC. By default, this workflow performs the self-docking of the pHLA complex with PDB code 1E27. You can edit the cell containing this PDB code, to try and reproduce the crystal structure of another complex. You can also edit the cells containing commands, to choose other types of input (e.g., you can use your own PDB files, rather than using a PDB code) or switch between a self-docking and a cross-docking experiment. Note that lines starting with character '#' are not executed; simply remove this character to make them executable by Jupyter Notebook.
**1b_geometry_prediction_w-rescoring.ipynb**
This is a longer version of the first workflow. It additionally performs the rescoring of all output conformations using different scoring functions. In this workflow, input processing for rescoring is handled by MGLTools, through DINC. This process includes steps to check the protonation state of both the ligand and receptor.
**1c_geometry_prediction_paper-demo.ipynb**
This is a demo version of the first workflow.
It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the cross-docking of the pHLA complex with PDB code 2GTW.
Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization.
You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.
**2_binding_prediction.ipynb**
This is the regular version of our second workflow, which allows you to perform binding prediction using ensembles of bound conformations generated by APE-Gen.
As input, you can provide an HLA allele, a list of peptide sequences, and corresponding binding affinities.
This workflow includes steps for a fast rescoring of the results (which is done by Smina) and correlation with experimental data (when available).
**2b_binding_prediction_w-pdbs.ipynb**
This is an alternative version of the second workflow.
In addition to the regular input (i.e., HLA allele, list of peptide sequences and binding affinities), this version of the workflow also takes as input the PDB codes of pHLA complexes you want to reproduce (when available).
Therefore, this version offers the possibility to calculate RMSDs between modeled complexes and their corresponding crystal structures.
**2c_binding_prediction_paper-demo.ipynb**
This is a demo version of the second workflow.
It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the prediction of the binding to HLA-A*02:01 for a small dataset of peptides.
Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization.
You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.
**3_virtual_screening.ipynb**
This is the regular version of our third workflow, which allows you to perform a virtual screening of potential HLA binders using structures generated by APE-Gen.
As input, you can provide a list of peptide sequences and HLA allele names.
**3b_virtual_screening_paper-demo.ipynb**
This is a demo version of the third workflow.
It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the structure-based virtual screening of a large set of potential HLA-binding peptides.
Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization.
You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.
"""
# import all necessary libraries outside the functions
from subprocess import call, check_output
from sys import stdout
from os import remove, path, listdir
from shutil import copyfile
import fileinput
from shlex import quote
import subprocess
import sys
sys.path.insert(0, "/APE-Gen")
import APE_Gen
import get_pMHC_pdb
#import model_receptor
import os
import copy
#apegen_env = os.environ
#dinc_env = copy.deepcopy(apegen_env)
dinc_env = os.environ.copy()
dinc_env["PATH"] = "/dinc/Utilities24:" + dinc_env["PATH"]
dinc_env["PATH"] = "/dinc/bin:" + dinc_env["PATH"]
dinc_env["BABEL_LIBDIR"] = "/dinc/lib/openbabel/2.3.2"
dinc_env["BABEL_DATADIR"] = "/dinc/share/openbabel/2.3.2"
import numpy as np
import nglview
import glob
import mdtraj as md
from matplotlib.colors import to_hex
import matplotlib as mpl
import pandas as pd
import seaborn as sns
from scipy import stats, spatial
import matplotlib.pyplot as plt
import pickle
#def run_command(command):
# process = subprocess.Popen(command.split(), env=dinc_env, stdout=subprocess.PIPE)
# while True:
# output = process.stdout.readline()
# if output == '' and process.poll() is not None:
# break
# if output:
# print(output.strip())
# rc = process.poll()
# return rc
##############
#
# STEP 1: Input Processing
#
def model_hla(allele_name, num_models=2):
"""
Run MODELLER to create an HLA structure
Runs MODELLER to create an HLA structure of the input allele. The sequence of the alpha chain of the allele is downloaded.
Then a template PDB of a different HLA structure from the same supertype is downloaded.
Parameters:
allele_name (str or (str,str)): Either allele name with format like "HLA-A*24:02" or a pair of strings, where the first is the location
of the fasta file with the alpha chain sequence and the second is the template pdb file (ex. allele_name=(alpha.fasta, template.pdb))
num_models (int): Number of models to generate with MODELLER
Returns:
str: Name of the HLA structure in PDB format
"""
import model_receptor
if allele_name[:4] != "HLA-": # assume giving fasta and pdb template
alpha_chain_seq = allele_name[0]
pdb_template = allele_name[1]
alpha_chain_seq_reformat = alpha_chain_seq
pdb_template_reformat = pdb_template
if alpha_chain_seq[-6:] != ".fasta": # trying to use allele name to pull sequence
if "*" not in alpha_chain_seq: alpha_chain_seq_reformat = alpha_chain_seq_reformat[:5] + "*" + alpha_chain_seq_reformat[5:]
if ":" not in alpha_chain_seq: allele_name_reformatted = alpha_chain_seq_reformat[:-2] + ":" + alpha_chain_seq_reformat[-2:]
if pdb_template[-4:] != ".pdb": # trying to use allele name to imply supertype
if "*" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:5] + "*" + pdb_template_reformat[5:]
if ":" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:-2] + ":" + pdb_template_reformat[-2:]
model_receptor.main([alpha_chain_seq_reformat, pdb_template_reformat, "-n", str(num_models)])
return "best_model.pdb"
allele_name_reformatted = allele_name
if "*" not in allele_name: allele_name_reformatted = allele_name_reformatted[:5] + "*" + allele_name_reformatted[5:]
if ":" not in allele_name: allele_name_reformatted = allele_name_reformatted[:-2] + ":" + allele_name_reformatted[-2:]
hla_allele_name = allele_name_reformatted.replace("*", "")
hla_allele_name = hla_allele_name.replace(":", "")
hla_allele = hla_allele_name + ".pdb"
if os.path.exists(hla_allele):
print("Already found " + hla_allele)
return hla_allele
model_receptor.main([allele_name_reformatted, allele_name_reformatted, "-n", str(num_models)])
call(["mv best_model.pdb " + hla_allele], shell=True)
return hla_allele
def download_pHLA_pdb(pdbid):
"""
Download PDB of a pHLA
Downloads PDB of a pHLA where the chains are relabeled such that the alpha chain is labeled "A",
the beta-2 microglobulin chain is labeled "B",
and the peptide is labeled "C"
Parameters:
pdbid (str): 4-letter PDB code
Returns:
str: Name of the pHLA structure in PDB format
"""
if not os.path.exists(pdbid + ".pdb"): get_pMHC_pdb.main([pdbid])
else: print("Found " + pdbid)
return pdbid + ".pdb"
## this function doesn't work yet
#def split_pdb(filename, prefix, local_folder):
# # split a pdb file of a ligand-receptor complex into two files
# ligand = open(path.join(local_folder, prefix+"_ligand.pdb"), "w+")
# receptor = open(path.join(local_folder, prefix+"_receptor.pdb"), "w+")
#
# curr_file = receptor
# for line in fileinput.input(path.join(local_folder, filename)):
# if line.startswith("TER"):
# curr_file.write(line,)
# curr_file = ligand
# else:
# curr_file.write(line,)
# fileinput.close()
###############
#
# STEP 2: Peptide docking
#
def dock(peptide, hla, useOpenMM=False, useGPU=False):
"""
Run APE-Gen
Runs APE-Gen for a given peptide sequence and HLA structure.
Parameters:
peptide (str): Sequence of peptide to dock
hla (str): Location of HLA structure to dock peptide
useOpenMM (bool): Whether to use do an extra minimization step using OpenMM on the whole ensemble
useGPU (bool): Whether to use the GPU when performing extra minimization step. Currently not working within Docker image.
Returns:
str: Location of best scoring structure that has been minimized with OpenMM
"""
best_scoring_conf = os.getcwd() + "/0/openmm_min_energy_system.pdb"
if os.path.exists(best_scoring_conf): return best_scoring_conf
args = [peptide, hla]
if useOpenMM: args.append("-o")
if useGPU: args.append("--use_gpu")
APE_Gen.main(args)
if not useOpenMM:
print("Minimizing best scoring conf")
call(["pdbfixer 0/min_energy_system.pdb --output=0/min_energy_system.pdb"], shell=True)
APE_Gen.minimizeConf("0/min_energy_system.pdb", "0/openmm_min_energy_system.pdb")
return best_scoring_conf
def set_dinc_params(params):
"""
params: dict where keys are params and values are the new values to set the param to
only contains params that should be changed
"""
# copy params.yml file
call(["cp /dinc/example/defaults.yml ./params.yml"], shell=True)
call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason
# change params
for key in params.keys():
command = "sed -i '/"+key+":/ c\\"+key+": "+params[key]+"' params.yml"
call([command], shell=True)
return
# THIS PRODUCES AN ERROR FOR SOME REASON
replaced = False
for line in fileinput.input("params.yml", inplace=True):
replaced = False
for key in params.keys():
if line.startswith(key):
print(key+": "+params[key],)
replaced = True
break
if not replaced:
print(line,)
fileinput.close()
def dock_with_DINC(ligand, receptor, output_file, params=False):
"""
Run DINC
Runs DINC for a given ligand and receptor
Parameters:
ligand (str): ligand to dock
receptor (str): Location of HLA structure to dock ligand
output_file (bool): Name of output file
params (bool): Set to true if using own params.yml file
Returns:
str: Location of best scoring structure that has been minimized with OpenMM
"""
#call(["cp /dinc/example/defaults.yml params.yml"], shell=True)
call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason
#call(["sed -i \"s/job_type: crossdocking/job_type: redocking/g\" params.yml"], shell=True)
# dock peptide using DINC
if params:
command = "/dinc/dinc -l " + ligand + " -r " + receptor + " -p params.yml"
else:
command = "/dinc/dinc -l " + ligand + " -r " + receptor
p = subprocess.Popen(command.split(), env=dinc_env)
p.wait()
call(["cp dincresults.txt " + output_file], shell=True)
###############
#
# STEP 3: Data analysis
#
def rescore_complex(path_to_complex, func):
"""
Rescore using DINC wrapper
Rescore using DINC wrapper. Calls MGLTools to prepare ligand and receptor
Parameters:
path_to_comlex (str): path to the pdb file containing complex
func (str): smina function flag, can take values: vina, AD4, vinardo
Returns:
float: Scoring according to "func"
"""
local_folder = "temp"
call(["mkdir -p " + local_folder], shell=True)
os.chdir(local_folder)
call(["cp /dinc/example/defaults.yml params.yml"], shell=True)
call(["sed -i \"s/rescoring: Vina/rescoring: " + func + "/g\" params.yml"], shell=True)
call(["sed -i \"s/job_type: crossdocking/job_type: scoring/g\" params.yml"], shell=True)
call(["cp " + path_to_complex + " complex.pdb"], shell=True)
call(["grep \"[A-Z] C \" complex.pdb > ligand.pdb"], shell=True)
call(["cp complex.pdb receptor.pdb"], shell=True)
call(["sed -i \"/[A-Z] C/d\" receptor.pdb"], shell=True)
ligand = "ligand.pdb"
receptor = "receptor.pdb"
command = "/dinc/dinc -l ligand.pdb -r receptor.pdb -p params.yml"
# dinc needs this file in the directory for some reason
call(["cp /dinc/example/defaults.yml ."], shell=True)
p = subprocess.Popen(command.split(), env=dinc_env)
p.wait()
for line in open("dincresults.txt").readlines():
if line.startswith("Binding Affinity:"):
score = float(line.split()[2])
break
os.chdir("..")
call(["rm -r " + local_folder], shell=True)
return score
def rescore_complex_simple_smina(path_to_complex, func):
"""
Rescore using SMINA
Rescore using SMINA. This version is faster than arena.rescore_complex since SMINA handles I/O more efficiently.
Note: pdb must have A,B corresponding to alpha and beta chain of receptor and C to ligand
Parameters:
path_to_comlex (str): path to the pdb file containing complex
func (str): smina function flag, can take values: vina, vinardo, ad4_scoring
Returns:
float: Scoring according to "func"
"""
#check if extra space is needed - APE-Gen files, not OPENMM minimized
with open(path_to_complex) as f:
if 'OPENMM' in f.read():
fixAPEG = False
else:
fixAPEG = True
call (["awk \'{if ($5==\"A\") print $0}\' "+path_to_complex+" > tmp_receptor1.pdb"], shell=True)
call (["awk \'{if ($5==\"B\") print $0}\' "+path_to_complex+" >> tmp_receptor1.pdb"], shell=True)
call (["awk \'{if ($5==\"C\") print $0}\' "+path_to_complex+" > tmp_ligand1.pdb"], shell=True)
#fix APEGen file by adding extra space
if fixAPEG:
call (["sed -i \"s/ / /g\" tmp_ligand1.pdb"], shell=True)
call (["sed -i \"s/ / /g\" tmp_receptor1.pdb"], shell=True)
call (["smina --receptor tmp_receptor1.pdb --ligand tmp_ligand1.pdb --scoring "+func+" --score_only > tmp_out1.pdb"], shell=True)
val = 0
with open("tmp_out1.pdb") as f1:
for line in f1:
if "Affinity:" in line:
val = line.split()[1]
break
call (["rm tmp_receptor1.pdb"], shell=True)
call (["rm tmp_ligand1.pdb"], shell=True)
call (["rm tmp_out1.pdb"], shell=True)
return float(val)
def rescore(ligands, receptor, func):
"""
ligands: list of all files with ligand conformations to rescore
receptor: single file with receptor for rescoring
func: rescoring function
"""
set_dinc_params({"rescoring":func, "job_type":"scoring"})
for l in ligands:
dock_with_DINC(l, receptor, l+"results", params=True)
call(["cat %sresults >> rescoreresults.txt" % l], shell=True)
def filter_by_terminal_positions(filenames, receptor, cutoff_N=12, cutoff_C=12):
"""
Filter the conformations by the distance between the N terminus/C terminus
and specific residues of the alpha helices
This filters out conformations that are flipped in the binding site,
filtering out conformations that are above either cutoff
filenames: (.pdb) of ligand conformations
receptor: (.pdb) file of receptor conformation
cutoff_N: distance cutoff between CA of res 63 of the receptor to
the CA of N-terminus of the ligand
cutoff_C: distance cutoff between res 146 and C-term
return list of filenames that fit the criteria
"""
final_filenames = []
# load receptor atoms, residues are numbered starting at 0
N_ref_atom = md.load(receptor).top.select("resid 62 and name CA")
N_ref = md.load(receptor, atom_indices=N_ref_atom)
C_ref_atom = md.load(receptor).top.select("resid 145 and name CA")
C_ref = md.load(receptor, atom_indices=C_ref_atom)
# get length of peptide (load without H atoms)
peptide_length = md.load(filenames[0], atom_indices = md.load(filenames[0]).top.select("not name H")).top.n_residues
select_str = "resid {} and name CA".format(peptide_length - 1)
for fname in filenames:
N_term = md.load(fname, atom_indices = md.load(fname).top.select("resid 0 and name CA"))
C_term = md.load(fname, atom_indices = md.load(fname).top.select(select_str))
# distance calculations
delta_N = N_ref.xyz - N_term.xyz
delta_C = C_ref.xyz - C_term.xyz
rmsd_N = np.sqrt(sum(sum((delta_N * delta_N).T)))
rmsd_C = np.sqrt(sum(sum((delta_C * delta_C).T)))
if rmsd_N*10 <= cutoff_N and rmsd_C*10 <= cutoff_C:
final_filenames.append(fname)
return final_filenames
def filter_by_rmsd(filenames, reference, cutoff=4):
"""
Filter the conformations by backbone rmsd
given filenames (.pdb) of ligand conformations and filename (.pdb) of reference ligand
and rmsd cutoff (default 4 angstrom)
return list of filenames with rmsd <= cutoff
"""
final_filenames = []
# load backbone only of reference ligand
ref_pep_atoms = md.load(reference).top.select("backbone")
ref_pdb = md.load(reference, atom_indices = ref_pep_atoms)
# filter conformations
for fname in filenames:
conf = md.load(fname, atom_indices = md.load(fname).top.select("backbone"))
# rmsd calculation
delta = conf.xyz - ref_pdb.xyz
rmsd = np.sqrt(sum(sum((delta * delta).T)) / len(delta))
if rmsd <= cutoff:
final_filenames.append(fname)
return final_filenames
def combine_flex_and_rigid_receptor(flex_filename, receptor_filename, output):
"""
Combine the flexible residues with the rest of the receptor in one file
given filenames for the entire receptor (.pdb or .pdbqt) and just the flexible residues (.pdb),
and the name of the combined file (.pdb)
"""
call(["cp "+receptor_filename+" "+output], shell=True)
with open(flex_filename) as flex:
for flex_line in flex.readlines():
for line in fileinput.input(output, inplace=True):
if flex_line[12:27] == line[12:27]:
print(flex_line,)
else:
print(line,)
fileinput.close()
def visualize(structures):
"""
Visualize set of structures
Runs NGLView to visualize structures
Parameters:
structures (str or list of str): Location of PDB files to visualize
Returns:
NGLView Obj: Widget for Jupyter notebook visualization
"""
widget = nglview.NGLWidget()
widget.clear_representations()
if isinstance(structures, str):
s = md.load(structures)
widget.add_trajectory(s)
else:
for s in structures:
widget.add_trajectory(md.load(s))
return widget
def visualize_ensemble(path_to_ensemble="./"):
"""
Visualize ensemble produced by APE-Gen
Runs NGLView to visualize the ensemble of conformations outputed by APE-Gen
Parameters:
path_to_ensemble (str): Location of PDB files of the ensemble
Returns:
NGLView Obj: Widget for Jupyter notebook visualization
"""
widget = nglview.NGLWidget()
widget.clear_representations()
ensemble = glob.glob(path_to_ensemble + "0/full_system_confs/*.pdb")
i = 0
for conf_str in ensemble:
struct = md.load(conf_str, atom_indices=md.load(conf_str).top.select("chainid == 2"))
widget.add_trajectory(struct)
widget.add_licorice(component=i)
widget.remove_cartoon(component=i)
i = i + 1
return widget
def visualize_separate(peptides, receptor = None, colors = None):
"""
Visualize ligand and receptor separately
Runs NGLView to visualize peptide conformations as well as the receptor (optional)
Parameters:
peptides (list of str): Location of PDB files of the peptides to visualize
receptor (str): Location of the receptor to plot using surface mode
colors (list of str): list of colors to use for the peptide conformations
Returns:
NGLView Obj: Widget for Jupyter notebook visualization
"""
call(["jupyter-nbextension enable --py --sys-prefix widgetsnbextension"], shell=True)
call(["jupyter-nbextension enable nglview --py --sys-prefix"], shell=True)
import nglview
import glob
import mdtraj as md
from matplotlib.colors import to_hex
import matplotlib as mpl
from matplotlib.colors import to_hex
widget = nglview.NGLWidget()
widget.clear_representations()
i = 0
for p in peptides:
struct = md.load(quote(p))
widget.add_trajectory(struct)
widget.add_licorice(component=i)
widget.remove_cartoon(component=i)
if colors:
widget.update_licorice(color=colors[i], component = i)
i = i + 1
if receptor:
struct = md.load(receptor)
widget.add_trajectory(struct)
widget.add_surface(component=i)
widget.remove_cartoon(component=i)
return widget
Functions
def combine_flex_and_rigid_receptor(flex_filename, receptor_filename, output)
-
Combine the flexible residues with the rest of the receptor in one file given filenames for the entire receptor (.pdb or .pdbqt) and just the flexible residues (.pdb), and the name of the combined file (.pdb)
Expand source code
def combine_flex_and_rigid_receptor(flex_filename, receptor_filename, output): """ Combine the flexible residues with the rest of the receptor in one file given filenames for the entire receptor (.pdb or .pdbqt) and just the flexible residues (.pdb), and the name of the combined file (.pdb) """ call(["cp "+receptor_filename+" "+output], shell=True) with open(flex_filename) as flex: for flex_line in flex.readlines(): for line in fileinput.input(output, inplace=True): if flex_line[12:27] == line[12:27]: print(flex_line,) else: print(line,) fileinput.close()
def dock(peptide, hla, useOpenMM=False, useGPU=False)
-
Run APE-Gen
Runs APE-Gen for a given peptide sequence and HLA structure.
Parameters: peptide (str): Sequence of peptide to dock hla (str): Location of HLA structure to dock peptide useOpenMM (bool): Whether to use do an extra minimization step using OpenMM on the whole ensemble useGPU (bool): Whether to use the GPU when performing extra minimization step. Currently not working within Docker image.
Returns: str: Location of best scoring structure that has been minimized with OpenMM
Expand source code
def dock(peptide, hla, useOpenMM=False, useGPU=False): """ Run APE-Gen Runs APE-Gen for a given peptide sequence and HLA structure. Parameters: peptide (str): Sequence of peptide to dock hla (str): Location of HLA structure to dock peptide useOpenMM (bool): Whether to use do an extra minimization step using OpenMM on the whole ensemble useGPU (bool): Whether to use the GPU when performing extra minimization step. Currently not working within Docker image. Returns: str: Location of best scoring structure that has been minimized with OpenMM """ best_scoring_conf = os.getcwd() + "/0/openmm_min_energy_system.pdb" if os.path.exists(best_scoring_conf): return best_scoring_conf args = [peptide, hla] if useOpenMM: args.append("-o") if useGPU: args.append("--use_gpu") APE_Gen.main(args) if not useOpenMM: print("Minimizing best scoring conf") call(["pdbfixer 0/min_energy_system.pdb --output=0/min_energy_system.pdb"], shell=True) APE_Gen.minimizeConf("0/min_energy_system.pdb", "0/openmm_min_energy_system.pdb") return best_scoring_conf
def dock_with_DINC(ligand, receptor, output_file, params=False)
-
Run DINC
Runs DINC for a given ligand and receptor
Parameters: ligand (str): ligand to dock receptor (str): Location of HLA structure to dock ligand output_file (bool): Name of output file params (bool): Set to true if using own params.yml file
Returns: str: Location of best scoring structure that has been minimized with OpenMM
Expand source code
def dock_with_DINC(ligand, receptor, output_file, params=False): """ Run DINC Runs DINC for a given ligand and receptor Parameters: ligand (str): ligand to dock receptor (str): Location of HLA structure to dock ligand output_file (bool): Name of output file params (bool): Set to true if using own params.yml file Returns: str: Location of best scoring structure that has been minimized with OpenMM """ #call(["cp /dinc/example/defaults.yml params.yml"], shell=True) call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason #call(["sed -i \"s/job_type: crossdocking/job_type: redocking/g\" params.yml"], shell=True) # dock peptide using DINC if params: command = "/dinc/dinc -l " + ligand + " -r " + receptor + " -p params.yml" else: command = "/dinc/dinc -l " + ligand + " -r " + receptor p = subprocess.Popen(command.split(), env=dinc_env) p.wait() call(["cp dincresults.txt " + output_file], shell=True)
def download_pHLA_pdb(pdbid)
-
Download PDB of a pHLA
Downloads PDB of a pHLA where the chains are relabeled such that the alpha chain is labeled "A", the beta-2 microglobulin chain is labeled "B", and the peptide is labeled "C"
Parameters: pdbid (str): 4-letter PDB code
Returns: str: Name of the pHLA structure in PDB format
Expand source code
def download_pHLA_pdb(pdbid): """ Download PDB of a pHLA Downloads PDB of a pHLA where the chains are relabeled such that the alpha chain is labeled "A", the beta-2 microglobulin chain is labeled "B", and the peptide is labeled "C" Parameters: pdbid (str): 4-letter PDB code Returns: str: Name of the pHLA structure in PDB format """ if not os.path.exists(pdbid + ".pdb"): get_pMHC_pdb.main([pdbid]) else: print("Found " + pdbid) return pdbid + ".pdb"
def filter_by_rmsd(filenames, reference, cutoff=4)
-
Filter the conformations by backbone rmsd given filenames (.pdb) of ligand conformations and filename (.pdb) of reference ligand and rmsd cutoff (default 4 angstrom) return list of filenames with rmsd <= cutoff
Expand source code
def filter_by_rmsd(filenames, reference, cutoff=4): """ Filter the conformations by backbone rmsd given filenames (.pdb) of ligand conformations and filename (.pdb) of reference ligand and rmsd cutoff (default 4 angstrom) return list of filenames with rmsd <= cutoff """ final_filenames = [] # load backbone only of reference ligand ref_pep_atoms = md.load(reference).top.select("backbone") ref_pdb = md.load(reference, atom_indices = ref_pep_atoms) # filter conformations for fname in filenames: conf = md.load(fname, atom_indices = md.load(fname).top.select("backbone")) # rmsd calculation delta = conf.xyz - ref_pdb.xyz rmsd = np.sqrt(sum(sum((delta * delta).T)) / len(delta)) if rmsd <= cutoff: final_filenames.append(fname) return final_filenames
def filter_by_terminal_positions(filenames, receptor, cutoff_N=12, cutoff_C=12)
-
Filter the conformations by the distance between the N terminus/C terminus and specific residues of the alpha helices This filters out conformations that are flipped in the binding site, filtering out conformations that are above either cutoff
filenames: (.pdb) of ligand conformations receptor: (.pdb) file of receptor conformation cutoff_N: distance cutoff between CA of res 63 of the receptor to the CA of N-terminus of the ligand cutoff_C: distance cutoff between res 146 and C-term
return list of filenames that fit the criteria
Expand source code
def filter_by_terminal_positions(filenames, receptor, cutoff_N=12, cutoff_C=12): """ Filter the conformations by the distance between the N terminus/C terminus and specific residues of the alpha helices This filters out conformations that are flipped in the binding site, filtering out conformations that are above either cutoff filenames: (.pdb) of ligand conformations receptor: (.pdb) file of receptor conformation cutoff_N: distance cutoff between CA of res 63 of the receptor to the CA of N-terminus of the ligand cutoff_C: distance cutoff between res 146 and C-term return list of filenames that fit the criteria """ final_filenames = [] # load receptor atoms, residues are numbered starting at 0 N_ref_atom = md.load(receptor).top.select("resid 62 and name CA") N_ref = md.load(receptor, atom_indices=N_ref_atom) C_ref_atom = md.load(receptor).top.select("resid 145 and name CA") C_ref = md.load(receptor, atom_indices=C_ref_atom) # get length of peptide (load without H atoms) peptide_length = md.load(filenames[0], atom_indices = md.load(filenames[0]).top.select("not name H")).top.n_residues select_str = "resid {} and name CA".format(peptide_length - 1) for fname in filenames: N_term = md.load(fname, atom_indices = md.load(fname).top.select("resid 0 and name CA")) C_term = md.load(fname, atom_indices = md.load(fname).top.select(select_str)) # distance calculations delta_N = N_ref.xyz - N_term.xyz delta_C = C_ref.xyz - C_term.xyz rmsd_N = np.sqrt(sum(sum((delta_N * delta_N).T))) rmsd_C = np.sqrt(sum(sum((delta_C * delta_C).T))) if rmsd_N*10 <= cutoff_N and rmsd_C*10 <= cutoff_C: final_filenames.append(fname) return final_filenames
def model_hla(allele_name, num_models=2)
-
Run MODELLER to create an HLA structure
Runs MODELLER to create an HLA structure of the input allele. The sequence of the alpha chain of the allele is downloaded. Then a template PDB of a different HLA structure from the same supertype is downloaded.
Parameters: allele_name (str or (str,str)): Either allele name with format like "HLA-A*24:02" or a pair of strings, where the first is the location of the fasta file with the alpha chain sequence and the second is the template pdb file (ex. allele_name=(alpha.fasta, template.pdb)) num_models (int): Number of models to generate with MODELLER
Returns: str: Name of the HLA structure in PDB format
Expand source code
def model_hla(allele_name, num_models=2): """ Run MODELLER to create an HLA structure Runs MODELLER to create an HLA structure of the input allele. The sequence of the alpha chain of the allele is downloaded. Then a template PDB of a different HLA structure from the same supertype is downloaded. Parameters: allele_name (str or (str,str)): Either allele name with format like "HLA-A*24:02" or a pair of strings, where the first is the location of the fasta file with the alpha chain sequence and the second is the template pdb file (ex. allele_name=(alpha.fasta, template.pdb)) num_models (int): Number of models to generate with MODELLER Returns: str: Name of the HLA structure in PDB format """ import model_receptor if allele_name[:4] != "HLA-": # assume giving fasta and pdb template alpha_chain_seq = allele_name[0] pdb_template = allele_name[1] alpha_chain_seq_reformat = alpha_chain_seq pdb_template_reformat = pdb_template if alpha_chain_seq[-6:] != ".fasta": # trying to use allele name to pull sequence if "*" not in alpha_chain_seq: alpha_chain_seq_reformat = alpha_chain_seq_reformat[:5] + "*" + alpha_chain_seq_reformat[5:] if ":" not in alpha_chain_seq: allele_name_reformatted = alpha_chain_seq_reformat[:-2] + ":" + alpha_chain_seq_reformat[-2:] if pdb_template[-4:] != ".pdb": # trying to use allele name to imply supertype if "*" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:5] + "*" + pdb_template_reformat[5:] if ":" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:-2] + ":" + pdb_template_reformat[-2:] model_receptor.main([alpha_chain_seq_reformat, pdb_template_reformat, "-n", str(num_models)]) return "best_model.pdb" allele_name_reformatted = allele_name if "*" not in allele_name: allele_name_reformatted = allele_name_reformatted[:5] + "*" + allele_name_reformatted[5:] if ":" not in allele_name: allele_name_reformatted = allele_name_reformatted[:-2] + ":" + allele_name_reformatted[-2:] hla_allele_name = allele_name_reformatted.replace("*", "") hla_allele_name = hla_allele_name.replace(":", "") hla_allele = hla_allele_name + ".pdb" if os.path.exists(hla_allele): print("Already found " + hla_allele) return hla_allele model_receptor.main([allele_name_reformatted, allele_name_reformatted, "-n", str(num_models)]) call(["mv best_model.pdb " + hla_allele], shell=True) return hla_allele
def rescore(ligands, receptor, func)
-
ligands: list of all files with ligand conformations to rescore receptor: single file with receptor for rescoring func: rescoring function
Expand source code
def rescore(ligands, receptor, func): """ ligands: list of all files with ligand conformations to rescore receptor: single file with receptor for rescoring func: rescoring function """ set_dinc_params({"rescoring":func, "job_type":"scoring"}) for l in ligands: dock_with_DINC(l, receptor, l+"results", params=True) call(["cat %sresults >> rescoreresults.txt" % l], shell=True)
def rescore_complex(path_to_complex, func)
-
Rescore using DINC wrapper
Rescore using DINC wrapper. Calls MGLTools to prepare ligand and receptor
Parameters: path_to_comlex (str): path to the pdb file containing complex func (str): smina function flag, can take values: vina, AD4, vinardo
Returns: float: Scoring according to "func"
Expand source code
def rescore_complex(path_to_complex, func): """ Rescore using DINC wrapper Rescore using DINC wrapper. Calls MGLTools to prepare ligand and receptor Parameters: path_to_comlex (str): path to the pdb file containing complex func (str): smina function flag, can take values: vina, AD4, vinardo Returns: float: Scoring according to "func" """ local_folder = "temp" call(["mkdir -p " + local_folder], shell=True) os.chdir(local_folder) call(["cp /dinc/example/defaults.yml params.yml"], shell=True) call(["sed -i \"s/rescoring: Vina/rescoring: " + func + "/g\" params.yml"], shell=True) call(["sed -i \"s/job_type: crossdocking/job_type: scoring/g\" params.yml"], shell=True) call(["cp " + path_to_complex + " complex.pdb"], shell=True) call(["grep \"[A-Z] C \" complex.pdb > ligand.pdb"], shell=True) call(["cp complex.pdb receptor.pdb"], shell=True) call(["sed -i \"/[A-Z] C/d\" receptor.pdb"], shell=True) ligand = "ligand.pdb" receptor = "receptor.pdb" command = "/dinc/dinc -l ligand.pdb -r receptor.pdb -p params.yml" # dinc needs this file in the directory for some reason call(["cp /dinc/example/defaults.yml ."], shell=True) p = subprocess.Popen(command.split(), env=dinc_env) p.wait() for line in open("dincresults.txt").readlines(): if line.startswith("Binding Affinity:"): score = float(line.split()[2]) break os.chdir("..") call(["rm -r " + local_folder], shell=True) return score
def rescore_complex_simple_smina(path_to_complex, func)
-
Rescore using SMINA
Rescore using SMINA. This version is faster than arena.rescore_complex since SMINA handles I/O more efficiently. Note: pdb must have A,B corresponding to alpha and beta chain of receptor and C to ligand
Parameters: path_to_comlex (str): path to the pdb file containing complex func (str): smina function flag, can take values: vina, vinardo, ad4_scoring
Returns: float: Scoring according to "func"
Expand source code
def rescore_complex_simple_smina(path_to_complex, func): """ Rescore using SMINA Rescore using SMINA. This version is faster than arena.rescore_complex since SMINA handles I/O more efficiently. Note: pdb must have A,B corresponding to alpha and beta chain of receptor and C to ligand Parameters: path_to_comlex (str): path to the pdb file containing complex func (str): smina function flag, can take values: vina, vinardo, ad4_scoring Returns: float: Scoring according to "func" """ #check if extra space is needed - APE-Gen files, not OPENMM minimized with open(path_to_complex) as f: if 'OPENMM' in f.read(): fixAPEG = False else: fixAPEG = True call (["awk \'{if ($5==\"A\") print $0}\' "+path_to_complex+" > tmp_receptor1.pdb"], shell=True) call (["awk \'{if ($5==\"B\") print $0}\' "+path_to_complex+" >> tmp_receptor1.pdb"], shell=True) call (["awk \'{if ($5==\"C\") print $0}\' "+path_to_complex+" > tmp_ligand1.pdb"], shell=True) #fix APEGen file by adding extra space if fixAPEG: call (["sed -i \"s/ / /g\" tmp_ligand1.pdb"], shell=True) call (["sed -i \"s/ / /g\" tmp_receptor1.pdb"], shell=True) call (["smina --receptor tmp_receptor1.pdb --ligand tmp_ligand1.pdb --scoring "+func+" --score_only > tmp_out1.pdb"], shell=True) val = 0 with open("tmp_out1.pdb") as f1: for line in f1: if "Affinity:" in line: val = line.split()[1] break call (["rm tmp_receptor1.pdb"], shell=True) call (["rm tmp_ligand1.pdb"], shell=True) call (["rm tmp_out1.pdb"], shell=True) return float(val)
def set_dinc_params(params)
-
params: dict where keys are params and values are the new values to set the param to only contains params that should be changed
Expand source code
def set_dinc_params(params): """ params: dict where keys are params and values are the new values to set the param to only contains params that should be changed """ # copy params.yml file call(["cp /dinc/example/defaults.yml ./params.yml"], shell=True) call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason # change params for key in params.keys(): command = "sed -i '/"+key+":/ c\\"+key+": "+params[key]+"' params.yml" call([command], shell=True) return # THIS PRODUCES AN ERROR FOR SOME REASON replaced = False for line in fileinput.input("params.yml", inplace=True): replaced = False for key in params.keys(): if line.startswith(key): print(key+": "+params[key],) replaced = True break if not replaced: print(line,) fileinput.close()
def visualize(structures)
-
Visualize set of structures
Runs NGLView to visualize structures
Parameters: structures (str or list of str): Location of PDB files to visualize
Returns: NGLView Obj: Widget for Jupyter notebook visualization
Expand source code
def visualize(structures): """ Visualize set of structures Runs NGLView to visualize structures Parameters: structures (str or list of str): Location of PDB files to visualize Returns: NGLView Obj: Widget for Jupyter notebook visualization """ widget = nglview.NGLWidget() widget.clear_representations() if isinstance(structures, str): s = md.load(structures) widget.add_trajectory(s) else: for s in structures: widget.add_trajectory(md.load(s)) return widget
def visualize_ensemble(path_to_ensemble='./')
-
Visualize ensemble produced by APE-Gen
Runs NGLView to visualize the ensemble of conformations outputed by APE-Gen
Parameters: path_to_ensemble (str): Location of PDB files of the ensemble
Returns: NGLView Obj: Widget for Jupyter notebook visualization
Expand source code
def visualize_ensemble(path_to_ensemble="./"): """ Visualize ensemble produced by APE-Gen Runs NGLView to visualize the ensemble of conformations outputed by APE-Gen Parameters: path_to_ensemble (str): Location of PDB files of the ensemble Returns: NGLView Obj: Widget for Jupyter notebook visualization """ widget = nglview.NGLWidget() widget.clear_representations() ensemble = glob.glob(path_to_ensemble + "0/full_system_confs/*.pdb") i = 0 for conf_str in ensemble: struct = md.load(conf_str, atom_indices=md.load(conf_str).top.select("chainid == 2")) widget.add_trajectory(struct) widget.add_licorice(component=i) widget.remove_cartoon(component=i) i = i + 1 return widget
def visualize_separate(peptides, receptor=None, colors=None)
-
Visualize ligand and receptor separately
Runs NGLView to visualize peptide conformations as well as the receptor (optional)
Parameters: peptides (list of str): Location of PDB files of the peptides to visualize receptor (str): Location of the receptor to plot using surface mode colors (list of str): list of colors to use for the peptide conformations
Returns: NGLView Obj: Widget for Jupyter notebook visualization
Expand source code
def visualize_separate(peptides, receptor = None, colors = None): """ Visualize ligand and receptor separately Runs NGLView to visualize peptide conformations as well as the receptor (optional) Parameters: peptides (list of str): Location of PDB files of the peptides to visualize receptor (str): Location of the receptor to plot using surface mode colors (list of str): list of colors to use for the peptide conformations Returns: NGLView Obj: Widget for Jupyter notebook visualization """ call(["jupyter-nbextension enable --py --sys-prefix widgetsnbextension"], shell=True) call(["jupyter-nbextension enable nglview --py --sys-prefix"], shell=True) import nglview import glob import mdtraj as md from matplotlib.colors import to_hex import matplotlib as mpl from matplotlib.colors import to_hex widget = nglview.NGLWidget() widget.clear_representations() i = 0 for p in peptides: struct = md.load(quote(p)) widget.add_trajectory(struct) widget.add_licorice(component=i) widget.remove_cartoon(component=i) if colors: widget.update_licorice(color=colors[i], component = i) i = i + 1 if receptor: struct = md.load(receptor) widget.add_trajectory(struct) widget.add_surface(component=i) widget.remove_cartoon(component=i) return widget