Module HLA_Arena

HLA-Arena: A Customizable environment for the structural modeling and analysis of peptide-HLA complexes for cancer immunotherapy

Installation

HLA-Arena is made available through Docker Hub.
Installation instructions are also provided in our github page.

Video Tutorials

Modeller license key

Several of HLA-Arena workflows rely on Modeller to perform the homology modeling of a given HLA receptor. This modeling task is integrated into a specific HLA-Arena function (more details below). However, using Modeller requires you to register and obtain your own license key, if you do not already have one. First, follow instructions on the Modeller registration page.

Once you have the key, you can permanently update the HLA-Arena container with your key. For that, you should execute the commands below, replacing MODELLER_KEY with the correct key.

docker run -it kavrakilab/hla-arena
sed -i "s/XXXX/MODELLER_KEY/g" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py
exit
docker commit $(docker ps -a | sed -n 2p | awk '{ print $1 }') kavrakilab/hla-arena
docker container rm $(docker ps -a | sed -n 2p | awk '{ print $1 }')

Note that this modification is permanent, in the sense that will not be lost when you close the container. However, it will be required again when you update the container (e.g., docker pull kavrakilab/hla-arena).

There is also the option of adding the key temporarily, when running a specific workflow. For that, just add the content below as one of the first cells to be executed in the workflow. Remember to replace MODELLER_KEY with the correct key.

from subprocess import call
call(["sed -i "s/XXXX/" + MODELLER_KEY + "/g" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py"], shell=True)

Using Jupyter Notebook

Each file with a '.ipynb' extension in this folder is a Jupyter notebook allowing you to run one of HLA-Arena workflows. Note that Jupyter Notebook is already installed in this docker image of HLA-Arena. If you are new to Jupyter Notebook, you can check this tutorial on how to interact with its interface. Numerous other resources are available online.

Available workflows

0_minimal_example.ipynb

This is the most basic workflow, which will walk you through the steps of HLA modeling, peptide docking and pHLA visualization. You can run all the cells at once, to sequentially execute Modeller, APE-Gen, and NGL Viewer. You can also edit cells to change, for example, the HLA allele name or the peptide sequence.

1_geometry_prediction.ipynb

This is a short version of the first workflow, which allows you to carry out geometry prediction with DINC. By default, this workflow performs the self-docking of the pHLA complex with PDB code 1E27. You can edit the cell containing this PDB code, to try and reproduce the crystal structure of another complex. You can also edit the cells containing commands, to choose other types of input (e.g., you can use your own PDB files, rather than using a PDB code) or switch between a self-docking and a cross-docking experiment. Note that lines starting with character '#' are not executed; simply remove this character to make them executable by Jupyter Notebook.

1b_geometry_prediction_w-rescoring.ipynb

This is a longer version of the first workflow. It additionally performs the rescoring of all output conformations using different scoring functions. In this workflow, input processing for rescoring is handled by MGLTools, through DINC. This process includes steps to check the protonation state of both the ligand and receptor.

1c_geometry_prediction_paper-demo.ipynb

This is a demo version of the first workflow. It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the cross-docking of the pHLA complex with PDB code 2GTW. Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization. You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.

2_binding_prediction.ipynb

This is the regular version of our second workflow, which allows you to perform binding prediction using ensembles of bound conformations generated by APE-Gen. As input, you can provide an HLA allele, a list of peptide sequences, and corresponding binding affinities. This workflow includes steps for a fast rescoring of the results (which is done by Smina) and correlation with experimental data (when available).

2b_binding_prediction_w-pdbs.ipynb

This is an alternative version of the second workflow. In addition to the regular input (i.e., HLA allele, list of peptide sequences and binding affinities), this version of the workflow also takes as input the PDB codes of pHLA complexes you want to reproduce (when available). Therefore, this version offers the possibility to calculate RMSDs between modeled complexes and their corresponding crystal structures.

2c_binding_prediction_paper-demo.ipynb

This is a demo version of the second workflow. It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the prediction of the binding to HLA-A*02:01 for a small dataset of peptides. Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization. You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.

3_virtual_screening.ipynb

This is the regular version of our third workflow, which allows you to perform a virtual screening of potential HLA binders using structures generated by APE-Gen. As input, you can provide a list of peptide sequences and HLA allele names.

3b_virtual_screening_paper-demo.ipynb

This is a demo version of the third workflow. It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the structure-based virtual screening of a large set of potential HLA-binding peptides. Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization. You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.

Expand source code
#
#  Here we wrap up the various docker calls in functions for each stage
#   in the arena workflow
#
#  python 3
#
# Last update: July 27 2020
"""
HLA-Arena: A Customizable environment for the structural modeling and analysis of peptide-HLA complexes for cancer immunotherapy

## Installation

HLA-Arena is made available through [Docker Hub](https://hub.docker.com/r/kavrakilab/hla-arena).  
Installation instructions are also provided in our [github page](https://github.com/KavrakiLab/hla-arena).

## Video Tutorials

* [HLA-Arena video tutorial part 1 (background)](https://youtu.be/gIFHmejEulo)
* [HLA-Arena video tutorial part 2 (demo)](https://youtu.be/fPhnmYez4QA)

## Modeller license key

Several of HLA-Arena workflows rely on Modeller to perform the homology modeling of a given HLA receptor. This modeling task is integrated into a specific HLA-Arena function (more details below). However, using Modeller requires you to register and obtain your own license key, if you do not already have one. First, follow instructions on the [Modeller registration page](https://salilab.org/modeller/registration.html). 

Once you have the key, you can permanently update the HLA-Arena container with your key. For that, you should execute the commands below, replacing `MODELLER_KEY` with the correct key. 

    docker run -it kavrakilab/hla-arena
    sed -i "s/XXXX/MODELLER_KEY/g" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py
    exit
    docker commit $(docker ps -a | sed -n 2p | awk '{ print $1 }') kavrakilab/hla-arena
    docker container rm $(docker ps -a | sed -n 2p | awk '{ print $1 }')

Note that this modification is permanent, in the sense that will not be lost when you close the container. However, it will be required again when you update the container (e.g., docker pull kavrakilab/hla-arena).

There is also the option of adding the key temporarily, when running a specific workflow. For that, just add the content below as one of the first cells to be executed in the workflow. Remember to replace `MODELLER_KEY` with the correct key.

    from subprocess import call
    call(["sed -i \"s/XXXX/" + MODELLER_KEY + "/g\" /conda/envs/apegen/lib/modeller-9.20/modlib/modeller/config.py"], shell=True)


## Using Jupyter Notebook

Each file with a '.ipynb' extension in this folder is a Jupyter notebook allowing you to run one of HLA-Arena workflows. Note that Jupyter Notebook is already installed in this docker image of HLA-Arena. If you are new to Jupyter Notebook, you can check this [tutorial on how to interact with its interface](https://www.youtube.com/watch?v=HW29067qVWk&feature=youtu.be&t=274). Numerous other resources are available online.


## Available workflows

**0_minimal_example.ipynb**

This is the most basic workflow, which will walk you through the steps of HLA modeling, peptide docking and pHLA visualization. You can run all the cells at once, to sequentially execute Modeller, APE-Gen, and NGL Viewer. You can also edit cells to change, for example, the HLA allele name or the peptide sequence.


**1_geometry_prediction.ipynb**

This is a short version of the first workflow, which allows you to carry out geometry prediction with DINC. By default, this workflow performs the self-docking of the pHLA complex with PDB code 1E27. You can edit the cell containing this PDB code, to try and reproduce the crystal structure of another complex. You can also edit the cells containing commands, to choose other types of input (e.g., you can use your own PDB files, rather than using a PDB code) or switch between a self-docking and a cross-docking experiment. Note that lines starting with character '#' are not executed; simply remove this character to make them executable by Jupyter Notebook.


**1b_geometry_prediction_w-rescoring.ipynb**

This is a longer version of the first workflow. It additionally performs the rescoring of all output conformations using different scoring functions. In this workflow, input processing for rescoring is handled by MGLTools, through DINC. This process includes steps to check the protonation state of both the ligand and receptor.


**1c_geometry_prediction_paper-demo.ipynb**

This is a demo version of the first workflow.
It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the cross-docking of the pHLA complex with PDB code 2GTW. 
Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization.
You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.


**2_binding_prediction.ipynb**

This is the regular version of our second workflow, which allows you to perform binding prediction using ensembles of bound conformations generated by APE-Gen. 
As input, you can provide an HLA allele, a list of peptide sequences, and corresponding binding affinities.
This workflow includes steps for a fast rescoring of the results (which is done by Smina) and correlation with experimental data (when available).

**2b_binding_prediction_w-pdbs.ipynb**

This is an alternative version of the second workflow. 
In addition to the regular input (i.e., HLA allele, list of peptide sequences and binding affinities), this version of the workflow also takes as input the PDB codes of pHLA complexes you want to reproduce (when available).
Therefore, this version offers the possibility to calculate RMSDs between modeled complexes and their corresponding crystal structures.

**2c_binding_prediction_paper-demo.ipynb**

This is a demo version of the second workflow.
It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the prediction of the binding to HLA-A*02:01 for a small dataset of peptides. 
Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization.
You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.


**3_virtual_screening.ipynb**

This is the regular version of our third workflow, which allows you to perform a virtual screening of potential HLA binders using structures generated by APE-Gen. 
As input, you can provide a list of peptide sequences and HLA allele names.


**3b_virtual_screening_paper-demo.ipynb**

This is a demo version of the third workflow.
It allows you to reproduce the results reported in our article on HLA-Arena (Antunes et al., 2020) about the structure-based virtual screening of a large set of potential HLA-binding peptides. 
Intermediate results have been pre-computed and will be loaded in the notebook for quick visualization.
You can click the "forward" button in the Jupyter Notebook toolbar to execute the entire workflow and visualize the final results.

"""

# import all necessary libraries outside the functions
from subprocess import call, check_output
from sys import stdout
from os import remove, path, listdir
from shutil import copyfile
import fileinput
from shlex import quote
import subprocess

import sys
sys.path.insert(0, "/APE-Gen")
import APE_Gen
import get_pMHC_pdb
#import model_receptor

import os
import copy
#apegen_env = os.environ
#dinc_env = copy.deepcopy(apegen_env)
dinc_env = os.environ.copy()
dinc_env["PATH"] = "/dinc/Utilities24:" + dinc_env["PATH"]
dinc_env["PATH"] = "/dinc/bin:" + dinc_env["PATH"]
dinc_env["BABEL_LIBDIR"] = "/dinc/lib/openbabel/2.3.2"
dinc_env["BABEL_DATADIR"] = "/dinc/share/openbabel/2.3.2"

import numpy as np
import nglview
import glob
import mdtraj as md
from matplotlib.colors import to_hex
import matplotlib as mpl

import pandas as pd
import seaborn as sns
from scipy import stats, spatial
import matplotlib.pyplot as plt
import pickle

#def run_command(command):
#    process = subprocess.Popen(command.split(), env=dinc_env, stdout=subprocess.PIPE)
#    while True:
#        output = process.stdout.readline()
#        if output == '' and process.poll() is not None:
#            break
#        if output:
#            print(output.strip())
#    rc = process.poll()
#    return rc
 
##############
#
#  STEP 1: Input Processing
#

def model_hla(allele_name, num_models=2):
    """
    Run MODELLER to create an HLA structure

    Runs MODELLER to create an HLA structure of the input allele. The sequence of the alpha chain of the allele is downloaded.
    Then a template PDB of a different HLA structure from the same supertype is downloaded.

    Parameters:
    allele_name (str or (str,str)): Either allele name with format like "HLA-A*24:02" or a pair of strings, where the first is the location
    of the fasta file with the alpha chain sequence and the second is the template pdb file (ex. allele_name=(alpha.fasta, template.pdb))
    num_models (int): Number of models to generate with MODELLER

    Returns:
    str: Name of the HLA structure in PDB format

    """

    import model_receptor

    if allele_name[:4] != "HLA-": # assume giving fasta and pdb template

        alpha_chain_seq = allele_name[0]
        pdb_template = allele_name[1]
        alpha_chain_seq_reformat = alpha_chain_seq
        pdb_template_reformat = pdb_template

        if alpha_chain_seq[-6:] != ".fasta": # trying to use allele name to pull sequence
            if "*" not in alpha_chain_seq: alpha_chain_seq_reformat = alpha_chain_seq_reformat[:5] + "*" + alpha_chain_seq_reformat[5:]
            if ":" not in alpha_chain_seq: allele_name_reformatted = alpha_chain_seq_reformat[:-2] + ":" + alpha_chain_seq_reformat[-2:]
        if pdb_template[-4:] != ".pdb": # trying to use allele name to imply supertype
            if "*" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:5] + "*" + pdb_template_reformat[5:]
            if ":" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:-2] + ":" + pdb_template_reformat[-2:]

        model_receptor.main([alpha_chain_seq_reformat, pdb_template_reformat, "-n", str(num_models)])
        return "best_model.pdb"

    allele_name_reformatted = allele_name
    if "*" not in allele_name: allele_name_reformatted = allele_name_reformatted[:5] + "*" + allele_name_reformatted[5:]
    if ":" not in allele_name: allele_name_reformatted = allele_name_reformatted[:-2] + ":" + allele_name_reformatted[-2:]
    hla_allele_name = allele_name_reformatted.replace("*", "")
    hla_allele_name = hla_allele_name.replace(":", "")
    hla_allele = hla_allele_name + ".pdb"

    if os.path.exists(hla_allele):
        print("Already found " + hla_allele)
        return hla_allele
    
    model_receptor.main([allele_name_reformatted, allele_name_reformatted, "-n", str(num_models)])

    call(["mv best_model.pdb " + hla_allele], shell=True)
    return hla_allele

def download_pHLA_pdb(pdbid):
    """
    Download PDB of a pHLA

    Downloads PDB of a pHLA where the chains are relabeled such that the alpha chain is labeled "A",
    the beta-2 microglobulin chain is labeled "B",
    and the peptide is labeled "C"

    Parameters:
    pdbid (str): 4-letter PDB code

    Returns:
    str: Name of the pHLA structure in PDB format

    """

    if not os.path.exists(pdbid + ".pdb"): get_pMHC_pdb.main([pdbid])
    else: print("Found " + pdbid)
    return pdbid + ".pdb"


## this function doesn't work yet
#def split_pdb(filename, prefix, local_folder):
#    # split a pdb file of a ligand-receptor complex into two files
#    ligand = open(path.join(local_folder, prefix+"_ligand.pdb"), "w+")
#    receptor = open(path.join(local_folder, prefix+"_receptor.pdb"), "w+")
#
#    curr_file = receptor
#    for line in fileinput.input(path.join(local_folder, filename)):
#        if line.startswith("TER"):
#            curr_file.write(line,)
#            curr_file = ligand
#        else:
#            curr_file.write(line,)
#    fileinput.close()


###############
#
#  STEP 2: Peptide docking
#

def dock(peptide, hla, useOpenMM=False, useGPU=False):
    """
    Run APE-Gen

    Runs APE-Gen for a given peptide sequence and HLA structure. 

    Parameters:
    peptide (str): Sequence of peptide to dock
    hla (str): Location of HLA structure to dock peptide
    useOpenMM (bool): Whether to use do an extra minimization step using OpenMM on the whole ensemble
    useGPU (bool): Whether to use the GPU when performing extra minimization step. Currently not working within Docker image.

    Returns:
    str: Location of best scoring structure that has been minimized with OpenMM

    """
   
    best_scoring_conf = os.getcwd() + "/0/openmm_min_energy_system.pdb"
    if os.path.exists(best_scoring_conf): return best_scoring_conf
 
    args = [peptide, hla]
    if useOpenMM: args.append("-o")
    if useGPU: args.append("--use_gpu")
    APE_Gen.main(args)

    if not useOpenMM:
        print("Minimizing best scoring conf")
        call(["pdbfixer 0/min_energy_system.pdb --output=0/min_energy_system.pdb"], shell=True)
        APE_Gen.minimizeConf("0/min_energy_system.pdb", "0/openmm_min_energy_system.pdb")

    return best_scoring_conf


def set_dinc_params(params):
    """
     params: dict where keys are params and values are the new values to set the param to
             only contains params that should be changed
    """

    # copy params.yml file
    call(["cp /dinc/example/defaults.yml ./params.yml"], shell=True)
    call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason

    # change params
    for key in params.keys():
        command = "sed -i '/"+key+":/ c\\"+key+": "+params[key]+"' params.yml"
        call([command], shell=True)

    return
    # THIS PRODUCES AN ERROR FOR SOME REASON
    replaced = False
    for line in fileinput.input("params.yml", inplace=True):
        replaced = False
        for key in params.keys():
            if line.startswith(key):
                print(key+": "+params[key],)
                replaced = True
                break
        if not replaced:
            print(line,)
    fileinput.close()

def dock_with_DINC(ligand, receptor, output_file, params=False):
    """
    Run DINC

    Runs DINC for a given ligand and receptor 

    Parameters:
    ligand (str): ligand to dock
    receptor (str): Location of HLA structure to dock ligand
    output_file (bool): Name of output file
    params (bool): Set to true if using own params.yml file

    Returns:
    str: Location of best scoring structure that has been minimized with OpenMM

    """

    #call(["cp /dinc/example/defaults.yml params.yml"], shell=True)
    call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason

    #call(["sed -i \"s/job_type: crossdocking/job_type: redocking/g\" params.yml"], shell=True)

    # dock peptide using DINC
    if params:
        command = "/dinc/dinc -l " + ligand + " -r " + receptor + " -p params.yml"
    else:
        command = "/dinc/dinc -l " + ligand + " -r " + receptor

    p = subprocess.Popen(command.split(), env=dinc_env)
    p.wait()

    call(["cp dincresults.txt " + output_file], shell=True)
        

###############
#
#  STEP 3: Data analysis
#

def rescore_complex(path_to_complex, func):
    """
    Rescore using DINC wrapper

    Rescore using DINC wrapper. Calls MGLTools to prepare ligand and receptor

    Parameters:
    path_to_comlex (str): path to the pdb file containing complex
    func (str): smina function flag, can take values: vina, AD4, vinardo

    Returns:
    float: Scoring according to "func"

    """

    local_folder = "temp"
    call(["mkdir -p " + local_folder], shell=True)
    os.chdir(local_folder)
    call(["cp /dinc/example/defaults.yml params.yml"], shell=True)
    call(["sed -i \"s/rescoring: Vina/rescoring: " + func + "/g\" params.yml"], shell=True)
    call(["sed -i \"s/job_type: crossdocking/job_type: scoring/g\" params.yml"], shell=True)

    call(["cp " + path_to_complex + " complex.pdb"], shell=True)
    call(["grep \"[A-Z] C  \" complex.pdb > ligand.pdb"], shell=True)
    call(["cp complex.pdb receptor.pdb"], shell=True)
    call(["sed -i \"/[A-Z] C/d\" receptor.pdb"], shell=True)
    ligand = "ligand.pdb"
    receptor = "receptor.pdb"
    command = "/dinc/dinc -l ligand.pdb -r receptor.pdb -p params.yml"
    
    # dinc needs this file in the directory for some reason
    call(["cp /dinc/example/defaults.yml ."], shell=True)
    p = subprocess.Popen(command.split(), env=dinc_env)
    p.wait()

    for line in open("dincresults.txt").readlines():
        if line.startswith("Binding Affinity:"):
            score = float(line.split()[2])
            break
    
    os.chdir("..")
    call(["rm -r " + local_folder], shell=True)
    return score

def rescore_complex_simple_smina(path_to_complex, func):
    """
    Rescore using SMINA

    Rescore using SMINA. This version is faster than arena.rescore_complex since SMINA handles I/O more efficiently.
    Note: pdb must have A,B corresponding to alpha and beta chain of receptor and C to ligand

    Parameters:
    path_to_comlex (str): path to the pdb file containing complex
    func (str): smina function flag, can take values: vina, vinardo, ad4_scoring

    Returns:
    float: Scoring according to "func"

    """
    #check if extra space is needed - APE-Gen files, not OPENMM minimized
    with open(path_to_complex) as f:
        if 'OPENMM' in f.read():
            fixAPEG = False
        else:
            fixAPEG = True  
  
    call (["awk \'{if ($5==\"A\") print $0}\' "+path_to_complex+" > tmp_receptor1.pdb"], shell=True)

    call (["awk \'{if ($5==\"B\") print $0}\' "+path_to_complex+" >> tmp_receptor1.pdb"], shell=True)
    call (["awk \'{if ($5==\"C\") print $0}\' "+path_to_complex+" > tmp_ligand1.pdb"], shell=True)
    
    #fix APEGen file by adding extra space
    if fixAPEG:
        call (["sed -i \"s/          /           /g\" tmp_ligand1.pdb"], shell=True)
        call (["sed -i \"s/          /           /g\" tmp_receptor1.pdb"], shell=True)

    call (["smina --receptor tmp_receptor1.pdb --ligand tmp_ligand1.pdb --scoring "+func+" --score_only > tmp_out1.pdb"], shell=True)
    
    val = 0
    with open("tmp_out1.pdb") as f1:
        for line in f1:
            if "Affinity:" in line:
                val = line.split()[1]
                break

    call (["rm tmp_receptor1.pdb"], shell=True)
    call (["rm tmp_ligand1.pdb"], shell=True)
    call (["rm tmp_out1.pdb"], shell=True)
    return float(val)
   
    


def rescore(ligands, receptor, func):
    """
     ligands: list of all files with ligand conformations to rescore
     receptor: single file with receptor for rescoring
     func: rescoring function
    """

    set_dinc_params({"rescoring":func, "job_type":"scoring"})
    
    for l in ligands:
        dock_with_DINC(l, receptor, l+"results", params=True)
        call(["cat %sresults >> rescoreresults.txt" % l], shell=True)

def filter_by_terminal_positions(filenames, receptor, cutoff_N=12, cutoff_C=12):
        """
        Filter the conformations by the distance between the N terminus/C terminus
        and specific residues of the alpha helices
        This filters out conformations that are flipped in the binding site, 
        filtering out conformations that are above either cutoff

        filenames: (.pdb) of ligand conformations
        receptor: (.pdb) file of receptor conformation
        cutoff_N: distance cutoff between CA of res 63 of the receptor to
                          the CA of N-terminus of the ligand
        cutoff_C: distance cutoff between res 146 and C-term

        return list of filenames that fit the criteria
        """

        final_filenames = []

        # load receptor atoms, residues are numbered starting at 0
        N_ref_atom = md.load(receptor).top.select("resid 62 and name CA")
        N_ref = md.load(receptor, atom_indices=N_ref_atom)

        C_ref_atom = md.load(receptor).top.select("resid 145 and name CA")
        C_ref = md.load(receptor, atom_indices=C_ref_atom)

        # get length of peptide (load without H atoms)
        peptide_length = md.load(filenames[0], atom_indices = md.load(filenames[0]).top.select("not name H")).top.n_residues
        select_str = "resid {} and name CA".format(peptide_length - 1)

        for fname in filenames:
                N_term = md.load(fname, atom_indices = md.load(fname).top.select("resid 0 and name CA"))
                C_term = md.load(fname, atom_indices = md.load(fname).top.select(select_str))

                # distance calculations
                delta_N = N_ref.xyz - N_term.xyz
                delta_C = C_ref.xyz - C_term.xyz

                rmsd_N = np.sqrt(sum(sum((delta_N * delta_N).T)))
                rmsd_C = np.sqrt(sum(sum((delta_C * delta_C).T)))

                if rmsd_N*10 <= cutoff_N and rmsd_C*10 <= cutoff_C:
                        final_filenames.append(fname)

        return final_filenames


def filter_by_rmsd(filenames, reference, cutoff=4):
    """
     Filter the conformations by backbone rmsd
     given filenames (.pdb) of ligand conformations and filename (.pdb) of reference ligand
     and rmsd cutoff (default 4 angstrom)
     return list of filenames with rmsd <= cutoff
    """
    final_filenames = []
    
    # load backbone only of reference ligand
    ref_pep_atoms = md.load(reference).top.select("backbone")
    ref_pdb = md.load(reference, atom_indices = ref_pep_atoms)

    # filter conformations
    for fname in filenames:
        conf = md.load(fname, atom_indices = md.load(fname).top.select("backbone"))

        # rmsd calculation
        delta = conf.xyz - ref_pdb.xyz
        rmsd = np.sqrt(sum(sum((delta * delta).T)) / len(delta))

        if rmsd <= cutoff:
            final_filenames.append(fname)

    return final_filenames

def combine_flex_and_rigid_receptor(flex_filename, receptor_filename, output):
    """
    Combine the flexible residues with the rest of the receptor in one file
    given filenames for the entire receptor (.pdb or .pdbqt) and just the flexible residues (.pdb), 
    and the name of the combined file (.pdb)
    """
    
    call(["cp "+receptor_filename+" "+output], shell=True)
    with open(flex_filename) as flex:
        for flex_line in flex.readlines():
            for line in fileinput.input(output, inplace=True):
                if flex_line[12:27] == line[12:27]:
                    print(flex_line,) 
                else:
                    print(line,)
    fileinput.close()


def visualize(structures):
    """
    Visualize set of structures

    Runs NGLView to visualize structures 

    Parameters:
    structures (str or list of str): Location of PDB files to visualize

    Returns:
    NGLView Obj: Widget for Jupyter notebook visualization

    """

    widget = nglview.NGLWidget()
    widget.clear_representations()

    if isinstance(structures, str):

        s = md.load(structures)
        widget.add_trajectory(s)

    else:
        for s in structures:
            widget.add_trajectory(md.load(s))

    return widget

def visualize_ensemble(path_to_ensemble="./"):
    """
    Visualize ensemble produced by APE-Gen

    Runs NGLView to visualize the ensemble of conformations outputed by APE-Gen

    Parameters:
    path_to_ensemble (str): Location of PDB files of the ensemble

    Returns:
    NGLView Obj: Widget for Jupyter notebook visualization

    """

    widget = nglview.NGLWidget()
    widget.clear_representations()

    ensemble = glob.glob(path_to_ensemble + "0/full_system_confs/*.pdb")
    
    i = 0
    for conf_str in ensemble:
        struct = md.load(conf_str, atom_indices=md.load(conf_str).top.select("chainid == 2"))
        widget.add_trajectory(struct)
        widget.add_licorice(component=i)
        widget.remove_cartoon(component=i)
        i = i + 1

    return widget

def visualize_separate(peptides, receptor = None, colors = None):
    """
    Visualize ligand and receptor separately

    Runs NGLView to visualize peptide conformations as well as the receptor (optional)

    Parameters:
    peptides (list of str): Location of PDB files of the peptides to visualize
    receptor (str): Location of the receptor to plot using surface mode
    colors (list of str): list of colors to use for the peptide conformations

    Returns:
    NGLView Obj: Widget for Jupyter notebook visualization

    """
    call(["jupyter-nbextension enable --py --sys-prefix widgetsnbextension"], shell=True)
    call(["jupyter-nbextension enable nglview --py --sys-prefix"], shell=True)

    import nglview
    import glob
    import mdtraj as md
    from matplotlib.colors import to_hex
    import matplotlib as mpl
    from matplotlib.colors import to_hex
    
    widget = nglview.NGLWidget()
    widget.clear_representations()

    i = 0
    for p in peptides:
        struct = md.load(quote(p))
        widget.add_trajectory(struct)
        widget.add_licorice(component=i)
        widget.remove_cartoon(component=i)
        if colors:
            widget.update_licorice(color=colors[i], component = i)
        i = i + 1

        
    if receptor:
        struct = md.load(receptor)
        widget.add_trajectory(struct)
        widget.add_surface(component=i)
        widget.remove_cartoon(component=i)

        
    return widget

Functions

def combine_flex_and_rigid_receptor(flex_filename, receptor_filename, output)

Combine the flexible residues with the rest of the receptor in one file given filenames for the entire receptor (.pdb or .pdbqt) and just the flexible residues (.pdb), and the name of the combined file (.pdb)

Expand source code
def combine_flex_and_rigid_receptor(flex_filename, receptor_filename, output):
    """
    Combine the flexible residues with the rest of the receptor in one file
    given filenames for the entire receptor (.pdb or .pdbqt) and just the flexible residues (.pdb), 
    and the name of the combined file (.pdb)
    """
    
    call(["cp "+receptor_filename+" "+output], shell=True)
    with open(flex_filename) as flex:
        for flex_line in flex.readlines():
            for line in fileinput.input(output, inplace=True):
                if flex_line[12:27] == line[12:27]:
                    print(flex_line,) 
                else:
                    print(line,)
    fileinput.close()
def dock(peptide, hla, useOpenMM=False, useGPU=False)

Run APE-Gen

Runs APE-Gen for a given peptide sequence and HLA structure.

Parameters: peptide (str): Sequence of peptide to dock hla (str): Location of HLA structure to dock peptide useOpenMM (bool): Whether to use do an extra minimization step using OpenMM on the whole ensemble useGPU (bool): Whether to use the GPU when performing extra minimization step. Currently not working within Docker image.

Returns: str: Location of best scoring structure that has been minimized with OpenMM

Expand source code
def dock(peptide, hla, useOpenMM=False, useGPU=False):
    """
    Run APE-Gen

    Runs APE-Gen for a given peptide sequence and HLA structure. 

    Parameters:
    peptide (str): Sequence of peptide to dock
    hla (str): Location of HLA structure to dock peptide
    useOpenMM (bool): Whether to use do an extra minimization step using OpenMM on the whole ensemble
    useGPU (bool): Whether to use the GPU when performing extra minimization step. Currently not working within Docker image.

    Returns:
    str: Location of best scoring structure that has been minimized with OpenMM

    """
   
    best_scoring_conf = os.getcwd() + "/0/openmm_min_energy_system.pdb"
    if os.path.exists(best_scoring_conf): return best_scoring_conf
 
    args = [peptide, hla]
    if useOpenMM: args.append("-o")
    if useGPU: args.append("--use_gpu")
    APE_Gen.main(args)

    if not useOpenMM:
        print("Minimizing best scoring conf")
        call(["pdbfixer 0/min_energy_system.pdb --output=0/min_energy_system.pdb"], shell=True)
        APE_Gen.minimizeConf("0/min_energy_system.pdb", "0/openmm_min_energy_system.pdb")

    return best_scoring_conf
def dock_with_DINC(ligand, receptor, output_file, params=False)

Run DINC

Runs DINC for a given ligand and receptor

Parameters: ligand (str): ligand to dock receptor (str): Location of HLA structure to dock ligand output_file (bool): Name of output file params (bool): Set to true if using own params.yml file

Returns: str: Location of best scoring structure that has been minimized with OpenMM

Expand source code
def dock_with_DINC(ligand, receptor, output_file, params=False):
    """
    Run DINC

    Runs DINC for a given ligand and receptor 

    Parameters:
    ligand (str): ligand to dock
    receptor (str): Location of HLA structure to dock ligand
    output_file (bool): Name of output file
    params (bool): Set to true if using own params.yml file

    Returns:
    str: Location of best scoring structure that has been minimized with OpenMM

    """

    #call(["cp /dinc/example/defaults.yml params.yml"], shell=True)
    call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason

    #call(["sed -i \"s/job_type: crossdocking/job_type: redocking/g\" params.yml"], shell=True)

    # dock peptide using DINC
    if params:
        command = "/dinc/dinc -l " + ligand + " -r " + receptor + " -p params.yml"
    else:
        command = "/dinc/dinc -l " + ligand + " -r " + receptor

    p = subprocess.Popen(command.split(), env=dinc_env)
    p.wait()

    call(["cp dincresults.txt " + output_file], shell=True)
def download_pHLA_pdb(pdbid)

Download PDB of a pHLA

Downloads PDB of a pHLA where the chains are relabeled such that the alpha chain is labeled "A", the beta-2 microglobulin chain is labeled "B", and the peptide is labeled "C"

Parameters: pdbid (str): 4-letter PDB code

Returns: str: Name of the pHLA structure in PDB format

Expand source code
def download_pHLA_pdb(pdbid):
    """
    Download PDB of a pHLA

    Downloads PDB of a pHLA where the chains are relabeled such that the alpha chain is labeled "A",
    the beta-2 microglobulin chain is labeled "B",
    and the peptide is labeled "C"

    Parameters:
    pdbid (str): 4-letter PDB code

    Returns:
    str: Name of the pHLA structure in PDB format

    """

    if not os.path.exists(pdbid + ".pdb"): get_pMHC_pdb.main([pdbid])
    else: print("Found " + pdbid)
    return pdbid + ".pdb"
def filter_by_rmsd(filenames, reference, cutoff=4)

Filter the conformations by backbone rmsd given filenames (.pdb) of ligand conformations and filename (.pdb) of reference ligand and rmsd cutoff (default 4 angstrom) return list of filenames with rmsd <= cutoff

Expand source code
def filter_by_rmsd(filenames, reference, cutoff=4):
    """
     Filter the conformations by backbone rmsd
     given filenames (.pdb) of ligand conformations and filename (.pdb) of reference ligand
     and rmsd cutoff (default 4 angstrom)
     return list of filenames with rmsd <= cutoff
    """
    final_filenames = []
    
    # load backbone only of reference ligand
    ref_pep_atoms = md.load(reference).top.select("backbone")
    ref_pdb = md.load(reference, atom_indices = ref_pep_atoms)

    # filter conformations
    for fname in filenames:
        conf = md.load(fname, atom_indices = md.load(fname).top.select("backbone"))

        # rmsd calculation
        delta = conf.xyz - ref_pdb.xyz
        rmsd = np.sqrt(sum(sum((delta * delta).T)) / len(delta))

        if rmsd <= cutoff:
            final_filenames.append(fname)

    return final_filenames
def filter_by_terminal_positions(filenames, receptor, cutoff_N=12, cutoff_C=12)

Filter the conformations by the distance between the N terminus/C terminus and specific residues of the alpha helices This filters out conformations that are flipped in the binding site, filtering out conformations that are above either cutoff

filenames: (.pdb) of ligand conformations receptor: (.pdb) file of receptor conformation cutoff_N: distance cutoff between CA of res 63 of the receptor to the CA of N-terminus of the ligand cutoff_C: distance cutoff between res 146 and C-term

return list of filenames that fit the criteria

Expand source code
def filter_by_terminal_positions(filenames, receptor, cutoff_N=12, cutoff_C=12):
        """
        Filter the conformations by the distance between the N terminus/C terminus
        and specific residues of the alpha helices
        This filters out conformations that are flipped in the binding site, 
        filtering out conformations that are above either cutoff

        filenames: (.pdb) of ligand conformations
        receptor: (.pdb) file of receptor conformation
        cutoff_N: distance cutoff between CA of res 63 of the receptor to
                          the CA of N-terminus of the ligand
        cutoff_C: distance cutoff between res 146 and C-term

        return list of filenames that fit the criteria
        """

        final_filenames = []

        # load receptor atoms, residues are numbered starting at 0
        N_ref_atom = md.load(receptor).top.select("resid 62 and name CA")
        N_ref = md.load(receptor, atom_indices=N_ref_atom)

        C_ref_atom = md.load(receptor).top.select("resid 145 and name CA")
        C_ref = md.load(receptor, atom_indices=C_ref_atom)

        # get length of peptide (load without H atoms)
        peptide_length = md.load(filenames[0], atom_indices = md.load(filenames[0]).top.select("not name H")).top.n_residues
        select_str = "resid {} and name CA".format(peptide_length - 1)

        for fname in filenames:
                N_term = md.load(fname, atom_indices = md.load(fname).top.select("resid 0 and name CA"))
                C_term = md.load(fname, atom_indices = md.load(fname).top.select(select_str))

                # distance calculations
                delta_N = N_ref.xyz - N_term.xyz
                delta_C = C_ref.xyz - C_term.xyz

                rmsd_N = np.sqrt(sum(sum((delta_N * delta_N).T)))
                rmsd_C = np.sqrt(sum(sum((delta_C * delta_C).T)))

                if rmsd_N*10 <= cutoff_N and rmsd_C*10 <= cutoff_C:
                        final_filenames.append(fname)

        return final_filenames
def model_hla(allele_name, num_models=2)

Run MODELLER to create an HLA structure

Runs MODELLER to create an HLA structure of the input allele. The sequence of the alpha chain of the allele is downloaded. Then a template PDB of a different HLA structure from the same supertype is downloaded.

Parameters: allele_name (str or (str,str)): Either allele name with format like "HLA-A*24:02" or a pair of strings, where the first is the location of the fasta file with the alpha chain sequence and the second is the template pdb file (ex. allele_name=(alpha.fasta, template.pdb)) num_models (int): Number of models to generate with MODELLER

Returns: str: Name of the HLA structure in PDB format

Expand source code
def model_hla(allele_name, num_models=2):
    """
    Run MODELLER to create an HLA structure

    Runs MODELLER to create an HLA structure of the input allele. The sequence of the alpha chain of the allele is downloaded.
    Then a template PDB of a different HLA structure from the same supertype is downloaded.

    Parameters:
    allele_name (str or (str,str)): Either allele name with format like "HLA-A*24:02" or a pair of strings, where the first is the location
    of the fasta file with the alpha chain sequence and the second is the template pdb file (ex. allele_name=(alpha.fasta, template.pdb))
    num_models (int): Number of models to generate with MODELLER

    Returns:
    str: Name of the HLA structure in PDB format

    """

    import model_receptor

    if allele_name[:4] != "HLA-": # assume giving fasta and pdb template

        alpha_chain_seq = allele_name[0]
        pdb_template = allele_name[1]
        alpha_chain_seq_reformat = alpha_chain_seq
        pdb_template_reformat = pdb_template

        if alpha_chain_seq[-6:] != ".fasta": # trying to use allele name to pull sequence
            if "*" not in alpha_chain_seq: alpha_chain_seq_reformat = alpha_chain_seq_reformat[:5] + "*" + alpha_chain_seq_reformat[5:]
            if ":" not in alpha_chain_seq: allele_name_reformatted = alpha_chain_seq_reformat[:-2] + ":" + alpha_chain_seq_reformat[-2:]
        if pdb_template[-4:] != ".pdb": # trying to use allele name to imply supertype
            if "*" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:5] + "*" + pdb_template_reformat[5:]
            if ":" not in pdb_template: pdb_template_reformat = pdb_template_reformat[:-2] + ":" + pdb_template_reformat[-2:]

        model_receptor.main([alpha_chain_seq_reformat, pdb_template_reformat, "-n", str(num_models)])
        return "best_model.pdb"

    allele_name_reformatted = allele_name
    if "*" not in allele_name: allele_name_reformatted = allele_name_reformatted[:5] + "*" + allele_name_reformatted[5:]
    if ":" not in allele_name: allele_name_reformatted = allele_name_reformatted[:-2] + ":" + allele_name_reformatted[-2:]
    hla_allele_name = allele_name_reformatted.replace("*", "")
    hla_allele_name = hla_allele_name.replace(":", "")
    hla_allele = hla_allele_name + ".pdb"

    if os.path.exists(hla_allele):
        print("Already found " + hla_allele)
        return hla_allele
    
    model_receptor.main([allele_name_reformatted, allele_name_reformatted, "-n", str(num_models)])

    call(["mv best_model.pdb " + hla_allele], shell=True)
    return hla_allele
def rescore(ligands, receptor, func)

ligands: list of all files with ligand conformations to rescore receptor: single file with receptor for rescoring func: rescoring function

Expand source code
def rescore(ligands, receptor, func):
    """
     ligands: list of all files with ligand conformations to rescore
     receptor: single file with receptor for rescoring
     func: rescoring function
    """

    set_dinc_params({"rescoring":func, "job_type":"scoring"})
    
    for l in ligands:
        dock_with_DINC(l, receptor, l+"results", params=True)
        call(["cat %sresults >> rescoreresults.txt" % l], shell=True)
def rescore_complex(path_to_complex, func)

Rescore using DINC wrapper

Rescore using DINC wrapper. Calls MGLTools to prepare ligand and receptor

Parameters: path_to_comlex (str): path to the pdb file containing complex func (str): smina function flag, can take values: vina, AD4, vinardo

Returns: float: Scoring according to "func"

Expand source code
def rescore_complex(path_to_complex, func):
    """
    Rescore using DINC wrapper

    Rescore using DINC wrapper. Calls MGLTools to prepare ligand and receptor

    Parameters:
    path_to_comlex (str): path to the pdb file containing complex
    func (str): smina function flag, can take values: vina, AD4, vinardo

    Returns:
    float: Scoring according to "func"

    """

    local_folder = "temp"
    call(["mkdir -p " + local_folder], shell=True)
    os.chdir(local_folder)
    call(["cp /dinc/example/defaults.yml params.yml"], shell=True)
    call(["sed -i \"s/rescoring: Vina/rescoring: " + func + "/g\" params.yml"], shell=True)
    call(["sed -i \"s/job_type: crossdocking/job_type: scoring/g\" params.yml"], shell=True)

    call(["cp " + path_to_complex + " complex.pdb"], shell=True)
    call(["grep \"[A-Z] C  \" complex.pdb > ligand.pdb"], shell=True)
    call(["cp complex.pdb receptor.pdb"], shell=True)
    call(["sed -i \"/[A-Z] C/d\" receptor.pdb"], shell=True)
    ligand = "ligand.pdb"
    receptor = "receptor.pdb"
    command = "/dinc/dinc -l ligand.pdb -r receptor.pdb -p params.yml"
    
    # dinc needs this file in the directory for some reason
    call(["cp /dinc/example/defaults.yml ."], shell=True)
    p = subprocess.Popen(command.split(), env=dinc_env)
    p.wait()

    for line in open("dincresults.txt").readlines():
        if line.startswith("Binding Affinity:"):
            score = float(line.split()[2])
            break
    
    os.chdir("..")
    call(["rm -r " + local_folder], shell=True)
    return score
def rescore_complex_simple_smina(path_to_complex, func)

Rescore using SMINA

Rescore using SMINA. This version is faster than arena.rescore_complex since SMINA handles I/O more efficiently. Note: pdb must have A,B corresponding to alpha and beta chain of receptor and C to ligand

Parameters: path_to_comlex (str): path to the pdb file containing complex func (str): smina function flag, can take values: vina, vinardo, ad4_scoring

Returns: float: Scoring according to "func"

Expand source code
def rescore_complex_simple_smina(path_to_complex, func):
    """
    Rescore using SMINA

    Rescore using SMINA. This version is faster than arena.rescore_complex since SMINA handles I/O more efficiently.
    Note: pdb must have A,B corresponding to alpha and beta chain of receptor and C to ligand

    Parameters:
    path_to_comlex (str): path to the pdb file containing complex
    func (str): smina function flag, can take values: vina, vinardo, ad4_scoring

    Returns:
    float: Scoring according to "func"

    """
    #check if extra space is needed - APE-Gen files, not OPENMM minimized
    with open(path_to_complex) as f:
        if 'OPENMM' in f.read():
            fixAPEG = False
        else:
            fixAPEG = True  
  
    call (["awk \'{if ($5==\"A\") print $0}\' "+path_to_complex+" > tmp_receptor1.pdb"], shell=True)

    call (["awk \'{if ($5==\"B\") print $0}\' "+path_to_complex+" >> tmp_receptor1.pdb"], shell=True)
    call (["awk \'{if ($5==\"C\") print $0}\' "+path_to_complex+" > tmp_ligand1.pdb"], shell=True)
    
    #fix APEGen file by adding extra space
    if fixAPEG:
        call (["sed -i \"s/          /           /g\" tmp_ligand1.pdb"], shell=True)
        call (["sed -i \"s/          /           /g\" tmp_receptor1.pdb"], shell=True)

    call (["smina --receptor tmp_receptor1.pdb --ligand tmp_ligand1.pdb --scoring "+func+" --score_only > tmp_out1.pdb"], shell=True)
    
    val = 0
    with open("tmp_out1.pdb") as f1:
        for line in f1:
            if "Affinity:" in line:
                val = line.split()[1]
                break

    call (["rm tmp_receptor1.pdb"], shell=True)
    call (["rm tmp_ligand1.pdb"], shell=True)
    call (["rm tmp_out1.pdb"], shell=True)
    return float(val)
def set_dinc_params(params)

params: dict where keys are params and values are the new values to set the param to only contains params that should be changed

Expand source code
def set_dinc_params(params):
    """
     params: dict where keys are params and values are the new values to set the param to
             only contains params that should be changed
    """

    # copy params.yml file
    call(["cp /dinc/example/defaults.yml ./params.yml"], shell=True)
    call(["cp /dinc/example/defaults.yml ."], shell=True) # dinc binary needs this for some reason

    # change params
    for key in params.keys():
        command = "sed -i '/"+key+":/ c\\"+key+": "+params[key]+"' params.yml"
        call([command], shell=True)

    return
    # THIS PRODUCES AN ERROR FOR SOME REASON
    replaced = False
    for line in fileinput.input("params.yml", inplace=True):
        replaced = False
        for key in params.keys():
            if line.startswith(key):
                print(key+": "+params[key],)
                replaced = True
                break
        if not replaced:
            print(line,)
    fileinput.close()
def visualize(structures)

Visualize set of structures

Runs NGLView to visualize structures

Parameters: structures (str or list of str): Location of PDB files to visualize

Returns: NGLView Obj: Widget for Jupyter notebook visualization

Expand source code
def visualize(structures):
    """
    Visualize set of structures

    Runs NGLView to visualize structures 

    Parameters:
    structures (str or list of str): Location of PDB files to visualize

    Returns:
    NGLView Obj: Widget for Jupyter notebook visualization

    """

    widget = nglview.NGLWidget()
    widget.clear_representations()

    if isinstance(structures, str):

        s = md.load(structures)
        widget.add_trajectory(s)

    else:
        for s in structures:
            widget.add_trajectory(md.load(s))

    return widget
def visualize_ensemble(path_to_ensemble='./')

Visualize ensemble produced by APE-Gen

Runs NGLView to visualize the ensemble of conformations outputed by APE-Gen

Parameters: path_to_ensemble (str): Location of PDB files of the ensemble

Returns: NGLView Obj: Widget for Jupyter notebook visualization

Expand source code
def visualize_ensemble(path_to_ensemble="./"):
    """
    Visualize ensemble produced by APE-Gen

    Runs NGLView to visualize the ensemble of conformations outputed by APE-Gen

    Parameters:
    path_to_ensemble (str): Location of PDB files of the ensemble

    Returns:
    NGLView Obj: Widget for Jupyter notebook visualization

    """

    widget = nglview.NGLWidget()
    widget.clear_representations()

    ensemble = glob.glob(path_to_ensemble + "0/full_system_confs/*.pdb")
    
    i = 0
    for conf_str in ensemble:
        struct = md.load(conf_str, atom_indices=md.load(conf_str).top.select("chainid == 2"))
        widget.add_trajectory(struct)
        widget.add_licorice(component=i)
        widget.remove_cartoon(component=i)
        i = i + 1

    return widget
def visualize_separate(peptides, receptor=None, colors=None)

Visualize ligand and receptor separately

Runs NGLView to visualize peptide conformations as well as the receptor (optional)

Parameters: peptides (list of str): Location of PDB files of the peptides to visualize receptor (str): Location of the receptor to plot using surface mode colors (list of str): list of colors to use for the peptide conformations

Returns: NGLView Obj: Widget for Jupyter notebook visualization

Expand source code
def visualize_separate(peptides, receptor = None, colors = None):
    """
    Visualize ligand and receptor separately

    Runs NGLView to visualize peptide conformations as well as the receptor (optional)

    Parameters:
    peptides (list of str): Location of PDB files of the peptides to visualize
    receptor (str): Location of the receptor to plot using surface mode
    colors (list of str): list of colors to use for the peptide conformations

    Returns:
    NGLView Obj: Widget for Jupyter notebook visualization

    """
    call(["jupyter-nbextension enable --py --sys-prefix widgetsnbextension"], shell=True)
    call(["jupyter-nbextension enable nglview --py --sys-prefix"], shell=True)

    import nglview
    import glob
    import mdtraj as md
    from matplotlib.colors import to_hex
    import matplotlib as mpl
    from matplotlib.colors import to_hex
    
    widget = nglview.NGLWidget()
    widget.clear_representations()

    i = 0
    for p in peptides:
        struct = md.load(quote(p))
        widget.add_trajectory(struct)
        widget.add_licorice(component=i)
        widget.remove_cartoon(component=i)
        if colors:
            widget.update_licorice(color=colors[i], component = i)
        i = i + 1

        
    if receptor:
        struct = md.load(receptor)
        widget.add_trajectory(struct)
        widget.add_surface(component=i)
        widget.remove_cartoon(component=i)

        
    return widget