Welcome to PBxplore’s documentation!¶
PBxplore is a suite of tools dedicated to Protein Block analysis.
Protein Blocks (PBs) are structural prototypes defined by de Brevern et al [1]. The 3-dimensional local structure of a protein backbone can be modelized as an 1-dimensional sequence of PBs. In principle, any conformation of any amino acid could be represented by one of the sixteen available Protein Blocks.

Schematic representation of the sixteen protein blocks, labeled from a to p (Creative commons 4.0 CC-BY).
PBxplore provides both a Python library and command-line tools (Utilization). For some features (reading trajectory, plots), PBxplore requires some optional dependencies (Installation). Basically, PBxplore can:
- assign PBs from either a PDB or either a molecular dynamics trajectory (PB assignation).
- use analysis tools to perform statistical analysis on PBs (Statistics).
- use analysis tools to study protein flexibility and deformability (Analyzis).
Protein Blocks¶
Protein Blocks (PBs) are structural prototypes defined by de Brevern et al [1]. The 3-dimensional local structure of a protein backbone can be modelized as an 1-dimensional sequence of PBs. In principle, any conformation of any amino acid could be represented by one of the sixteen available Protein Blocks.
PBs are labeled from a to p (see Figure 1). PBs m and d can be roughly described as prototypes for alpha-helix and central beta-strand, respectively. PBs a to c primarily represent beta-strand N-caps and PBs e and f, beta-strand C-caps; PBs a to j are specific to coils, PBs k and l to alpha-helix N-caps, and PBs n to p to alpha-helix C-caps.

Figure 1. Schematic representation of the sixteen protein blocks, labeled from a to p (Creative commons CC-BY).

Figure 2. 3D representation of the barstar protein (PDB ID 1AY7, chain B) (Creative commons CC-BY).
For instance, the 3D-structure of the barstar protein represented in Figure 2 can be translated into a 1D-sequence of PBs:
ZZdddfklpcbfklmmmmmmmmnopafklgoiaklmmmmmmmmpacddddddehkllmmmmnnommmmmmmmmmmmmmnopacddddZZ
The conformations of the 89 residues of the barstar are translated into a sequence of 89 PBs. Note that “Z” corresponds to amino acids for which a PB cannot be assigned. As a matter of fact, the assignment of a given residue n requires is based on the conformations of residues n-2, n-1, n, n+1 and n+2. Therefore, no PB can be assigned to the two first (N-termini) and two last (C-termini) residues of a polypeptide chain.
Neq¶
Neq is calculated as follows:

Where fx is the probability of PB x. Neq quantifies the average number of PBs at a given position in the protein sequence. A Neq value of 1 indicates that only one type of PB is observed. On the opposite, a Neq value of 16 is equivalent to a fully random distribution of PBs.
[1] |
|
Installation¶
Although it’s not possible to cover all possible ways to install PBxplore on any operating system, we try in this document to provide few guidelines regarding PBxplore setup.
Supported Platforms¶
Currently, PBxplore run with Python >=3.6 on Linux and Mac OS X.
Dependencies¶
PBxplore requires the following libraries (which will be installed through ` pip`).
- NumPy >= 1.6.0
- NumPy is the base package for numerical computing in python.
- matplotlib [1] >= 1.4.0
- All ploting functions use the matplotlib package.
- MDAnalysis [2] >= 0.11
- We use MDAnalysis to read Gromacs molecular dynamics trajectories. Many other trajectory files are also supported, see list here.
Optionally, PBxplore can use the following packages:
Installing PBxplore¶
The most straightforward way is to use pip. It will also install the required dependencies:
$ pip install --user pbxplore
If none of the dependencies were previously installed, the installation process can take up to several minutes.
Note
The former command will install the PBxplore command-line scripts in:
$HOME/.local/bin
on Linux.~/Library/Python/X.Y/bin
on Mac OSX, whereX.Y
stands for the Python version (for instance3.8
for Python 3.8).
Do not forget to update your $PATH
environment variable to make all PBxplore tools accessible. As an example, with the Bash shell, this gives:
$ # for Linux
$ export PATH=$PATH:$HOME/.local/bin
$ # for Mac OSX
$ export PATH=$PATH:~/Library/Python/X.Y/bin
See the documentation of pip for more information. You may also want to look at virtualenv.
Upgrading PBxplore¶
Be sure to always have the latest version of PBxplore with:
$ pip install --user --upgrade pbxplore
Testing PBxplore¶
PBxplore comes with unit tests and regression tests. It requires the package pytest. You can run the tests within Python:
import pbxplore
pbxplore.test()
PBxplore for advanced users¶
You can clone PBxplore from GitHub:
$ git clone --depth 1 https://github.com/pierrepo/PBxplore.git
Once in the PBxplore
directory, we advise you to create a virtual environment:
$ pip3 install --user virtualenv
$ virtualenv -p python3 venv
$ source venv/bin/activate
You can then install the latest version of PBxplore as a Python module:
$ pip install -e .
You can also run unit tests and regression tests:
$ pip install pytest
$ pytest -v pbxplore/tests
or
$ pip install pytest
$ python setup.py test
[1] | J. D. Hunter. Matplotlib: A 2D graphics environment. Computing In Science and Engineering 9 (2007), 90-95. doi:10.1109/MCSE.2007.55 |
[2] | N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 32 (2011), 2319–2327. doi:10.1002/jcc.21787 |
[3] | G. E. Crooks, G. Hon, J.-M. Chandonia, and S. E. Brenner. WebLogo: A Sequence Logo Generator. Genome Research 14: 1188–90 (2004) doi:10.1101/gr.849004 |
Utilization¶
- There is 2 ways to use PBxplore:
- command-line scripts
- API
Command-line scripts¶
PBxplore is bundle with a suite of command-line tools.
Once installed, they should be available in your $PATH
.
Here the list:
- PBassign assign Protein Blocks (PBs) from a set of protein structures.
- PBcount computes the frequency of PBs at each position along the amino acid sequence.
- PBstat generates frequency and logo plots, and estimates the Neq.
PBassign¶
PBassign
assigns a PB sequence to a protein structure.
Example¶
First download data:
$ wget https://files.rcsb.org/view/3ICH.pdb
Then perform PBs assignment:
$ PBassign -p 3ICH.pdb -o 3ICH
1 PDB file(s) to process
Read 1 chain(s) in 3ICH.pdb
wrote 3ICH.PB.fasta
Content of 3ICH.PB.fasta :
>3ICH.pdb | chain A
ZZccdfbdcdddddehjbdebjcdddddfklmmmlmmmmmmmmnopnopajeopacfbdc
ehibacehiamnonopgocdfkbjbdcdfblmbccfbghiacdddebehiafkbccddfb
dcfklgokaccfbdcfbhklmmmmmmmpccdfkopafbacddfbgcddddfbacddddZZ
Note that Protein Blocs assignment is only possible for proteins (as its name suggests). As a consequence, processed PDB files must contain protein structures only (please remove any other molecule). In addition, the PDB parser implemented here is pretty straightforward. Be sure your PDB files complies with the ATOM field of the PDB format.
Usage¶
Here’s the PBassign
help text.
Usage: PBassign [options] -p file.pdb|dir [-p file2.pdb] -o output_root_name -g gro_file -x xtc_file
optional arguments:
-h, --help show this help message and exit
-p P name of a pdb file or name of a directory containing pdb
files
-o O name for results
-v, --version show program's version number and exit
other options to handle molecular dynamics trajectories:
-x X name of the topology file
-g G name of the trajectory file
-p
option¶
can be used several times. For instance:
$ wget https://files.rcsb.org/view/3ICH.pdb
$ wget https://files.rcsb.org/view/1BTA.pdb
$ wget https://files.rcsb.org/view/1AY7.pdb
$ PBassign -p 3ICH.pdb -p 1BTA.pdb -p 1AY7.pdb -o test1
3 PDB file(s) to process
Read 1 chain(s) in 3ICH.pdb
Read 1 chain(s) in 1BTA.pdb
Read 2 chain(s) in 1AY7.pdb
wrote test1.PB.fasta
All PB assignments are written in the same output file. If a PDB file contains several chains
and/or models, PBs assignments are also written in a single output file.
From the previous example, the ouput of test1.PB.fasta
is:
>3ICH.pdb | chain A
ZZccdfbdcdddddehjbdebjcdddddfklmmmlmmmmmmmmnopnopajeopacfbdc
ehibacehiamnonopgocdfkbjbdcdfblmbccfbghiacdddebehiafkbccddfb
dcfklgokaccfbdcfbhklmmmmmmmpccdfkopafbacddfbgcddddfbacddddZZ
>1BTA.pdb | chain A
ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmm
mngoilmmmmmmmmmmmmnopacdcddZZ
>1AY7.pdb | chain A
ZZbjadfklmcfklmmmmmmmmnnpaafbfkgopacehlnomaccddehjaccdddddeh
klpnbjadcdddfbehiacddfegolaccdddfkZZ
>1AY7.pdb | chain B
ZZcddfklpcbfklmmmmmmmmnopafklgoiaklmmmmmmmmpacddddddehkllmmm
mnnommmmmmmmmmmmmmnopacddddZZ
One can also use the -p
option to provide a directory containing PDB files as an input.
PBassign
will process all PDB files located in the PBdata directory:
$ wget https://files.rcsb.org/view/1AY7.pdb -P demo
$ wget https://files.rcsb.org/view/2LFU.pdb -P demo
$ wget https://files.rcsb.org/view/3ICH.pdb -P demo
$ wget https://files.rcsb.org/view/1BTA.pdb -P demo
$ PBassign -p demo/ -o test2
4 PDB file(s) to process
Read 1 chain(s) in demo/3ICH.pdb
Read 2 chain(s) in demo/1AY7.pdb
Read 1 chain(s) in demo/1BTA.pdb
Read 10 chain(s) in demo/2LFU.pdb
wrote test2.PB.fasta
-x
and -g
options¶
Warning
These options use the MDAnalysis library which is installed by PBxplore.
Instead using the -p
option, protein structures could come from a molecular dynamics simulation trajectory file.
For this, you have to specify a trajectory file with the -x
option and a topology file with the -g
option.
It will accept any trajectory file format handled by the MDAnalysis library. See their [table of supported formats](https://pythonhosted.org/MDAnalysis/documentation_pages/coordinates/init.html#id1) for the full list.
Here an example with GROMACS files.
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.gro
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.xtc
$ PBassign -x psi_md_traj.xtc -g psi_md_traj.gro -o psi_md_traj
Frame 1/225.
Frame 100/225.
Frame 200/225.
Frame 225/225.
wrote psi_md_traj.PB.fasta
If needed, you can download psi_md_traj.PB.fasta
[here](https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.PB.fasta).
Tips’n tricks¶
To flatten the PB sequences obtained in FASTA format, i.e. get PB sequences in a single line each, one solution could be:
$ wget https://files.rcsb.org/view/1AY7.pdb
$ PBassign -p 1AY7.pdb -o 1AY7
$ cat 1AY7.PB.fasta | sed "s/^>.*/\t/" | tr -d "\n" | tr "\t" "\n" > 1AY7.PB.flat
Content of 1AY7.PB.flat :
ZZbjadfklmcfklmmmmmmmmnnpaafbfkgopacehlnomaccddehjaccdddddehklpnbjadcdddfbehiacddfegolaccdddfkZZ
ZZcddfklpcbfklmmmmmmmmnopafklgoiaklmmmmmmmmpacddddddehkllmmmmnnommmmmmmmmmmmmmnopacddddZZ
PBcount¶
PBcount
computes the frequency of PBs at each position along the amino acid sequence.
Note
- The following examples require
psi_md_traj.PB.fasta
and psi_md_traj2.PB.fasta
obtained withPBassign
:
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_1.gro
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_1.xtc
$ PBassign -x psi_md_traj_1.xtc -g psi_md_traj_1.gro -o psi_md_traj_1
Frame 1/225.
Frame 100/225.
Frame 200/225.
Frame 225/225.
wrote psi_md_traj_1.PB.fasta
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_2.gro
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_2.xtc
$ PBassign -x psi_md_traj_2.xtc -g psi_md_traj_2.gro -o psi_md_traj_2
Frame 1/225.
Frame 100/225.
Frame 200/225.
Frame 225/225.
wrote psi_md_traj_2.PB.fasta
If needed, you can download [psi_md_traj_1.PB.fasta](https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_1.PB.fasta) and [psi_md_traj_2.PB.fasta](https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_2.PB.fasta).
Example¶
$ PBcount -f psi_md_traj_1.PB.fasta -o psi_md_traj_1
read 225 sequences in psi_md_traj_1.PB.fasta
wrote psi_md_traj_1.PB.count
Content of psi_md_traj_1.PB.count:
a b c d e f g h i j k l m n o p
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 4 0 221 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 5 0 0 0 0 220 0 0 0 0 0
[snip]
51 0 0 0 0 0 56 0 98 0 0 71 0 0 0 0 0
52 0 56 0 0 0 0 0 0 94 3 3 69 0 0 0 0
53 144 0 60 2 0 0 0 0 0 0 0 0 1 0 0 18
54 0 0 225 0 0 0 0 0 0 0 0 0 0 0 0 0
55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Note that residues 1, 2, 55 and 56 have a null count of all PBs. These residues are the first and last residues of the structure and no PB can be assigned to them.
If needed, you can download [psi_md_traj_1.PB.count](https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_1.PB.count).
Usage¶
Here’s the PBcount
help text.
usage: PBcount [-h] -f F -o O [--first-residue FIRST_RESIDUE]
Compute PB frequency along protein sequence.
optional arguments:
-h, --help show this help message and exit
-f F name(s) of the PBs file (in fasta format)
-o O name for results
--first-residue FIRST_RESIDUE
define first residue number (1 by default)
-v, --version show program's version number and exit
-f option¶
can be used several times:
$ PBcount -f psi_md_traj_1.PB.fasta -f psi_md_traj_2.PB.fasta -o psi_md_traj_all
read 225 sequences in psi_md_traj_1.PB.fasta
read 225 sequences in psi_md_traj_2.PB.fasta
wrote psi_md_traj_all.PB.count
If needed, you can download [psi_md_traj_all.PB.count](https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_all.PB.count).
–first-residue option¶
By default, the number of the first residue is 1, this option allows to adjust the number associated to the first residue (and to the followings automaticaly).
$ PBcount --first-residue 5 -f psi_md_traj_1.PB.fasta -o psi_md_traj_1_shifted
read 225 sequences in psi_md_traj_1.PB.fasta
wrote psi_md_traj_1_shifted.PB.count
Content of psi_md_traj_1_shifted.PB.count:
a b c d e f g h i j k l m n o p
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 4 0 221 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 5 0 0 0 0 220 0 0 0 0 0
9 0 222 0 0 0 0 0 0 0 0 0 3 0 0 0 0
10 6 0 201 0 0 5 0 0 0 0 0 0 12 0 1 0
[snip]
If needed, you can download [psi_md_traj_1_shifted.PB.count](https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_1_shifted.PB.count).
PBstat¶
PBstat
generates frequency and logo plots, and estimates something similar
to entropy called the equivalent number of PBs (Neq).
Note
The following examples require psi_md_traj_all.PB.count
obtained with PBassign and then PBcount:
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_1.gro
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_1.xtc
$ PBassign -x psi_md_traj_1.xtc -g psi_md_traj_1.gro -o psi_md_traj_1
Frame 1/225.
Frame 100/225.
Frame 200/225.
Frame 225/225.
wrote psi_md_traj_1.PB.fasta
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_2.gro
$ wget https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_2.xtc
$ PBassign -x psi_md_traj_2.xtc -g psi_md_traj_2.gro -o psi_md_traj_2
Frame 1/225.
Frame 100/225.
Frame 200/225.
Frame 225/225.
wrote psi_md_traj_2.PB.fasta
$ PBcount -f psi_md_traj_1.PB.fasta -f psi_md_traj_2.PB.fasta -o psi_md_traj_all
read 225 sequences in psi_md_traj_1.PB.fasta
read 225 sequences in psi_md_traj_2.PB.fasta
wrote psi_md_traj_all.PB.count
Warning
To generate Weblogo-like representations, Weblogo3 is required.
If needed, you can download [psi_md_traj_all.PB.count](https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj_all.PB.count).
Example¶
$ PBstat -f psi_md_traj_all.PB.count --map --neq --logo -o psi_md_traj_all
Index of first residue in psi_md_traj_all.PB.count is 1
First residue in the output file(s) is 1
wrote psi_md_traj_all.PB.map.png
wrote psi_md_traj_all.PB.Neq
wrote psi_md_traj_all.PB.Neq.png
wrote psi_md_traj_all.PB.logo.png
Usage¶
Here’s the PBstat
help text.
usage: PBstat [-h] -f F -o O [--map] [--neq] [--logo]
[--image-format {pdf,png,jpg}] [--residue-min RESIDUE_MIN]
[--residue-max RESIDUE_MAX]
Statistical analysis and graphical representations of PBs.
optional arguments:
-h, --help show this help message and exit
-f F name of file that contains PBs frequency (count)
-o O name for results
--map generate map of the distribution of PBs along protein
sequence
--neq compute Neq and generate Neq plot along protein
sequence
--logo generate logo representation of PBs frequency along
protein sequence
--image-format {pdf,png,jpg}
File format for all image output.
--residue-min RESIDUE_MIN
defines lower bound of residue frame
--residue-max RESIDUE_MAX
defines upper bound of residue frame
-v, --version show program's version number and exit
–map option¶
generates map of the distribution of PBs along protein sequence.
$ PBstat -f psi_md_traj_all.PB.count --map -o psi_md_traj_all
Index of first residue in psi_md_traj_all.PB.count is 1
First residue in the output file(s) is 1
wrote psi_md_traj_all.PB.map.png

Distribution of PBs
The color range goes from red to blue. For a given position in the protein sequence, blue corresponds to a null frequency (meaning the particular PB is never met a this position) and red corresponds to a frequency of 1 (meaning the particular PB is always found at this position).
–neq option¶
computes Neq and generates Neq plot along protein sequence.
$ PBstat -f psi_md_traj_all.PB.count --neq -o psi_md_traj_all
Index of first residue in psi_md_traj_all.PB.count is 1
First residue in the output file(s) is 1
wrote psi_md_traj_all.PB.Neq
wrote psi_md_traj_all.PB.Neq.png
Content of psi_md_traj_all.PB.Neq:
resid Neq
1 1.00
2 1.00
3 1.90
4 1.91
5 2.87
6 2.30
[snip]

Neq versus residue number
–logo option¶
generates WebLogo-like representation of PBs frequency along protein sequence.
Warning
This option requires Weblogo3.
$ PBstat -f psi_md_traj_all.PB.count --logo -o psi_md_traj_all
Index of first residue is: 1
wrote psi_md_traj_all.PB.logo.png

Logo representation of PBs frequency
–residue-min and –residue-max options¶
These options define the lower and upper bound of residue frame.
$ PBstat -f psi_md_traj_all.PB.count --map --neq --logo -o psi_md_traj_all_frame --residue-min 15 --residue-max 42
Index of first residue in psi_md_traj_all.PB.count is 1
First residue in the output file(s) is 15
wrote psi_md_traj_all_frame.PB.map.15-42.png
wrote psi_md_traj_all_frame.PB.Neq.15-42
wrote psi_md_traj_all_frame.PB.Neq.15-42.png
wrote psi_md_traj_all_frame.PB.logo.15-42.png

PBs distribution with residue frame

Neq versus residue number with residue frame

Logo representation of PBs frequency
–image-format option¶
All figure can be produced in either PNG, PDF, or JPEG format. The –image-format option allows to control the file format of the image outputs.
API Cookbook¶
This page provides some examples, tutorials as notebooks to help you use the API of PBxplore
.
Basically, PBxplore is structured around 3 modules:
PBxplore.structure¶
This module handles the reading of PDB files and trajectory files. Its 2 major functions chain_from_files() and chain_from_trajectory are direclty exposed at the package level.
Look at the notebook PB assignation for further information.
PBxplore.io¶
This module is about the I/O of all files other than PDB. It handles the reading/writing of fasta files and the writing of text-like analysis files (Neq files).
Look at the notebook Writing PB in file for further information.
PBxplore.analysis¶
This module handle all analysis functions and ploting functions. You can:
- generate map of the distribution of PBs along protein sequence with plot_map().
- compute Neq with compute_neq() and generate the plot with plot_neq().
- generate WebLogo-like representation of PBs frequency along protein sequence with generate_weblogo().
Look at the notebook Visualize protein deformability for further information.
Note
generate_weblogo() requires Weblogo3.
PB assignation¶
Table of Contents
We hereby demonstrate how to use the API to assign PB sequences.
[1]:
from pprint import pprint
import urllib.request
import os
# print date & versions
import datetime
print("Date & time:",datetime.datetime.now())
import sys
print("Python version:", sys.version)
Date & time: 2021-01-27 13:55:14.789953
Python version: 3.7.9 (default, Oct 19 2020, 15:13:17)
[GCC 7.5.0]
[2]:
import pbxplore as pbx
print("PBxplore version:", pbx.__version__)
PBxplore version: 1.4.0
Use the built-in structure parser¶
Assign PB for a single structure¶
The pbxplore.chains_from_files()
function is the prefered way to read PDB and PDBx/mmCIF files using PBxplore. This function takes a list of file path as argument, and yield each chain it can read from these files. It provides a single interface to read PDB and PDBx/mmCIF files, to read single model and multimodel files, and to read a single file of a collection of files.
Here we want to read a single file with a single model and a single chain. Therefore, we need the first and only record that is yield by pbxplore.chains_from_files()
. This record contains a name for the chain, and the chain itself as a pbxplore.structure.structure.Chain
object. Note that, even if we want to read a single file, we need to provide it as a list to pbxplore.chains_from_files()
.
[3]:
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/1BTA.pdb', '1BTA.pdb')
structure_reader = pbx.chains_from_files([pdb_name])
chain_name, chain = next(structure_reader)
print(chain_name)
print(chain)
1BTA.pdb | chain A
Chain A / model : 1434 atoms
Protein Blocks are assigned based on the dihedral angles of the backbone. So we need to calculate them. The pbxplore.structure.structure.Chain.get_phi_psi_angles()
methods calculate these angles and return them in a form that can be directly provided to the assignement function.
The dihedral angles are returned as a dictionnary. Each key of this dictionary is a residue number, and each value is a dictionary with the phi and psi angles.
[4]:
dihedrals = chain.get_phi_psi_angles()
pprint(dihedrals)
{1: {'phi': None, 'psi': -171.65563134448553},
2: {'phi': -133.80467711845586, 'psi': 153.74322760775027},
3: {'phi': -134.6617568892695, 'psi': 157.30476083095581},
4: {'phi': -144.49159910635183, 'psi': 118.59706956501033},
5: {'phi': -100.12866913978127, 'psi': 92.98634825528089},
6: {'phi': -83.48980457968895, 'psi': 104.23730726195485},
7: {'phi': -64.77163869310709, 'psi': -43.25159835828049},
8: {'phi': -44.47885842536954, 'psi': -25.89184262616925},
9: {'phi': -94.90790101955957, 'psi': -47.182577907117775},
10: {'phi': -41.312671692330014, 'psi': 133.73743399231304},
11: {'phi': -119.15122785547305, 'psi': -11.82789586402356},
12: {'phi': -174.2119655293399, 'psi': 175.87239770676175},
13: {'phi': -56.61341695443224, 'psi': -45.74767617535588},
14: {'phi': -50.78226415072095, 'psi': -45.3742585970337},
15: {'phi': -57.93584481869442, 'psi': -43.329444361460844},
16: {'phi': -55.20960354113049, 'psi': -56.47559202715399},
17: {'phi': -64.51979885245254, 'psi': -18.577118068149446},
18: {'phi': -70.24273354141474, 'psi': -55.153744337676926},
19: {'phi': -65.20648546633555, 'psi': -41.28370221159946},
20: {'phi': -58.98821952110768, 'psi': -35.78957701447905},
21: {'phi': -66.8659714296852, 'psi': -42.14634696303375},
22: {'phi': -67.34201665142825, 'psi': -57.40438549689628},
23: {'phi': -52.297936091413874, 'psi': -66.09120830346023},
24: {'phi': -61.19010445362886, 'psi': -14.807316930892625},
25: {'phi': 54.95158694420636, 'psi': 47.59528477656777},
26: {'phi': -69.51531755580697, 'psi': 161.10806531443862},
27: {'phi': -57.36300935545188, 'psi': -179.66365615297644},
28: {'phi': -79.91369005407893, 'psi': -18.494472196394668},
29: {'phi': -93.51717329199727, 'psi': 13.802530546559774},
30: {'phi': -38.40653214238887, 'psi': 105.85297788366393},
31: {'phi': -64.01559307951965, 'psi': -5.507357757886723},
32: {'phi': 50.06519606710965, 'psi': 3.6047302867544038},
33: {'phi': -84.83560576923662, 'psi': -176.048770127013},
34: {'phi': -76.65985981150652, 'psi': -36.89428882663367},
35: {'phi': -66.20745817863622, 'psi': -36.19018119951471},
36: {'phi': -80.76844188891471, 'psi': -55.88509876949212},
37: {'phi': -45.0995601497454, 'psi': -50.82304368319501},
38: {'phi': -58.512419169182465, 'psi': -56.4318511704347},
39: {'phi': -44.00775783983471, 'psi': -26.06209153795419},
40: {'phi': -79.6799641005731, 'psi': -51.3827703817916},
41: {'phi': -58.80532943671335, 'psi': -49.46425322450557},
42: {'phi': -75.73059711071141, 'psi': 3.9162670655637033},
43: {'phi': -177.14613562249534, 'psi': 60.46495675947553},
44: {'phi': 177.1265816932884, 'psi': -66.62887199130637},
45: {'phi': -58.436100708193806, 'psi': 149.59997847317612},
46: {'phi': -102.66050573267097, 'psi': 132.43212727859543},
47: {'phi': -114.5213275524662, 'psi': 169.33012343233455},
48: {'phi': -61.39150617820462, 'psi': 136.7035538929314},
49: {'phi': -113.17589693608565, 'psi': 156.54195412530404},
50: {'phi': -117.26440335376822, 'psi': 138.51305036902693},
51: {'phi': -120.03410170277817, 'psi': 81.75707989178757},
52: {'phi': -77.60981590398825, 'psi': 83.18451037698443},
53: {'phi': -79.65858964180552, 'psi': 111.40143302647459},
54: {'phi': -100.37011629225776, 'psi': 150.03395825502497},
55: {'phi': 49.87330458406236, 'psi': 68.7419980340502},
56: {'phi': -73.87938409722335, 'psi': -66.7355521840301},
57: {'phi': -56.20534388077749, 'psi': -35.207843043514686},
58: {'phi': -66.38284564180043, 'psi': -32.21866387324769},
59: {'phi': -94.6778115344365, 'psi': 17.6861402216656},
60: {'phi': -111.48538994784963, 'psi': -38.097764578613976},
61: {'phi': -70.64502750557983, 'psi': -62.8582975880629},
62: {'phi': -33.50588994671665, 'psi': -32.02546270762565},
63: {'phi': -128.57384077349852, 'psi': 62.57927537310066},
64: {'phi': -12.365761900396421, 'psi': 106.99327496259977},
65: {'phi': 73.68588813063126, 'psi': 32.13155886020173},
66: {'phi': -89.05260862755028, 'psi': -69.16778908477181},
67: {'phi': -77.83088301001709, 'psi': -21.564910924673597},
68: {'phi': -71.32122280651765, 'psi': -21.859413182600065},
69: {'phi': -81.4118653034867, 'psi': -55.2935117883826},
70: {'phi': -52.047970110313145, 'psi': -43.22593946145588},
71: {'phi': -59.215594114973726, 'psi': -45.283196644537554},
72: {'phi': -52.67186926130671, 'psi': -38.127901315075064},
73: {'phi': -75.00963018964649, 'psi': -30.83999517691734},
74: {'phi': -74.69878930178584, 'psi': -35.042954979175136},
75: {'phi': -80.22740138668189, 'psi': -37.2721834868002},
76: {'phi': -63.3253002341084, 'psi': -46.736848174955014},
77: {'phi': -62.577975558265166, 'psi': -38.836376804396195},
78: {'phi': -58.4371262613883, 'psi': -30.932534133630554},
79: {'phi': -77.25603045197096, 'psi': -28.810984281581455},
80: {'phi': -65.77402807318447, 'psi': -6.587861693755372},
81: {'phi': 113.27162201541088, 'psi': -14.067924223417435},
82: {'phi': -63.856071155072016, 'psi': 160.46313493362328},
83: {'phi': -109.29442965951228, 'psi': 65.33016925110071},
84: {'phi': -94.29902268445335, 'psi': 87.93029438989075},
85: {'phi': -52.91938395571083, 'psi': 98.897475962567},
86: {'phi': -73.44769372512917, 'psi': 114.64881254410926},
87: {'phi': -114.16119204550665, 'psi': 101.24805765454327},
88: {'phi': -96.78933556699712, 'psi': 106.74340425527281},
89: {'phi': -109.02775603395978, 'psi': None}}
The dihedral angles can be provided to the pbxplore.assign()
function that assigns a Protein Block to each residue, and that returns the PB sequence as a string. Note that the first and last two residues are assigned to the Z
jocker block as some dihedral angles cannot be calculated.
[5]:
pb_seq = pbx.assign(dihedrals)
print(pb_seq)
ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmmmngoilmmmmmmmmmmmmnopacdcddZZ
Assign PB for several models of a single file¶
A single PDB file can contain several models. Then, we do not want to read only the first chain. Instead, we want to iterate over all the chains.
[6]:
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')
for chain_name, chain in pbx.chains_from_files([pdb_name]):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
print('* {}'.format(chain_name))
print(' {}'.format(pb_seq))
* 2LFU.pdb | model 1 | chain A
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
* 2LFU.pdb | model 2 | chain A
ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
* 2LFU.pdb | model 3 | chain A
ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
* 2LFU.pdb | model 4 | chain A
ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ
* 2LFU.pdb | model 5 | chain A
ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ
* 2LFU.pdb | model 6 | chain A
ZZmghbacfkbccdddddeehpacdddddddddfklggcdddddddddddddehiacadddddddddddfklopadddddehiaklnopcddddddeehiadddehjlnopacddddddfklmaccddddehiaehbgcdehiadddddddfehjlpcZZ
* 2LFU.pdb | model 7 | chain A
ZZcchbacfkbccdddddfehpacdddddddddfklggcdddddddddddddddehjapadddddddddfknopadddddfklmmmnopcddddddehjiddddddfknopacddddddfklpaccdddddfklmaacdfehpadddddehjblckknZZ
* 2LFU.pdb | model 8 | chain A
ZZcehjdeehiacdjdddedjbdcdddddddddfklggcdddddddddddddddbfblbacddddddddfklopacddddehiamlnopaddddddehjacddddfehpaaccdddddefklpaccdddddfklmbfbehehiaddddddffkgoiehZZ
* 2LFU.pdb | model 9 | chain A
ZZpccdjdfkbccdddddehhpacdddddddddfklggcdddddddddddddehiacbdcdddddddddfklopadddddehiammnopcddddddeejiadddehjlgobacddddddfklmpccddddehiacbcbdfehpadddddehjklmklmZZ
* 2LFU.pdb | model 10 | chain A
ZZccfklcfkbccdddddehjbdcdddddddddfklggcdddddddddddddehiapaccdddddddddfklopadddddehjamlnopaddddddehjddcdddfbfghpacddddddfklpaccddddddfbcfbacfehpadddddddekpghiaZZ
Read 10 chain(s) in 2LFU.pdb
Assign PB for a set of structures¶
The pbxplore.chains_from_files()
function can also handle several chains from several files.
[7]:
import glob
files = ['1BTA.pdb', '2LFU.pdb', '3ICH.pdb']
for pdb_name in files:
urllib.request.urlretrieve('https://files.rcsb.org/view/{0}'.format(pdb_name), pdb_name)
print('The following files will be used:')
pprint(files)
for chain_name, chain in pbx.chains_from_files(files):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
print('* {}'.format(chain_name))
print(' {}'.format(pb_seq))
The following files will be used:
['1BTA.pdb', '2LFU.pdb', '3ICH.pdb']
* 1BTA.pdb | chain A
ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmmmngoilmmmmmmmmmmmmnopacdcddZZ
Read 1 chain(s) in 1BTA.pdb
* 2LFU.pdb | model 1 | chain A
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
* 2LFU.pdb | model 2 | chain A
ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
* 2LFU.pdb | model 3 | chain A
ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
* 2LFU.pdb | model 4 | chain A
ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ
* 2LFU.pdb | model 5 | chain A
ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ
* 2LFU.pdb | model 6 | chain A
ZZmghbacfkbccdddddeehpacdddddddddfklggcdddddddddddddehiacadddddddddddfklopadddddehiaklnopcddddddeehiadddehjlnopacddddddfklmaccddddehiaehbgcdehiadddddddfehjlpcZZ
* 2LFU.pdb | model 7 | chain A
ZZcchbacfkbccdddddfehpacdddddddddfklggcdddddddddddddddehjapadddddddddfknopadddddfklmmmnopcddddddehjiddddddfknopacddddddfklpaccdddddfklmaacdfehpadddddehjblckknZZ
* 2LFU.pdb | model 8 | chain A
ZZcehjdeehiacdjdddedjbdcdddddddddfklggcdddddddddddddddbfblbacddddddddfklopacddddehiamlnopaddddddehjacddddfehpaaccdddddefklpaccdddddfklmbfbehehiaddddddffkgoiehZZ
* 2LFU.pdb | model 9 | chain A
ZZpccdjdfkbccdddddehhpacdddddddddfklggcdddddddddddddehiacbdcdddddddddfklopadddddehiammnopcddddddeejiadddehjlgobacddddddfklmpccddddehiacbcbdfehpadddddehjklmklmZZ
* 2LFU.pdb | model 10 | chain A
ZZccfklcfkbccdddddehjbdcdddddddddfklggcdddddddddddddehiapaccdddddddddfklopadddddehjamlnopaddddddehjddcdddfbfghpacddddddfklpaccddddddfbcfbacfehpadddddddekpghiaZZ
* 3ICH.pdb | chain A
ZZccdfbdcdddddehjbdebjcdddddfklmmmlmmmmmmmmnopnopajeopacfbdcehibacehiamnonopgocdfkbjbdcdfblmbccfbghiacdddebehiafkbccddfbdcfklgokaccfbdcfbhklmmmmmmmpccdfkopafbacddfbgcddddfbacddddZZ
Read 10 chain(s) in 2LFU.pdb
Read 1 chain(s) in 3ICH.pdb
Assign PB for frames in a trajectory¶
PB sequences can be assigned from a trajectory. To do so, we use the pbxplore.chains_from_trajectory()
function that takes the path to a trajectory and the path to the corresponding topology as argument. Any file formats readable by MDAnalysis can be used. Except for its arguments, pbxplore.chains_from_trajectory()
works the same as pbxplore.chains_from_files()
.
[8]:
topology, _ = urllib.request.urlretrieve('https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.gro',
'psi_md_traj.gro')
trajectory, _ = urllib.request.urlretrieve('https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.xtc',
'psi_md_traj.xtc')
for chain_name, chain in pbx.chains_from_trajectory(trajectory, topology):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
print('* {}'.format(chain_name))
print(' {}'.format(pb_seq))
Frame 1/225.
* psi_md_traj.xtc | frame 0
ZZfkbcnopabfklmmmmpckbccdfbfkbcghiaghidfklmmmmmpcfklccZZ
* psi_md_traj.xtc | frame 1
ZZfkbcnhpabfklmmmmpckbccdfbfkbcchiachidfklmmmmmbdfklccZZ
* psi_md_traj.xtc | frame 2
ZZfkbcnhpabfklmmmmcckbccdfbfkbcchiacdddfklmmmmmbdfklpcZZ
* psi_md_traj.xtc | frame 3
ZZfkbcnhpacfklmmmmpckbccdfbfkbcehiaedddfklmmmpccdfklpcZZ
* psi_md_traj.xtc | frame 4
ZZfkbmnopabfklmmmmmmmbccddbfkbcghiaehidfklmmmmmcdfklpcZZ
* psi_md_traj.xtc | frame 5
ZZfkbcnlpabfklmmmmpmkbccdfbfkbcchiacdddfklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 6
ZZfkbcnopabfklmmmnofkbccdfbfkbcchiacdddfklmmoccbdfklpcZZ
* psi_md_traj.xtc | frame 7
ZZfkbmnopabfklmmmmcfkbccdddfkbcchiaghidfklmmmccbdfklpcZZ
* psi_md_traj.xtc | frame 8
ZZfkbcnhpabfklmmmmockbccdfbfkbcehiaehiafklmmmmcbcfklccZZ
* psi_md_traj.xtc | frame 9
ZZfkbcnhpabfklmmmmcckbccdfbfkbcghiaehidfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 10
ZZfkbcnopabfklmmmmpmkbccdfbfklcghiaehidfklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 11
ZZfkbcghpabfklmmmnockbccdfbfcbcehiaedddfklmmmpcbdfklccZZ
* psi_md_traj.xtc | frame 12
ZZfkbanopabfklmmmmockbccdfbfkbcchiaehidfklmmmmmbdfklccZZ
* psi_md_traj.xtc | frame 13
ZZfkbcnhpabfklmmmmockbccdfbfkbcchiaghidfklmmmpccdfklccZZ
* psi_md_traj.xtc | frame 14
ZZfkbcnbpabfklmmmmockbccdfbfkbcchiaghidfklmmopfbdfklccZZ
* psi_md_traj.xtc | frame 15
ZZfkbcnlpabfklmmmmockbccdfbfklcchiacdddfklmmmpccdfklccZZ
* psi_md_traj.xtc | frame 16
ZZfkbcghpabfklmmmmmmkbccdfbfkbcehiachidfklmmmpcbdfklccZZ
* psi_md_traj.xtc | frame 17
ZZfkbanlpabfklmmmmpckbccdfbfkbcchiaedddfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 18
ZZfkbcnhpabfklmmmmomkbccdfbfkbcehiaehidfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 19
ZZfkbcnopabfklmmmmpmkbccdfbfkbcehiaehidfklmmmpmbdfklccZZ
* psi_md_traj.xtc | frame 20
ZZfkbcnhpabfklmmmmockbccdfbfkbaghiaedddfklmmmgfbafklccZZ
* psi_md_traj.xtc | frame 21
ZZfkbcnbpabfklmmmmcckbccdfbfkbcehiaghiafklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 22
ZZfkbcnlpabfklmmmmofkbccdfbfkbcchiaghiafklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 23
ZZfkbcnopabfklmmmmockbccdfbfkbcchiaehiafklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 24
ZZfkbcnbpabfklmmmmockbccddbfkbcchiaehidfklmmmpmbdfklccZZ
* psi_md_traj.xtc | frame 25
ZZfkbcgbpabfklmmmmockbccdfbfkbcchiachidfklmmmccbdfklpcZZ
* psi_md_traj.xtc | frame 26
ZZfkbcnbpabfklmmmmbgkbccdfbfkbcchiacdidfklmmmpmbdfklccZZ
* psi_md_traj.xtc | frame 27
ZZfkbcnopabfklmmmmpmkbccdfbfkbaghiaghiafklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 28
ZZfkbcnhpabfklmmmmockbccdfbfkbcghiaedidfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 29
ZZfkbcnhpabfklmmmmmmmbccdfbfkbcghiaehidfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 30
ZZfkbcnlpabfklmmmmmmmbccdfbfklcchiaehidfklmmmpfbdfklccZZ
* psi_md_traj.xtc | frame 31
ZZfkbonhpabfklmmmmmmmbccdfbfkbcghiaghidfklmmmmmbdfklccZZ
* psi_md_traj.xtc | frame 32
ZZfkbcfbpabfklmmmmmmmbccdfbdcbcehiaghidfklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 33
ZZfkbcnhpabfklmmmmmmkbccdfbfkbcchiaghidfklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 34
ZZfklcnopabfklmmmmbmmbccdfbfklcghiaghidfklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 35
ZZfklcnopabfklmmmmpfkbccdfbfkbcehiaehiafklmmmccbdfklccZZ
* psi_md_traj.xtc | frame 36
ZZfkbcnopabfklmmmnofkbccdfbfcbcehiacdddfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 37
ZZfklcnopabfklmmmmmmmbccdfbfklcghiaghidfklmmmcfbdfklccZZ
* psi_md_traj.xtc | frame 38
ZZfkbcnopabfklmmmmockbccdfbfkbcehiaehiafklmmmccbdfklpcZZ
* psi_md_traj.xtc | frame 39
ZZfkbcnopabfklmmmmmmmbccdfbfkbcghiaehiafklmmmpfbdfklccZZ
* psi_md_traj.xtc | frame 40
ZZfkbcnlpabfklmmmmockbccdddfklcchiaghiafklmmmpfbafklccZZ
* psi_md_traj.xtc | frame 41
ZZfkbcnopacfklmmmmmmkbccdddfkbcehiaghiafklmmmcfbdfklmcZZ
* psi_md_traj.xtc | frame 42
ZZfkbcnopabfklmmmnockbccdddfkbcchiaehidfklmmncfbdfklpcZZ
* psi_md_traj.xtc | frame 43
ZZfkbcnopabfklmmmmockbccdddfkbcchiaehidfklmmmgmbdfklpcZZ
* psi_md_traj.xtc | frame 44
ZZfkbcnhpabfklmmmmcfkbccdfbfkbcehiaghiafklmmmcfbdfklpcZZ
* psi_md_traj.xtc | frame 45
ZZfkbcnhpabfklmmmmmgcbccdfbfkbcchiaehidfklmmmccbdfkbccZZ
* psi_md_traj.xtc | frame 46
ZZfkbcnopabfklmmmmocbbccdfbdfbcehiaehidfklmmmgfbacklccZZ
* psi_md_traj.xtc | frame 47
ZZfkbcnbpabfklmmmmockbccdfbfkbcchiaehidfklmmoccbdfklccZZ
* psi_md_traj.xtc | frame 48
ZZfkbmnbpabfklmmmmockbccdfbfkbcchiaehiafklmmommbcfklccZZ
* psi_md_traj.xtc | frame 49
ZZfkbcnlpabfklmmmmcfkbccdfbfklcghiaehiafklmmmmmbcfklccZZ
* psi_md_traj.xtc | frame 50
ZZfkbcnopabfklmmmmockbccdfbfcbcehiacdidfklmmmmmpcfklpcZZ
* psi_md_traj.xtc | frame 51
ZZfkbcnopabfklmmmmockbccdfbfkbcchiaehidfklmmmmmbcfklccZZ
* psi_md_traj.xtc | frame 52
ZZfkbcnbpabfklmmmmomkbccdfbfklcghiaghidfklmmmmmmcfklccZZ
* psi_md_traj.xtc | frame 53
ZZfkbmnopcbfklmmmmpckbccdfbfklcghiaghiafklmmmmmccfklpcZZ
* psi_md_traj.xtc | frame 54
ZZfkbcnbpfbfklmmmmpckbccdddfkbcchiaghiafklmmmmmpcfklpcZZ
* psi_md_traj.xtc | frame 55
ZZfkbcnbpfbfklmmmmockbccdfbfkbcchiaehidfklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 56
ZZfkbmnopabfklmmmmcckbccdfbfkbcchiaehiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 57
ZZfkbcnbpabfklmmmmcfkbccdfbfkbcehiaehiafklmmmmmacdfbacZZ
* psi_md_traj.xtc | frame 58
ZZfkbcnhpabfklmmmmockbccdfbdfbachiaehiafklmmmmmmcfklccZZ
* psi_md_traj.xtc | frame 59
ZZfkbcnopabfklmmmmockbccdfbfkbcchiaghidfklmmmmmmccfbacZZ
* psi_md_traj.xtc | frame 60
ZZfkbcnbpabfklmmmmofkbccdfbfkbcehiaghidfklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 61
ZZfkbanbpfbfklmmmmockbccdfbfkbcehiaghidfklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 62
ZZfkbcnbpabfklmmmmomkbccdfbfkbachiaehiafklmmnomacdfbacZZ
* psi_md_traj.xtc | frame 63
ZZfkbcnbpabfklmmmmpmkbccdfbfkbcchiaehiafklmmmmnocchiacZZ
* psi_md_traj.xtc | frame 64
ZZfkbcnbpabfklmmmmpckbccdfbfkbcehiaehiafklmmmmmpccfkacZZ
* psi_md_traj.xtc | frame 65
ZZfkbcnbpcbfklmmmmpckbccdfbfkbcehiaghiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 66
ZZfkbcnopabfklmmmmockbccdfbfkbcchiaehiafklmmmmmbccfbccZZ
* psi_md_traj.xtc | frame 67
ZZfkbcnhpabfklmmmmcfkbccdfbfkbcehiaehiafklmmmmmbccfbccZZ
* psi_md_traj.xtc | frame 68
ZZfkbcnbpabfklmmmmpckbccdfbfkbcehiaehidfklmmmmmbccfbacZZ
* psi_md_traj.xtc | frame 69
ZZfkbcnbpabfklmmmmockbccdfbfkbcghiaehiafklmmmmmocfklccZZ
* psi_md_traj.xtc | frame 70
ZZfkbcnopabfklmmmmmmkbccdfbfkbcchiaehidfklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 71
ZZfkbcnbpabfklmmmmomkbccdfbfkbcehiaghidfklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 72
ZZfkbcfbpabfklmmmmofkbccdfbfcbcehiaehidfklmmmmmmccfbacZZ
* psi_md_traj.xtc | frame 73
ZZfkbcnbpabfklmmmmockbccdfbffbcchiaehiafklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 74
ZZfkbcnbpabfklmmmmcckbccdfbfklcghiaehiafklmmmmmcccfbdcZZ
* psi_md_traj.xtc | frame 75
ZZfkbcfhpabfklmmmmpmkbccdfbfklcchiaehiafklmmmmmccfklccZZ
* psi_md_traj.xtc | frame 76
ZZfkbcghpabfklmmmmmmmbccdfbfkbcchiaehiafklmmmmmpacfbacZZ
* psi_md_traj.xtc | frame 77
ZZfkbcchiabfklmmmmbfkbccdfbfkbcehiaehiafklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 78
ZZfkbcchiacfklmmmmmmkbccdfbfkbcehiaehiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 79
ZZfkbcghpabfklmmmmmmkbccdfbfkbcchiaehidfklmmmmmacfklpcZZ
* psi_md_traj.xtc | frame 80
ZZfkbcghpabfklmmmmmmkbccdfbfkbcghiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 81
ZZfkbcnhpabfklmmmmockbccdfbfkbcehiaghidfklmmmmmbcchiacZZ
* psi_md_traj.xtc | frame 82
ZZfkbcghpabfklmmmmmmcbccdfbfklcghiaghiafklmmmmmpcfklccZZ
* psi_md_traj.xtc | frame 83
ZZfkbcfbpabfklmmmmockbccdfbfklcghiaghidfklmmnomacdfbacZZ
* psi_md_traj.xtc | frame 84
ZZfkbcnbpabfklmmmmockbccdfbdfbgghiaehiafklmmmmmbccfbccZZ
* psi_md_traj.xtc | frame 85
ZZfkbcgbpacfklmmmmomkbccdfbfkbcehiaghiafklmmmmmpcfklccZZ
* psi_md_traj.xtc | frame 86
ZZfkbcghpacfklmmmmmmmbccdfbfkbcehiaehiafklmmmmmccfklccZZ
* psi_md_traj.xtc | frame 87
ZZfkbcnopabfklmmmmomkbccdfbdcbcehiaghiafklmmmmmacfklpcZZ
* psi_md_traj.xtc | frame 88
ZZfkbcfbpacfklmmmmmmmbccdfbfkbcghiaehiafklmmmmmacfklacZZ
* psi_md_traj.xtc | frame 89
ZZfkbcnopabfklmmmmomkbccdfbfklcghiaghiafklmmmmmmcfklpcZZ
* psi_md_traj.xtc | frame 90
ZZfkbcnhpacfklmmmmockbccdfbfkbcghiaghiafklmmmmmccfklccZZ
* psi_md_traj.xtc | frame 91
ZZffbcnopacfklmmmmockbccdfbfklaghiaehiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 92
ZZfkbcnlpacfklmmmmcckbccdfbfkbcghiaehidfklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 93
ZZdfbanhpacfklmmmmcfkbccdfbfkbcehiaghiafklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 94
ZZfkbcnhpacfklmmmmockbccdfbfkbcehiaehiafklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 95
ZZfkbcnopabfklmmmmmmmbccdfbfkbcehiaehiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 96
ZZfkbcnbpabfklmmmmockbccdfbfkbcehiaghiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 97
ZZfkbcnbpabfklmmmmmmkbccdfbfklcghiaehiafklmmmmmmccfbccZZ
* psi_md_traj.xtc | frame 98
ZZdfbfnbpfbfklmmmmmmkbccdfbfkbcchiaehidfklmmmmmpccklccZZ
* psi_md_traj.xtc | frame 99
ZZfkbcnbpabfklmmmncckbccdfbafbcchiaghidfklmmmmmbcehiacZZ
* psi_md_traj.xtc | frame 100
ZZfkbcnbpabfklmmmmmmkbccdfbfkbachiaghidfklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 101
ZZfkbcnbjabfklmmmmmmmbccdfbfklpghiaghidfklmmmmmpcfkbccZZ
* psi_md_traj.xtc | frame 102
ZZfkbcnbpcbfklmmmmockbccdfbfklcghiaehidfklmmmmmpccklccZZ
* psi_md_traj.xtc | frame 103
ZZfkbcnbpabfklmmmmockbccdfbfkbcchiaehidfklmmmmmccdhiacZZ
* psi_md_traj.xtc | frame 104
ZZfkbcnopacfklmmmmpckbccdfbfkbcehiaehiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 105
ZZfkbcnopabfklmmmmockbccdfbfklcchiaehidfklmmmmmbccfbacZZ
* psi_md_traj.xtc | frame 106
ZZfkbcnbpcbfklmmmmoccbccdfbfkbcghiaehidfklmmmmmbccfbacZZ
* psi_md_traj.xtc | frame 107
ZZfkbcnbjfbfklmmmmockbccdfbfkbcghiaehidfklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 108
ZZfkbckbpcbfklmmmmpmkbccdfbfkbcehiaghidfklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 109
ZZfkbcnbpabfklmmmmpmkbccdfbfklcghiaghiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 110
ZZfkbcnbpabfklmmmnockbccdfbfklcehiaghiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 111
ZZfkbcnbpabfklmmmmockbccdfbfklcghiaehiafklmmmnmpacfbacZZ
* psi_md_traj.xtc | frame 112
ZZfkbcfbpcbfklmmmmockbccdfbfklcghiaghiafklmmmmmpccfbacZZ
* psi_md_traj.xtc | frame 113
ZZfkbcnhpabfklmmmmpckbccdfbfklcchiaehiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 114
ZZfkbcehpabfklmmmmpmkbccdfbfkbcghiaghiafklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 115
ZZfkbcnbpabfklmmmmomkbccdfbfkbcehiaghiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 116
ZZfkbcnbpabfklmmmmomkbccdfbfklcghiaghiafklmmmmmcccfbccZZ
* psi_md_traj.xtc | frame 117
ZZfkbcfbpabfklmmmmomkbccdfbfkbcghiaghiafklmmmmmccdfbccZZ
* psi_md_traj.xtc | frame 118
ZZfkbcnbpcbfklmmmmockbccdfbfkbcghiaghiafklmmmmmpccfbpcZZ
* psi_md_traj.xtc | frame 119
ZZfkbcnopabfklmmmmockbccdfbfklcghiaehiafklmmmmmbccfbdcZZ
* psi_md_traj.xtc | frame 120
ZZfkbcnopabfklmmmmockbccdfbfkbcehiaehiafklmmmmmccdfbacZZ
* psi_md_traj.xtc | frame 121
ZZfkbcnbpabfklmmmnockbccdfbfklcehiaghiafklmmmmmpccfbccZZ
* psi_md_traj.xtc | frame 122
ZZfkbcnopabfklmmmmmckbccdfbfklcghiaghiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 123
ZZfkbcnopabfklmmmmbckbccdfbfkbcchiaghiafklmmnombcfklccZZ
* psi_md_traj.xtc | frame 124
ZZfkbcehpabfklmmmmcckbccdfbfkbcehiaehiafklmmmmmbcfklpcZZ
* psi_md_traj.xtc | frame 125
ZZfkbcnbpabfklmmmmockbccdfbffbcchiaghiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 126
ZZfkbcnopabfklmmmmockbccdfbfklcghiaehiafklmmmmmpccfbccZZ
* psi_md_traj.xtc | frame 127
ZZfkbcnbpabfklmmmmmmkbccdfbfkbcehiachiafklmmmmmmccfbacZZ
* psi_md_traj.xtc | frame 128
ZZfkbcnopabfklmmmmockbccdfbfbbcehiaghiafklmmmmmpccfbacZZ
Frame 100/225.
* psi_md_traj.xtc | frame 129
ZZfkbcnopabfklmmmmmmkbccdfbfbdcehiaehiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 130
ZZfkbcnopabfklmmmmockbccdfbfkbcghiaghiafklmmmmmcccfbacZZ
* psi_md_traj.xtc | frame 131
ZZfkbcnbpabfklmmmmockbccdfbfkbcehiaghiafklmmmmmmgcfkacZZ
* psi_md_traj.xtc | frame 132
ZZfkbmnlpabfklmmmmockbccdfbfkbcghiaehiafklmmmmmpcghiacZZ
* psi_md_traj.xtc | frame 133
ZZfkbcnopabfklmmmmbckbccdfbfcbcehiaehiafklmmmmmacehiacZZ
* psi_md_traj.xtc | frame 134
ZZfkbcnbpabfklmmmmockbccdfbfkbcehiaehidfklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 135
ZZfkbcnopabfklmmmmockbccdfbfkbcehiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 136
ZZfkbcnbpcbfklmmmmmmkbccdfbfkbcehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 137
ZZfkbmnopabfklmmmmcckbccdfbfkbcghiaghiafklmmmmmpgehiacZZ
* psi_md_traj.xtc | frame 138
ZZfkbcnbpabfklmmmmmmmbccdfblkbcehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 139
ZZfkbcnbpcbfklmmmmbmkbccdfbfkbcehiaehiafklmmmmmmcehjacZZ
* psi_md_traj.xtc | frame 140
ZZfkbfnopcbfklmmmmpmkbccdfbfkbcehiaghiafklmmmmmmgehiacZZ
* psi_md_traj.xtc | frame 141
ZZfkbcnbpcbfklmmmmmmmbccdfbfkbcehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 142
ZZfkbcnopabfklmmmmpmkbccdfbfkbcehiaehiafklmmmmmacehiacZZ
* psi_md_traj.xtc | frame 143
ZZfkbcnhpabfklmmmmomkbccdfbfkbcehiaghiafklmmmmmacehiacZZ
* psi_md_traj.xtc | frame 144
ZZfkbcnbpabfklmmmmmmkbccdfbfkbcehiachiafklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 145
ZZfkbcfbpabfklmmmmbckbccdfbdfbaehiaghiafklmmmmmacehiacZZ
* psi_md_traj.xtc | frame 146
ZZfkbcnbpcbfklmmmmockbccdfbdfbaehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 147
ZZfkbcnbpabfklmmmmockbccdfbdhiaghiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 148
ZZfkbcnbpcbfklmmmmomkbccdfbdcbdehiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 149
ZZfkbcnbpabfklmmmmcfkbccdfbfkbcehiaghiafklmmmmmpcghiacZZ
* psi_md_traj.xtc | frame 150
ZZfkbcnbpcbfklmmmmomkbccdfbfkbcehiaehiafklmmnombcghiacZZ
* psi_md_traj.xtc | frame 151
ZZfkbcnbpabfklmmmmockbccdfbfblcehiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 152
ZZfkbcnbpabfklmmmmockbccdfbfklcehiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 153
ZZfkbcnbpabfklmmmmpfkbccdfbfblcehiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 154
ZZfkbcnbpabfklmmmmpmkbccdfbfklcghiaehiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 155
ZZfkbcnbpabfklmmmmpmkbccdfbfklcghiaghiafklmmmmmpcghiacZZ
* psi_md_traj.xtc | frame 156
ZZfkbcnbpabfklmmmmpckbccdfbfklcghiaehiafklmmmmmbcehiacZZ
* psi_md_traj.xtc | frame 157
ZZfkbcnbpabfklmmmmofkbccdfbfkbcghiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 158
ZZfkbcnbpabfklmmmnockbccdfbfkbcehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 159
ZZfkbcnbpcbfklmmmmockbccdfbdfbaehiaehiafklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 160
ZZfkbcnbpabfklmmmmmmkbccdfbdfbachiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 161
ZZfkbcnbpabfklmmmmockbccdfbacbdehiaehiafklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 162
ZZfkbcnbpabfklmmmmockbccdfbfkbcehiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 163
ZZfkbcfbpcbfklmmmmpckbccdfbfkbcghiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 164
ZZfkbcnhpabfklmmmmomkbccdfbdfbaehiaehiafklmmmmmpcehkacZZ
* psi_md_traj.xtc | frame 165
ZZfkbcnopabfklmmmmockbccdfbdcbcehiaehiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 166
ZZfkbcnbpabfklmmmmcfkbccdfbfkbcghiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 167
ZZfkbcfbpfbfklmmmmockbccdfbfkbcghiaghiafklmmommpcehiacZZ
* psi_md_traj.xtc | frame 168
ZZfkbcfbpabfklmmmmockbccdfbfkbcghiaehiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 169
ZZfkbcfbpabfklmmmmockbccdfbfkbcghiaghiafklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 170
ZZfkbcfbpabfklmmmmockbccdfbfkbcghiachiafklmmmmmacehiacZZ
* psi_md_traj.xtc | frame 171
ZZfkbcnopabfklmmmmockbccdfbfklcehiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 172
ZZfkbcnopabfklmmmmockbccdfbfklcghiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 173
ZZfkbcnopabfklmmmmpckbccdfbfkbcehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 174
ZZfkbmnopabfklmmmmockbccdfbfklcehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 175
ZZfkbcnbpabfklmmmmockbccdfbfklcchiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 176
ZZfkbcnbpabfklmmmncfkbccdfbfkbcehiaehiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 177
ZZfkbcnopabfklmmmnockbccdfbfkbcghiaehiafklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 178
ZZfkbcnbpabfklmmmmcckbccdfbfkbcehiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 179
ZZfkbcnbpabfklmmmncfkbccdfbfkbcghiaghiafklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 180
ZZfkbcnopabfklmmmmpckbccdfbfklcehiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 181
ZZfkbfnbpcbfklmmmmockbccdfbdcbcehiacdddfklmmmmnpcehiacZZ
* psi_md_traj.xtc | frame 182
ZZfkbcnbpcbfklmmmmpmkbccdfbfkbcghiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 183
ZZfkbcnbpabfklmmmmockbccdfbfkbcehiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 184
ZZfkbcnbpabfklmmmmcckbccdfbfklcghiaehiafklmmommccehiacZZ
* psi_md_traj.xtc | frame 185
ZZfkbcnbpabfklmmmmpfkbccdfbfkbcehiaehiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 186
ZZfkbcnopabfklmmmmmmkbccdfbfklcghiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 187
ZZfkbcnopabfklmmmmockbccdfbfklcghiaghiafklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 188
ZZfkbcnbpabfklmmmmpmkbccdfbfkbcchiafbdcfklmmmmmpcehjacZZ
* psi_md_traj.xtc | frame 189
ZZfkbcnbpcbfklmmmmockbccdfbfcbcehiafbdcfklmmmmmbcehiacZZ
* psi_md_traj.xtc | frame 190
ZZfkbcnbpcbfklmmmmockbccdfbfklcghiafbdcfklmmmmmbcehiacZZ
* psi_md_traj.xtc | frame 191
ZZfkbmnbpcbfklmmmnockbccdfblbacehiafbdcfklmmnmmbcehiacZZ
* psi_md_traj.xtc | frame 192
ZZfkbcnbpfbfklmmmmpmkbccdfbfbdcehiafbdcfklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 193
ZZfkbcnbpcbfklmmmnomkbccdfbfklcghiacbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 194
ZZfkbmnbpabfklmmmmpmkbccdfbfkbcghiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 195
ZZfkbfnbpabfklmmmmpmkbccdfbfkbcehiafbdcfklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 196
ZZfkbfnbpabfklmmmmpmkbccdfbfkbcehiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 197
ZZfkbmnbpabfklmmmmockbccdfbfkbcehiafbdcfklmmmmmpcchiacZZ
* psi_md_traj.xtc | frame 198
ZZfkbmnbpcbfklmmmmpmkbccdfbdfbcehiafbdcfklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 199
ZZfkbcnbpabfklmmmnockbccdfbfkbcghiafbdcfklmmmmmpaehiacZZ
* psi_md_traj.xtc | frame 200
ZZfkbcnbpcbfklmmmmockbccdfbdcbcehiafbdcfklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 201
ZZfkbcnopabfklmmmmomkbccdfbfbdcehiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 202
ZZfkbcnopabfklmmmmomkbccdfbfkbcehiafbdcfklmmmmmpcghiacZZ
* psi_md_traj.xtc | frame 203
ZZfkbcnopabfklmmmmmmkbccdfbfbbcehiafbdcfklmmmmmpcghiacZZ
* psi_md_traj.xtc | frame 204
ZZfkbcnopabfklmmmmpmkbccdfbfbbcehiafbdcfklmmmmmpcghiacZZ
* psi_md_traj.xtc | frame 205
ZZfkbcnopabfklmmmmomkbccdfbfbbcehiafbdcfklmmmmmbcghiacZZ
* psi_md_traj.xtc | frame 206
ZZfkbcnopabfklmmmmpckbccdfbfkbgghiafbdcfklmmmmmpcehjacZZ
* psi_md_traj.xtc | frame 207
ZZfkbcnopacfklmmmmomkbccdfblkbcghiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 208
ZZfkbcfbpabfklmmmmockbccdfbfkbcghiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 209
ZZfkbcnbpabfklmmmmockbccdfbfkbcghiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 210
ZZfkbcnbpabfklmmmmockbccdfbfkbcghiafbdcfklmmmmmbcehiacZZ
* psi_md_traj.xtc | frame 211
ZZdfbanbpabfklmmmmockbccdfbfkbcehiafbdcfklmmmnmccehiacZZ
* psi_md_traj.xtc | frame 212
ZZfkbcnopabfklmmmnofkbccdfbfkbcehiafbdcfklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 213
ZZfkbcnbpabfklmmmmockbccdfbfkbcehiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 214
ZZfkbcnopabfklmmmmomkbccdfbfcbcehiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 215
ZZfkbcnopabfklmmmmpfkbccdfbfkbcchiacbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 216
ZZfkbcnopabfklmmmmockbccdfbfkbcehiacbdcfklmmmnmpaghiacZZ
* psi_md_traj.xtc | frame 217
ZZfkbcnbpabfklmmmmcckbccdfbfkbcehiafbdcfklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 218
ZZfkbcnhpabfklmmmmpckbccdfbfkbcchiafbdcfklmmmmmccehiacZZ
* psi_md_traj.xtc | frame 219
ZZfkbcnbpabfklmmmmpckbccdfbfkbachiafbdcfklmmmmmpcehiacZZ
* psi_md_traj.xtc | frame 220
ZZfkbcnbpabfklmmmnockbccdfbfkbcehiafbdcfklmmmnmpaghiacZZ
* psi_md_traj.xtc | frame 221
ZZdfbanbpabfklmmmnockbccdfblkbcehiafbdcfklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 222
ZZfkbcnopabfklmmmmcfkbccdfbfkbcehiafbdcfklmmmmmmcehiacZZ
* psi_md_traj.xtc | frame 223
ZZfkbcnopabfklmmmmcfkbccdfbfkbcehiafbdcfklmmmmmmcghiacZZ
* psi_md_traj.xtc | frame 224
ZZfkbcnbpabfklmmmmofkbccdfbfkbcghiafbdcfklmmmmmpcehiacZZ
Frame 200/225.
Frame 225/225.
Use a different structure parser¶
Providing the dihedral angles can be formated as expected by pbxplore.assign()
, the source of these angles does not matter. For instance, other PDB parser can be used with PBxplore.
BioPython¶
[9]:
import Bio
import Bio.PDB
import math
print("BioPython version:", Bio.__version__)
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')
for model in Bio.PDB.PDBParser().get_structure("2LFU", pdb_name):
for chain in model:
polypeptides = Bio.PDB.PPBuilder().build_peptides(chain)
for poly_index, poly in enumerate(polypeptides):
dihedral_list = poly.get_phi_psi_list()
dihedrals = {}
for resid, (phi, psi) in enumerate(dihedral_list, start=1):
if not phi is None:
phi = 180 * phi / math.pi
if not psi is None:
psi = 180 * psi / math.pi
dihedrals[resid] = {'phi': phi, 'psi': psi}
print(model, chain)
pb_seq = pbx.assign(dihedrals)
print(pb_seq)
BioPython version: 1.78
<Model id=0> <Chain id=A>
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
<Model id=1> <Chain id=A>
ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
<Model id=2> <Chain id=A>
ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
<Model id=3> <Chain id=A>
ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ
<Model id=4> <Chain id=A>
ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ
<Model id=5> <Chain id=A>
ZZmghbacfkbccdddddeehpacdddddddddfklggcdddddddddddddehiacadddddddddddfklopadddddehiaklnopcddddddeehiadddehjlnopacddddddfklmaccddddehiaehbgcdehiadddddddfehjlpcZZ
<Model id=6> <Chain id=A>
ZZcchbacfkbccdddddfehpacdddddddddfklggcdddddddddddddddehjapadddddddddfknopadddddfklmmmnopcddddddehjiddddddfknopacddddddfklpaccdddddfklmaacdfehpadddddehjblckknZZ
<Model id=7> <Chain id=A>
ZZcehjdeehiacdjdddedjbdcdddddddddfklggcdddddddddddddddbfblbacddddddddfklopacddddehiamlnopaddddddehjacddddfehpaaccdddddefklpaccdddddfklmbfbehehiaddddddffkgoiehZZ
<Model id=8> <Chain id=A>
ZZpccdjdfkbccdddddehhpacdddddddddfklggcdddddddddddddehiacbdcdddddddddfklopadddddehiammnopcddddddeejiadddehjlgobacddddddfklmpccddddehiacbcbdfehpadddddehjklmklmZZ
<Model id=9> <Chain id=A>
ZZccfklcfkbccdddddehjbdcdddddddddfklggcdddddddddddddehiapaccdddddddddfklopadddddehjamlnopaddddddehjddcdddfbfghpacddddddfklpaccddddddfbcfbacfehpadddddddekpghiaZZ
Writing PB in file¶
Table of Contents
The API allows to write all the files the command line tools can. This includes the outputs of PBassign. The functions to handle several file formats are available in the pbxplore.io
module.
[1]:
from pprint import pprint
import urllib.request
import os
# print date & versions
import datetime
print("Date & time:",datetime.datetime.now())
import sys
print("Python version:", sys.version)
Date & time: 2021-01-27 13:55:39.313685
Python version: 3.7.9 (default, Oct 19 2020, 15:13:17)
[GCC 7.5.0]
[2]:
import pbxplore as pbx
print("PBxplore version:", pbx.__version__)
PBxplore version: 1.4.0
Fasta files¶
The most common way to save PB sequences is to write them in a fasta file.
PBxplore allows two ways to write fasta files. The sequences can be written either all at once or one at a time. To write a batch of sequences at once, we need a list of sequences and a list of the corresponding sequence names. The writing function here is pbxplore.io.write_fasta()
.
[3]:
names = []
pb_sequences = []
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')
for chain_name, chain in pbx.chains_from_files([pdb_name]):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
names.append(chain_name)
pb_sequences.append(pb_seq)
pprint(names)
pprint(pb_sequences)
with open('output.fasta', 'w') as outfile:
pbx.io.write_fasta(outfile, pb_sequences, names)
['2LFU.pdb | model 1 | chain A',
'2LFU.pdb | model 2 | chain A',
'2LFU.pdb | model 3 | chain A',
'2LFU.pdb | model 4 | chain A',
'2LFU.pdb | model 5 | chain A',
'2LFU.pdb | model 6 | chain A',
'2LFU.pdb | model 7 | chain A',
'2LFU.pdb | model 8 | chain A',
'2LFU.pdb | model 9 | chain A',
'2LFU.pdb | model 10 | chain A']
['ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ',
'ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ',
'ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ',
'ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ',
'ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ',
'ZZmghbacfkbccdddddeehpacdddddddddfklggcdddddddddddddehiacadddddddddddfklopadddddehiaklnopcddddddeehiadddehjlnopacddddddfklmaccddddehiaehbgcdehiadddddddfehjlpcZZ',
'ZZcchbacfkbccdddddfehpacdddddddddfklggcdddddddddddddddehjapadddddddddfknopadddddfklmmmnopcddddddehjiddddddfknopacddddddfklpaccdddddfklmaacdfehpadddddehjblckknZZ',
'ZZcehjdeehiacdjdddedjbdcdddddddddfklggcdddddddddddddddbfblbacddddddddfklopacddddehiamlnopaddddddehjacddddfehpaaccdddddefklpaccdddddfklmbfbehehiaddddddffkgoiehZZ',
'ZZpccdjdfkbccdddddehhpacdddddddddfklggcdddddddddddddehiacbdcdddddddddfklopadddddehiammnopcddddddeejiadddehjlgobacddddddfklmpccddddehiacbcbdfehpadddddehjklmklmZZ',
'ZZccfklcfkbccdddddehjbdcdddddddddfklggcdddddddddddddehiapaccdddddddddfklopadddddehjamlnopaddddddehjddcdddfbfghpacddddddfklpaccddddddfbcfbacfehpadddddddekpghiaZZ']
Read 10 chain(s) in 2LFU.pdb
[4]:
!cat output.fasta
!rm output.fasta
>2LFU.pdb | model 1 | chain A
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcd
dddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddf
klmaccddddddfbgniaghiapaddddddfklnoambZZ
>2LFU.pdb | model 2 | chain A
ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghia
dddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddf
klpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
>2LFU.pdb | model 3 | chain A
ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpa
cddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddef
klpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
>2LFU.pdb | model 4 | chain A
ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddd
dddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddf
klmpccdddddddehiabghehiaddddddfklpccfkZZ
>2LFU.pdb | model 5 | chain A
ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadd
dddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddf
klmaccddddddfbgghiafehiadddddddfklpacfZZ
>2LFU.pdb | model 6 | chain A
ZZmghbacfkbccdddddeehpacdddddddddfklggcdddddddddddddehiacadd
dddddddddfklopadddddehiaklnopcddddddeehiadddehjlnopacddddddf
klmaccddddehiaehbgcdehiadddddddfehjlpcZZ
>2LFU.pdb | model 7 | chain A
ZZcchbacfkbccdddddfehpacdddddddddfklggcdddddddddddddddehjapa
dddddddddfknopadddddfklmmmnopcddddddehjiddddddfknopacddddddf
klpaccdddddfklmaacdfehpadddddehjblckknZZ
>2LFU.pdb | model 8 | chain A
ZZcehjdeehiacdjdddedjbdcdddddddddfklggcdddddddddddddddbfblba
cddddddddfklopacddddehiamlnopaddddddehjacddddfehpaaccdddddef
klpaccdddddfklmbfbehehiaddddddffkgoiehZZ
>2LFU.pdb | model 9 | chain A
ZZpccdjdfkbccdddddehhpacdddddddddfklggcdddddddddddddehiacbdc
dddddddddfklopadddddehiammnopcddddddeejiadddehjlgobacddddddf
klmpccddddehiacbcbdfehpadddddehjklmklmZZ
>2LFU.pdb | model 10 | chain A
ZZccfklcfkbccdddddehjbdcdddddddddfklggcdddddddddddddehiapacc
dddddddddfklopadddddehjamlnopaddddddehjddcdddfbfghpacddddddf
klpaccddddddfbcfbacfehpadddddddekpghiaZZ
Sequences can be written once at a time using the pbxplore.io.write_fasta_entry()
function.
[5]:
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')
with open('output.fasta', 'w') as outfile:
for chain_name, chain in pbx.chains_from_files([pdb_name]):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
pbx.io.write_fasta_entry(outfile, pb_seq, chain_name)
Read 10 chain(s) in 2LFU.pdb
[6]:
!cat output.fasta
!rm output.fasta
>2LFU.pdb | model 1 | chain A
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcd
dddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddf
klmaccddddddfbgniaghiapaddddddfklnoambZZ
>2LFU.pdb | model 2 | chain A
ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghia
dddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddf
klpaccdddddfklmlmgcdehiaddddddfklmmgopZZ
>2LFU.pdb | model 3 | chain A
ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpa
cddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddef
klpaccddddddfkgojbdfehpaddddddfkbccfbgZZ
>2LFU.pdb | model 4 | chain A
ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddd
dddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddf
klmpccdddddddehiabghehiaddddddfklpccfkZZ
>2LFU.pdb | model 5 | chain A
ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadd
dddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddf
klmaccddddddfbgghiafehiadddddddfklpacfZZ
>2LFU.pdb | model 6 | chain A
ZZmghbacfkbccdddddeehpacdddddddddfklggcdddddddddddddehiacadd
dddddddddfklopadddddehiaklnopcddddddeehiadddehjlnopacddddddf
klmaccddddehiaehbgcdehiadddddddfehjlpcZZ
>2LFU.pdb | model 7 | chain A
ZZcchbacfkbccdddddfehpacdddddddddfklggcdddddddddddddddehjapa
dddddddddfknopadddddfklmmmnopcddddddehjiddddddfknopacddddddf
klpaccdddddfklmaacdfehpadddddehjblckknZZ
>2LFU.pdb | model 8 | chain A
ZZcehjdeehiacdjdddedjbdcdddddddddfklggcdddddddddddddddbfblba
cddddddddfklopacddddehiamlnopaddddddehjacddddfehpaaccdddddef
klpaccdddddfklmbfbehehiaddddddffkgoiehZZ
>2LFU.pdb | model 9 | chain A
ZZpccdjdfkbccdddddehhpacdddddddddfklggcdddddddddddddehiacbdc
dddddddddfklopadddddehiammnopcddddddeejiadddehjlgobacddddddf
klmpccddddehiacbcbdfehpadddddehjklmklmZZ
>2LFU.pdb | model 10 | chain A
ZZccfklcfkbccdddddehjbdcdddddddddfklggcdddddddddddddehiapacc
dddddddddfklopadddddehjamlnopaddddddehjddcdddfbfghpacddddddf
klpaccddddddfbcfbacfehpadddddddekpghiaZZ
By default, the lines in fasta files are wrapped at 60 caracters as defined in pbxplore.io.fasta.FASTA_WIDTH
. Both pbxplore.io.write_fasta()
and pbxplore.io.write_fasta_entry()
have a width
optionnal argument that allows to control the wrapping.
[7]:
print(pb_sequences[0])
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
[8]:
with open('output.fasta', 'w') as outfile:
for width in (60, 70, 80):
pbx.io.write_fasta_entry(outfile, pb_sequences[0],
'width={} blocks'.format(width),
width=width)
[9]:
!cat output.fasta
!rm output.fasta
>width=60 blocks
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcd
dddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddf
klmaccddddddfbgniaghiapaddddddfklnoambZZ
>width=70 blocks
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddf
klopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniagh
iapaddddddfklnoambZZ
>width=80 blocks
ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopaddddd
fhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ
Dihedral angles¶
One needs the phi and psi dihedral angles to assign protein block sequences. Having these angles, it is sometime convenient to store them in a file. This can be done easily.
[10]:
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')
with open('output.phipsi', 'w') as outfile:
for chain_name, chain in pbx.chains_from_files([pdb_name]):
dihedral = chain.get_phi_psi_angles()
for res in sorted(dihedral):
phi = "{:8.2f}".format(dihedral[res]["phi"]) if dihedral[res]["phi"] else " None"
psi = "{:8.2f}".format(dihedral[res]["psi"]) if dihedral[res]["psi"] else " None"
print("{} {:6d} {} {} ".format(chain_name, res, phi, psi), file=outfile)
Read 10 chain(s) in 2LFU.pdb
Note it’s better to write the dihedral for each PDB/frame due to the high memory cost to store all of them in a list.
The output is formated with one line per residue. The first columns repeat the name given for the chain, then is the residue id followed by the phi and the psi angle. If an angle is not defined, ‘None’ is written instead.
[11]:
!head output.phipsi
!tail output.phipsi
!rm output.phipsi
2LFU.pdb | model 1 | chain A 276 None 44.77
2LFU.pdb | model 1 | chain A 277 -81.67 15.27
2LFU.pdb | model 1 | chain A 278 -79.03 -21.52
2LFU.pdb | model 1 | chain A 279 -144.43 64.08
2LFU.pdb | model 1 | chain A 280 -84.23 144.90
2LFU.pdb | model 1 | chain A 281 70.83 -64.01
2LFU.pdb | model 1 | chain A 282 -107.77 93.35
2LFU.pdb | model 1 | chain A 283 -71.42 108.03
2LFU.pdb | model 1 | chain A 284 -72.99 99.31
2LFU.pdb | model 1 | chain A 285 -92.93 2.32
2LFU.pdb | model 10 | chain A 426 -88.16 127.99
2LFU.pdb | model 10 | chain A 427 -107.40 -161.76
2LFU.pdb | model 10 | chain A 428 -72.12 -14.94
2LFU.pdb | model 10 | chain A 429 66.62 -81.17
2LFU.pdb | model 10 | chain A 430 -141.67 111.76
2LFU.pdb | model 10 | chain A 431 -69.37 141.19
2LFU.pdb | model 10 | chain A 432 76.08 -75.91
2LFU.pdb | model 10 | chain A 433 -149.23 167.34
2LFU.pdb | model 10 | chain A 434 -63.43 -27.06
2LFU.pdb | model 10 | chain A 435 -166.70 None
Read fasta files¶
We want to read sequences that we wrote in files. PBxplore provides a function to read fasta files: the pbxplore.io.read_fasta()
function.
[12]:
def pdb_to_fasta_pb(pdb_path, fasta_path):
"""
Write a fasta file with all the PB sequences from a PDB
"""
with open(fasta_path, 'w') as outfile:
for chain_name, chain in pbx.chains_from_files([pdb_path]):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
pbx.io.write_fasta_entry(outfile, pb_seq, chain_name)
# Write a fasta file
pdb_name, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')
pdb_to_fasta_pb(pdb_name, 'output.fasta')
# Read a list of headers and a list of sequences from a fasta file
names, sequences = pbx.io.read_fasta('output.fasta')
print('names:')
pprint(names)
print('sequences:')
pprint(sequences)
!rm output.fasta
Read 10 chain(s) in 2LFU.pdb
read 10 sequences in output.fasta
names:
['2LFU.pdb | model 1 | chain A',
'2LFU.pdb | model 2 | chain A',
'2LFU.pdb | model 3 | chain A',
'2LFU.pdb | model 4 | chain A',
'2LFU.pdb | model 5 | chain A',
'2LFU.pdb | model 6 | chain A',
'2LFU.pdb | model 7 | chain A',
'2LFU.pdb | model 8 | chain A',
'2LFU.pdb | model 9 | chain A',
'2LFU.pdb | model 10 | chain A']
sequences:
['ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ',
'ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ',
'ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ',
'ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ',
'ZZpaehiehkaccdddddehjbccdddddddddfklggcddddddddddddddfbhpadddddddddddfklopadddddehiamlmmpccdddddeehiadddddfbacddcddddddfklmaccddddddfbgghiafehiadddddddfklpacfZZ',
'ZZmghbacfkbccdddddeehpacdddddddddfklggcdddddddddddddehiacadddddddddddfklopadddddehiaklnopcddddddeehiadddehjlnopacddddddfklmaccddddehiaehbgcdehiadddddddfehjlpcZZ',
'ZZcchbacfkbccdddddfehpacdddddddddfklggcdddddddddddddddehjapadddddddddfknopadddddfklmmmnopcddddddehjiddddddfknopacddddddfklpaccdddddfklmaacdfehpadddddehjblckknZZ',
'ZZcehjdeehiacdjdddedjbdcdddddddddfklggcdddddddddddddddbfblbacddddddddfklopacddddehiamlnopaddddddehjacddddfehpaaccdddddefklpaccdddddfklmbfbehehiaddddddffkgoiehZZ',
'ZZpccdjdfkbccdddddehhpacdddddddddfklggcdddddddddddddehiacbdcdddddddddfklopadddddehiammnopcddddddeejiadddehjlgobacddddddfklmpccddddehiacbcbdfehpadddddehjklmklmZZ',
'ZZccfklcfkbccdddddehjbdcdddddddddfklggcdddddddddddddehiapaccdddddddddfklopadddddehjamlnopaddddddehjddcdddfbfghpacddddddfklpaccddddddfbcfbacfehpadddddddekpghiaZZ']
If the sequences we want to read are spread amongst several fasta files, then we can use the pbxplore.io.read_several_fasta()
function that takes a list of fasta file path as argument instead of a single path.
[13]:
# Write several fasta files
pdbname, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/1BTA.pdb', '1BTA.pdb')
pdb_to_fasta_pb(pdbname, '1BTA.fasta')
pdbname, _ = urllib.request.urlretrieve('https://files.rcsb.org/view/2LFU.pdb', '2LFU.pdb')
pdb_to_fasta_pb(pdb_name, '2FLU.fasta')
# Read the fasta files
names, sequences = pbx.io.read_several_fasta(['1BTA.fasta', '2FLU.fasta'])
# Print the first entries
print('names:')
pprint(names[:5])
print('sequences:')
pprint(sequences[:5])
!rm 1BTA.fasta 2FLU.fasta
Read 1 chain(s) in 1BTA.pdb
Read 10 chain(s) in 2LFU.pdb
read 1 sequences in 1BTA.fasta
read 10 sequences in 2FLU.fasta
names:
['1BTA.pdb | chain A',
'2LFU.pdb | model 1 | chain A',
'2LFU.pdb | model 2 | chain A',
'2LFU.pdb | model 3 | chain A',
'2LFU.pdb | model 4 | chain A']
sequences:
['ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmmmngoilmmmmmmmmmmmmnopacdcddZZ',
'ZZbghiacfkbccdddddehiadddddddddddfklggcdddddddddddddehifbdcddddddddddfklopadddddfhpamlnopcddddddehjadddddehjacbddddddddfklmaccddddddfbgniaghiapaddddddfklnoambZZ',
'ZZpcfblcffbccdddddeehjacdddddddddfklggcddddddddddddddfblghiadddddddddfklopadddddehpmmmnopcddddddeehiacdddfblopadcddddddfklpaccdddddfklmlmgcdehiaddddddfklmmgopZZ',
'ZZmgghiafbbccdddddehjbdcdddddddddfklggcddddddddddddddfbfghpacddddddddfklopadddddehiaklmmmgcdddddeehiaddddfkbgciacdddddefklpaccddddddfkgojbdfehpaddddddfkbccfbgZZ',
'ZZcghiacfkbacdddddfbhpacdddddddddfklmcfdddddddddddddehiacddddddddddddfknopadddddfkpamlnopaddddddehjaccdddfklnopacddddddfklmpccdddddddehiabghehiaddddddfklpccfkZZ']
Visualize protein deformability¶
Table of Contents
Protein Blocks are great tools to study protein deformability. Indeed, if the block assigned to a residue changes between two frames of a trajectory, it represents a local deformation of the protein rather than the displacement of the residue.
The API allows to visualize Protein Block variability throughout a molecular dynamics simulation trajectory.
[1]:
from pprint import pprint
from IPython.display import Image, display
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import urllib.request
import os
import numpy as np
# print date & versions
import datetime
print("Date & time:",datetime.datetime.now())
import sys
print("Python version:", sys.version)
print("Matplotlib version:", matplotlib.__version__)
Date & time: 2021-01-27 13:55:22.924405
Python version: 3.7.9 (default, Oct 19 2020, 15:13:17)
[GCC 7.5.0]
Matplotlib version: 3.3.2
[2]:
import pbxplore as pbx
print("PBxplore version:", pbx.__version__)
PBxplore version: 1.4.0
Here we will look at a molecular dynamics simulation of the barstar. As we will analyse Protein Block sequences, we first need to assign these sequences for each frame of the trajectory.
[3]:
# Assign PB sequences for all frames of a trajectory
topology, _ = urllib.request.urlretrieve('https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.gro',
'psi_md_traj.gro')
trajectory, _ = urllib.request.urlretrieve('https://raw.githubusercontent.com/pierrepo/PBxplore/master/demo_doc/psi_md_traj.xtc',
'psi_md_traj.xtc')
sequences = []
for chain_name, chain in pbx.chains_from_trajectory(trajectory, topology):
dihedrals = chain.get_phi_psi_angles()
pb_seq = pbx.assign(dihedrals)
sequences.append(pb_seq)
/home/docs/checkouts/readthedocs.org/user_builds/pbxplore/envs/latest/lib/python3.7/site-packages/MDAnalysis-1.0.0-py3.7-linux-x86_64.egg/MDAnalysis/coordinates/XDR.py:216: UserWarning: Reload offsets from trajectory
ctime or size or n_atoms did not match
warnings.warn("Reload offsets from trajectory\n "
Frame 1/225.
Frame 100/225.
Frame 200/225.
Frame 225/225.
Block occurences per position¶
The basic information we need to analyse protein deformability is the count of occurences of each PB for each position throughout the trajectory. This occurence matrix can be calculated with the pbxplore.analysis.count_matrix()
function.
[4]:
count_matrix = pbx.analysis.count_matrix(sequences)
count_matrix
is a numpy array with one row per PB and one column per position. In each cell is the number of time a position was assigned to a PB.
We can visualize count_matrix
using Matplotlib as any 2D numpy array.
[5]:
im = plt.imshow(count_matrix, interpolation='none', aspect='auto')
plt.colorbar(im)
plt.xlabel('Position')
plt.ylabel('Block')
[5]:
Text(0, 0.5, 'Block')

PBxplore provides the pbxplore.analysis.plot_map()
function to ease the visualization of the occurence matrix.
[6]:
pbx.analysis.plot_map('map.png', count_matrix)
!rm map.png

The pbxplore.analysis.plot_map()
helper has a residue_min
and a residue_max
optional arguments to display only part of the matrix. These two arguments can be pass to all PBxplore functions that produce a figure.
[7]:
pbx.analysis.plot_map('map.png', count_matrix,
residue_min=20, residue_max=30)
!rm map.png

Note that matrix in the the figure produced by pbxplore.analysis.plot_map()
is normalized so as the sum of each column is 1. The matrix can be normalized with the pbxplore.analysis.compute_freq_matrix()
.
[8]:
freq_matrix = pbx.analysis.compute_freq_matrix(count_matrix)
[9]:
im = plt.imshow(freq_matrix, interpolation='none', aspect='auto')
plt.colorbar(im)
plt.xlabel('Position')
plt.ylabel('Block')
[9]:
Text(0, 0.5, 'Block')

Protein Block entropy¶
The \(N_{eq}\) is a measure of variability based on the count matrix calculated above. It can be computed with the pbxplore.analysis.compute_neq()
function.
[10]:
neq_by_position = pbx.analysis.compute_neq(count_matrix)
neq_by_position
is a 1D numpy array with the \(N_{eq}\) for each residue.
[11]:
#Residus start by default at 1.
resids = np.arange(1,len(neq_by_position)+1)
plt.plot(resids, neq_by_position)
plt.xlabel('Position')
plt.ylabel('$N_{eq}$')
[11]:
Text(0, 0.5, '$N_{eq}$')

The pbxplore.analysis.plot_neq()
helper ease the plotting of the \(N_{eq}\).
[12]:
pbx.analysis.plot_neq('neq.png', neq_by_position)
!rm neq.png

The residue_min
and residue_max
arguments are available.
[13]:
pbx.analysis.plot_neq('neq.png', neq_by_position,
residue_min=20, residue_max=30)
!rm neq.png

Neq with RMSF¶
The \(N_{eq}\) and RMSF (Root Mean Square Fluctuation) can be plot together to highlight differences between flexible and rigid residues : the \(N_{eq}\) is a metric of deformability and flexibility whereas RMSF quantifies mobility.
Here an example of a plot with both metrics (You can adapt this code to your own need):
[14]:
# Let's assume you computed the RMSF (file rmsf.xvg)
# For this example, the rmsf was computed on the C-alpha and grouped by residue :
# g_rmsf -s psi_md_traj.gro -f psi_md_traj.xtc -res -o rmsf.xvg (Gromacs 4.6.7)
# Read rmsf file (ignore lines which start by '#@' and assume the data are in 2-column,
# first one the number of residue, 2nd one the rmsf
rmsf = np.array([line.split() for line in open("../../../demo_doc/rmsf.xvg") if not line[0] in '#@'], dtype=float)
#Generate 2 y-axes who share a same x-axis
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
#Left Axis
ax1.plot(rmsf[:,0], neq_by_position, color='#1f77b4', lw=2)
ax1.set_xlabel('Position')
ax1.set_ylabel('$N_{eq}$', color='#1f77b4')
ax1.set_ylim([0.0, 5])
#Right Axis
ax2.plot(rmsf[:,0], rmsf[:,1],color='#ff7f0e', lw=2)
ax2.set_ylim([0.0, 0.4])
ax2.set_ylabel('RMSF (nm)', color='#ff7f0e')
[14]:
Text(0, 0.5, 'RMSF (nm)')

We observe that the region 33-35 is rigid. The high values of RMSF we observed were due to flexible residues in the vicinity of the region 33-35, probably acting as hinges (residues 32 and 36–37). Those hinges, due to their flexibility, induced the mobility of the whole loop : the region 33-35 fluctuated but did not deform.
Display PB variability as a logo¶
[15]:
pbx.analysis.generate_weblogo('logo.png', count_matrix)
display(Image('logo.png'))
!rm logo.png

[16]:
pbx.analysis.generate_weblogo('logo.png', count_matrix,
residue_min=20, residue_max=30)
display(Image('logo.png'))
!rm logo.png

[ ]:
API Reference¶
This page provides the documentation generated from the source files.
Protein Blocks related data — pbxplore.PB
¶
Protein blocks definition¶
REFERENCES
The definition of each block as a dictionary; each key is a block name (as a lower case letter), and each value is a list of the dihedral angle values that define the block.NAMES
The names of all the protein blocks.
Substitution matrix¶
SUBSTITUTION_MATRIX_NAME
The absolute path to the file that contains the subtitution matrix
-
pbxplore.PB.
load_substitution_matrix
(name)[source]¶ Load PB substitution matrix.
The matrix must be 16x16.
Parameters: name (str) – Name of the file containing the PBs susbtitution matrix. Returns: mat – Array of floats. Return type: numpy array Raises: InvalidBlockError – encountered an unexpected PB
Protein structure manipulation — pbxplore.structure
¶
Constants¶
PDB_EXTENSIONS
- list of file extensions corresponding to PDB files
PDBx_EXTENSIONS
- list of file extensions corresponding to PDBx/mmCIF files
Functions¶
Objects¶
-
class
pbxplore.structure.structure.
Chain
[source]¶ Class to handle PDB chain
-
get_phi_psi_angles
()[source]¶ Compute phi and psi angles.
Returns: phi_psi_angles – Dict with residue number (int) as keys and a {'phi' : (float), 'psi' : (float)}
dictionnary as values.Return type: dict Raises: FloatingPointError – If the computation of angles produces NaN. Generally, it means there is some problem with the residue coordinates. Examples
>>> lines = ("ATOM 840 C ARG B 11 22.955 23.561 -4.012 1.00 28.07 C ", ... "ATOM 849 N SER B 12 22.623 24.218 -2.883 1.00 24.77 N ", ... "ATOM 850 CA SER B 12 22.385 23.396 -1.637 1.00 21.99 C ", ... "ATOM 851 C SER B 12 21.150 24.066 -0.947 1.00 32.67 C ", ... "ATOM 855 N ILE B 13 20.421 23.341 -0.088 1.00 30.25 N ") >>> >>> import pbxplore as pbx >>> ch = pbx.structure.structure.Chain() >>> for line in lines: ... at = pbx.structure.structure.Atom() ... at.read_from_PDB(line) ... ch.add_atom(at) ... >>> print(ch.get_phi_psi_angles()) {11: {'phi': None, 'psi': None}, 12: {'phi': -139.77684605036447, 'psi': 157.94348570201197}, 13: {'phi': None, 'psi': None}}
-
Protrein Block assignation — pbxplore.assignment
¶
-
pbxplore.assignment.
assign
(dihedrals, pb_ref={'a': [41.14, 75.53, 13.92, -99.8, 131.88, -96.27, 122.08, -99.68], 'b': [108.24, -90.12, 119.54, -92.21, -18.06, -128.93, 147.04, -99.9], 'c': [-11.61, -105.66, 94.81, -106.09, 133.56, -106.93, 135.97, -100.63], 'd': [141.98, -112.79, 132.2, -114.79, 140.11, -111.05, 139.54, -103.16], 'e': [133.25, -112.37, 137.64, -108.13, 133.0, -87.3, 120.54, 77.4], 'f': [116.4, -105.53, 129.32, -96.68, 140.72, -74.19, -26.65, -94.51], 'g': [0.4, -81.83, 4.91, -100.59, 85.5, -71.65, 130.78, 84.98], 'h': [119.14, -102.58, 130.83, -67.91, 121.55, 76.25, -2.95, -90.88], 'i': [130.68, -56.92, 119.26, 77.85, 10.42, -99.43, 141.4, -98.01], 'j': [114.32, -121.47, 118.14, 82.88, -150.05, -83.81, 23.35, -85.82], 'k': [117.16, -95.41, 140.4, -59.35, -29.23, -72.39, -25.08, -76.16], 'l': [139.2, -55.96, -32.7, -68.51, -26.09, -74.44, -22.6, -71.74], 'm': [-39.62, -64.73, -39.52, -65.54, -38.88, -66.89, -37.76, -70.19], 'n': [-35.34, -65.03, -38.12, -66.34, -29.51, -89.1, -2.91, 77.9], 'o': [-45.29, -67.44, -27.72, -87.27, 5.13, 77.49, 30.71, -93.23], 'p': [-27.09, -86.14, 0.3, 59.85, 21.51, -96.3, 132.67, -92.91]})[source]¶ Assign Protein Blocks.
Dihedral angles are provided as a dictionnary with one item per residue. The key is the residue number, and the value is a dictionnary with phi and psi as keys, and the dihedral angles as values.
The protein block definitions are provided as a dictionnary. Each key is a block name, the values are lists of dihedral angles.
Parameters: - dihedrals (dict) – Phi and psi dihedral angles for each residue.
- pb_ref (dict) – The definition of the protein blocks.
Input/Output in files — pbxplore.io
¶
Deals with writing and reading of files in various formats.
Fasta¶
-
pbxplore.io.
read_fasta
(name)[source]¶ Read fasta file and output sequences in a list.
Parameters: name (str) – Name of file containing sequences in fasta format. Returns: - header_lst (list) – List of headers (str)
- sequence_lst (list) – List of sequences (str)
-
pbxplore.io.
read_several_fasta
(input_files)[source]¶ Read several fasta files
Note that each fasta file may contain several sequences.
Parameters: input_files (a list of fasta file paths.) – Returns: - pb_name (a list of the headers)
- pb_seq (a list of the sequences)
-
pbxplore.io.
write_fasta
(outfile, sequences, comments)[source]¶ Write fasta entries (header + sequence) in an open file
Parameters: - name (file descriptor) – The file descriptor to write in. It must allow writing.
- header_lst (list) – List of headers (str)
- sequence_lst (list) – List of sequences (str)
-
pbxplore.io.
write_fasta_entry
(outfile, sequence, comment, width=60)[source]¶ Write a fasta entry (header + sequence) in an open file
Parameters: - name (file descriptor) – The file descriptor to write in. It must allow writing.
- sequence (str) – Sequence to format.
- comment (str) – Comment to make header of sequence.
- width (int) – The width of a line. FASTA_WIDTH by default.
Results af analyses¶
-
pbxplore.io.
write_count_matrix
(pb_count, outfile, first=1)[source]¶ Write a PB occurence matrix in a file.
Parameters: - pb_count (an occurence matrix as a 2D numpy array.) –
- outfile (an open file where to write the matrix.) –
- first (the residue number of the first position.) –
-
pbxplore.io.
write_neq
(outfile, neq_array, idx_first_residue=1, residue_min=1, residue_max=None)[source]¶ Write the Neq matrix in an open file
Parameters: - outfile (file descriptor) – The file descriptor to write in. It must allow writing.
- neq_array (numpy array) – a 1D array containing the neq values.
- idx_first_residue (int) – the index of the first residue in the array
- residue_min (int) – the lower bound of residue frame
- residue_max (int) – the upper bound of residue frame
Analyses – pbxplore.analysis
¶
Build occurence matrix¶
-
pbxplore.analysis.
count_matrix
(pb_seq)[source]¶ Count the occurences of each block at each position.
The occurence matrix has one row per sequence, and one column per block. The columns are ordered in as
pbxplore.PB.NAMES
.Parameters: pb_seq – a list of PB sequences. Returns: pb_count – The occurence matrix. Return type: numpy array Raises: pbxplore.PB.InvalidBlockError – encountered an unexpected PB
-
pbxplore.analysis.
read_occurence_file
(name)[source]¶ Read an occurence matrix from a file. It will return the matrix as a numpy array and the indexes of residues.
Parameters: name (str) – Name of the file. Returns: - count_mat (numpy array) – the occurence matrix without the residue number
- residues (list) – the list of residues indexes
Raises: ValueError – when something is wrong about the file
-
pbxplore.analysis.
plot_map
(fname, count_mat, idx_first_residue=1, residue_min=1, residue_max=None)[source]¶ Generate a map of the distribution of PBs along protein sequence from an occurence matrix.
Parameters: - fname (str) – The path to the file to write in
- count_mat (numpy array) – an occurence matrix returned by count_matrix.
- idx_first_residue (int) – the index of the first residue in the matrix
- residue_min (int) – the lower bound of the protein sequence
- residue_max (int) – the upper bound of the protein sequence
Compare protein block sequences¶
-
pbxplore.analysis.
compare
(header_lst, seq_lst, substitution_mat, fname)[source]¶ Command line wrapper for the comparison of all sequences with the first one
When the –compare option is given to the command line, the program compares all the sequences to the first one and writes these comparison as sequences of digits. These digits represent the distance between the PB in the target and the one in the reference at the same position. The digits are normalized in the [0; 9] range.
This function run the comparison, write the result in a fasta file, and display on screen informations about the process.
Parameters: - header_lst (list of strings) – The list of sequence headers ordered as the sequences
- seq_lst (list of strings) – The list of sequences ordered as the headers
- substitution_mat (numpy.array) – A substitution matrix expressed as similarity scores
- fname (str) – The output file name
Visualize deformability¶
-
pbxplore.analysis.
compute_neq
(count_mat)[source]¶ Compute the Neq for each residue from an occurence matrix.
Parameters: count_mat (numpy array) – an occurence matrix returned by count_matrix. Returns: neq_array – a 1D array containing the neq values Return type: numpy array
-
pbxplore.analysis.
plot_neq
(fname, neq_array, idx_first_residue=1, residue_min=1, residue_max=None)[source]¶ Generate the Neq plot along the protein sequence
Parameters: - fname (str) – The path to the file to write in
- neq_array (numpy array) – an array containing the neq value associated to the residue number
- idx_first_residue (int) – the index of the first residue in the array
- residue_min (int) – the lower bound of the protein sequence
- residue_max (int) – the upper bound of the protein sequence
-
pbxplore.analysis.
generate_weblogo
(fname, count_mat, idx_first_residue=1, residue_min=1, residue_max=None, title='')[source]¶ Generates logo representation of PBs frequency along protein sequence through the weblogo library.
The weblogo reference: G. E. Crooks, G. Hon, J.-M. Chandonia, and S. E. Brenner. ‘WebLogo: A Sequence Logo Generator.’ Genome Research 14:1188–90 (2004) doi:10.1101/gr.849004. http://weblogo.threeplusone.com/
Parameters: - fname (str) – The path to the file to write in
- count_mat (numpy array) – an occurence matrix returned by count_matrix.
- idx_first_residue (int) – the index of the first residue in the matrix
- residue_min (int) – the lower bound of residue frame
- residue_max (int) – the upper bound of residue frame
- title (str) – the title of the weblogo. Default is empty.
Utils¶
-
pbxplore.analysis.
substitution_score
(substitution_matrix, seqA, seqB)[source]¶ Compute the substitution score to go from
seqA
toseqB
Both sequences must have the same length.
The score is either expressed as a similarity or a distance depending on the substitution matrix.
-
pbxplore.analysis.
compute_freq_matrix
(count_mat)[source]¶ Compute a PB frequency matrix from an occurence matrix.
The frequency matrix has one row per sequence, and one column per block. The columns are ordered in as
pbxplore.PB.NAMES
.Parameters: count_mat (numpy array) – an occurence matrix returned by count_matrix
.Returns: freq – The frequency matrix Return type: numpy array
-
pbxplore.analysis.
compute_score_by_position
(score_mat, seq1, seq2)[source]¶ Computes substitution score between two sequences position per position
The substitution score can represent a similarity or a distance depending on the score matrix provided. The score matrix should be provided as a 2D numpy array with
score[i, j]
the score to swich the PB at the i-th position inpbxplore.PB.NAMES
to the PB at the j-th position inpbxplore.PB.NAMES
.The function returns the result as a list of substitution scores to go from
seq1
toseq2
for each position. Both sequences must have the same length.Note
The score to move from or to a Z block (dummy block) is always 0.
Raises: pbxplore.PB.InvalidBlockError – encountered an unexpected PB
Citation¶
If you use PBxplore, please cite this tool as:
Contact & Support¶
PBxplore is a research software and has been developped by:
- Pierre Poulain, “Mitochondria, Metals and Oxidative Stress” group, Institut Jacques Monod, UMR 7592, Univ. Paris Diderot, CNRS, Sorbonne Paris Cité, France.
- Jonathan Barnoud, University of Groningen, Groningen, The Netherlands.
- Hubert Santuz, Laboratoire de Biochimie Théorique, CNRS UPR 9080, Institut de Biologie Physico-Chimique, Paris, France.
- Alexandre G. de Brevern, DSIMB, INSERM, U 1134, INTS, Paris, France.
If you want to report a bug or request a feature, please use the GitHub issue system.
Licence¶
PBxplore is licensed under The MIT License.
[1] |
|