This library is the code and a pre-trained model to predict NMR chemical shifts from protein structures and organic molecules. It relies on the nmrdata package which includes embeddings and NMR parameters.
To install from this github repo directly:
pip install nmrgnn@git+git://github.com/ur-whitelab/nmrgnn.git
Available commands are
nmrgnn eval-struct
to predict chemical shifts of structure via MDAnalysis library as coordinate readernmrgnn train
to train a modelnmrgnn hper
to tune hyperparametersnmrgnn eval-tfrecords
to evaluate model on records in format fromnmrdata
package
Note: This model is trained on models with no solvent, so remove that before use. For small molecules, the model was trained mostly on water solutions. You should only expect agreement in relative chemical shifts between atoms depending on your solvent and reference.
To predict NMR chemical shifts via the MDAnalysis library as a reader:
nmrgnn eval-struct [struct-file] [output-csv]
where struct-file
could be a pdb file or equivalent. Example:
nmrgnn eval-struct 108M.pdb 108M-predicted.csv
Available functions are
load_model
to load the included pre-trained model or specify a path to a trained modeluniverse2graph
to convert an MDAnalysis universe into a tuple of atoms, neighbor list, edges, inverse_degree.check_peaks
to estimate validity of predicted peaks
The example below predicts peaks and estimates (True/False
) if the peaks are valid. Examples of why peaks are
not valid are that the elements are not inlcuded in training data (e.g., oxygen shifts) or unusual chemistries or
you forgot to remove solvent.
import MDAnalysis as md
import nmrgnn
model = nmrgnn.load_model()
u = md.Universe('108M.pdb')
g = nmrgnn.universe2graph(u)
peaks = model(g)
# check_peaks only uses first element of tuple (atom identities)
confident = nmrgnn.check_peaks(g[0], peaks)
You should not trust peaks coming from model without checking
Please cite Predicting Chemical Shifts with Graph Neural Networks
@article{yang2020predicting,
title = {Predicting Chemical Shifts with Graph Neural Networks},
author = {Yang, Ziyue and Chakraborty, Maghesree and White, Andrew D},
journal = {bioRxiv},
year = {2020},
publisher = {Cold Spring Harbor Laboratory}
}
Here is the included model performance on proteins (P
prefix) and organic molecules (Mol
prefix). r
is correlation coefficient and rmsd
is root mean square deviation. These results vary from paper values because they are evaluated on whole proteins instead of 256 atom fragments.
N | baseline | |
---|---|---|
Mol-H-r | 307 | 0.9591749434360993 |
Mol-H-rmsd | 307 | 0.39710393617916234 |
P-C-r | 6701 | 0.864163 |
P-H-r | 7747 | 0.72265 |
P-N-r | 7640 | 0.890842 |
P-CA-r | 8305 | 0.97374 |
P-CB-r | 6827 | 0.990706 |
P-CD-r | 739 | 0.996123 |
P-CD1-r | 961 | 0.999515 |
P-CD2-r | 609 | 0.999223 |
P-CE-r | 340 | 0.991736 |
P-CE1-r | 261 | 0.958121 |
P-CE2-r | 173 | 0.943739 |
P-CE3-r | 37 | -0.215088 |
P-CG-r | 1674 | 0.998763 |
P-CG1-r | 589 | 0.93124 |
P-CG2-r | 839 | 0.829016 |
P-CH2-r | 43 | 0.158363 |
P-CZ-r | 125 | 0.984575 |
P-CZ2-r | 45 | 0.311805 |
P-CZ3-r | 37 | 0.164961 |
P-HA-r | 5565 | 0.839377 |
P-HA2-r | 462 | 0.495514 |
P-HA3-r | 449 | 0.262298 |
P-HB-r | 960 | 0.958713 |
P-HB2-r | 3427 | 0.901358 |
P-HB3-r | 3255 | 0.901234 |
P-HD1-r | 383 | 0.44733 |
P-HD11-r | 753 | 0.615756 |
P-HD12-r | 753 | 0.585852 |
P-HD13-r | 753 | 0.609181 |
P-HD2-r | 1043 | 0.988991 |
P-HD21-r | 428 | 0.617599 |
P-HD22-r | 428 | 0.651927 |
P-HD23-r | 428 | 0.605888 |
P-HD3-r | 637 | 0.95089 |
P-HE-r | 93 | 0.396258 |
P-HE1-r | 413 | 0.879142 |
P-HE2-r | 561 | 0.98963 |
P-HE3-r | 293 | 0.985685 |
P-HG-r | 389 | 0.810401 |
P-HG1-r | 11 | 0.0653286 |
P-HG11-r | 350 | 0.572609 |
P-HG12-r | 350 | 0.498696 |
P-HG13-r | 350 | 0.558426 |
P-HG2-r | 1317 | 0.867619 |
P-HG21-r | 936 | 0.689592 |
P-HG22-r | 936 | 0.674086 |
P-HG23-r | 936 | 0.662057 |
P-HG3-r | 1200 | 0.856177 |
P-HH-r | 1 | nan |
P-HH2-r | 51 | 0.217372 |
P-HZ-r | 134 | 0.407285 |
P-HZ2-r | 54 | 0.419415 |
P-HZ3-r | 45 | 0.318577 |
P-ND1-r | 9 | 0.184443 |
P-ND2-r | 173 | 0.320299 |
P-NE-r | 88 | 0.0135033 |
P-NE1-r | 64 | 0.0998792 |
P-NE2-r | 149 | 0.972614 |
P-NH1-r | 3 | -0.914066 |
P-NH2-r | 3 | -0.276087 |
P-NZ-r | 1 | nan |
P-C-rmsd | 6701 | 1.22819 |
P-H-rmsd | 7747 | 0.279766 |
P-N-rmsd | 7640 | 6.65505 |
P-CA-rmsd | 8305 | 1.3298 |
P-CB-rmsd | 6827 | 3.10571 |
P-CD-rmsd | 739 | 10.3192 |
P-CD1-rmsd | 961 | 2.74597 |
P-CD2-rmsd | 609 | 4.35399 |
P-CE-rmsd | 340 | 1.14623 |
P-CE1-rmsd | 261 | 4.69154 |
P-CE2-rmsd | 173 | 4.82229 |
P-CE3-rmsd | 37 | 3.0327 |
P-CG-rmsd | 1674 | 1.63828 |
P-CG1-rmsd | 589 | 1.558 |
P-CG2-rmsd | 839 | 1.87753 |
P-CH2-rmsd | 43 | 1.95861 |
P-CZ-rmsd | 125 | 4.32496 |
P-CZ2-rmsd | 45 | 1.22984 |
P-CZ3-rmsd | 37 | 1.99567 |
P-HA-rmsd | 5565 | 0.0903255 |
P-HA2-rmsd | 462 | 0.119584 |
P-HA3-rmsd | 449 | 0.234069 |
P-HB-rmsd | 960 | 0.103812 |
P-HB2-rmsd | 3427 | 0.10552 |
P-HB3-rmsd | 3255 | 0.117287 |
P-HD1-rmsd | 383 | 0.114696 |
P-HD11-rmsd | 753 | 0.0699893 |
P-HD12-rmsd | 753 | 0.0744762 |
P-HD13-rmsd | 753 | 0.0711484 |
P-HD2-rmsd | 1043 | 0.105893 |
P-HD21-rmsd | 428 | 0.0737762 |
P-HD22-rmsd | 428 | 0.0689306 |
P-HD23-rmsd | 428 | 0.0764191 |
P-HD3-rmsd | 637 | 0.0869007 |
P-HE-rmsd | 93 | 0.422132 |
P-HE1-rmsd | 413 | 0.376196 |
P-HE2-rmsd | 561 | 0.0861489 |
P-HE3-rmsd | 293 | 0.0855213 |
P-HG-rmsd | 389 | 0.118694 |
P-HG1-rmsd | 11 | 10.3704 |
P-HG11-rmsd | 350 | 0.0504736 |
P-HG12-rmsd | 350 | 0.0552385 |
P-HG13-rmsd | 350 | 0.0516929 |
P-HG2-rmsd | 1317 | 0.0654069 |
P-HG21-rmsd | 936 | 0.0634577 |
P-HG22-rmsd | 936 | 0.0650697 |
P-HG23-rmsd | 936 | 0.0679991 |
P-HG3-rmsd | 1200 | 0.0775636 |
P-HH-rmsd | 1 | 4.07231 |
P-HH2-rmsd | 51 | 0.0862706 |
P-HZ-rmsd | 134 | 0.147387 |
P-HZ2-rmsd | 54 | 0.13507 |
P-HZ3-rmsd | 45 | 0.083249 |
P-ND1-rmsd | 9 | 1576.13 |
P-ND2-rmsd | 173 | 6.56618 |
P-NE-rmsd | 88 | 231.589 |
P-NE1-rmsd | 64 | 4.51713 |
P-NE2-rmsd | 149 | 13.9975 |
P-NH1-rmsd | 3 | 5.76985 |
P-NH2-rmsd | 3 | 0.91028 |
P-NZ-rmsd | 1 | 165.069 |