Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a list of oemols for PointMutationExecutor input instead of files #746

Merged
merged 6 commits into from
Oct 27, 2020

Conversation

glass-w
Copy link
Contributor

@glass-w glass-w commented Oct 26, 2020

This PR adapts the input of PointMutationExecutor to allow for a list of oemol objects to be used instead. Both @schallerdavid and I were having difficulty parsing ligand .sdf files containing a nitrogen that was perceived as needing to be tetrahedral at this line in perses.utils.openeye. This PR allows the user to bypass internal oemol preparation.

Changed:

  • PointMutationExecutor now takes inputs: .sdf, .pdb, [oemol].
  • ligand_file renamed to ligand_input to be more general.

An example script of how to prepare the [oemol] object is below, input names changed from my use case:

from perses.app.relative_point_mutation_setup import PointMutationExecutor
from perses.utils.openeye import generate_unique_atom_names
import oechem
from simtk import unit

ifs = oechem.oemolistream()
ifs.open('./ligand.sdf')
mol_list = [oechem.OEMol(mol) for mol in ifs.GetOEMols()]
mol = mol_list[0]

if len([atom.GetName() for atom in mol.GetAtoms()]) > len(set([atom.GetName() for atom in mol.GetAtoms()])):
        mol = generate_unique_atom_names(mol)
oechem.OEAssignAromaticFlags(mol, oechem.OEAroModelOpenEye)
oechem.OEAssignHybridization(mol)
oechem.OEAddExplicitHydrogens(mol)
oechem.OEPerceiveChiral(mol)

mol_list = [mol]
protein_file = './protein.pdb'

x = PointMutationExecutor(
protein_file,
'1',
'510', # an example residue number
'VAL', # an example mutation
ligand_input=mol_list,
ionic_strength=0.15 * unit.molar,
small_molecule_forcefields='openff-1.2.0'
)

@codecov
Copy link

codecov bot commented Oct 26, 2020

Codecov Report

Merging #746 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Copy link
Contributor

@zhang-ivy zhang-ivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominicrufa can you check my comments below to make sure they are correct? after that, @glass-w can make the changes

ligand_n_atoms = ligand_md_topology.n_atoms

elif isinstance(ligand_input, list): # list of oemol object(s)
ligand_mol = ligand_input[0] # assume only one molecule in the list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're assuming there is only molecule in the list, let's just allow the options for ligand_input to be a .pdb, .sdf or oemol object (passing a list thats always one elements feels unnecessary). You should be able to check if the ligand_input is an instance of oechem.OEMol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know if reading a small molecule as a .pdb is safe, come to think of it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the small molecule should not be passed as a pdb. if ligand_input is a protein, it should be passed as a pdb. this will be noted in function description (it's in one of my suggestions in this PR)

ligand_n_atoms = ligand_md_topology.n_atoms

elif isinstance(ligand_input, list): # list of oemol object(s)
ligand_mol = ligand_input[0] # assume only one molecule in the list
molecules.append(Molecule.from_openeye(ligand_mol, allow_undefined_stereo=False))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ligand_mol = generate_unique_atom_names(ligand_mol) is necessary to include before this line.. did you see any errors when you ran with this version of PointMutationExecutor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait, i just noticed generate_unique_atom_names is redundant.
it's called here:

molecule = generate_unique_atom_names(molecule)

so it doesn't need to be called again after createMolFromSDF

@@ -116,7 +116,7 @@ def __init__(self,
if phase == vacuum, then the complex will not be solvated with water; else, it will be solvated with tip3p
conduct_endstate_validation : bool, default True
whether to conduct an endstate validation of the hybrid topology factory
ligand_file : str, default None
ligand_input : str or list of oemol objects, default None
path to ligand of interest (i.e. small molecule or protein); .sdf or .pdb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this to:

path to ligand of interest: .pdb for protein and .sdf or oemol for small molecule

if isinstance(ligand_input, str):
if ligand_input.endswith('.sdf'): # small molecule
ligand_mol = createOEMolFromSDF(ligand_input, index=ligand_index)
ligand_mol = generate_unique_atom_names(ligand_mol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is redundant, as `generate_unique_atom_names is called here:

molecule = generate_unique_atom_names(molecule)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominicrufa can we remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. remove in the relative_point_mutation_setup.py, you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes in relative_point_mutation_setup.py

@zhang-ivy
Copy link
Contributor

@glass-w I noticed your script above that generates the oemol does not include the functionality covered by these lines in createOEMolFromSDF() ... Did you get any errors when running the above script? if not, it may be fine to not include the lines I linked (as long as when you inspect the old and new pdbs from the htf, the small molecule geometries look ok). but it is probably best practice to incorporate the lines i linked above in your script for generating the oemol (you'll just have to find a workaround that checks that carbons have tetrahedral stereochemistry and nitrogens are pyramidal)

@jchodera
Copy link
Member

This PR adapts the input of PointMutationExecutor to allow for a list of oemol objects to be used instead.

Instead of OEMols, can we use openforcefield.topology.Molecule objects?
These will be much more portable in the future.

Internally, we can use molecule.to_oemol() whenever we need an OEMol().

@dominicrufa
Copy link
Contributor

This PR adapts the input of PointMutationExecutor to allow for a list of oemol objects to be used instead.

Instead of OEMols, can we use openforcefield.topology.Molecule objects?
These will be much more portable in the future.

Internally, we can use molecule.to_oemol() whenever we need an OEMol().

is there anything lossy between oemols and OFF molecule objects? if not, then this is perfectly fine and can be added here.

there probably should be a test written somewhere here.

@jchodera
Copy link
Member

is there anything lossy between oemols and OFF molecule objects?

I believe some non-chemical information, such as SD tags, would be lost. But atom ordering and geometry should be preserved.

Is there any particular information you're worried about being lost, @dominicrufa? If so, we can check (or you can just ask on the OFF slack).

@glass-w
Copy link
Contributor Author

glass-w commented Oct 27, 2020

@zhang-ivy

Did you get any errors when running the above script?

I didn't get any errors when running the above script and the PointMutationExecutor successfully generated a series of htfs.

as long as when you inspect the old and new pdbs from the htf, the small molecule geometries look ok

Yes the geometries looked fine when visualising.

you'll just have to find a workaround that checks that carbons have tetrahedral stereochemistry and nitrogens are pyramidal

Yes this is a key issue when using OE3DToInternalStereo. It seems that for a subset of these ligands running OE3DToInternalStereo considers a nitrogen to be a stereocenter so then running has_undefined_stereocenters returns True. At the moment there doesn't seem to be a workaround (cc @schallerdavid).

I agree that if a user were to prepare their own OEMol object for parsing to PointMutationExecutor then they should apply both checks.

@glass-w
Copy link
Contributor Author

glass-w commented Oct 27, 2020

@jchodera

Instead of OEMols, can we use openforcefield.topology.Molecule objects?

Yes I think this would be an idea for the future. Perhaps we can raise this as an issue and generate a separate PR? Right now this change is needed for functionality with this set of ligands.

@@ -116,8 +116,8 @@ def __init__(self,
if phase == vacuum, then the complex will not be solvated with water; else, it will be solvated with tip3p
conduct_endstate_validation : bool, default True
whether to conduct an endstate validation of the hybrid topology factory
ligand_file : str, default None
path to ligand of interest (i.e. small molecule or protein); .sdf or .pdb
ligand_input : str or list of oemol objects, default None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glass-w list of oemol objects should be changed to oemol here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good catch!

@zhang-ivy
Copy link
Contributor

@glass-w have you re-run your example script above after making the changes today? just want to make sure that changing elif isinstance(ligand_input, oechem.OEMol): doesn't break anything

@glass-w
Copy link
Contributor Author

glass-w commented Oct 27, 2020

@zhang-ivy yes I've re-run in a batch job using this functionality:

from perses.app.relative_point_mutation_setup import PointMutationExecutor
from perses.utils.openeye import generate_unique_atom_names
import oechem
from simtk import unit

ifs = oechem.oemolistream()
ifs.open('./ligand.sdf')
mol_list = [oechem.OEMol(mol) for mol in ifs.GetOEMols()]
mol = mol_list[0]

if len([atom.GetName() for atom in mol.GetAtoms()]) > len(set([atom.GetName() for atom in mol.GetAtoms()])):
        mol = generate_unique_atom_names(mol)
oechem.OEAssignAromaticFlags(mol, oechem.OEAroModelOpenEye)
oechem.OEAssignHybridization(mol)
oechem.OEAddExplicitHydrogens(mol)
oechem.OEPerceiveChiral(mol)

protein_file = './protein.pdb'

x = PointMutationExecutor(
protein_file,
'1',
'510', # an example residue number
'VAL', # an example mutation
ligand_input=mol,
ionic_strength=0.15 * unit.molar,
small_molecule_forcefields='openff-1.2.0'
)

it runs and produces all of the relvant output files. Again the above script is for my use case but for general OEMol preparation the user should use the lines you mentioned in your earlier comment.

@dominicrufa
Copy link
Contributor

is there anything lossy between oemols and OFF molecule objects?

I believe some non-chemical information, such as SD tags, would be lost. But atom ordering and geometry should be preserved.

Is there any particular information you're worried about being lost, @dominicrufa? If so, we can check (or you can just ask on the OFF slack).

i think this is sufficient

Copy link
Contributor

@dominicrufa dominicrufa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants