Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter update #67

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Filter update #67

wants to merge 23 commits into from

Conversation

frederik-sandfort1
Copy link
Collaborator

@frederik-sandfort1 frederik-sandfort1 commented Aug 20, 2024

Closes #66 and #61 .

Additionally implements a (jsonable) DescriptorsFilter.

Also did some code cleaning and removed unnecessary lines of code without a breaking change

Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my first round reading. Will check again tomorrow in more detail.

molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/mol2mol/filter.py Show resolved Hide resolved
molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved
tests/test_elements/test_mol2mol/test_mol2mol_filter.py Outdated Show resolved Hide resolved
molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved
@c-w-feldmann c-w-feldmann self-requested a review August 22, 2024 11:43
@frederik-sandfort1 frederik-sandfort1 marked this pull request as draft August 22, 2024 12:55
@frederik-sandfort1 frederik-sandfort1 marked this pull request as ready for review August 22, 2024 15:19
@frederik-sandfort1
Copy link
Collaborator Author

@c-w-feldmann @JochenSiegWork

I have one mypy issue, which I do not understand.

Also three questions:

  • should we combine smarts and smiles filter within a single class? option usesmiles?
  • should we check the input patterns for valid smarts/smiles?
  • should we apply the same logic to ElementFilter?

@c-w-feldmann

This comment was marked as resolved.

Copy link
Collaborator

@c-w-feldmann c-w-feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my head is exploding, but beside the small additions in the doc string, nothing I can complain about. @JochenSiegWork , your turn.

Copy link
Collaborator

@JochenSiegWork JochenSiegWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there :)

molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/utils/value_conversions.py Outdated Show resolved Hide resolved
if not unique_elements.issubset(self.allowed_element_numbers):
forbidden_elements = unique_elements - self.allowed_element_numbers
to_process_value = (
Chem.AddHs(value) if 1 in self.allowed_element_numbers else value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicitly modifying the molecule does not seem right. I understand that this can make sense when you are interested in exact element counts in the molecules. However, the filter should just filter molecules and not implicitly modify them before filtering. I think the most consistent way would be to let the user explicitly add hydrogens before the filter, which would result in an additional pipeline element. However, I think a compromise would also be fine by adding a flag add_hs to the ElementFilter's constructor to let the user choose.

@c-w-feldmann what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I disagree... If the Use adds the element number 1 in allowed_elements, I am pretty sure he would like to get the Hydrogens counted... The returned molecules do NOT have the added hydrogens, because this creates a new mol object

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, and I understand your point. I have 3 aspects I am unhappy with:

  • It's not explicit but implicit, and I advocate the "Explicit is Better Than Implicit" principle. However, this might also be solved with good, explicit docstrings that you can add.
  • The molecules might already have been prepared with Hydrogens added. Calling Chem.AddHs again adds additional computation for every molecule.
  • I am not sure about our use cases, but for docking, for example, the users typically prepare the protonation states of their molecules with some other tool beforehand. In which case readding hydrogen might mess with the prepared molecules. Is something similar also possible in our use cases?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, could we check, whether hydrogens are there already? Or should we alternatively log a warning here if hydrogens are requested but the count is 0 and tell people to e.g. use an AddHs Pipeline module? (not sure whether this exists?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the three of us should talk about it together. This is a more conceptual decision we need to make.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default in most cases is only implicit hydrogens, but I recognize the need that the user might want to keep the molecule as is. I would propose to add the attribute add_hydrogens, default:True, which adds Hs if Hs are counted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the attribute, included this and added a test (if set to false and a count of Hydrogens is included, I print a warning, that this will yield unexpected results (because the hydrogen number is 0 in this case)).

I wanted to test the logging (I already did via debugger), but also wanted to add a capturing of the logs. I tried this: loguru log capturing, but this did not work unfortunately...

@c-w-feldmann please, check this again

molpipeline/mol2mol/filter.py Show resolved Hide resolved
molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved
tests/test_elements/test_mol2mol/test_mol2mol_filter.py Outdated Show resolved Hide resolved
tests/test_elements/test_mol2mol/test_mol2mol_filter.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@frederik-sandfort1 frederik-sandfort1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved most, and some comments. Please have a look

molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
if not unique_elements.issubset(self.allowed_element_numbers):
forbidden_elements = unique_elements - self.allowed_element_numbers
to_process_value = (
Chem.AddHs(value) if 1 in self.allowed_element_numbers else value
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I disagree... If the Use adds the element number 1 in allowed_elements, I am pretty sure he would like to get the Hydrogens counted... The returned molecules do NOT have the added hydrogens, because this creates a new mol object

molpipeline/utils/value_conversions.py Outdated Show resolved Hide resolved
tests/test_elements/test_mol2mol/test_mol2mol_filter.py Outdated Show resolved Hide resolved
tests/test_elements/test_mol2mol/test_mol2mol_filter.py Outdated Show resolved Hide resolved
@frederik-sandfort1
Copy link
Collaborator Author

@c-w-feldmann @JochenSiegWork I added an additional filter (ComplexFilter) which can be initiatied with multiple moltomol elements and uses the keep matches logic

I also had a look on auto2mol - I was a bit puzzled about serializability (there you also just set the elements as attributes, no get_params / set_params functionality given. Please have a look on the implementation especially regarding serializability

Copy link
Collaborator

@JochenSiegWork JochenSiegWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just two small comments

molpipeline/utils/value_conversions.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/utils/value_conversions.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
@JochenSiegWork
Copy link
Collaborator

JochenSiegWork commented Sep 12, 2024

@c-w-feldmann @JochenSiegWork

I have one mypy issue, which I do not understand.

Also three questions:

  • should we combine smarts and smiles filter within a single class? option usesmiles?

Strong yes. I think we should combine SMARTS and SMILES filter within a single class, because every SMILES is a valid SMARTS. Effectively a list of SMILES is a list of SMARTS.

  • should we check the input patterns for valid smarts/smiles?

No. But since you pre-compute the molecules for the SMARTS/SMILES now this is done automatically during the construction of the filter. So you already included this feature now.

  • should we apply the same logic to ElementFilter?

We talked about this. We can in the future because it should fit but we don't have any real use case right now. So postpone.

molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
if not unique_elements.issubset(self.allowed_element_numbers):
forbidden_elements = unique_elements - self.allowed_element_numbers
to_process_value = (
Chem.AddHs(value) if 1 in self.allowed_element_numbers else value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the three of us should talk about it together. This is a more conceptual decision we need to make.

molpipeline/mol2mol/filter.py Show resolved Hide resolved
molpipeline/utils/value_conversions.py Outdated Show resolved Hide resolved
molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
molpipeline/abstract_pipeline_elements/mol2mol/filter.py Outdated Show resolved Hide resolved
if not unique_elements.issubset(self.allowed_element_numbers):
forbidden_elements = unique_elements - self.allowed_element_numbers
to_process_value = (
Chem.AddHs(value) if 1 in self.allowed_element_numbers else value
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default in most cases is only implicit hydrogens, but I recognize the need that the user might want to keep the molecule as is. I would propose to add the attribute add_hydrogens, default:True, which adds Hs if Hs are counted.

molpipeline/mol2mol/filter.py Outdated Show resolved Hide resolved


class ComplexFilter(_BaseKeepMatchesFilter):
"""Filter to keep or remove molecules based on multiple filter elements.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fancy!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check the get and set params of the subfilters.
To ensure the correct passing of sub-parameters I am afraid that we would need to provide them as a tuple in the init:
(name, object): (lower, upper)
name: str
object: filter
lower: float| int
upper: float| int

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should quickly discuss on monday - I am not onehundred percent sure, how this actually helps

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would help during the get/set_params section, as the sklearn schema does not support giving parameters to objects based on their position in a list. I feel this will be needed to save/load the object in a json file. See comment to the test below.

filtered_smiles_2 = pipeline.fit_transform(SMILES_LIST)
self.assertEqual(filtered_smiles_2, [SMILES_BENZENE, SMILES_CHLOROBENZENE])

def test_complex_filter(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need a test_get_set_params_complex_filter.

I would advocate for implementing one TestClass for each FilterClass. @JochenSiegWork what do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just filled the existing structure, but would be good with one class per FilterClass...

What would you like the test_get_set_params_complex_filter to test exactly? Also, would like to point here where there is just the object included

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The critical part is if we can write it as a json file and load the the object from the file.
A nice to have would be if the parameters of the subfilters can be accessed via the get/set_params from the filter, à la filter.set_params(subfiltername__allowed_element_numbers=[...])

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the write as json file work (where is the code reference)? I do not see the direct relation of get/set_params and save as json (also, see the link in my previous comment, probably this is the same issue)

tests/test_elements/test_mol2mol/test_mol2mol_filter.py Outdated Show resolved Hide resolved
@c-w-feldmann
Copy link
Collaborator

c-w-feldmann commented Sep 30, 2024

@frederik-sandfort1

  • smiles filter within a single class? option usesmiles?

I would prefer to keep them in separate classes.

  • should we check the input patterns for valid smarts/smiles?

Yes, we should check the input, and raise an Error of the mol-object is None.

  • should we apply the same logic to ElementFilter?

What would be an invalid element? Or do you mean the same filterlogic? I think if it was possible to filer element with the code from the BaseFilter, we should do it.

@c-w-feldmann
Copy link
Collaborator

@c-w-feldmann @JochenSiegWork
I have one mypy issue, which I do not understand.
Also three questions:

  • should we combine smarts and smiles filter within a single class? option usesmiles?

Strong yes. I think we should combine SMARTS and SMILES filter within a single class, because every SMILES is a valid SMARTS. Effectively a list of SMILES is a list of SMARTS.

  • should we check the input patterns for valid smarts/smiles?

No. But since you pre-compute the molecules for the SMARTS/SMILES now this is done automatically during the construction of the filter. So you already included this feature now.

  • should we apply the same logic to ElementFilter?

We talked about this. We can in the future because it should fit but we don't have any real use case right now. So postpone.

I love that I gave completely different answers than Jochen :D
Maybe we need to sit together and have a chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Smarts filter
3 participants