Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HybridCompatibilityMixin to handle RESTCapableHybridTopologyFactory and add repex self-consistency tests #992

Merged
merged 14 commits into from
Jun 2, 2022

Conversation

zhang-ivy
Copy link
Contributor

Description

  • Updated HybridCompatilityMixin to automatically detect which type of factory (HybridTopologyFactory or RESTCapableHybridTopologyFactory) was inputted and create a sampler depending on the factory type.
  • Added self-consistency tests for dipeptide systems (running repex with the new RESTCapableHybridTopologyFactory):
    • Neutral mutation: ALA->THR vs THR->ALA
    • Charge changing: ARG->ALA->ARG vs LYS->ALA->LYS (see docstring of test for explanation on why these particular transformations are necessary)

TODO:

  • I put the tests in test_samplers.py -- not sure if this is the right place -- may want to move it somewhere else and/or make it an example?
  • Not sure if the tests in test_samplers.py are even working -- should we delete everything else in here?
  • n_iterations should be bumped up to 1000 and it should be run on a gpu (@mikemhenry : Can you help with this?)

Motivation and context

Resolves #984

How has this been tested?

Change log


@zhang-ivy zhang-ivy changed the title Update repex sampler and add tests Update HybridCompatibilityMixin to handle RESTCapableHybridTopologyFactory and add repex self-consistency tests May 3, 2022
@mikemhenry
Copy link
Contributor

Could we run a few iterations on the CPU but skip the validation checks, and run the 1000 iterations on the GPU? An advantage of just doing a few iterations on the CPU is it will help find API/other run time errors faster

@zhang-ivy
Copy link
Contributor Author

zhang-ivy commented May 4, 2022 via email

Copy link
Contributor

@ijpulidos ijpulidos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments.

On the other hand, in my opinion these tests should be examples, and we are already testing the examples in test_examples.py, there's probably a bit of tweaking to be done in order to get this run in the new GPU CI, since the examples predates it. These tests are not really testing "units" but rather complete workflows/simulations. I can make the changes for this to be converted as examples, we also need examples with repex.

perses/tests/test_samplers.py Outdated Show resolved Hide resolved
perses/tests/test_samplers.py Outdated Show resolved Hide resolved
perses/tests/test_samplers.py Outdated Show resolved Hide resolved
zhang-ivy and others added 2 commits May 6, 2022 11:10
Co-authored-by: Iván Pulido <ivanpulido@protonmail.com>
Co-authored-by: Iván Pulido <ivanpulido@protonmail.com>
@jchodera
Copy link
Member

jchodera commented May 8, 2022

Can this make it into the 0.10.0 release, or will it have to wait until 0.11.0?

@zhang-ivy
Copy link
Contributor Author

@jchodera : This can probably make it into 0.10.0, we just need to decide whether we want to keep the tests I wrote as tests or convert them to examples. Was thinking we could discuss this tomorrow at the dev sync.

After we make a decision, we might need @mikemhenry to help specify which tests should be run on cpu vs gpu.

@zhang-ivy
Copy link
Contributor Author

I should also note that the new tests I added in this PR are the reason why the GHA workflow has failures. This is because the tests are only running 2 cycles of repex, which is surely not enough to get the free energies to be equal and opposite. We'll need to move this test to the gpu to get the free energies to match.

@zhang-ivy
Copy link
Contributor Author

Notes from dev sync:

  • We need to determine the smallest number of replicas and iterations necessary to get the repex examples to converge.
  • Make this an example (not a test) -- which will be marked as gpu only and will run this once a week on aws (or lilac??) to catch potential regressions.

mikemhenry and others added 2 commits May 23, 2022 12:32
Co-authored-by: Iván Pulido <ivanpulido@protonmail.com>
@ijpulidos
Copy link
Contributor

I keep getting the following error traceback when running the counterion mutation test using 4 or more iterations (2 work):

Traceback (most recent call last):
  File "/lila/data/chodera/pulidoi/sandbox/perses-992/test_REST_counterion_mutation.py", line 79, in <module>
    f_ij, df_ij = analyzer.get_free_energy()
  File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1932, in get_free_energy
    self._compute_free_energy()
  File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1892, in _compute_free_energy
    (Deltaf_ij, dDeltaf_ij) = self.mbar.getFreeEnergyDifferences()
  File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/mbar.py", line 541, in getFreeEnergyDifferences
    Theta_ij = self._computeAsymptoticCovarianceMatrix(
  File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/mbar.py", line 1679, in _computeAsymptoticCovarianceMatrix
    check_w_normalized(W, N_k)
  File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/utils.py", line 358, in check_w_normalized
    raise ParameterError(
pymbar.utils.ParameterError: Warning: Should have \sum_n W_nk = 1.  Actual column sum for state 0 was 0.635215. 12 other columns have similar problems

@zhang-ivy
Copy link
Contributor Author

zhang-ivy commented May 25, 2022 via email

@zhang-ivy
Copy link
Contributor Author

@ijpulidos : This is the pymbar issue related to your error: choderalab/pymbar#419

@zhang-ivy
Copy link
Contributor Author

zhang-ivy commented May 25, 2022

@ijpulidos : Actually, which version are you using? 3.0.6+ should be fine (i.e. you don't necessarily need to install from master), 3.0.5 is broken.

@ijpulidos
Copy link
Contributor

@zhang-ivy yes, I was using 3.0.5, I can confirm 3.0.6+ works fine. Thanks!

I checked the behavior of the free energy with the number of iterations. I don't see any tendency or convergence. The results are as follows

# neutral mutation
n_iterations, energy_diff
2,      4.18895
4,      0.990343
8,      -0.00456955
16,     -5.50846
32,     -0.54498
64,      -1.93432
128,     0.827766
256,     0.176885
512,     0.0994963  # took ~3 hours

# counterion mutation
n_iterations, energy_diff
2,      4.79188
4,      6.45721
8,      0.626254
16,     1.82293
32,     -1.89355
64,     1.7545
128,    -2.88888   # took ~1.5 hours

None of them converged (they all resulted in an AssertionError as per the test). The walltime for the longest ones are specified.

@zhang-ivy
Copy link
Contributor Author

zhang-ivy commented May 25, 2022

@ijpulidos : I think the n_iterations is too small, can you try n_iterations=250, 500, 1000, 1500, 2000? You basically already did the first 2 for the neutral mutation, so no need to repeat those.

@ijpulidos
Copy link
Contributor

@zhang-ivy Sure I can try, I'm just wondering if we are okay with tests running for more than 6 hours in the GPU CI. I guess it's not much of an issue if these are running overnight or so.

@zhang-ivy
Copy link
Contributor Author

@ijpulidos : Yes, I'm not sure either. We can discuss more (with Mike, too) once we determine what the minimal n_iterations is.

@mikemhenry
Copy link
Contributor

Yah lets first figure out what the minimal amount of time we can spend is, then we can figure out what we want to do with that. It may be something we check before a release or once a month or something. A p2.xl is something like 90 cents an hour, so even it it takes 10 hours, that isn't munch per month or week if we decide we need it.

@ijpulidos
Copy link
Contributor

ijpulidos commented May 26, 2022

The most recent results are as follows (I'm including previous results as well just in case we spot some pattern or something).

# neutral mutation
n_iterations, energy_diff
2,      4.18895
4,      0.990343
8,      -0.00456955
16,     -5.50846
32,     -0.54498
64,      -1.93432
128,     0.827766
256,     0.176885
512,     0.0994963  # took ~3 hours
1024,   -0.199971
1512,   -0.026376  # took ~13 hours in A40

# counterion mutation
n_iterations, energy_diff
2,      4.79188
4,      6.45721
8,      0.626254
16,     1.82293
32,     -1.89355
64,     1.7545
128,    -2.88888   # took ~1.5 hours
512,    1.87339
1024,   -1.04253
1512,   -1.37579
2048,   0.446562
4096,   -0.247799
5000,   -0.436782  # took ~17 hours in A100

These are still not converged in terms of np.isclose which, as far as I can see, defaults to checking around 5 significant figures. I wonder if we are expecting too much and whether we should change our convergence criteria. As far as I can see, it also depends on the system, the counterion one is one order of magnitude further apart compared to the neutral system. Thoughts?

@ijpulidos
Copy link
Contributor

ijpulidos commented May 26, 2022

As far as I can see, it also depends on the system, the counterion one is one order of magnitude further apart compared to the neutral system. Thoughts?

Actually, nvm this part, if we check the significant figures in the actual values (not the difference) it is similar for both systems. For example:

# neutral system
ALA-THR is -51.84165478283243. THR-ALA is -51.81527874133261
# counterion system
ARG-ALA-ARG is -277.7522884618811. LYS-ALA-LYS is -277.31550615053516

@zhang-ivy
Copy link
Contributor Author

zhang-ivy commented May 26, 2022

I wonder if we are expecting too much and whether we should change our convergence criteria.

Yes, I think the convergence criteria is too stringent. Let's just check whether the difference in the forward and reverse free energies is < 1 kT, or perhaps 0.5 kT if we want to be a bit more stringent.

Seems like for the neutral mutations, 250 iterations should be sufficient. For the charge changing, seems like 2000 should be good.

@ijpulidos
Copy link
Contributor

Ok, so I think we leave these as tests (now I think they are actually tests not examples, sorry for my confusion before). I'll be pushing the changes soon.

@mikemhenry I'm wondering if we use a new pytest mark for these kind of tests that could potentially take hours/days maybe having yet another CI workflow that run these specially marked tests every once in a while or so? Ideas on how could we accomplish this are appreciated.

@codecov
Copy link

codecov bot commented May 27, 2022

Codecov Report

Merging #992 (af2862c) into main (ef35201) will decrease coverage by 0.24%.
The diff coverage is 15.59%.

@mikemhenry
Copy link
Contributor

@ijpulidos How about we merge this in and see how long it takes on AWS? How long does it take if we only care about <1kT?

@ijpulidos
Copy link
Contributor

@ijpulidos How about we merge this in and see how long it takes on AWS? How long does it take if we only care about <1kT?

So there are two tests, the first one might run in a reasonable time, the second one I have the suspicion that it is going to take more than 24 hours to run on the current AWS instance we are using. I guess we can just try and see, I'm going to make the changes for this to be run (it's now explicitly skipped with pytest). What's a good limit for this? My assumption is that anything below 24 hours should be okay since we are running the GPU CI every 24 hours as well, but I don't know if there are other limits (github/aws limiting it).

@ijpulidos ijpulidos self-requested a review June 1, 2022 00:10
@mikemhenry
Copy link
Contributor

On a self hosted runner I think it's like a 30 day timeout 🤣 let's do it an see what happens

@zhang-ivy
Copy link
Contributor Author

zhang-ivy commented Jun 1, 2022

@ijpulidos : I just reviewed your changes and I don't think we should call the file test_topology_factories.py -- we aren't actually testing the factory here -- we do that in test_relative.py, which contains the energy validation tests for the factories -- I think we should leave it as test_samplers.py, since we are checking to make sure the sampler HybridCompatabilityMixin is working properly.

Or, perhaps an even better name for the test file might be test_repex_convergence.py or test_repex_self_consistency.py

PS: We'll want to put the small molecule repex test in this same file as well, see: #959

@ijpulidos
Copy link
Contributor

@zhang-ivy Yes, I agree. I think it makes sense to have a test_repex.py module or similar with all the repex tests (these ones at least, maybe we'll migrate/create others with time). Similarly with others like sams and neq-switching, etc. I'll make the changes.

@zhang-ivy
Copy link
Contributor Author

@ijpulidos : Tests are passing -- as long as you specified the right flags for running the tests on GPU, I think this should be good to merge!

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: medium priority medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add HybridCompatilityMixin subclass for use with RESTCapableHybridTopologyFactory
4 participants