-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update HybridCompatibilityMixin to handle RESTCapableHybridTopologyFactory and add repex self-consistency tests #992
Conversation
Could we run a few iterations on the CPU but skip the validation checks, and run the 1000 iterations on the GPU? An advantage of just doing a few iterations on the CPU is it will help find API/other run time errors faster |
Yes that sounds good. Sorry I wasn’t clear, that’s actually what I meant.
On May 4, 2022, at 03:51, Mike Henry ***@***.***> wrote:
Could we run a few iterations on the CPU but skip the validation checks, and run the 1000 iterations on the GPU? An advantage of just doing a few iterations on the CPU is it will help find API/other run time errors faster
—
Reply to this email directly, view it on GitHub<#992 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIPGJCSQ5JGHJHIY6KPYFA3VIIT57ANCNFSM5VALN3DQ>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few comments.
On the other hand, in my opinion these tests should be examples, and we are already testing the examples in test_examples.py
, there's probably a bit of tweaking to be done in order to get this run in the new GPU CI, since the examples predates it. These tests are not really testing "units" but rather complete workflows/simulations. I can make the changes for this to be converted as examples, we also need examples with repex.
Co-authored-by: Iván Pulido <ivanpulido@protonmail.com>
Co-authored-by: Iván Pulido <ivanpulido@protonmail.com>
Can this make it into the 0.10.0 release, or will it have to wait until 0.11.0? |
@jchodera : This can probably make it into After we make a decision, we might need @mikemhenry to help specify which tests should be run on cpu vs gpu. |
I should also note that the new tests I added in this PR are the reason why the GHA workflow has failures. This is because the tests are only running 2 cycles of repex, which is surely not enough to get the free energies to be equal and opposite. We'll need to move this test to the gpu to get the free energies to match. |
Notes from dev sync:
|
Co-authored-by: Iván Pulido <ivanpulido@protonmail.com>
I keep getting the following error traceback when running the counterion mutation test using 4 or more iterations (2 work): Traceback (most recent call last):
File "/lila/data/chodera/pulidoi/sandbox/perses-992/test_REST_counterion_mutation.py", line 79, in <module>
f_ij, df_ij = analyzer.get_free_energy()
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1932, in get_free_energy
self._compute_free_energy()
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1892, in _compute_free_energy
(Deltaf_ij, dDeltaf_ij) = self.mbar.getFreeEnergyDifferences()
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/mbar.py", line 541, in getFreeEnergyDifferences
Theta_ij = self._computeAsymptoticCovarianceMatrix(
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/mbar.py", line 1679, in _computeAsymptoticCovarianceMatrix
check_w_normalized(W, N_k)
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/utils.py", line 358, in check_w_normalized
raise ParameterError(
pymbar.utils.ParameterError: Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 0.635215. 12 other columns have similar problems |
Ah yes that’s an issue with your version of pymbar. Make sure you install from master.
On May 24, 2022, at 20:04, Iván Pulido ***@***.***> wrote:
I keep getting the following error traceback when running the counterion mutation test using 4 or more iterations (2 work):
Traceback (most recent call last):
File "/lila/data/chodera/pulidoi/sandbox/perses-992/test_REST_counterion_mutation.py", line 79, in <module>
f_ij, df_ij = analyzer.get_free_energy()
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1932, in get_free_energy
self._compute_free_energy()
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/openmmtools/multistate/multistateanalyzer.py", line 1892, in _compute_free_energy
(Deltaf_ij, dDeltaf_ij) = self.mbar.getFreeEnergyDifferences()
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/mbar.py", line 541, in getFreeEnergyDifferences
Theta_ij = self._computeAsymptoticCovarianceMatrix(
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/mbar.py", line 1679, in _computeAsymptoticCovarianceMatrix
check_w_normalized(W, N_k)
File "/home/pulidoi/miniconda3/envs/perses-dev/lib/python3.9/site-packages/pymbar/utils.py", line 358, in check_w_normalized
raise ParameterError(
pymbar.utils.ParameterError: Warning: Should have \sum_n W_nk = 1. Actual column sum for state 0 was 0.635215. 12 other columns have similar problems
—
Reply to this email directly, view it on GitHub<#992 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIPGJCVM65SVZ3DD5UM4GNDVLVVCDANCNFSM5VALN3DQ>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
@ijpulidos : This is the pymbar issue related to your error: choderalab/pymbar#419 |
@ijpulidos : Actually, which version are you using? 3.0.6+ should be fine (i.e. you don't necessarily need to install from master), 3.0.5 is broken. |
@zhang-ivy yes, I was using 3.0.5, I can confirm 3.0.6+ works fine. Thanks! I checked the behavior of the free energy with the number of iterations. I don't see any tendency or convergence. The results are as follows
None of them converged (they all resulted in an |
@ijpulidos : I think the |
@zhang-ivy Sure I can try, I'm just wondering if we are okay with tests running for more than 6 hours in the GPU CI. I guess it's not much of an issue if these are running overnight or so. |
@ijpulidos : Yes, I'm not sure either. We can discuss more (with Mike, too) once we determine what the minimal n_iterations is. |
Yah lets first figure out what the minimal amount of time we can spend is, then we can figure out what we want to do with that. It may be something we check before a release or once a month or something. A p2.xl is something like 90 cents an hour, so even it it takes 10 hours, that isn't munch per month or week if we decide we need it. |
The most recent results are as follows (I'm including previous results as well just in case we spot some pattern or something).
These are still not converged in terms of |
Actually, nvm this part, if we check the significant figures in the actual values (not the difference) it is similar for both systems. For example:
|
Yes, I think the convergence criteria is too stringent. Let's just check whether the difference in the forward and reverse free energies is < 1 kT, or perhaps 0.5 kT if we want to be a bit more stringent. Seems like for the neutral mutations, 250 iterations should be sufficient. For the charge changing, seems like 2000 should be good. |
Ok, so I think we leave these as tests (now I think they are actually tests not examples, sorry for my confusion before). I'll be pushing the changes soon. @mikemhenry I'm wondering if we use a new |
@ijpulidos How about we merge this in and see how long it takes on AWS? How long does it take if we only care about <1kT? |
So there are two tests, the first one might run in a reasonable time, the second one I have the suspicion that it is going to take more than 24 hours to run on the current AWS instance we are using. I guess we can just try and see, I'm going to make the changes for this to be run (it's now explicitly skipped with pytest). What's a good limit for this? My assumption is that anything below 24 hours should be okay since we are running the GPU CI every 24 hours as well, but I don't know if there are other limits (github/aws limiting it). |
On a self hosted runner I think it's like a 30 day timeout 🤣 let's do it an see what happens |
@ijpulidos : I just reviewed your changes and I don't think we should call the file Or, perhaps an even better name for the test file might be PS: We'll want to put the small molecule repex test in this same file as well, see: #959 |
@zhang-ivy Yes, I agree. I think it makes sense to have a |
@ijpulidos : Tests are passing -- as long as you specified the right flags for running the tests on GPU, I think this should be good to merge! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
HybridCompatilityMixin
to automatically detect which type of factory (HybridTopologyFactory
orRESTCapableHybridTopologyFactory
) was inputted and create a sampler depending on the factory type.RESTCapableHybridTopologyFactory
):TODO:
test_samplers.py
-- not sure if this is the right place -- may want to move it somewhere else and/or make it an example?test_samplers.py
are even working -- should we delete everything else in here?n_iterations
should be bumped up to 1000 and it should be run on a gpu (@mikemhenry : Can you help with this?)Motivation and context
Resolves #984
How has this been tested?
Change log