Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialized XML objects in small molecule pipeline have the wrong extension #1157

Closed
ijpulidos opened this issue Feb 10, 2023 · 3 comments · Fixed by #1163
Closed

Serialized XML objects in small molecule pipeline have the wrong extension #1157

ijpulidos opened this issue Feb 10, 2023 · 3 comments · Fixed by #1163
Assignees
Labels
priority: low low priority

Comments

@ijpulidos
Copy link
Contributor

When we serialize objects in

data.serialize(htf[phase].hybrid_system, trajectory_directory / "xml" /f"{phase}-hybrid-system.gz")
data.serialize(htf[phase]._old_system, trajectory_directory / "xml" / f"{phase}-old-system.gz")
data.serialize(htf[phase]._new_system, trajectory_directory / "xml" /f"{phase}-new-system.gz")

They are supposed to be gzipped files, but while inspecting these files one can easily tell they are directly the XML files. As in:

❯ file *
complex-hybrid-system.gz: XML 1.0 document, ASCII text
complex-new-system.gz:    XML 1.0 document, ASCII text
complex-old-system.gz:    XML 1.0 document, ASCII text
solvent-hybrid-system.gz: XML 1.0 document, ASCII text
solvent-new-system.gz:    XML 1.0 document, ASCII text
solvent-old-system.gz:    XML 1.0 document, ASCII text

Instead of the expected "gzip compressed data".

@ijpulidos ijpulidos added the priority: low low priority label Feb 10, 2023
@ijpulidos ijpulidos added this to the 0.10.2 Bugfix release milestone Feb 10, 2023
@ijpulidos ijpulidos self-assigned this Feb 10, 2023
@mikemhenry
Copy link
Contributor

Do we want to have them zipped, or do we want to keep them uncompressed?

@ijpulidos
Copy link
Contributor Author

We do want to compress them.

@mikemhenry
Copy link
Contributor

Okay this was a fun one: https://github.com/choderalab/perses/blob/main/perses/utils/data.py#L114-L127

We do save the xml with gzip, but then since there is an if instead of elif when checking for bz2, we overwrite the file with an uncompressed version when we hit the else block. I've got a PR incoming with some extra debug that will help troubleshoot issues like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: low low priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants