Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVA-3386: Check Ensembl rapid release for supported assemblies #45

Merged
merged 5 commits into from
Sep 22, 2023

Conversation

apriltuesday
Copy link
Contributor

No description provided.

@apriltuesday apriltuesday marked this pull request as ready for review September 15, 2023 14:59
@apriltuesday apriltuesday self-assigned this Sep 15, 2023
@@ -29,3 +31,39 @@ def get_supported_asm_from_ensembl(tax_id: int) -> str:
if assembly_accession_attribute in response:
return str(response.get(assembly_accession_attribute))
return None


@lru_cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this works for cases where the function has no arguments and Python can't compare the input with a previous input... does Python cache the result unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, it's kind of a lazy shorthand for "just call this method once"... I added maxsize=None as I read somewhere it performs a bit better for this use case.

continue
current_assembly, current_date = results[tax_id]
# Keep the more recent assembly, or the lexicographically first one if release dates are equal
if current_date < release_date or (current_date == release_date and asm_accession < current_assembly):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Guess you have already run an experiment to see if there are cases where release date for an older assembly is more recent than the release date for a newer assembly for some reason (an update/correction to the older assembly perhaps?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not actually checked this, do you mean comparing against dates in NCBI for example? I'm not sure it's worth doing since this is intended as a short-term measure till Ensembl makes the API public, but we could perhaps ask Ensembl if the scenario you proposed is plausible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @tcezard As discussed, this does occasionally occur. Here's an example where the later patch has an earlier release date:

Screenshot from 2023-09-21 14-53-42

And an example where two patch versions have the same release date:

Screenshot from 2023-09-22 10-17-02

I modified the lexicographical ordering to choose the last rather than the first to account for the second case.

Sundar suggested we could also look at the geneset attribute, but I think at this point it would be preferable to ask Ensembl for clarification and make any necessary changes in a subsequent PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense...

if current_date < release_date or (current_date == release_date and asm_accession < current_assembly):
results[tax_id] = (asm_accession, release_date)

return {key: val[0] for key, val in results.items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I am wondering if there is value in storing the release dates somewhere as well for our reference...

@apriltuesday apriltuesday merged commit ca03de6 into EBIvariation:master Sep 22, 2023
4 checks passed
@apriltuesday apriltuesday deleted the EVA-3386 branch September 22, 2023 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants