-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EVA-3386: Check Ensembl rapid release for supported assemblies #45
Conversation
@@ -29,3 +31,39 @@ def get_supported_asm_from_ensembl(tax_id: int) -> str: | |||
if assembly_accession_attribute in response: | |||
return str(response.get(assembly_accession_attribute)) | |||
return None | |||
|
|||
|
|||
@lru_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this works for cases where the function has no arguments and Python can't compare the input with a previous input... does Python cache the result unconditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, it's kind of a lazy shorthand for "just call this method once"... I added maxsize=None
as I read somewhere it performs a bit better for this use case.
continue | ||
current_assembly, current_date = results[tax_id] | ||
# Keep the more recent assembly, or the lexicographically first one if release dates are equal | ||
if current_date < release_date or (current_date == release_date and asm_accession < current_assembly): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: Guess you have already run an experiment to see if there are cases where release date for an older assembly is more recent than the release date for a newer assembly for some reason (an update/correction to the older assembly perhaps?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not actually checked this, do you mean comparing against dates in NCBI for example? I'm not sure it's worth doing since this is intended as a short-term measure till Ensembl makes the API public, but we could perhaps ask Ensembl if the scenario you proposed is plausible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @tcezard As discussed, this does occasionally occur. Here's an example where the later patch has an earlier release date:
And an example where two patch versions have the same release date:
I modified the lexicographical ordering to choose the last rather than the first to account for the second case.
Sundar suggested we could also look at the geneset
attribute, but I think at this point it would be preferable to ask Ensembl for clarification and make any necessary changes in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense...
if current_date < release_date or (current_date == release_date and asm_accession < current_assembly): | ||
results[tax_id] = (asm_accession, release_date) | ||
|
||
return {key: val[0] for key, val in results.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: I am wondering if there is value in storing the release dates somewhere as well for our reference...
No description provided.