Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate use of all trait names vs. preferred trait names in OT evidence generation #384

Open
apriltuesday opened this issue Jun 12, 2023 · 1 comment

Comments

@apriltuesday
Copy link
Contributor

Context of issue:
When we do trait mapping (automated and manual), we use only preferred names, but when we annotate we attempt to use all names. Because we retain previous mappings even if they don't appear (i.e. don't appear among preferred names in current ClinVar), this means obsolete mappings can be not just retained but also used without being updated.

Example - in ClinVar:

    <TraitSet Type="Disease" ID="6307">
      <Trait ID="4675" Type="Disease">
        <Name>
          <ElementValue Type="Preferred">Malignant tumor of urinary bladder</ElementValue>
          <XRef ID="Bladder+cancer/7822" DB="Genetic Alliance"/>
          <XRef ID="399326009" DB="SNOMED CT"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Urinary bladder cancer</ElementValue>
          <XRef ID="MONDO:0001187" DB="MONDO"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Urinary Bladder Neoplasms</ElementValue>
          <XRef ID="D001749" DB="MeSH"/>
        </Name>
        <Name>
          <ElementValue Type="Alternate">Bladder cancer</ElementValue>
        </Name>
        <AttributeSet>
          <Attribute Type="keyword">Hereditary cancer syndrome</Attribute>
        </AttributeSet>
        <XRef ID="MONDO:0001187" DB="MONDO"/>
        <XRef ID="C0005684" DB="MedGen"/>
        <XRef Type="MIM" ID="109800" DB="OMIM"/>
      </Trait>
    </TraitSet>

In latest mappings:

# preferred name yields up-to-date mapping
$ grep -i '^Malignant tumor of urinary bladder' latest_mappings.tsv
malignant tumor of urinary bladder	http://purl.obolibrary.org/obo/MONDO_0004986	urinary bladder carcinoma

# alternate name yields obsolete mapping
$ grep -i '^Urinary bladder cancer' latest_mappings.tsv
urinary bladder cancer	http://www.ebi.ac.uk/efo/EFO_0000292	bladder carcinoma

In #383 we modified annotated XML generation to use only preferred names, observing that it decreased coverage of traits only slightly while decreasing the number of obsolete EFO terms used significantly.

The goal of this issue is to see what is the impact of making a similar change for OT evidence string generation (which is more complicated due to how it groups and explodes traits), and if it is acceptable make the change.

@apriltuesday
Copy link
Contributor Author

See also #210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant