Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix & test augur export for v2 (unified) JSONs #297

Closed
jameshadfield opened this issue Jun 17, 2019 · 5 comments
Closed

fix & test augur export for v2 (unified) JSONs #297

jameshadfield opened this issue Jun 17, 2019 · 5 comments
Assignees
Labels
v6 Issues relating to augur v6 / auspice v2

Comments

@jameshadfield
Copy link
Member

Currently augur export works well for v1 JSONs (i.e. meta + tree files) but is generally broken for v2 (henceforth referred to as "unified") JSON production. This issue asks to get this functionality working for unified JSONs as currently consumed by auspice on the v2 branch: https://github.com/nextstrain/auspice/tree/v2. This functionality is urgently needed in order to better test auspice v2. Note that some unified JSON functionality in augur export is present, however it doesn't generate JSONs which validate against the schema (see below).

If auspice from the v2 branch is installed (e.g. npm install auspice@alpha -- see nextstrain/auspice#720 for more deatils & background) then auspice convert is available which converts meta+tree JSONs to the unified JSON. This may be helpful to compare augur vs auspice approaches.

schema & config modifications

While there are multiple issues proposing slight revisions to the schema for the unified JSON, this issue asks it to be implemented as per the JSON schema defined here: https://github.com/nextstrain/augur/pull/277/files#diff-c021e1723b2195f3dada2595f3d70ebb. After it is working for this it'll be simple to add slight modifications as needed before the we release a new version.

As far as this issue is concerned, it's simplest to use the same (or similar) auspice config JSON. There are proposals to change this, however this discussion is best held somewhere else, with subsequent modifications added in after this issue has been closed.

User interface for augur export

@emmahodcroft and I talked a lot about this a few weeks ago and decided that it's desirable to have command line arguments available to replace a config file for simple use cases The command line arguments are currently implemented in augur master. For complicated cases it's necessary to use a JSON config file (see above), however the ability to use a config file is not yet implemented in augur.

@emmahodcroft
Copy link
Member

@jameshadfield With regard to the error in augur export V2 legend order:

The new unified JSON schema changes how colouring is specified. In V1 JSONs color_map is a list of lists:
[ ["<1y", "#d7191c"], ["1-5y", "#fdae61"] .... ]
In V2 JSONs, scale is a dict:
{ "1-5y": "#fdae61", "18-64y": "#abd9e9", ... }

So this is where the ordering problem comes from. As we've found before, though dicts technically do not have an order, it looks like through augur export V2 the order is actually maintained - until the JSON is written out.

This is because in utils.py write_json, json.dump has the flag sort_keys set to True. (And it seems to do this alphabetically.) (This doesn't affect V1 legends because it's a list, I guess.)

Setting this to False fixes the legend ordering problem, but messes up the overall JSON order, so tree is above all the colourings, authors, etc (which personally I find much less useful - currently it's easy to check on your metadata stuff without scrolling past a huge tree). However, we could probably fix this in export by just adding things to unified in the order we want, rather than as we process them.

So one solution would be to add an argument to utils.py write_json so that sort_keys can be set False and use this in augur export V2. But, we should think about what else this will impact.

What think you?

@tsibley
Copy link
Member

tsibley commented Jun 20, 2019

@emmahodcroft It's generally unreliable to rely on the order of keys in a dictionary when crossing languages, regardless of what order they appear in when serialized to JSON. Different languages make different guarantees about preserving key ordering. I think the simplest thing would be to keep the scale as a list of lists, which is a standard and reasonable approach to cross-language ordered key-value pairs. This also allows arbitrary user-defined ordering rather than forcing an ASCII-betical sort. @jameshadfield What motivated the change from a list of lists to a dict? Could we keep it as a list of lists?

@jameshadfield
Copy link
Member Author

I don't see a problem with returning to the v1 schema here (list of lists). Can't remember anything important about why I proposed turning it into a dict.

@kairstenfay
Copy link
Contributor

kairstenfay commented Jun 27, 2019

FYI @joverlee521 and I just finished going through the augur export process for the unified JSONs. Here is what my process looked like.
@jameshadfield asked me to note any 'WTF' moments, which are tagged with WTF below:

  • WTF = a moment in the install process that was obscure, unexpected, or not clearly documented where I needed help.

Kairsten's augur export process

With help from Jover

  1. Started with the zika-colombia repo. Ran nextstrain build .

  2. When running augur export and seeing the required parameters in the help message, I wanted to run

    augur export -t auspice/zika-colombia_tree.json --metadata auspice/zika-colombia_meta.json
    
  3. When that didn't work, looked at the Snakefile.

  4. Tried

    augur export \
     --tree results/tree.nwk \
     --metadata results/metadata.tsv \
     --node-data results/branch_lengths.json \
     			results/traits.json \
     			results/nt_muts.json \
     			results/aa_muts.json
    

    and received the message
    ERROR: config file None not found. as well as a TypeError: expected str, bytes or os.PathLike object, not NoneType

  5. Switched to zika-tutorial. Ran nextstrain build . there, too.

  6. Copied and ran the following code from the tutorial:

    augur export \
    --tree results/tree.nwk \
    --metadata data/metadata.tsv \
    --node-data results/branch_lengths.json \
    			results/traits.json \
    			results/nt_muts.json \
    			results/aa_muts.json
    

    Got the same errors as above.
    WTF

  7. Ran the code from step 4 again in zika-colombia but with the flag --new-schema (thanks to Jover's investigation).

    Got the following errors:
    ERROR: file with states does not exist
    TypeError: expected str, bytes or os.PathLike object, not NoneType

  8. Ran the code from step 6 again in zika-tutorial but with the flag --new-schema and got this warning plus error:

    WARNING: Contradictory author information: {'authors': 'Ho et al', 'title': 'Outbreak of Zika virus infection in Singapore: an epidemiological, entomological, virological, and clinical analysis', 'journal': 'Lancet Infect Dis (2017) In press', 'url': 'https://www.ncbi.nlm.nih.gov/nuccore/KY241688'} vs {'authors': 'Ho et al', 'title': 'Outbreak of Zika virus infection in Singapore: an epidemiological, entomological, virological, and clinical analysis', 'journal': 'Lancet Infect Dis (2017) In press', 'url': 'https://www.ncbi.nlm.nih.gov/nuccore/KY241726'}
     ...
    

    TypeError: expected str, bytes or os.PathLike object, not NoneType

    WTF

  9. Back to zika-colombia: Jover says I need an output file. Ran the following code:

    augur export --tree results/tree.nwk --metadata data/metadata.tsv --node-data results/branch_lengths.json results/traits.json results/nt_muts.json results/aa_muts.json --new-schema --output-main out.json
    

    and received the error:
    ERROR: file with states does not exist

    but out.json was still generated.

    WTF

    Jover says it looks good but I'm missing metadata.

  10. Ran augur export -h.

    Tried to match the augur options with what was listed in the config/auspice_config.json.

    Note: maintainers and maintainer-urls is split in augur but not in the auspice config.
    WTF

    geography-traits? WTF Matched with geo in auspice config.

    extra-traits: skipped WTF

    panels: gave nothing b/c of default

    tree-name: skipped WTF

    Ended up running the code in step 8 (in zika-tutorial) plus:

    --title 'Real-time tracking of Zika virus \ evolution'
    --maintainers "Me" \
    --maintainer-urls 'google.com' \
    --geography-traits 'country' 'region'
    
  11. Moved onto the augur validate process.
    First ran augur validate out.json.
    The JSON file is not a positional argument WTF.

    Then, ran augur validate -h.
    WTF is the nextflu schema?

    Ran:
    augur validate --json out.json --new-schema

    Received:

    Validating out.json using the main schema (version new)... SUCCESS
    Validating that out.json is internally consistent...
    SUCCESS
    

@jameshadfield jameshadfield self-assigned this Jul 18, 2019
@jameshadfield jameshadfield added the v6 Issues relating to augur v6 / auspice v2 label Jul 18, 2019
@jameshadfield
Copy link
Member Author

Now working in the v6 branch -- with bug fixes etc to be expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v6 Issues relating to augur v6 / auspice v2
Projects
None yet
Development

No branches or pull requests

4 participants