Skip to content

Commit

Permalink
Merge pull request #1588: docs: Convert to rST, add CI, and fix warnings
Browse files Browse the repository at this point in the history
  • Loading branch information
victorlin authored Mar 10, 2023
2 parents f06ff45 + 653f440 commit 57c27b3
Show file tree
Hide file tree
Showing 30 changed files with 1,070 additions and 1,049 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,3 +92,8 @@ jobs:
- run: gh workflow run ci.yml --repo nextstrain/docker-base
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN_NEXTSTRAIN_BOT_WORKFLOW_DISPATCH }}
build-docs:
uses: nextstrain/.github/.github/workflows/docs-ci.yaml@master
with:
docs-directory: docs/
environment-file: docs/environment.yml
4 changes: 1 addition & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
---
title: Changelog
---
# Changelog

## version 2.45.0 - 2023/03/08

Expand Down
1 change: 0 additions & 1 deletion docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
# read the docs generated files
_templates
_static
_build
117 changes: 0 additions & 117 deletions docs/advanced-functionality/drag-drop-csv-tsv.md

This file was deleted.

110 changes: 110 additions & 0 deletions docs/advanced-functionality/drag-drop-csv-tsv.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
Adding extra metadata via CSV/TSV/XLSX
======================================

A common use case is to have additional metadata which you would like to add to the current dataset. If you created the dataset itself, then you may wish to keep certain data out of the dataset, as it may change frequently or be sensitive information which you don't want to share publicly.

Additional metadata (CSV / TSV / XLSX file(s)) can be dragged onto an existing dataset in Auspice. These extra data are processed within the browser, so no information leaves the client, which can be useful for viewing private metadata.

The general format is compatible with other popular tools such as `MicroReact <https://microreact.org/>`__. The first column defines the names of the strains / samples in the tree, while the first row (header row) defines the metadata names. You can add as many columns you want, each will result in a different colouring of the data being made available. The separator can be either a tab character or a comma & the file extension should be ``.tsv`` or ``.csv``, respectively. Excel files with file extension ``.xlsx`` are also supported, but the metadata must be in the first sheet of the workbook. Older Excel files with the ``.xls`` extension are not supported.

Example:
--------

A TSV file as follows can be dragged onto `nextstrain.org/zika <https://nextstrain.org/zika>`__ to add a "secret" color-by:

.. code:: text
strain secret
USVI/19/2016 A
USVI/28/2016 B
USVI/41/2016 C
USVI/42/2016 C
.. figure:: ../assets/csv-extra-data.png
:alt: auspice with extra data shown via csv

auspice with extra data shown via csv

A more complex metadata file may look like the following, which makes use of additional features available. This defines colours for the metadata (e.g. ``A`` is yellow, ``B`` is orange) as well as associating strains with (made up) geographic coordinates.

.. code:: text
strain secret secret__colour latitude longitude
USVI/19/2016 A #f4e409 0 -120
USVI/28/2016 B #f49015 0 -115
USVI/41/2016 C #710000 0 -100
USVI/42/2016 C #710000 0 -120
.. figure:: ../assets/csv-extra-data-2.png
:alt: auspice with extra data shown via csv

auspice with extra data shown via csv

Adding extra colorings and filters
----------------------------------

Most metadata columns will be added as colourings; once the data has been added they should appear as new entries in the "Color By" dropdown (Left-hand sidebar of Auspice). This means you can also filter by these traits using the "Filter Data" box.

An extra colouring is automatically created to represent the set of samples which were in the CSV/TSV/XLSX file -- this allows you to easily filter the dataset to just those samples which you had in your metadata file.

You can choose the colours you want to associate with values by adding in a separate column with the same name + ``__colour`` (see above example), or the suffix ``__color`` may also be used. Currently the values in this column must be hex values such as ``#3498db`` (blue). If the same value of metadata is associated with multiple, distinct, colours then the colours are blended together.

Adding geographic locations
---------------------------

If the columns ``latitude`` and ``longitude`` exist (or ``__latitude`` and ``__longitude``) then you can see these samples on the map. This means that there will be a new geographic resolution available in the sidebar dropdown, labelled the same as the metadata filename you dropped on, which will plot the location on the map for those samples in the metadata file for which you provided positions for.

Additional metadata of this format defines lat-longs *per-sample*, which is different to Nextstrain's approach (where we associate a location to a metadata trait). To resolve this, we create a new (dummy) trait whose values represent the unique lat/longs provided. In the above example screenshot, note that auspice groups ``USVI/19/2016`` and ``USVI/42/2016`` together on the map as their lat/longs are identical; the other metadata columns (e.g. ``secret``) are irrelevant in this case.

P.S. If the dataset itself doesn't contain any geographic data, then adding metadata will trigger the map to be displayed.

Privacy
-------

All data added via these additional metadata files remains in-browser, and doesn't leave your computer. This makes it safe for sensitive data.

Schema
------

The following fields are ignored completely. (Some of these may be allowed in the future when we have increased the features available here.)

.. code:: yaml
name
div
vaccine
labels
hidden
mutations
url
authors
accession
traits
children
date
num_date
year
month
day
Fields which end with certain strings are treated as follows:

- ``__autocolour``: this suffix is dropped, but the column is otherwise parsed as normal
- ``__colour``: see above section on adding colours
- ``__shape``: this column is currently ignored

The following columns are interpreted as geographic locations (see section above) and therefore are not added as a colouring:

.. code:: yaml
__latitude
__longitude
latitude
longitude
The name of the first column is not used, but the first column is always taken to be the sample (strain) name.

Scale types
-----------

The type of the data is currently always categorical. This means that while numeric data will work, it won't be very usable if there are many values.
33 changes: 0 additions & 33 deletions docs/advanced-functionality/misc.md

This file was deleted.

36 changes: 36 additions & 0 deletions docs/advanced-functionality/misc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Miscellaneous
=============

How auspice handles unknown or missing data
-------------------------------------------

Attributes assigned to nodes in the tree -- such as ``country`` -- may have values which are *missing* or *unknown*. Auspice will ignore values such as these, and they will not be displayed in the legend or the tree info-boxes (e.g. hovering over the tree). Tips & branches across the tree with values such as these will be gray. (Branches with low confidence for an inferred trait may also show as gray, and hovering over the branches will help identify this.) Note that, if a discrete trait is selected, then a proportion of the pie-chart on the map may also be gray to represent the proportion of tips with missing data.

If a trait is not set on a node it is considered missing, as well as if (after coersion to lower-case) it has one of the following values:

.. code:: js
["unknown", "?", "nan", "na", "n/a", "", "unassigned"]
GISAID specific changes to behavior
-----------------------------------

GISAID data provenance annotation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the dataset JSON defines a data provenance named ``"GISAID"`` (``JSON → metadata → data_provenance``, see `schema <https://github.com/nextstrain/augur/blob/master/augur/data/schema-export-v2.json>`__), then there are two changes to Auspice behaviour:

1. The GISAID data provenance text (displayed at the top of the page in auspice) will be replaced with the GISAID logo, which is also a link to `gisaid.org <https://gisaid.org>`__.
2. The available metadata for download is different. We now use a “Per-sample acknowledgments table” where each row is a strain in the tree, with the following columns:

- ``strain``
- ``gisaid_epi_isl``
- ``genbank_accession``
- ``originating_lab``
- ``submitting_lab``
- ``author``

Node annotated with ``gisaid_epi_isl``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When hovering or clicking on tips (in the tree), nodes annotated with ``gisaid_epi_isl`` will behave slightly differently. These info-boxes will display “GISAID EPI ISL” and, in the tip-clicked info-panel, the value will also be a link to `gisaid.org <https://gisaid.org>`__.
26 changes: 0 additions & 26 deletions docs/advanced-functionality/second-trees.md

This file was deleted.

22 changes: 22 additions & 0 deletions docs/advanced-functionality/second-trees.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Displaying multiple trees
=========================

Auspice has the ability to display two trees side-by-side, and to draw lines between tips with the same name (aka tanglegrams). This is useful to compare the shape of different trees, especially when they are from the same organism -- for instance comparing phylogenies constructed from different segments of the same influenza virus can tell you a lot about the different histories of the segments which have the capacity to reassort (see image below).

How to load multiple trees
--------------------------

You can compare any two datasets which you have available -- for instance if you had "flu/seasonal/h3n2/ha/2y" and "flu/seasonal/h3n2/na/2y" then loading the URL "flu/seasonal/h3n2/ha/2y:flu/seasonal/h3n2/na/2y" would load them both. A toggle is made available in the sidebar to turn off the lines drawn between tips.

|two-trees| *Comparing epitope mutations between HA and NA (worldwide influena H3N2).* *Notice how the segments can differ drastically in how many epitope mutations they acquire!* *While the crossing of the lines between the tips doesn't always prove reassortment, it's usually a good indication that reassortment is present.*

Showing potential datasets in the sidebar
-----------------------------------------

Depending on the way you've labelled your datasets, potential second trees are available in a sidebar dropdown. These are defined by the :ref:`getAvailable API request <server-api-charon-getavailable>`. Currently, the logic in ``auspice view`` is to match all datasets which:

- contain the same first "part" of the URL -- interpreted here to represent the same pathogen.
- have the same number of "parts" in the URL (parts are delimited by a ``_`` in the filename or a ``/`` in the URL).
- differ from the currently selected dataset by only 1 part.

.. |two-trees| image:: ../assets/tangle.png
Loading

0 comments on commit 57c27b3

Please sign in to comment.