Skip to content

Commit

Permalink
git subrepo pull (merge) vendored
Browse files Browse the repository at this point in the history
subrepo:
  subdir:   "vendored"
  merged:   "7617c39"
upstream:
  origin:   "https://github.com/nextstrain/ingest"
  branch:   "main"
  commit:   "7617c39"
git-subrepo:
  version:  "0.4.6"
  origin:   "https://github.com/ingydotnet/git-subrepo"
  commit:   "110b9eb"
  • Loading branch information
victorlin committed Oct 17, 2023
1 parent 6c0a9cc commit 21cb18b
Show file tree
Hide file tree
Showing 11 changed files with 104 additions and 475 deletions.
10 changes: 6 additions & 4 deletions vendored/.github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
name: CI

on:
- push
- pull_request
- workflow_dispatch
push:
branches:
- main
pull_request:
workflow_dispatch:

jobs:
shellcheck:
Expand All @@ -18,4 +20,4 @@ jobs:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
- run: pip install cram
- run: cram tests/
- run: cram tests/
4 changes: 2 additions & 2 deletions vendored/.gitrepo
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[subrepo]
remote = https://github.com/nextstrain/ingest
branch = main
commit = c97df238518171c2b1574bec0349a55855d1e7a7
parent = 6ef4dc097df037130845d002e54eb4b7338e3d5b
commit = 7617c39fae05e5882c5e6c065c5b47d500c998af
parent = 6c0a9cc7a1c3cfc6a055707a0eb661af56befeb6
method = merge
cmdver = 0.4.6
33 changes: 29 additions & 4 deletions vendored/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,31 @@ Any future updates of ingest scripts can be pulled in with:
git subrepo pull ingest/vendored
```

If you run into merge conflicts and would like to pull in a fresh copy of the
latest ingest scripts, pull with the `--force` flag:

```
git subrepo pull ingest/vendored --force
```

> **Warning**
> Beware of rebasing/dropping the parent commit of a `git subrepo` update
`git subrepo` relies on metadata in the `ingest/vendored/.gitrepo` file,
which includes the hash for the parent commit in the pathogen repos.
If this hash no longer exists in the commit history, there will be errors when
running future `git subrepo pull` commands.

If you run into an error similar to the following:
```
$ git subrepo pull ingest/vendored
git-subrepo: Command failed: 'git branch subrepo/ingest/vendored '.
fatal: not a valid object name: ''
```
Check the parent commit hash in the `ingest/vendored/.gitrepo` file and make
sure the commit exists in the commit history. Update to the appropriate parent
commit hash if needed.

## History

Much of this tooling originated in
Expand Down Expand Up @@ -72,10 +97,9 @@ Scripts for supporting ingest workflow automation that don’t really belong in
NCBI interaction scripts that are useful for fetching public metadata and sequences.

- [fetch-from-ncbi-entrez](fetch-from-ncbi-entrez) - Fetch metadata and nucleotide sequences from [NCBI Entrez](https://www.ncbi.nlm.nih.gov/books/NBK25501/) and output to a GenBank file.
Useful for pathogens with metadata and annotations in custom fields that are not part of the standard [NCBI Virus](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/) or [NCBI Datasets](https://www.ncbi.nlm.nih.gov/datasets/) outputs.
- [fetch-from-ncbi-virus](fetch-from-ncbi-virus) - Fetch metadata and nucleotide sequences from [NCBI Virus](https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/) and output NDJSON records to stdout.
- [ncbi-virus-url](ncbi-virus-url) - Generates the URL to download metadata and sequences from NCBI Virus as a single CSV file.
- [csv-to-ndjson](csv-to-ndjson) - Converts CSV file to NDJSON file with a hard-coded 200MiB field size limit to accommodate sequences in the NCBI Virus download.
Useful for pathogens with metadata and annotations in custom fields that are not part of the standard [NCBI Datasets](https://www.ncbi.nlm.nih.gov/datasets/) outputs.

Historically, some pathogen repos used the undocumented NCBI Virus API through [fetch-from-ncbi-virus](https://github.com/nextstrain/ingest/blob/c97df238518171c2b1574bec0349a55855d1e7a7/fetch-from-ncbi-virus) to fetch data. However we've opted to drop the NCBI Virus scripts due to https://github.com/nextstrain/ingest/issues/18.

Potential Nextstrain CLI scripts

Expand All @@ -97,6 +121,7 @@ Potential augur curate scripts
- [transform-authors](transform-authors) - Abbreviates full author lists to '<first author> et al.'
- [transform-field-names](transform-field-names) - Rename fields of NDJSON records
- [transform-genbank-location](transform-genbank-location) - Parses `location` field with the expected pattern `"<country_value>[:<region>][, <locality>]"` based on [GenBank's country field](https://www.ncbi.nlm.nih.gov/genbank/collab/country/)
- [transform-strain-names](transform-strain-names) - Ordered search for strain names across several fields.

## Software requirements

Expand Down
15 changes: 0 additions & 15 deletions vendored/csv-to-ndjson

This file was deleted.

292 changes: 0 additions & 292 deletions vendored/docs/ncbi-virus-all-fields-example.json

This file was deleted.

Loading

0 comments on commit 21cb18b

Please sign in to comment.