You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the only options for limiting the size of the database are:
preselection argument of db_download(), which allows for selection of GenBank "division" (i.e., plant, bacterial, invertebrate, etc)
min_length argument of db_create()
max_length argument of db_create()
This does not work well if someone is only interested in a clade (e.g., ferns) within one of the larger divisions (e.g., plants; ca. 800gb), as the local database is much larger than needed and therefore slow.
So what I propose is a way to limit the database during creation by ID (i.e., GenBank accession number). This is very similar to the example given for extracting data from the database, but instead of returning data to R, it would reduce the size of the external database created. That way, future queries would run on a smaller database and go faster.
Another idea would be to limit the database by taxonomic level, but I am not sure if that is possible with the information available during parsing of the files downloaded from GenBank.
The text was updated successfully, but these errors were encountered:
joelnitta
changed the title
Feature request: allow manipulation of database
Feature request: more options for filtering database during creation
May 19, 2022
Currently, the only options for limiting the size of the database are:
preselection
argument ofdb_download()
, which allows for selection of GenBank "division" (i.e., plant, bacterial, invertebrate, etc)min_length
argument ofdb_create()
max_length
argument ofdb_create()
This does not work well if someone is only interested in a clade (e.g., ferns) within one of the larger divisions (e.g., plants; ca. 800gb), as the local database is much larger than needed and therefore slow.
So what I propose is a way to limit the database during creation by ID (i.e., GenBank accession number). This is very similar to the example given for extracting data from the database, but instead of returning data to R, it would reduce the size of the external database created. That way, future queries would run on a smaller database and go faster.
Another idea would be to limit the database by taxonomic level, but I am not sure if that is possible with the information available during parsing of the files downloaded from GenBank.
The text was updated successfully, but these errors were encountered: