Retriever allows committing of datasets and installation of the committed dataset into the database of your choice at a later date. This ensures that the previous outputs/results can be produced easily.
The directory to save your committed dataset can be defined by setting the environment variable PROVENANCE_DIR
.
However, you can still save the committed dataset in a directory of your choice by defining the path
while committing
the dataset.
Retriever supports committing of a dataset into a compressed archive.
def commit(dataset, commit_message='', path=None, quiet=False):
A description of the default parameters mentioned above:
dataset (String): Name of the dataset.
commit_message (String): Specify commit message for a commit.
path (String): Specify the directory path to store the compressed archive file.
quiet (Bool): Setting True minimizes the console output.
Example to commit dataset:
retriever commit abalone-age -m "Example commit" --path .
Committing dataset abalone-age
Successfully committed.
>>> from retriever import commit
>>> commit('abalone-age', commit_message='Example commit', path='/home/')
If the path is not provided the committed dataset is saved in the provenance directory
.
You can view the log of commits of the datasets stored in the provenance directory.
def commit_log(dataset):
A description of the parameter mentioned above:
dataset (String): Name of the dataset.
Example:
retriever log abalone-age
Commit message: Example commit
Hash: 02ee77
Date: 08/16/2019, 16:12:28
>>> from retriever import commit_log
>>> commit_log('abalone-age')
You can install committed datasets by using the hash-value or by providing the path of the compressed archive. Installation using hash-value is supported only for datasets stored in the provenance directory.
For installing dataset from a committed archive you can provide the path to the archive in place of dataset name:
retriever install sqlite abalone-age-02ee77.zip
>>> from retriever import install_sqlite
>>> install_sqlite('abalone-age-02ee77.zip')
Also, you can install using the hash-value of the datasets stored in provenance directory. You can always look up the
hash-value of your previous commits using the command retriever log dataset_name
.
For installing dataset from provenance directory provide the hash-value
of the commit.
retriever install sqlite abalone-age --hash-value 02ee77
>>> from retriever import install_sqlite
>>> install_sqlite('abalone-age', hash_value='02ee77')