Table of contents
https://github.com/LineaLabs/lineapy/blob/update_docs/CONTRIBUTING.md#1-where-to-start
Run the following command in the root directory
sphinx-autobuild docs/source/ docs/build/html/
Note - if you're modifying the docs: Any changes in the rst files in the /docs
directory will be detected and the html refreshed. However, changes in the doc
strings in code will not be picked up, and you'll have to rebuild the docs to refresh.
We recommend you at least read the following sections in the docs before getting started.
- What is a Linea Graph
- Creating Graphs
- Reading Graphs
There are two main ways to set up lineapy
locally either using Conda or using
Docker. If you prefer to use venv instead of Conda, then please follow the instructions here
(optional) First download the submodules so you can run our tests, if you want to run the integration tests:
git submodule update --init --recursive .
conda create --name lineapy python=3.7 \ # a lower version of Python to make sure that it's compatible
postgresql \
graphviz \
cmake # needed for building deps of numpy tutorial on mac (can remove if you are not running)
conda activate lineapy
pip install -r requirements.txt # for dev
pip install -e .
# verify everything works as expected
lineapy --help
pytest tests
(We support python 3.8+ for now and you can initialize a conda environment with python 3.8 as well if you desire)
Sliced code can be exported to an Airflow DAG using the following command:
lineapy python tests/housing.py --slice "p value" --airflow sliced_housing_dag
This creates a sliced_housing_dag.py
file in the current dir. It can be executed with:
airflow db init
airflow dags test sliced_housing_dag_dag $(date '+%Y-%m-%d') -S .
Remove the following folders (if they exist) and create a Docker network
rm -rf build dist
docker network create lineapy
To build the Lineapy container, run make build
(you can pass in arguments with
args=
, i.e. make build args=--no-cache
)
To open bash within the container, run make bash
. One can either use bash for
dev or can connect to remote runtimes inside a container using extensions available for the editor of choice.
make tests
executes the test suite.
To build Lineapy contained with Airflow, run make build-airflow
. make tests-airflow
runs airflow tests.
make airflow-up
is one command that will bring up a standalone local Airflow server on port 8080.
Login and password will be printed on command line output. Please note that this mode used SQLite DB and is not meant for heavy workloads.
We use GitHub flow for commits to lineapy
.
To contribute, please follow these steps:
A. Create a new branch and set its upstream
git checkout -b TicketOrBranchName
git branch --set-upstream-to=origin/<branch> TicketOrBranchName
B. Work on code
C. Add your code changes git add new_code_files
D. pre-commit
your changes
# Installs pre commit hook to run linting before commit:
pre-commit install
# To manually run hooks:
pre-commit
# To force a run even when there are no changes
pre-commit run --all-files
Note that the pre-commit hook does not run the tests, since these are time consuming and you might not want to have to wait to run them on every commit.
The pre commit config also pins the versions of the packages. To update them to
the latest, run pre-commit autoupdate
.
E. finally run tests pytest tests
F. Repeat steps B-E until there are no more errors. If you have added any new
libraries as part of your change, then you need to run the following commands to
reset requirements.txt
. For this, you need to
be running on Python 3.7 (note that the earlier installation commands were
for Python 3.9).
pip uninstall -y `pip freeze` # don't run on base
pip install -e ".[dev]"
pip freeze --exclude-editable >> requirements.txt # so that lineapy is not in the list
Be sure to remove any weird versions due to Conda installations. Note that longer term, we want to make this more automated.
G. Commit your code
git commit -m "message with detailed description"
git push
H. Create a PR to merge your branch
Note - if you're using Docker as your local environment, please check here for additional details about testing.
When contributing, please add unit tests (listed under unit
folder), and when
appropriate, add end to end tests in the end_to_end
directory. We are trying
to maintain a high level of test coverage, especially for parts of the codebase
that's more complex, such as the mutation tracking and analysis for
executor and tracer.
To run all default tests, use the following commands:
pre-commit run --all # currently tests for flake8, black, isort, and mypy
pytest tests
Some tests use syrupy
for snapshot test,
to make it easier to update generate code and graphs.
If you mean to change the tracing or graph spec, or add a new test that uses it,
then run pytest --snapshot-update
to update the saved snapshots.
If using docker, you can use make test args="--snapshot-update"
to update snapshots.
We also generate snapshots for a visualization of the graph as SVG. These are
only used to help in the diffs, not for testing. They will be regenerated by
default only when a snapshot is updated or created. To force them to
regenerate, use the --svg-update
CLI option. They are not regenerated by
default, since their source is not deterministic.
We also use pytest's xfail
to mark tests that are expected to fail, because of a known bug. To have them run anyway, run --run-xfail
.
There are also integration tests that are tested against external libraries. These tests
also take a while and many of them are currently xfailing. To run them, use
pytest tests -m "integration"
(in the root dir).
We currently have notebooks that are also evaluated in the tests, and the outputs are compared.
If you want to re-run all the notebooks and update their outputs, run:
make notebooks
# You can also re-run and save a particular notebook with
make tests/notebook/test_visualize.ipynb
Or you can open it in a notebook UI (JupyterLab, JupyterNotebook, VS Code, etc.) and re-run it manually
To skip testing on a test result, you can add # NBVAL_SKIP
to the top of the cell.
Some tests have been marked "slow". These typically take > 0.5s and can be skipped
by passing the args pytest tests -m "not slow"
when running pytest.
We also added some tests which run airflow to verify that it works on the code we produce.
These also take a lot longer, they create their own virtualenv
with airflow in it, and create a new airflow DB. By default, those are not run.
To run them, use pytests tests -m "airflow"
when running pytest.
If using Docker, please add appropriate tests and ensure all tests are working using
make test
. Any args to pytest can be passed using args="xxx". e.g., individual
tests can be run using make test args="<path_to_test_file>"
.
Please ensure linting and typecheck
s are done before committing your code.
When using docker, this can be done using make lint
and make typecheck
respectively. A
pre-commit hook that runs make blackfix lint typecheck build test
will fix
any fixable issues and ensure build and test works.
To help debug when writing a test, or to get a better understanding of the codebase, we have implemented a --tree-log
CLI command, which will print
a visual tree, using Rich's tree renderer, of all method calls in our main classes.
To change the appearance of the logs or what classes are logged, look at the lineapy/utils/tree_logger.py
file.
.vscode/launch.json
has a VSC debug configuration for lineapy
which executes
lineapy python --slice "p value" tests/housing.py
through VSC "Run and Debug" dialog.
Sometimes it's helpful to see a visual representation of the graph
and the tracers state, while debugging a test. Run the tests with --visualize
to have it save a tracer.pdf
file whenever it run an execution.
You can also run tests with pytest --snapshot-update test_name
, which will
create snapshots in __snapshots__/test_name/
folder for the particular test you're running
Note: This requires graphviz to be installed.
We have logging set up as well, which can be printed while running the tests to help with debugging:
pytest --log-cli-level INFO
If you would like to see the logs pretty printed, using Rich's custom log handler you have to disable pytests built in handler disable its stdout capturing:
pytest -p no:logging -s
By default, linea will rewrite any exceptions raised during the normal
execution of users code to attempt to match Python's behavior. To disable our
custom exception handling, set the LINEA_NO_EXCEPTIONS
environment variable
to any value.
If you want to inspect the AST of some Python code for debugging, you can run:
./tests/tools/print_ast.py 'hi(a=10)'
Verifying the specs
./tests/tools/test_validate_annotation_spec.py
Tests are run on Github Actions. If you are trying to debug a failure that
happens on Github Actions, you can try using act
,
which will run it locally through Docker.
brew install act
act
# When it prompts, the "medium" image seems to work out alright.
Please see this for notes on profiling.
Note - on M1 chip with macOS Monterey & using conda's Python 3.8.11, installing the requirements fails due to a failure in building fastparquet==0.7.2 Downgrading to 0.7.0 solves the issue - simply change the fastparquet version in requirements.txt.
Another heuristic is that when installing with pip fails, you can try installing using conda, sometimes they have better M1 support---M1 installation is still a bit of trials and error.
If you have weird issues with pytest, try conda install pytest
.
If you prefer using venv, then you need to install the following prerequisites first
- Python
Note - for Apple users, DON'T use the native Python that comes with macOS, instead install one from brew
brew install python #for the latest version or python@3.8
- OpenSSL, postgres, and graphviz
brew install openssl
brew install postgresql
brew install graphviz
- [Optional] Rust (this was at a time needed, but we were not able to reproduce this dependency requirement in later tries)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
Now you're ready to create your venv
python3.9 -m venv lineapy # or python3.8
source lineapy/bin/activate
pip install -r requirements.txt
pip install -e .
# verify everything works as expected
lineapy --help
pytest tests
The Docker and
make files are good starting point to see the main components of lineapy
We provide a benchmarking command to test how much lineapy impacts the performance. To use it,
pass in the path to a notebook (which does not import lineapy) to lineapy benchmark
and lineapy will try it multiple time and report some statistics (using the method
described in the paper "Quantifying Performance Changes with Effect Size Confidence
Interval") by Tomas Kalibera and Richard Jones).
$ cd tests/integration
# First run the test, to create the conda env for the tensorflow notebook (this make take a while)
$ pytest 'test_slice.py::test_slice[tensorflow_preprocessing_layers]' -m integration -s --log-cli-level info
# Then activate the environment
$ conda activate envs/tensorflow-docs
# And run the benchmark
$ lineapy benchmark sources/tensorflow-docs/site/en/tutorials/structured_data/preprocessing_layers.ipynb
──────────────────────────────────── Benchmarking sources/tensorflow-docs/site/en/tutorials/structured_data/preprocessing_layers.ipynb ────────────────────────────────────
───────────────────────────────────────────────────────────────────────── Running without lineapy ─────────────────────────────────────────────────────────────────────────
12.3 seconds (discarding first run)
13.1 seconds
12.1 seconds
12.1 seconds
Executing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Mean: 12.4 seconds
────────────────────────────────────────────────────────────────────────── Running with lineapy ───────────────────────────────────────────────────────────────────────────
15.1 seconds
14.1 seconds
14.1 seconds
Executing... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Mean: 14.4 seconds
──────────────────────────────────────────────────────────────────────────────── Analyzing ────────────────────────────────────────────────────────────────────────────────
Lineapy is between 26.4% slower and 6.6% slower (90.0% CI)