Skip to content
/ lineapy Public
forked from LineaLabs/lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.

License

Notifications You must be signed in to change notification settings

daavoo/lineapy

Repository files navigation

Python Versions Build Documentation Status Slack License PyPi Twitter

Supercharge your data science workflow with LineaPy! Just two lines of code captures, analyzes, and transforms Python code to extract production data pipelines in minutes.

👇 Try It Out! 👇
Open in Colab

Why Use LineaPy?

Going from development to production is full of friction. Data engineering is a manual and time-consuming process. A proliferation of libraries, tools, and technologies means data teams spend countless hours managing infrastructure and repeating tasks. This drastically reduces the team’s ability to deliver actionable insights in real-time.

LineaPy creates a frictionless path for taking your data science work from development to production, backed by a decade of research and industry expertise tackling hyperscale data challenges.

Data engineering, simplified.

Your data science artifact works, now comes the cleanup. LineaPy extracts essential operations from the messy development code in minutes not days, simplifying data engineering with just two lines of code.

You analyze, we productionize.

Productionization is manual, messy, and it requires software engineering expertise to create clean, reproducible code. LineaPy automatically handles lineage and refactoring so you can focus on experimenting, analyzing, and modeling.

Move fast from prototype to pipeline.

LineaPy automates code translations to save you time and help you stay focused. Rapidly create analytics pipelines with a simple API — no refactoring or new tools needed. Go from your Jupyter notebook to an Airflow pipeline in minutes.

Getting Started

Installation

To install LineaPy, run:

$ pip install lineapy

Or, if you want the latest version of LineaPy directly from the source, run:

$ pip install git+https://github.com/LineaLabs/lineapy.git --upgrade

By default, LineaPy uses SQLite for artifact store, which keeps the package light and simple. However, SQLite has several limitations, one of which is that it does not support multiple concurrent writes to a database (it will result in a database lock). If you want to use a more robust database, please follow instructions for using PostgreSQL.

Interfaces

Jupyter and IPython

To use LineaPy in an interactive computing environment such as Jupyter Notebook/Lab or IPython, launch the environment with the lineapy command, like so:

$ lineapy jupyter notebook
$ lineapy jupyter lab
$ lineapy ipython

This will automatically load the LineaPy extension in the corresponding interactive shell application.

Or, if the application is already running without the extension loaded, which can happen when we start the Jupyter server with jupyter notebook or jupyter lab without lineapy, you can load it on the fly with:

%load_ext lineapy

executed at the top of your session. Please note:

  • You will need to run this as the first command in a given session; executing it in the middle of a session will lead to erroneous behaviors by LineaPy.

  • This loads the extension to the current session only, i.e., it does not carry over to different sessions; you will need to repeat it for each new session.

CLI

We can also use LineaPy as a CLI command. Run:

$ lineapy python --help

to see available options.

Quick Start

Once you have LineaPy installed, you are ready to start using the package. We can start with a simple example that demonstrates how to use LineaPy to store a variable's history. The lineapy.save() function removes extraneous code to give you the simplest version of a variable's history.

Say we have development code looking as follows:

import lineapy

# Define text to display in page heading
text = "Greetings"

# Some irrelevant operation
num = 1 + 2

# Change heading text
text = "Hello"

# Another irrelevant operation
num_squared = num**2

# Augment heading text
text = text + " World!"

# Try an alternative display
alt_text = text.split()

Now, we have reached the end of our development session and decided that we like what we see when we print(text). As shown above, text has gone through different modifications, and it might not be clear how it reached its final state especially given other extraneous operations between these modifications. We can cut through this by running:

# Store the variable's history or "lineage"
lineapy.save(text, "text_for_heading")

# Retrieve the stored "artifact"
artifact = lineapy.get("text_for_heading")

# Obtain the simplest version of a variable's history
print(artifact.get_code())

which will print:

text = "Hello"
text = text + " World!"

Note that these are the minimal essential steps to get to the final state of the variable text. That is, LineaPy has performed code cleanup on our behalf, moving us a step closer to production.

Usage Reporting

LineaPy collects anonymous usage data that helps our team to improve the product. Only LineaPy's API calls and CLI commands are being reported. We strip out as much potentially sensitive information as possible, and we will never collect user code, data, variable names, or stack traces.

You can opt-out of usage tracking by setting environment variable:

$ export LINEAPY_DO_NOT_TRACK=true

What Next?

To learn more about LineaPy, please check out the project documentation which contains many examples you can follow with. Some key resources include:

Resource Description
Docs This is our knowledge hub — when in doubt, start here!
Concepts Learn about key concepts underlying LineaPy!
Tutorials These notebook tutorials will help you better understand core functionalities of LineaPy
Use Cases These domain examples illustrate how LineaPy can help in real-world applications
API Reference Need more technical details? This reference may help!
Contribute Want to contribute? These instructions will help you get set up!
Slack Have questions or issues unresolved? Join our community and ask away!

About

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 51.4%
  • Jupyter Notebook 48.0%
  • Jinja 0.5%
  • Makefile 0.1%
  • Dockerfile 0.0%
  • Mako 0.0%