Demo of dlt's REST API source

In this repo we collect the code created in our data engineering tutorials on the dlt REST API source – the easiest way to create pipelines loading data from REST API sources into various destination systems.

This verified REST API source allows us to develop a source connector using just a Python dictionary. This gives us the maximum flexibility of Python becasue we're not restricted to JSON or YAML. At the same time, the code is very easy to read and write.

These video tutorials cover topics, such as:

Video 1: Basic Walkthrough

We cover in pokemon_pipeline.py:

Import endpoints
Adding custom query parameters
Specify parent-child relationship between endpoints /berry/{berry_name}
Install dlt and rest_api source
Inspect loaded data

Video

Installation

poetry install
dlt init rest_api duckdb

Run

python pokemon_pipeline.py

Inspect Data

Using streamlit app with GUI in web browser:

dlt pipeline pokemon_pipeline show

Alternatively, using duckdb CLI:

duckdb pokemon_pipeline.duckdb

use pokemon;
select
  berry_details.name,
  berry_details.size,
  berry_details__flavors.flavor__name,
  berry_details__flavors.potency
from berry_details
join berry_details__flavors on berry_details._dlt_id = berry_details__flavors._dlt_parent_id;

Video 2: Authentication

We cover:

simple bearer token authentication in github_pipeline.py
more complex HTTP Basic authentication in freshdesk_pipeline.py

We add the secrets into the hidden .dlt/secrets.toml. For production use cases, we recommend using environment variables provided by a secrets manager.

More infos on dlt secrets and configs

Video

Video 3: Incremental Loading

We load github issues incrementally in github_pipeline.py.

As explained in the video, if you want to load a resource incrementally which was previously loaded with the write_disposition=(append, replace) we need to reset the pipeline state. You can use:

dlt pipeline github_pipeline drop issues

This avoids dlt from attempting to apply an ALTER TABLE statement adding a constraint on the ID field which duckdb does not support at the moment (v0.9.2).

Video

Tutorial 4: Custom Authentication

In zoom.py, we implement a connector to the Zoom API to load meeting and webinar information. We implemented the specific OAuth 2.0 implementation for Zoom. Also, we implemented response actions, such as ignoring certain error messages or HTTP status codes.

See tutorial blog post here: How To Create A dlt Source With A Custom Authentication Method

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.dlt		.dlt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
freshdesk_pipeline.py		freshdesk_pipeline.py
github_pipeline.py		github_pipeline.py
poetry.lock		poetry.lock
pokemon_pipeline.py		pokemon_pipeline.py
pyproject.toml		pyproject.toml
zoom.py		zoom.py
zoom_pipeline.py		zoom_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demo of dlt's REST API source

Video 1: Basic Walkthrough

Installation

Run

Inspect Data

Video 2: Authentication

Video 3: Incremental Loading

Tutorial 4: Custom Authentication

About

Languages

License

untitled-data-company/dlt-rest-api-tutorial

Folders and files

Latest commit

History

Repository files navigation

Demo of dlt's REST API source

Video 1: Basic Walkthrough

Installation

Run

Inspect Data

Video 2: Authentication

Video 3: Incremental Loading

Tutorial 4: Custom Authentication

About

Topics

Resources

License

Stars

Watchers

Forks

Languages