Skip to content
View mlincon's full-sized avatar
đź‘‹
đź‘‹
Block or Report

Block or report mlincon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

dataEngineering

15 repositories

Price Crawler - Tracking Price Inflation

Python 179 53 Updated Jun 23, 2020

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Jupyter Notebook 36,679 5,844 Updated Jul 5, 2024

Data Engineering pipeline hosted entirely in the AWS ecosystem utilizing DocumentDB as the database

Python 14 6 Updated Oct 26, 2021

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Python 1,261 212 Updated Mar 9, 2020

Columnar storage extension for Postgres built as a foreign data wrapper. Check out https://github.com/citusdata/citus for a modernized columnar storage implementation built as a table access method.

C 1,755 171 Updated Mar 8, 2021

A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.

C# 180 113 Updated Feb 13, 2024

Always know what to expect from your data.

Python 9,702 1,502 Updated Jul 25, 2024

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,194 523 Updated Jul 2, 2024

A framework for moving data into a data warehouse.

Jupyter Notebook 52 22 Updated Sep 7, 2021

Metadata Driven Development (m3d) is a cloud and platform agnostic framework for the automated creation, management and governance of data lakes.

Python 29 8 Updated May 23, 2023

Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.

Python 42 14 Updated Jul 6, 2022

Guides and docs to help you get up and running with Apache Airflow.

JavaScript 796 99 Updated Oct 13, 2022

My Insight Data Engineering Fellowship project. I implemented a big data processing pipeline based on ​lambda architecture​, that aggregates Twitter and US stock market data for user sentiment anal…

Scala 465 123 Updated Aug 24, 2022

Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.

Python 38 15 Updated Sep 1, 2022

Example repo to create end to end tests for data pipeline.

Python 21 3 Updated Jun 14, 2024