-
DataEdge Systems
- India
-
-
-
Build-Data-Warehouse-project-in-Hive Public
Forked from Ajay026/Build-Data-Warehouse-project-in-HiveHive Mini Project to Build a Data Warehouse for e-Commerce
UpdatedNov 1, 2022 -
Azure-Databricks-project-on-Yelp-Dataset Public
Forked from Ajay026/Azure-Databricks-project-on-Yelp-DatasetAnalyse Yelp Dataset with Spark & Parquet Format on Azure Databricks
Jupyter Notebook UpdatedNov 1, 2022 -
Hadoop-Project Public
Forked from Ajay026/Hadoop-ProjectHadoop Project analysis with Hive
HiveQL UpdatedNov 1, 2022 -
Azure-Project-on-Movielens-Data Public
Forked from Ajay026/Azure-Project-on-Movielens-DataBuild an Azure Recommendation Engine on Movielens Dataset
Jupyter Notebook UpdatedNov 1, 2022 -
spark-etl Public
Forked from aphp/spark-etlBetter bridge apache spark and postgresql
Scala Apache License 2.0 UpdatedOct 19, 2022 -
ApacheSpark Public
Forked from martandsingh/ApacheSparkThis repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We …
Python UpdatedOct 18, 2022 -
Snowflake-Azure-Project Public
Forked from Ajay026/Snowflake-Azure-ProjectSnowflake project using Azure with solutions.
Python UpdatedOct 17, 2022 -
SQL-Project-for-Data-Analysis-part-1-7 Public
Forked from Ajay026/SQL-Project-for-Data-Analysis-part-1-7Complete SQL Project for data analysis with source code.
UpdatedOct 11, 2022 -
Udacity-Data-Engineering-Projects Public
Forked from san089/Udacity-Data-Engineering-ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Python Other UpdatedAug 26, 2022 -
cca175 Public
Forked from itversity/cca175This is the repository for itversity CCA 175 material with YouTube videos.
Jupyter Notebook MIT License UpdatedFeb 16, 2022 -
apache-spark-etl-pipeline-example Public
Forked from jamesbyars/apache-spark-etl-pipeline-exampleDemonstration of using Apache Spark to build robust ETL pipelines while taking advantage of open source, general purpose cluster computing.
Python UpdatedJan 13, 2022 -
spear-framework Public
Forked from romans-weapon/spear-frameworkRapid ETL/ELT-connectors/pipeline development leveraged on top of Apache Spark
Scala Apache License 2.0 UpdatedDec 16, 2021 -
gcp-etl Public
Forked from camposvinicius/gcp-etlThis is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-codes/r/airport-codes_zip.zip, it's about a zipped .json, which…
Smarty UpdatedNov 15, 2021 -
spark2-etl-examples Public
Forked from anish749/spark2-etl-examplesA project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0
Scala UpdatedAug 5, 2021 -
datalake-etl-pipeline Public
Forked from vim89/datapipelines-essentials-pythonSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transform…
Python Apache License 2.0 UpdatedJul 23, 2021 -
Batch-ETL-with-AWS-EMR-and-MWAA Public
Forked from anthonywong611/Batch-ETL-with-AWS-EMR-and-MWAACreate a data pipeline on AWS to execute batch processing in a Spark cluster provisioned by Amazon EMR. ETL using managed airflow: extracts data from S3, transform data using spark, load transforme…
Python UpdatedJul 12, 2021 -
PySpark Public
Forked from hyunjoonbok/PySparkPySpark functions and utilities with examples. Assists ETL process of data modeling
Jupyter Notebook UpdatedDec 3, 2020 -
the-incredible-pytorch Public
Forked from ritchieng/the-incredible-pytorchThe Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.
MIT License UpdatedSep 12, 2020 -
TensorFlow-Examples Public
Forked from aymericdamien/TensorFlow-ExamplesTensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
Jupyter Notebook Other UpdatedSep 6, 2020 -
spark-twitter-streaming Public
Forked from pran4ajith/spark-twitter-streamingA real-time streaming ETL pipeline for streaming and performing sentiment analysis on Twitter data using Apache Kafka, Apache Spark and Delta Lake.
Python GNU General Public License v3.0 UpdatedAug 8, 2020 -
-
CCA175-PySpark-Practice-with-solutions Public
Forked from ramapilli16/CCA175-PySpark-Practice-with-solutionsMy Solutions to the practice tests provided at http://nn02.itversity.com/cca175/ by ITVersity.
UpdatedJul 15, 2020 -
spark-scala-examples Public
Forked from spark-examples/spark-scala-examplesThis project provides Apache Spark SQL, RDD, DataFrame and Dataset examples in Scala language
Scala UpdatedJul 6, 2020 -
pyspark-examples Public
Forked from spark-examples/pyspark-examplesPyspark RDD, DataFrame and Dataset Examples in Python language
Python UpdatedJun 23, 2020 -
Movalytics-Data-Warehouse Public
Forked from alanchn31/Movalytics-Data-WarehouseData pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Python UpdatedJun 16, 2020 -
Cloudera_Material Public
Forked from san089/Cloudera_MaterialCloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collaborate.
MIT License UpdatedApr 21, 2020 -
spark-amazon-s3-examples Public
Forked from spark-examples/spark-amazon-s3-examplesScala UpdatedMar 19, 2020 -
goodreads_etl_pipeline Public
Forked from san089/goodreads_etl_pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Python MIT License UpdatedMar 9, 2020