project_data_modeling_cassandra

note: remove aws password before public

Project Description

Project Github address: https://github.com/gilzero/project_data_modeling_cassandra

Data modeling with Apache Cassandra and build an ETL pipeline using Python. Working with raw event log files and combine the logs. By checking a pre-given queries, create table and validate with read. ETL pipeline that transfers data from files into Apache Cassendra. For this project, I provide local solution (remote workspace) and AWS deployment solution.

About: Apache Cassandra Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. (Some adopters like Apple, Uber, Netflix.)

Local Solution

file: Project_1B_Project_local.ipynb This notebook is the solution for Apache Cassandra cluster set up on local machine / workspace.

AWS Solution (Amazon Keyspaces for Apache Cassandra)

file: Project_1B_Project_with_AWS.ipynb

I have also created a solution that working on actual AWS Keyspaces cluster under Hongkong Region.

Before start, create a user with Keyspaces permission, under IAM settings.

Cluster exists already in region, just connect it.

For this case I use HK region, the cluster address is:

cassandra.ap-east-1.amazonaws.com

Note that AWS requires to set consistency level for inserting data. https://docs.datastax.com/en/developer/python-driver/3.25/getting_started/

No need to shutdown cluster as it is persisted in the region.

Make sure remove keyspace once no longer in use, otherwise extra charges may occurred!

How to Run:

Run the ETL via Jupyter notebook.

Environment

Developed and tested on remote workspace and AWS Keyspaces service.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
event_data		event_data
Project_1B_ Project_Template.ipynb		Project_1B_ Project_Template.ipynb
Project_1B_ Project_local.ipynb		Project_1B_ Project_local.ipynb
Project_1B_ Project_with_AWS.ipynb		Project_1B_ Project_with_AWS.ipynb
README.md		README.md
aws_keyspaces.py		aws_keyspaces.py
data.tar.gz		data.tar.gz
event_datafile_new.csv		event_datafile_new.csv
main.py		main.py
notebook_files.tar.gz		notebook_files.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

project_data_modeling_cassandra

Project Description

Local Solution

AWS Solution (Amazon Keyspaces for Apache Cassandra)

How to Run:

Environment

About

Releases

Packages

Languages

gilzero/project_data_modeling_cassandra

Folders and files

Latest commit

History

Repository files navigation

project_data_modeling_cassandra

Project Description

Local Solution

AWS Solution (Amazon Keyspaces for Apache Cassandra)

How to Run:

Environment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages