Crowdfunding-ETL

Overview of analysis

Independent Funding received a dataset that contains information about the backers who have pledged to the live projects. The task was to perform both an ETL (extract, transform and load) process and a dataset process.

Resources

Software used

PgAdmin 4
Visual Studio Code
Jupyter Notebook
QuickDBD

ERD

ETL

The first step I took was extracting the data. Using Jupyter Notebook, I imported the backer_info.csv into my DataFrame and converted each row into a dictionary values using the code shown below, after turning the rows into dictionary values, we can organize the data by adding columns and seperating the information.

# Iterate through the backers DataFrame and convert each row to a dictionary.
dict_values = []
for i, row in backers_info_df.iterrows():
    # Iterate through each dictionary (row) and get the values for each row using list comprehension.
    data = row['backer_info']
    converted_data = json.loads(data)
    row_values = [v for k, v in converted_data.items()]
    # Append the list of values for each row to a list. 
    dict_values.append(row_values)

The next step is the cleaning process. This required me to take the DataFrame I created and making changes such as splitting the full name into a "first name" and "last name" column by using ".str.split(' ', n=1, expand=True)". After doing so, I dropped the column that contained the combined first name/last name "name" column and reorganized my columns.

#  Drop the name column
backers_cleaned = backers_df.drop(['name'], axis=1)

# Reorder the columns
backers_cleaned = backers_cleaned[["backer_id", "cf_id", "first_name", "last_name", "email"]]
backers_cleaned

After exporting the csv labeled "backers.csv", I switched to the ERD I created earlier on and created a "backers" table. By doing this, I was able to take the tables I created within QuickDBD and load the information into a PgAdmin 4 query and import the csv file into the table itself.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Extract-Transform_final_code.ipynb		Extract-Transform_final_code.ipynb
README.md		README.md
backer_info.csv		backer_info.csv
backers.csv		backers.csv
backers_info.csv		backers_info.csv
campaign.csv		campaign.csv
category.csv		category.csv
contacts.csv		contacts.csv
contacts_string.ipynb		contacts_string.ipynb
contacts_string_data.csv		contacts_string_data.csv
crowdfunding.ipynb		crowdfunding.ipynb
crowdfunding.xlsx		crowdfunding.xlsx
crowdfunding_SQL_Analysis.sql		crowdfunding_SQL_Analysis.sql
crowdfunding_db_relationships.png		crowdfunding_db_relationships.png
crowdfunding_db_schema.png		crowdfunding_db_schema.png
crowdfunding_db_schema.sql		crowdfunding_db_schema.sql
crowdfunding_db_table_schema.sql		crowdfunding_db_table_schema.sql
email_backers_remaining_goal_amount.csv		email_backers_remaining_goal_amount.csv
email_contacts_remaining_goal_amount.csv		email_contacts_remaining_goal_amount.csv
subcategory.csv		subcategory.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crowdfunding-ETL

Overview of analysis

Resources

ERD

ETL

About

Releases

Packages

Languages

anrobertson/Crowdfunding-ETL

Folders and files

Latest commit

History

Repository files navigation

Crowdfunding-ETL

Overview of analysis

Resources

ERD

ETL

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages