Skip to content

Knowledge Is Power is a data analysis and prediction tool leveraging U.S. Census data to provide insights into societal topics. Utilizing machine learning, specifically MATLAB's classification learner, the project predicts educational attainment based on income and offers interactive visualizations of veterans' data.

Notifications You must be signed in to change notification settings

mar19a/KnowledgeIsPower

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Knowledge Is Power

knowledgeispower

Refer to our Final Project Presentation.

Overview

The "KnowledgeIsPower" project is a comprehensive analysis and prediction tool that leverages U.S. Census data to provide insights into various societal topics. This project includes features like data analysis of veterans, educational predictions based on income, and more. It showcases my ability to apply data science and machine learning techniques to solve real-world problems.

Features

  • Veterans Data Analysis: Analyze veterans data by different demographics such as sex, providing insights through interactive graphs.
  • Educational Predictions: Predict educational attainment based on income using machine learning models, and compare earnings across different educational levels.
  • Interactive Visualizations: Provide users with dynamic and informative visualizations to better understand the data and predictions.

Machine Learning Utilization

How did we use the Machine Learning Toolbox?

We used the classification learner tool in MATLAB to train a model that predicts a user’s educational degree based on their income. The data was sourced from the census earnings table, and we plotted year vs. earning for different types of degrees such as less than high school, high school graduate, some college, bachelor’s degree, and graduate or professional degree. We used the linear discriminant option to train and predict the degree type.

In the confusion matrix plot, the program predicted 100% of the degree categories correctly. Unlike predicting numbers, this program predicts a category.

Goal Accomplishment

Our main goal for this experiment was to create a user-friendly program where a user could pick any topic from the U.S. census and learn more about it. We aimed to demonstrate the value of data and how different tools can be used to make predictions based on given data. We believe we achieved these goals by creating an engaging and easy-to-use program and using MATLAB’s classification learner to predict a person's degree based on their income.

By using our program, users can learn more about various topics in the U.S., such as education, business, and population. This knowledge can help make the U.S. more progressive, as education is the premise of progress.

What I Learned

  • Data Analysis and Visualization: Gained expertise in analyzing large datasets and presenting insights through visualizations.
  • Machine Learning: Applied machine learning techniques using MATLAB’s classification learner tool to predict educational attainment.
  • User-Friendly Design: Designed an engaging and easy-to-use interface to help users interact with and understand the data.

Importance as a Software Engineer

  • Problem-Solving Skills: Demonstrated ability to tackle complex societal problems using data-driven approaches.
  • Technical Proficiency: Enhanced skills in data analysis, machine learning, and visualization, which are critical in the field of software engineering.
  • Impactful Work: Developed a tool that can educate and inform users, contributing to a more knowledgeable society.

Technologies Used

  • Programming Languages:
    • MATLAB (for data analysis and machine learning)
    • Python (for data processing and visualization)
  • Tools and Libraries:
    • MATLAB Classification Learner
    • Pandas, Matplotlib, Seaborn (Python)
  • Data Source:
    • U.S. Census Data

Project Links

Contact

Feel free to reach out to me via my LinkedIn profile.

License

This project is licensed under the MIT License.

Additional Information

Understanding Data Bias

The article by Prabhakar Krishnamurthy, “Understanding Data Bias,” discusses biases caused by machine learning. Examples include Amazon's ML program that discriminated against female applicants and an ad ranking system accused of racial and gender profiling. Data bias can stem from various sources, including human-generated content and system drift over time. Identifying and mitigating these biases involves techniques like pre-processing data, performing exploratory data analysis, and using in-processing and post-processing methods during training.

Krishnamurthy, Prabhakar. “Understanding Data Bias.” Medium, Towards Data Science, 22 Oct. 2019. Understanding Data Bias.

Conclusion

In this project, we applied our MATLAB skills to solve a societal problem, emphasizing the importance of education for progress. Our tool helps users appreciate the value of education and understand the realities of the country, aligning with our belief that “Knowledge is Power. Education Is the Premise of Progress in Every Society.”

Link to Code and Data

You can find the code and data table used for this project in the following link: Google Drive Folder

About

Knowledge Is Power is a data analysis and prediction tool leveraging U.S. Census data to provide insights into societal topics. Utilizing machine learning, specifically MATLAB's classification learner, the project predicts educational attainment based on income and offers interactive visualizations of veterans' data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages