Superhero_NLP

"Work in Progress"

Superheros come to save the day! Source: Marjan Blan, unsplash.com

Intro

Trying to determine if a superhero is good or bad from text. A Natural Language Processing Model based on a dataset from Kaggle Kernels.

README Outline

Introduction
README Outline
Repo Contents
Libraries and Prerequisites
Feature and Definitions
Models
Conclusions
Future Work
Built With, Contributors, Authors, Acknowledgments

Using 'text' not 'photographs' to determine who is who in the hero world! Source: Marjan Blan, Unsplash.com

Repo Contents

This repo contains the following:

README.md - this is where you are now!
Superhero_NLP_Notebook.ipynb - the Jupyter Notebook containing the finalized code for this project.
LICENSE.md - the required license information.
Data - superheroes_nlp_dataset.csv
CONTRIBUTING.md
Images - contains the fun images to this repo.
gitignore

Libraries & Prerequisites

These are the libraries that I used in this project.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import string, re
import nltk
from nltk import FreqDist, word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import MultinomialNB

Bar graph of Superheros ranked by total powers. Source: Superhero_NLP_Notebook.ipynb

Features

There are 81 columns in this dataset. Below is a sample of some of the ones I used:

0 name 1448 non-null object
1 real_name 1301 non-null object
2 full_name 956 non-null object
3 overall_score 1450 non-null object
4 history_text 1360 non-null object
5 powers_text 1086 non-null object
6 intelligence_score 1450 non-null int64
7 strength_score 1450 non-null int64
8 speed_score 1450 non-null int64
9 durability_score 1450 non-null int64
10 power_score 1450 non-null int64
11 combat_score 1450 non-null int64
12 superpowers 1450 non-null object
13 alter_egos 1450 non-null object
14 aliases 1450 non-null object
15 place_of_birth 788 non-null object
16 first_appearance 1247 non-null object
17 creator 1311 non-null object
18 alignment 1368 non-null object The majority of the rest of the features are floats, describing the superhero powers as a number.

Models

Naive Bayes and Random Forest

Conclusions

Future Work

There is so much more I want to do! I still have a lot to learn about text pre-processing - including how to put it all into a nice easy function.

Is this one of the good guys! Source: Yulia Matvienko, unsplash.com

Built With:

Jupyter Notebook Python 3.0 scikit.learn

Contributing

Please read CONTRIBUTING.md for details

Authors

Thomas Whipple

License

Please read LICENSE.md for details

Acknowledgments

Kaggle - This dataset comes from the following: https://www.kaggle.com/jonathanbesomi/superheroes-nlp-dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Superhero_NLP

"Work in Progress"

Intro

README Outline

Repo Contents

Libraries & Prerequisites

Features

Models

Conclusions

Future Work

Built With:

Contributing

Authors

License

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Data		Data
Images		Images
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Superhero_NLP_Notebook.ipynb		Superhero_NLP_Notebook.ipynb

License

twhipple/Superhero_NLP

Folders and files

Latest commit

History

Repository files navigation

Superhero_NLP

"Work in Progress"

Intro

README Outline

Repo Contents

Libraries & Prerequisites

Features

Models

Conclusions

Future Work

Built With:

Contributing

Authors

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages