Skip to content

juandes/infinity-war-spacy

Repository files navigation

Reliving Avengers: Infinity War with spaCy and Natural Language Processing

Overview

This repo contains the scripts used in my latest experiment titled Reliving Avengers: Infinity War with spaCy and Natural Language Processing, available at this link Reliving Avengers: Infinity War with spaCy and Natural Language Processing.

Using spaCy, an NLP Python open source library designed to help us process and understand volumes of text, I analyzed the script of the movie to investigate the following concepts:

  • Overall top 10 verbs, nouns, adverbs and adjectives from the film.
  • Top verbs and nouns spoke by a particular character
  • Top 30 named entities from the film
  • The similarity between the lines spoken by each character pair, e.g., the similarity between Thor's and Thanos' lines.

Tools used

  • Python
  • spaCy

Repo content

Besides the scripts, the repo contains the full movie script (raw_script.txt), the script without comments, scenes descriptions, and the subjects (cleaned-script.txt), and the cleaned script but with the subjects (cleaned-script-subject.txt). Moreover, the plots directory contains all the plots that show the top nouns, adverbs, adjetives, verbs and entities per character.

Thanks to Manuel Romero (https://github.com/mrm8488) for writing the Jupyter notebook.

About

Code that supplements my article: Reliving Avengers: Infinity War with spaCy and Natural Language Processing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published