Skip to content

BopItFreak/Caesar-stylometric-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Caesar Stylometric Assignment

This assignment is an analysis of Caesar's De Bello Gallico. There are 7 parts to this assignment.

Part 1: Build corpus and normalise the text

  • The text is fetched from thelatinlibrary.com, and is fetched via node-fetch.

Part 2: Normalise the text

  • Text is formatted using regex to remove newlines, text between the square brackets at the beginning of each chapter.
  • Text is converted to lower case.

Part 3: Determine the average length of sentence in characters and words.

  • Text is parsed into sentences using an external library, as abbreviations such as vii. and kal. made it impossible to split sentences by looking for periods.
  • The total word count or the number of characters is then divided by the total number of sentences to get the average characters or words in a sentence.

Part 4: Comparative length of a sentences in a paragraph.

  • I haven't quite figured this part out yet

Part 5: Stop Words

  • The text is split into words, and the words are then sorted by frequency of their occurance.
  • A list of "stop words" like prepositions and common forms of sum was compiled.
  • The most common words excluding the "stop words" are then displayed

Results and more to come!

About

Caesar Stylometric Assignment for school.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published