Skip to content

cmcarmon/pdf-or-html-to-txt-write-to-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

pdf-or-html-to-txt-write-to-corpus (Python)

this is a script that will show you how to convert pdf book or documents and html page text to custom sized chunks. the script will also show you how to write the listed text chunks to a corpus folder that can be used to train text models.

the attached file is a python notebook file that I use in jupyter lab.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published