this is a script that will show you how to convert pdf book or documents and html page text to custom sized chunks. the script will also show you how to write the listed text chunks to a corpus folder that can be used to train text models.
the attached file is a python notebook file that I use in jupyter lab.