Skip to content
forked from joarSv/pyCrawler

Threaded python web crawler with MongoDB backend

Notifications You must be signed in to change notification settings

zpc5686/pyCrawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pyCrawler

Threaded python web crawler with MongoDB backend.

The file crawlerLinks.py grabs a random link from a given collection. It crawls for new http links and if they're unique it does a new insert into the collection.

The other file named crawlerMeta.py grabs a random link from a collection and extracts some meta data using Beautiful Soup 4. The meta data gets saved together with the crawled link. The number of threads used can easily be modified in the scripts.

About

Threaded python web crawler with MongoDB backend

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published