Skip to content

chiritagabriela/WebCrawlerMTA

Repository files navigation

Web Crawler 🕷️ 🕸️

Star on GitHub

The software is designed to provide basic functionality of a Web Crawler. A web crawler is an Internet bot that helps at web indexing.These crawls one page at a time through a website, until all of the pages have been indexed.
After the resorces have been download is completed, the user will be informed about the success rate of the download and will be able to choose from one of the many data analysis methods like:

🔱 filtering data by type,dimension and date;
🔱 searching by keywords;
🔱 creating a sitemap;

SRS & SDD

To see details about the Software Requirements Specification access the link:SRS.
To see details about the Software Design Description access the link SDD.

How to use application? 💻 ⌨️

Set crawling configuration file

crawl config.cnf

Crawling sites from a given input file

crawl input.txt

Crawling sites from a given input file with a specified type

crawl input.txt type png

Filter downloaded sites by type

filter type png

Filter downloaded sites by date

filter date zz/mm/yyyy

Filter downloaded sites by dimension

filter size 150 kb

Create sitemap in an output file

sitemap sitemap.txt

Search word in all the downloaded files

search word

Exit

exit

Testing

In order to use the application, please make sure you first see the testing documentation Testing

Team

Andrei Claudia 👩‍🎓

Chiriță Gabriela 👩‍🎓

Guțu Bogdan 👨‍🎓

Manea Sebastian 👨‍🎓

Mercheș Diana 👩‍🎓

Releases

No releases published

Packages

No packages published

Languages