#

crawler-engine

Here are 49 public repositories matching this topic...

WebScrapper

nuhmanpk / WebScrapper

Simple and powerfull all in one Telegram Bot to scrap / crawl webpages using Requests, html5lib and Beautifulsoup

Updated Apr 19, 2024
Python

6677-ai / tap4-ai-crawler

The crawler opened source by tap4.ai

crawler crawler-engine crawler-python aitoolkit aitools

Updated Jun 21, 2024
Python

namhong1412 / browser-clone-web

Use browser to re-copy a web page

python chromedriver selenium-python crawler-engine clone-ui clone-website

Updated Apr 4, 2023
Python

bkeepers / spiderman

your friendly neighborhood web crawler

ruby http crawler spider web-crawler nokogiri web-scraping webcrawler webscraping spider-framework crawler-engine httprb

Updated Jul 26, 2022
Ruby

supernebula / shark

Shark (Plunder)可配置、插件化的爬虫引擎，二次开发框架。Configurable, pluginable crawler engine, secondary development framework.

downloader framework pipeline scheduler analyzer crawler-engine remove-duplicate

Updated Feb 10, 2022
C#

Keerthivasan13 / Targeted_Advertising_Google_AdSense

Hybrid E-Marketing using Web Page Mining for Website Monetization

information-retrieval data-mining google google-analytics google-adwords jsoup data-engineering naive-bayes-classifier google-adsense ranking-algorithm google-ads crawler-engine advertisement-management-system website-monetization targeted-advertising

Updated Mar 28, 2020
TSQL

web-extractors / arachnid-seo-js

Web crawler for extracting internal site links info for SEO auditing & optimization purposes

crawler scraper seo seotools seo-optimization crawler-engine

Updated Dec 4, 2023
TypeScript

lichang98 / visualize_spider

基于Spring Boot、Scrapy 的可视化爬虫配置与管理

visualization crawler-engine

Updated May 11, 2019
HTML

plugnsearch / plugnsearch

The only real pluggable crawler / spider / webcrawler to search the web for stuff you need to know.

search-engine crawler scraper crawler-engine webpage-scraper

Updated Apr 23, 2023
JavaScript

ShiqinHuo / wuhan_house_price_crawler

武汉东湖高新片区光谷&软件园二手房房价爬虫。data source: 房天下

crawler housing-prices scraping-websites house-price-prediction crawler-engine fangtianxia guanggoo scraping-python wuhan house-prices-crawler crawler-house-prices wuhan-house-prices

Updated Apr 29, 2019
Jupyter Notebook

rrmerugu / trawler

A data gathering/trawling framework to search and get information from web sources like bing

python search webcrawler crawler-engine

Updated Jul 15, 2022
Python

wefindx / metadrive

Generic Interfaces to Addressable Objects

framework driver proxies iterators sessions filters protocols generators formats controller-manager crawler-engine

Updated Feb 11, 2023
Python

fooock / robots.txt

🤖 robots.txt as a service. Crawls robots.txt files, downloads and parses them to check rules through an API

kotlin java api docker redis crawler spring-boot gradle docker-compose makefile postgresql robots-txt antlr4 spiders robots-parser crawler-engine redis-stream redis-streams

Updated Dec 2, 2020
Java

Sobak / scrawler

Declarative, scriptable web robot (crawler) and scrapper

crawler scraper robots-txt scraping-websites crawler-engine

Updated Apr 9, 2020
PHP

paganini2008 / greenfinger

A high-performance distributed web crawling framework based on SpringBoot framework. It provides rich APIs to customize business and easily embedded your system.

java distributed-systems high-performance crawler-engine mircoservice

Updated Oct 8, 2022
Java

nicolasmelo1 / price_miner

Price miner from e-commerces that i made for Price Management class of my Marketing Graduation and want to turn on my possible TCC for price analysis of e-commerces

flask crawler scraper scraping crawling selenium python3 miner celery e-commerce selenium-webdriver crawlers scrapper flask-api scraping-websites crawler-engine

Updated Dec 8, 2022
HTML

wetrycode / tegenaria

Tegenaria is a crawler framework based on golang

go golang crawler framework spider spiders crawler-engine crawler-framework

Updated Dec 23, 2023
Go

hseghetti / simple-crawler

Simple crawler using apache nutch and elasticsearch

docker elasticsearch crawler docker-compose nutch crawling cerebro crawlspider crawler-engine

Updated May 27, 2020
Shell

runjia1987 / crawler-engine

crawler-engine with HTTP, proxy, JS-Java Interoperability, MQ task consumption, dynamic crawler scripts execution. support deployment in distribution style.

rabbitmq proxy nashorn rhino-js crawler-engine js-java-interoperability mq-task-consumption

Updated Dec 23, 2017
Java

BaseMax / NetPHP

Useful functions for connecting to the network in the PHP based applications.

Updated May 26, 2020
PHP

Improve this page

Add a description, image, and links to the crawler-engine topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the crawler-engine topic, visit your repo's landing page and select "manage topics."