Skip to content

etchegom/demo-scrapy

Repository files navigation

Demo cool scrapy features

  • Random user-agent middleware
  • Persistence pipeline with PostgreSQL and SQLAlchemy
  • Check exists in database middleware
  • Custom log formatter
  • Blacklist middleware
  • Extract page metadata with extruct

Setup locally

  • Setup postgres database
docker-compose up -d --build
  • Setup python environment
virtualenv -p /usr/bin/python3.7 .venv
source .venv/bin/activate
pip install -r requirements.txt
  • Run the spider
scrapy crawl thomann
  • Check your items in database using psql cli of pgadmin

Next steps

  • Complete scraping in thomaannn spider
  • Add Metabase to the docker stack
  • Add a Scrapyd server to the docker stack

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published