Skip to content

Tests with webscraping websites for updates on available rescue dogs. It's being such a difficult endeavour that I thought it could be a nice exercise, at least..

License

Notifications You must be signed in to change notification settings

LucaLoVerde/doggo_finder_webscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

doggo_finder_webscraper

License: GPL v3

2020-09-22 - I've found a dog to adopt, therefore I don't think I will be working on this in the near future.

Kind suggestion to anyone trying to adopt a rescue dog: do me and yourself a favor and just find a physical shelter (possibly one you personally trust) on your territory where they still allow in-person meetings with the dogs. You want to really know the situation of the dog you will adopt before committing.

Exercising a bit with webscraping using selenium with python. I started this when I realized that one of my favorite local animal rescue websites doesn't really offer any way of receiving updates when a new dog is added to the list. It's being enough of a frustrating experience so far that I thought I could try to learn a bit of webscraping methods. If you are looking for rigorous, clean stuff, be advised that this might not be the right place (:

Requirements

I'm using Python 3.8 with Anaconda. Dependencies so far are:

I'm mainly developing under Linux and Windows.

Usage

For now, only a barebone script exists. To run it, make sure you have the dependencies mentioned above installed, then run the script from the root folder:

python doggo_finder_test.py

At the moment, it will open a background Selenium webdriver instance on the adoption page of the DPS Rescue. Upon startup, it will pull the current dogs listing and print it. It will then check if any cached data from a previous run is available, and if so report any differences in the listing compared to the current data. The script will then enter a loop in which it will monitor the page for updates (adoptions and additions). The check interval is set to 120 seconds by default, but can be changed with the appropriate interval argument in the loop call. It also supports colored printed output reports (if the termcolor module is installed) by specifying the appropriate flag while calling the main loop function. When quitting (currently, by pressing CTRL + c) it saves the current dog listing and a timestamp in the cache for the next time it is run.

Example main loop call:

my_cache = load_cache(path='cache')
my_driver = open_connection('http://dpsrescue.org/adopt/available/', instance_type='chrome')
simple_loop(my_driver, interval=60, cache=my_cache, print_mode='color').

It works but I am not comfortable with the project dependencies, for instance the script arbitrarily stopped working with Firefox instances on my Linux machine (no updates nor other changes) while still working perfectly on both Windows and Mac (hence why for now I've switched to Google Chrome webdriver instances).

Roadmap

Ideally, I'd want to run this thing in the background, and when it detects a change in the rescue dogs list (new dogs added to the list and adopted dogs removed from it) it should notify me reliably. I'd also like to parse the information a bit better to do some basic filtering of candidates. I should probably look into email notifications to make this thing vaguely useful in situations other than you staring at the monitor for listings updates...

License

Distributed under the GPLv3 license. See LICENSE for more information.

Contact

Luca Lo Verde - luca.loverde0@gmail.com

Project Link: https://github.com/LucaLoVerde/doggo_finder_webscraper

About

Tests with webscraping websites for updates on available rescue dogs. It's being such a difficult endeavour that I thought it could be a nice exercise, at least..

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages