This is a multi-threaded web scraper to scrape data science job Ads for user pre-defined location (e.g. Seattle, WA) on indeed.com to gain insights of the most-wanted data science skills in the job market. At the end of scraping, it will provide the user with the total number of job Ads scraped, a sorted list of most-wanted data science skills (each with its number and percentage of appearing in job Ads), and a bar chart for visualization.
- This application is written in Python 3.
- For Pittsburgh case (100+ jobs), this multi-threaded web scraper is 862.79% faster than a single-threaded one.
- For Seattle case (1100+ jobs), this multi-threaded version is amazingly much faster - it reduced the running time from 12 hours to 20 minutes (about 3500% faster).
- Date: 09-05-2016
- City: Seattle, WA
- Number of Jobs Scraped: 1181
- Run Time: 1176.9583258628845 seconds
- Date: 09-05-2016
- City: San Francisco, CA
- Number of Jobs Scraped: 1878
- Run Time: 2133.5088317394257 seconds
- Date: 09-05-2016
- City: San Jose, CA
- Number of Jobs Scraped: 1567
- Run Time: 1054.757504940033 seconds
- Date: 09-05-2016
- City: Los Angeles, CA
- Number of Jobs Scraped: 524
- Run Time: 265.3683178424835 seconds
- Date: 09-05-2016
- City: San Diego, CA
- Number of Jobs Scraped: 476
- Run Time: 319.94061756134033 seconds
- Date: 09-06-2016
- City: New York, NY
- Number of Jobs Scraped: 2338
- Run Time: 1950.0931429862976 seconds
- Date: 09-05-2016
- City: Washington, DC
- Number of Jobs Scraped: 1753
- Run Time: 986.4153757095337 seconds
- Date: 09-06-2016
- City: Boston, MA
- Number of Jobs Scraped: 1678
- Run Time: 1501.2796454429626 seconds
- Date: 09-05-2016
- City: Chicago, IL
- Number of Jobs Scraped: 466
- Run Time: 320.2699830532074 seconds
- Date: 09-05-2016
- City: Charlotte, NC
- Number of Jobs Scraped: 117
- Run Time: 119.16461515426636 seconds
- Date: 09-05-2016
- City: Houston, TX
- Number of Jobs Scraped: 174
- Run Time: 196.04837822914124 seconds
- Download all files and folder in this repository and save them in the same folder.
- Run run_scraper.py in Python3 with a city/state name.
- An example to run the application against Seattle WA:
python3 run_scraper.py --city Seattle --state WA