Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Using linear regression to predict NYC apartment rental prices scraped from Craigslist

Notifications You must be signed in to change notification settings

a-poor/nyc-apt-rental-predictions

Repository files navigation

Modeling NYC Apartment Rental Pricing on Craigslist

by Austin Poor

I did this analysis as my second project for the Metis Data Science Bootcamp. For this project we chose our own topics but were required to use gather our data by web-scraping and use a linear regression model.

As someone who spent most of my life living in New York, I picked a topic near and dear to my heart – I chose to model New York City apartment rental prices, using data scraped from Craigslist.

Data was collected from NYC area Craigslist (newyork.craigslist.com) with listings that were posted in the range 2019-12-24 to 2020-01-23.


Process

I used two python scripts to scrape and clean my data – scrape.py and clean.py – which download apartment listing data to an sqlite database data/craigslist_apts.db.

From there, the notebook craigslist_regression.ipynb loads the data, further cleans it, and then models the data. There's an additional notebook, geometry_conversion.ipynb, which is used to calculate apartment neighborhoods based on the latitude and longitude data from the Craigslist apartment listing.

Results

After testing multiple types of linear models (linear regression, degree-2 polynomial regression, degree-3 polynomial regression, LASSO, and Ridge), my final model (degree 3 polynomial regression) was able to get an R^2 score of 0.768 on test data.

Presentation

I've included a pdf of the slide deck used for my presentation, here.

About

Using linear regression to predict NYC apartment rental prices scraped from Craigslist

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published