Skip to content

abulbasar/data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datalinks

Earthquake data https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_month.csv

https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research

http://www.datasciencecentral.com/profiles/blogs/a-plethora-of-data-set-repositories

Wikipedia data dump https://dumps.wikimedia.org/enwiki/

StackExchange data dump https://archive.org/details/stackexchange

US Supreme Court http://scdb.wustl.edu/data.php

US postal codes https://www.unitedstateszipcodes.org

Weather data https://www.ncdc.noaa.gov/data-access

Million Song Dataset: http://labrosa.ee.columbia.edu/millionsong/

Transportation Dataset http://transtats.bts.gov/DL_SelectFields.asp

Catalog of 33 datasource http://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#43139bf16796

Baseball game data: http://www.retrosheet.org

Del.icio.us Dataset http://www.din.uem.br/~gsii/delicious-dataset/

Project Challenges http://www.datasciencecentral.com/group/dsa-projects/forum/topics/data-science-projects-for-dsa-candidates

https://data.world/

https://blog.bigml.com/2013/02/28/data-data-data-thousands-of-public-data-sources/

Youtube dataset: http://netsg.cs.sfu.ca/youtubedata/

Datasets for machine learning

https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research

Company profiles

https://data.crunchbase.com/docs

Microfinance Lending Data

http://build.kiva.org/docs/data/basic_types

World Bank - International Debt Statistics

http://data.worldbank.org/data-catalog/international-debt-statistics

AWS public data set https://aws.amazon.com/datasets/

S3 bucket for public dataset: s3://aws-publicdatasets

Amazon product review data

http://jmcauley.ucsd.edu/data/amazon/

USDA Food database https://ndb.nal.usda.gov/ndb/search/list

US Federal Commission Data

http://www.fec.gov/disclosure.shtml

Data published by opendatasoft.com

https://data.opendatasoft.com/explore/?sort=modified

NYC Open Data

https://data.cityofnewyork.us/browse

World Bank

http://datacatalog.worldbank.org/

Open Data Catalog

http://dataportals.org/

US Geological Survey Science Data Catalog

https://data.usgs.gov/datacatalog

Geolocation and IP mapping http://dev.maxmind.com/geoip/geoip2/geolite2/

List of cities

http://openweathermap.org/help/city_list.txt

Graph Data

ICON is a comprehensive index of research-quality network data sets from all domains of network science, including social, web, information, biological, ecological, connectome, transportation, and technological networks.

Each network record in the index is annotated with and searchable or browsable by its graph properties, description, size, etc., and many records include links to multiple networks. The contents of ICON are curated by volunteer experts from Prof. Aaron Clauset's research group at the University of Colorado Boulder.

https://icon.colorado.edu

KONECT is a comprehensive archive that provides not only the data (dozens of networks), but also summary statistics about each dataset.

http://konect.uni-koblenz.de/networks/

http://www-personal.umich.edu/~mejn/netdata/

http://snap.stanford.edu/data/

Social Network Data

http://socialnetworks.mpi-sws.org/data-imc2007.html

Medline Database - a database of academic papers that have been published in journals covering the life sciences and medicine

ftp://ftp.nlm.nih.gov/nlmdata/sample/medline

With 3.5 billion nodes and 128 billion edges, this is the largest known freely available real world graph dataset.

http://webdatacommons.org/hyperlinkgraph/index.html

Case Studies on Benefits of Open Data Business case for open data https://project-open-data.cio.gov/business-case/ https://socrata.com/case-studies/ https://www.opendatasoft.com/resources/#casestudies

English Dictionary Database https://wordnet.princeton.edu/wordnet/download/

Awesome public dataset https://github.com/awesomedata/awesome-public-datasets

Airbnb http://insideairbnb.com/get-the-data.html

CTR (click through rate) prediction Criteo: https://www.kaggle.com/c/criteo-display-ad-challenge Avazu: https://www.kaggle.com/c/avazu-ctr-prediction Outbrain: https://www.kaggle.com/c/outbrain-click-prediction RecSys 2015: http://dl.acm.org/citation.cfm?id=2813511&dl=ACM&coll=DL&CFID=941880276&CFTOKEN=60022934

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages