Skip to content

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand.

License

Notifications You must be signed in to change notification settings

mnhacohen/datajournalism-resources

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 

Repository files navigation

datajournalism-resources

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand. PRs welcomed!

Legend:

  • 🌐 = online tool/service/resource
  • πŸ’» = software
  • πŸ“– = guide/tutorial
  • πŸ“ = list of tools/resources
  • 🐍 = Python module
  • πŸ’² = paid or paid-only tool/service

Contents

APIs

  • Postman πŸ’» - API development environment offering useful tools for crafting and debugging API requests.
  • ProgrammableWeb πŸ“ - A good API directory.
  • Public APIs πŸ“ - A categorized list of APIs.

Archival

Breached Data

  • Breach Data Search Engines Comparison πŸ“ (IntelTechniques)
  • Dehashed πŸŒπŸ’² - Find cleartext & hashed password from data breaches (paid, $4/week, $11/mo).
  • GhostProject 🌐 - Check if an email appears in a breach. Shows the first 3 characters of the password for free.
  • h8mail πŸ’» - Find passwords through different breach and reconnaissance services. Can also search the BreachedCompilation torrent.
  • Have I Been Pwned? 🌐 - Check if an email appears in a breach, set up alerts.

Companies

  • CompaniesHouse Short Guide πŸ“– (Bellingcat) - A guide about the UK online company registry.
  • ICIJ's Offshore Leaks Database 🌐 - Data on offshore companies, foundations and trusts from the Panama Papers, the Offshore Leaks, the Bahamas Leaks and the Paradise Papers investigations.
  • List of company registers πŸ“ (Wikipedia) - A list of all companies registers, by country.
  • OCCRP Data 🌐 - Fantastic search tool & resources made available by OCCRP. Public records, leaks, scraped business registers, and more.
  • OpenCorporates 🌐 - A very comprehensive companies database. Has an API.

Data Analysis & Manipulation

  • csvkit πŸ’» - A suite of command-line tools for converting to and working with CSV files.
  • OpenRefine πŸ’» - Clean & transform messy data.
  • pandas 🐍 - Powerful Python data analysis library. Best used in a Jupyter notebook.
  • tabula πŸ’» - an open-source tool for extracting data tables from pdf documents.

Email

Lists of tools & resources

Location, Maps, Satellite Imagery

Interpretation

Mapping services & software

Tools & techniques

User generated content

  • EchoSec πŸŒπŸ’² - Search and analyze social media data based on location. ($499/mo)
  • GeoCreepy πŸ’» - Geolocation information gathering through social networking platforms (discontinued).
  • OpenStreetMap 🌐 - User-generated locations & maps. Use taginfo and/or overpass-turbo.eu - To search a location by key/value tags (see OSM's Wiki)
  • Social networks (see category)
  • Tourism & review websites: Foursquare, TripAdvisor, Yelp, etc. 🌐
  • Vkontakte 🌐 - Use near:<coordinates> in a search.
  • Wikimapia 🌐 - User-generated locations & descriptions. Has an API. Also allows to switch between satellite imagery from Google, Bing, OSM.

Multi-purpose tools

  • Belati πŸ’» - Command-line OSINT tool with whois, subdomain enumeration, mail harvesting, and more.
  • Buscador πŸ’» - A very handy VM with plenty of pre-installed & pre-configured OSINT tools.
  • DataSploit πŸ’» - A collection of python scripts which automate open source intelligence searches about domain names, email addresses, IP addresses and usernames.
  • Maltego CE πŸ’» - Interactive data mining & mapping tool.
  • Spiderfoot πŸ’» - Open source intelligence automation tool. Gathers intelligence about a given target, which may be an IP address, domain name, hostname, network subnet, ASN, e-mail address or person's name.

Phone numbers

  • NumberWay 🌐 - International directory of white pages and yellow pages phone books.
  • PhoneInfoga πŸ’» - Information gathering & OSINT reconnaissance tool for phone numbers.

Pictures, Photos, Videos

Metadata

Military/Weapons

Reverse search

Search

Verification & Analysis

Social Networks

General

  • Sherlock πŸ’» - Search for a username across 135 social media sites.

Facebook

Github

  • gitrob πŸ’» - Find potentially sensitive files pushed to public repositories on Github. Requires a GitHub access token.
  • Zen πŸ’» - Find emails of Github users.

Instagram

  • InstaLooter πŸ’» - Download all pictures & videos from an Instagram profile. No API key needed.

Linkedin

Reddit

Twitter

  • tinfoleak πŸ’» - Very complete open-source tool for Twitter intelligence analysis. Needs API credentials.
  • twarc πŸ’»πŸ - A command line tool and Python library for archiving Twitter in JSON format.
  • Tweetdeck 🌐
  • Tweetdeck Location Search Tutorial πŸ“–
  • Tweets Analyzer πŸ’» - Twitter profile analyzer: tweet activity, locations, most used hashtags, etc. Can save tweets to JSON. Requires a Twitter API key.
  • TWINT (Twitter Intelligence Tool) πŸ’» - Advanced Twitter scraping tool, no API key needed. Can export to text, CSV, JSON, SQLite, Elasticsearch. Can detect emails, phone numbers, profiles.

Text & Documents

Indexing & searching

  • Aleph πŸ’» - A toolkit for data search, management and analysis in investigative reporting.
  • Blacklight πŸ’» - Open source Solr user interface discovery platform.
  • Datashare πŸ’» - Index & search documents on your computer, automatically detect people, organizations and locations with NLP.
  • DumpsterDiver πŸ’» - Analyze big volumes of various file types in search of secrets, credentials, etc.
  • ICIJ Extract πŸ’» - A command line tool for parallelized, distributed content-extraction.
  • searchbox πŸ’» - A simple out-of-the-box web interface to search through thousands of unstructured documents using Solr.

OCR

  • NewOCR.com 🌐 - Recognizes several languages, can resize images, shortcuts to Google & Bing Translate.
  • Tesseract πŸ’» - Open-source OCR engine.

PDF

Text Processing & Analysis

  • topia 🐍 - Python module to determine important terms within a given piece of content.
  • TXM πŸ’» - Lexicometry and text statistical analysis for large bodies of text.

Visualization

Graphs

  • DataWrapper πŸŒπŸ’² - Easy to use graph & map tool. Free plan available.
  • Google Fusion Tables - Create maps & charts from data. Will shut down on Dec. 2019.
  • Matplotlib 🐍 - Python 2D plotting library. Best used with pandas in a Jupyter notebook.

Maps

  • ArcGIS πŸ’»πŸ’² - mapping & analysis software (proprietary, paid, 21-day trial)
  • Folium 🐍 - Python library to create Leaflet.js maps. Can be used in a Jupyter Notebook to map data from pandas.
  • Geopy 🐍 - Python geocoding library. Supports OSM Nominatim, Google, Bing, GeoNames & many more.
  • Google:
  • Humanitarian Data Exchange 🌐 - Useful resources of shapefiles, especially for administrative boundaries.
  • KML Interactive Sampler 🌐 - Lots of KML templates.
  • QGIS πŸ’» - Free & open-source alternative to ArcGis.

Mindmaps & Network graphs

Timelines

  • Tik Tok πŸ’» - Javascript tool to easily create simple, mobile-friendly, vertical timelines. Open-source.
  • TimelineJS πŸ’»

Weather

Websites

Dark Web & Onion services

Scraping

Searches, info, related entities

Misc

  • awesome-selfhosted πŸ“ - A list of Free Software network services and web applications which can be hosted locally
  • grayhatwarfare 🌐 - Search open Amazon S3 buckets content.
  • Shodan 🌐 - Internet of Things search engine

License

This list is under the Creative Commons Attribution-NonCommercial 4.0 International Public License License.

About

A compilation of links to datajournalism & OSINT tools, guides and resources I find useful to keep at hand.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published