Skip to content

White House salary data, as disclosed by whitehouse.gov

Notifications You must be signed in to change notification settings

neerajseth000/white-house-salaries

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

White House Salaries

tl;dr: White House salaries for all staffers, from 2009 to the just-released 2017 list, compiled in one file:

data/compiled/white-house-salaries.csv

Warning: File has not been double-checked or even spot-checked for errors (2017-06-30)


Today, the Trump administration published White House Staff Salary data at the following URL:

https://www.whitehouse.gov/staff-salaries

Currently, the data exists only as PDF:

https://www.whitehouse.gov/sites/whitehouse.gov/files/docs/disclosures/07012017-report-final.pdf

For comparison's sake, President Obama's White House salary data was released as machine-readable CSV, for the years 2009 through 2016:

https://obamawhitehouse.archives.gov/briefing-room/disclosures/annual-records/2016

Inventory

The data/raw directory contains CSVs from the Obama administration:

From the 2017 release:

  • The raw PDF: data/raw/2017.pdf
  • ABBYY's conversion of that PDF to XLSX (one file for each sheet): data/raw/2017-abbyy-to-xlsx/
  • (TODO) Using in2csv and other csvkit tools to compile those Excel spreadsheets into one plaintext CSV.

Fetching

Quickie shell commands to download the files and save them to data/raw

mkdir -p data/raw

curl -Lo data/raw/2017.pdf \
  https://www.whitehouse.gov/sites/whitehouse.gov/files/docs/disclosures/07012017-report-final.pdf

curl -Lo data/raw/2016.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2016_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2015.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2015_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2014.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2014_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2013.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2013_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2012.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2012_Annual_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2011.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2011_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2010.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2010_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2009.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2009_Report_to_Congress_on_White_House_Staff.csv

Converting

Note: the PDF to XLS step was done by ABBYY, i.e. not programmatic.

The wrangle/xls_to_pdf.py command-line script is given a path to a bunch of XLS files (e.g. data/raw/2017-abbyy-to-xlsx) and compiles them into a single CSV file:

data/converted/2017.csv

Compiling

Use wrangle/compile.py to read all the CSV files, add a year column, and compile them to the final file:

data/compiled/white-house-salaries.csv

$ python wrangle/compile.py \
  data/raw/*.csv data/converted/2017.csv \
  > data/compiled/white-house-salaries.csv

About

White House salary data, as disclosed by whitehouse.gov

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%