White House Salaries

tl;dr: White House salaries for all staffers, from 2009 to the just-released 2017 list, compiled in one file:

Warning: File has not been double-checked or even spot-checked for errors (2017-06-30)

Today, the Trump administration published White House Staff Salary data at the following URL:

https://www.whitehouse.gov/staff-salaries

Currently, the data exists only as PDF:

https://www.whitehouse.gov/sites/whitehouse.gov/files/docs/disclosures/07012017-report-final.pdf

For comparison's sake, President Obama's White House salary data was released as machine-readable CSV, for the years 2009 through 2016:

https://obamawhitehouse.archives.gov/briefing-room/disclosures/annual-records/2016

Inventory

The data/raw directory contains CSVs from the Obama administration:

data/raw/2009.csv
data/raw/2010.csv
data/raw/2011.csv
data/raw/2012.csv
data/raw/2013.csv
data/raw/2014.csv
data/raw/2015.csv
data/raw/2016.csv
(TODO) Use csvkit's csvstack to create one large file

From the 2017 release:

The raw PDF: data/raw/2017.pdf
ABBYY's conversion of that PDF to XLSX (one file for each sheet): data/raw/2017-abbyy-to-xlsx/
(TODO) Using in2csv and other csvkit tools to compile those Excel spreadsheets into one plaintext CSV.

Fetching

Quickie shell commands to download the files and save them to data/raw

mkdir -p data/raw

curl -Lo data/raw/2017.pdf \
  https://www.whitehouse.gov/sites/whitehouse.gov/files/docs/disclosures/07012017-report-final.pdf

curl -Lo data/raw/2016.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2016_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2015.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2015_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2014.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2014_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2013.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2013_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2012.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2012_Annual_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2011.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2011_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2010.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2010_Report_to_Congress_on_White_House_Staff.csv

curl -Lo data/raw/2009.csv \
  https://open.obamawhitehouse.archives.gov/sites/default/files/2009_Report_to_Congress_on_White_House_Staff.csv

Converting

Note: the PDF to XLS step was done by ABBYY, i.e. not programmatic.

The wrangle/xls_to_pdf.py command-line script is given a path to a bunch of XLS files (e.g. data/raw/2017-abbyy-to-xlsx) and compiles them into a single CSV file:

data/converted/2017.csv

Compiling

Use wrangle/compile.py to read all the CSV files, add a year column, and compile them to the final file:

data/compiled/white-house-salaries.csv

$ python wrangle/compile.py \
  data/raw/*.csv data/converted/2017.csv \
  > data/compiled/white-house-salaries.csv

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
wrangle		wrangle
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

White House Salaries

Inventory

Fetching

Converting

Compiling

About

Releases

Packages

Languages

neerajseth000/white-house-salaries

Folders and files

Latest commit

History

Repository files navigation

White House Salaries

Inventory

Fetching

Converting

Compiling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages