The Government Needs to Clean Out It’s Database

I tried to load the government data of all “outlays” into a database and hit a snag. Supposedly, a group of fields is supposed to be unique. From the database guide:

If you plan to use these data in a relational database, you should designate the following fields as

“primary” to uniquely identify each row of data: agency code, bureau code, account code,

subfunction code, BEA category, grant/non-grant, and on-/off-budget field.

Well, as you may have guessed, that turned out not to be true in a few cases. The following lines turned out to have identical agency codes, bureau codes, account codes, subfunction codes, BEA category, grant/non-grant, and on-/off-budget values. This caused me to have to look at each one and figure out what was going on.

Lines 1970 and 1971 was one offending pair of repeats. Each of them has the exact same codes. The only difference is the Account Name. One is “Gifts and Bequests”, the other “Gifts and bequests, Labor, Dept. Management”. My conclusion is that these are the exact same thing, only one of them has a more exact descriptor. In no year, does both of these accounts have outlays in the same year. I think it’s safe to treat them as the same thing.

Lines 3054 and 3055 is another offending pair. They each are accounts in the Maritime Administration Bureau. Like before their account codes are the same, but this time their descriptions are quite different. One is “Miscellaneous Trust Funds, Maritime Administration” and the other “Special studies, services and projects”. Now these sound like 2 separate things, but just like in the other pair, In no year were both pairs charged. I guess these 2 are the same so I’ll treat them as such.

Lines 4699 through 4702. This was interesting. All four of these had the same matching codes. All 4 of them were Interest earned on varies Trust Funds held by the Federal Government. Unlike the previous pairs, there were numerous years where outlays were charged to 1 or more of them. One thing I noticed about these 4 that I hadn’t noticed with the other problem pairs was that they did have different Treasury Agency Codes.

I went back and checked the other duplicates and their Treasury Agency Codes were all the same. So, cleaning up this mess is going to take a 2-pronged approach. One, Treasury Agency Codes is going to be added as part of the Primary Key(The Government should probably update the db_guide so that everyone knows to do that) and second, I’m just going to treat the other 2 pairs as the same thing. I think the Government needs to combine those into a single description. So that’s how I’m cleaning it up.

Going Right to the Source: JACKPOT!

So I’ve been looking at the data over at usgovernmentspending.com. I was trying to figure out where they get their data. At first I found this explanation:

Federal spending between 1962 and 2012 is obtained from a spreadsheet file Table 3.2 – Outlays by Function and Subfunction: 1962-2010 in Budget of the United States Government: Historical Tables Fiscal Year 2008 published by the Executive Office of the President of the United States. It contains actual historical federal government spending from 1962 to 2006, and budgeted and estimated spending from 2007 through 2010.

Eventually I found that data it was talking about. You goto the GPO budget page, then I click on my favorite budget year. I’m then taken to a page with a log of document links. I scroll to the bottom and click the link: Historical Tables.

At that point I can download speadsheet 3.2 (Outlays by Function and Subfunction: 1962–2015) Voila! I have a list of all government spending by billing code.

This info was pretty good. I could now look at the info without relying on a third party website that can apply their own spin. For example, USSpending has a “welfare” item. However, the federal budget doesn’t list that. The categories are spread between Health, Housing, Education, training, employment, and social services.

Unfortunately, the information is still at a pretty high level. I noticed that the info on usgovernmentspending.com had more detail. That’s when I found an additional explanation. Federal Spending at the Agency Code Level:

IF YOU DRILL down below the federal subfunction level you can see federal spending at the agency code level. This spending information is obtained from Outlays, an Excel spreadsheet (4.5MB) that contains federal spending numbers classified by department, bureau, and agency code for FY 1962 through FY 2008. There are about 4,000 line items at the agency code level. The file can be downloaded from Budget of the United States Government: Public Budget Database Fiscal Year 2008 in xls, csv, or zip format. Only spending line items in excess of $0.05 billion are displayed at usgovernmentspending.com.

Holy Crap now that sounds exactly like what I’m looking for. I went back to the GPO budget page, and clicked on fiscal year 2011. This time instead of clicking Historical Tables, I clicked Public Budget Database. What I found was a 4700 line excel sheet with every budget code that the U.S. has used since 1962. Every dollar that’s been spent has been charged to one of these line items.

Jackpot!

This is exactly what I’ve been looking for. The data is pretty raw, but that’s ok. Instead of being sorted by type of spending it’s sorted by Agency then “Bureau” then account name then subfunction. It even has a column saying whether it’s mandatory or discretionary(guess that search is done).

Well it’s taken me more than a week to get to this point, but I can finally start unraveling the mysteries of the U.S. budget.