The purpose of this document is to provide example code for interacting with USDA'S Economic Research Service data via its REST API. Learn more about the API here.
Below is a simple class implementing an API to the ERS data.
module USDA
require 'uri'
require 'net/http'
require 'json'
class API
attr_accessor :base_uri
API_KEY = { api_key: "DEMO_KEY" }
def initialize(base_uri = "http://api.data.gov/USDA/ERS/data/")
@base_uri = base_uri
end
def response(endpoint, params = {})
uri = build_uri(endpoint, params)
response = Net::HTTP.get(uri)
JSON.parse(response)
end
def build_uri(endpoint, params)
params.merge!(API_KEY)
uri = URI(@base_uri + endpoint)
uri.query = URI.encode_www_form(params)
uri
end
end
...
You can use this class to build an API object to interact with the USDA API. For example, you could get a list of reports for a given survey like this:
api = USDA::API.new
response = api.response('Arms/Reports', survey: "CROP")
puts response
would return
{
"infoTable" => [
[0] {
"recordCount" => 19,
"message" => "NO ERROR"
}
],
"dataTable" => [
[ 0] {
"report_num" => 1,
"report_header" => "Pesticide Use"
},
[ 1] {
"report_num" => 2,
"report_header" => "Crop Residue Management Practices"
},
[ 2] {
"report_num" => 3,
"report_header" => "Irrigation Technology and Water Use"
},
#...
[18] {
"report_num" => 19,
"report_header" => "Precision Agriculture"
}
]
}
To get specific data from the API, you need to build requests using the REST paths described in the documentation. The following ARMS
class provides some pre-built methods to access these paths.
...
class ARMS
def self.surveys
# returns a list of surveys available in ARMS
api = API.new
api.response("Arms/Surveys")
end
def self.reports(survey)
# returns a list of reports available in a given survey
api = API.new
api.response("Arms/Reports", survey: survey)
end
def self.subjects(survey, report)
# returns a list of subjects available in a given report within a given survey
api = API.new
api.response("Arms/Subjects", survey: survey, report: report)
end
def self.crops(report, series, options = {})
# queries the Crop survey for a given data series within a report, including any optional parameters
options_hash = { report: report, series1: series }.merge options
api = API.new
api.response("Arms/Crop", options_hash)
end
end
...
You can use these class methods to make calls to specific routes using the the optional and required paramters for each request.
For example to get a list of surveys available from the ARMS, use the .surveys
method:
$ puts USDA::ARMS.surveys
to return
{
"infoTable" => [
[0] {
"recordCount" => 2,
"message" => "NO ERROR"
}
],
"dataTable" => [
[0] {
"survey_abb" => "CROP",
"surveyDesc" => "Crop production practices"
},
[1] {
"survey_abb" => "FINANCE",
"surveyDesc" => "Farm finances"
}
]
}
For a list of available reports for a given survey, use the .reports
method:
$ puts USDA::ARMS.reports("CROP")
returns:
{
"infoTable" => [
[0] {
"recordCount" => 19,
"message" => "NO ERROR"
}
],
"dataTable" => [
[ 0] {
"report_num" => 1,
"report_header" => "Pesticide Use"
},
[ 1] {
"report_num" => 2,
"report_header" => "Crop Residue Management Practices"
},
[ 2] {
"report_num" => 3,
"report_header" => "Irrigation Technology and Water Use"
},
# ...
[18] {
"report_num" => 19,
"report_header" => "Precision Agriculture"
}
]
}
The .subjects
and .series
methods work similarly, returning a list of subject values and series values we can use in the next step.
The main use case of the ERS API is used to query the Crop or Finance survey database using the endpoints Arms/Crops
and Arms/Finance
respectively. These endpoints accept GET
requests with a few required and optional parameters.
The required parameters are:
- report: An integer value specifying a report number
- series1: A string value specifying a valid series code
The USDA::ARMS.crops
method shown above crafts requests to the Crops survey endpoint using the required parameters, along with any opyional paramters you pass it. For example, to receive the data points from the Crops survey in Report #1 (Pesticide Use) for all Farms (series1 = "FARM"
) in the great state of Minnesota, you would use the method like this:
USDA::ARMS.crops(1, "FARM", { fipsStateCode: "27" })
The method returns a JSON object representing the API's response. The interesting part of the response is the series of datapoints corresponding to your query, which look like this:
# ...
[766] {
"survey_abb" => "CROP",
"report_num" => 1,
"topic_group" => nil,
"topic_abb" => "FUNACT",
"topic_seq" => 10,
"topic_header" => "Acres treated with fungicide",
"topic_level" => nil,
"unit_desc" => "percent of planted acres",
"footnote" => nil,
"subject_num" => 10,
"fips_st" => "27",
"state" => "Minnesota",
"stat_year" => "1996",
"series" => "FARM",
"series_element" => 0,
"series_header" => "All Farms",
"element_name" => "TOTAL",
"series2" => "FARM",
"series2_element" => 0,
"series2_header" => "All Farms",
"element2_name" => "TOTAL",
"estimate" => 9.372,
"rse" => 29.625,
"unreliable_est" => true,
"decimal_disp" => 3
},
[767] {
"survey_abb" => "CROP",
"report_num" => 1,
"topic_group" => nil,
"topic_abb" => "FUNACT",
"topic_seq" => 10,
# ...
As you can see, the response is a series of datapoints (up to 1,000 per request), with different attributes that describe what the datapoint is. In order to make sense of the data, you may want to filter these results for datapoints of a specific type.
Once you have a blob of JSON representing the response, you may want to filter the datapoints to only show the ones you care about. An example of how you could do this is shown below:
# ...
class Selector
def self.select_from_results(selectors = {}, results)
selected = results["dataTable"]
selectors.each do |selector, value|
selected = selected.select { |e| e[selector] == value }
end
selected
end
end
# ...
The USDA::Selector.select_from_results
method takes a JSON response object and returns only the datapoints that match a set of filters (one or more) that you provide as :key => value pairs.
Using our previous example, if we wanted to find only the datapoints in the response that related to topic_seq = 1
("Acres treated with any pesticide") for subject_num = 1
(corn), we could use the method like this:
#... assume we already have a results object from the previous query (i.e. using the USDA::ARMS.crops method)
results["dataTable"].count # => 800 (lots of data points)
filtered_results = USDA::Selector.select_from_results( { "topic_seq" => 1,
"subject_num" => 1 }, results)
filtered_results.count # => 16 (only the datapoints showing the total acreage of corn treated with pesticides)
Let's say you wanted to get time series data for a specific data point. You could use a method like the one below:
# ...
class DataSeries
def self.data_by_year(topic_seq, subject_num, element2_name, results)
selected = Selector.select_from_results({ "topic_seq" => topic_seq,
"subject_num" => subject_num,
"element2_name" => element2_name },
results)
selected.map { |n| { n["stat_year"] => n["estimate"] } }
end
end
#...
This method will take a set of results, filter them for a sepcifc topic_seq
, subject_num
, and element2_nam
, and return an array of datapoints corresponding to a specific year.
Again using the above results for the Pesticide use in Minnesota, to get the time series data for total acreage (element2_name = 'TOTAL
) with pesticides (topic_seq = 1
) planted with corn (subject_num = 1
), we could use the method like this:
#... assume we already have a 'results' object that was the result of a previous query
time_series = USDA::DataSeries.data_by_year(1, 1, "TOTAL", results)
# => [{"2010"=>95.461}, {"2005"=>98.916}, {"2001"=>97.477}, {"2000"=>97.374}, {"1999"=>97.967}, {"1998"=>96.116}, {"1997"=>92.93}, {"1996"=>96.651}]