Introduction
Over the past few years there has been an undeniable surge in interest related to using open source tools for data analysis and modeling. Not only do open source tools provide users with immediate cost savings but they can also be customized to better suit your particular needs and in some cases may exhibit better performance than their proprietary counterparts. Professionals within the fields of GIS and Remote Sensing have recognized these benefits and are leveraging open source tools more and more frequently for their day-to-day workflows. Additionally, some of these tools overlap with those employed by data scientists and provide the means to perform exploratory analysis and predictive modeling with machine learning algorithms. In an effort to shed some light on machine learning and raise awareness about open source tools, I am presenting a short series of blogs that will demonstrate how some of these tools (listed below) can be used to explore land cover change information available from the NOAA Office for Coastal Management’s Coastal Change Analysis Program (C-CAP) Land Cover Atlas (referred to as Atlas). In case you are not familiar, the C-CAP Atlas has been one of the flagship tools on NOAA’s Digital Coast website providing massive amounts of land cover change information for the coastal U.S. to a non-technical audience. The tool synthesizes data about land cover gains and losses into tables, maps, and reports that are ready for use. But did you know you can access that data directly and perform your own analysis through an API? Wait…Do you know what an API is? No, it is not on tap at your local brewery.
What is an API?
API stands for Application Programming Interface and in the case of the C-CAP Atlas, it is part of the web service that accepts requests for data and sends responses. The C-CAP Atlas API accepts requests for land cover change information for coastal counties and watersheds and returns the change statistics in JSON format which is a typical format for facilitating the transfer of data through an API. This allows the user to ingest the data directly to create visualizations, perform analysis or add it to an application.
Tools used in this blog
Python – high level programming language which uses highly readable syntax
SciPy and Scikit-learn – Python libraries which contain machine learning algorithms for data exploration and modeling
Jupyter Notebooks – web application for sharing code, visualizations and narrative text
Pandas – Python library providing data structures and analysis tools
GeoPandas – Python library providing tools for working with geospatial data
Matplotlib – Library for plotting data and creating visualizations
Data Science and Geospatial
The tools listed above are widely popular among practitioners of data science and it is worth noting that setting up a data science project is not all that different from how one would approach a geospatial one. Typically, before we can produce a map or perform spatial analysis, we have to gather our data sets, review them for consistency, accuracy and appropriateness and then transform them into a common coordinate system or format. Data scientists follow a similar process which for the purposes of this blog, I am distilling into the following steps:
- Retrieve the data
- Clean the data
- Explore the data
- Build a model with the data
The objective of this blog entry is to focus on Step 1 – Retrieving the Data. Nowadays, it is quite common for analysts to acquire geospatial data from the web using download sites or ingesting web services. Data scientists frequently rely on APIs for acquiring data. If an API does not exist they use a less desirable the process called Web Scraping which could be another entire blog post. For this demonstration, we will request data from the C-CAP Atlas API and load it into a Pandas dataframe using a few lines of Python code. Does that sound like fun to you? Great, but before we dive in, let’s get some background on the C-CAP Atlas API.
Using the C-CAP Atlas API
The C-CAP Atlas API was built to allow users to stream live data across an internet connection. The API uses OpenData protocals to construct queries against the data. Now let’s walk through accessing the API, identifying the lookup fields we will use to refine the data, and constructing a URL that will be used in our script to pull in the data to be analyzed.
We start by accessing the root url to view the full list of available API function names, also known as resources:
https://coast.noaa.gov/opendata/Landcover/api/v1/
Once we identify a resource of interest, we can access the API metadata for additional information, including the names of each field and the field types:
https://coast.noaa.gov/opendata/Landcover/api/v1/$metadata
For this demonstration, we will be using the resource named “distributionOfChangeGainsLossesByLandcovers”. This resource returns areas of gains and losses for a group of land cover classes for each geography for different year pairs, as seen in the metadata below.
You’ll notice each of the land cover class names are abbreviated. While we tried to make the naming conventions self explanatory, a key for the abbreviations can be found in the table below. A full definition of the land cover classes can be found here:
https://coast.noaa.gov/data/digitalcoast/pdf/ccap-class-scheme-regional.pdf
Abbreviation | Full Name | Abbreviation | Full Name |
GrsAreaGain | Grass Area Gained | GrsAreaLoss | Grass Area Lost |
SscbAreaGain | Scrub/Shrub Area Gained | SscbAreaLoss | Scrub/Shrub Area Lost |
BarAreaGain | Barren Area Gained | BarAreaLoss | Barren Area Lost |
WtrAreaGain | Water Area Gained | WtrAreaLoss | Water Area Lost |
AgrAreaGain | Agricultural Area Gained | AgrAreaLoss | Agricultural Area Lost |
ForAreaGain | Forest Area Gained | ForAreaLoss | Forest Area Lost |
WdwAreaGain | Woody Wetland Area Gained | WdwAreaLoss | Woody Wetland Area Lost |
EmwAreaGain | Emergent Wetland Area Gained | EmwAreaLoss | Emergent Wetland Area Lost |
HIDAreaGain | High Intensity Development Area Gained | HIDAreaLoss | High Intensity Development Area Lost |
LIDAreaGain | Low Intensity Development Area Gained | LIDAreaLoss | Low Intensity Development Area Lost |
OSDAreaGain | Open Space Development Area Gained | OSDAreaLoss | Open Space Development Area Lost |
If we access the base url for the entity, it will return all data available:
https://coast.noaa.gov/opendata/Landcover/api/v1/distributionOfChangeGainsLossesByLandcovers
To limit the data returned to just the items we are interested in, we will query three key lookup fields:
geoid = A geographic identifier, either a 5-digit FIPS code or an 8-digit watershed code
earlyyear = A 4-digit year defining the 1st date of change period (1996, 2001, 2006 or 2010)
lateyear = A 4-digit year defining the 2nd date of change period (1996, 2001, 2006 or 2010)
To construct our query, we will use OpenData URL conventions. Below are a list of the conventions we will used to build our query to limit our data returns to only counties in the state of Florida (FIPS code = 12000) with an early year of 1996 and a late year of 2010.
API URL Conventions
$filter – allows a subset of uri conventions to be applied to the url
startswith – requests data that matches the starting characters provided. Used here to limit the request to the 2-digit state FIPS code of interest
length – restricts the query to responses matching the character length in a field. Used here to limit responses to 5 characters, the length of the full FIPS code. It excludes watershed codes, which are 8 digits.
eq – Will only return data that exactly matches the requested field value
Example API call to retrieve data:
https://coast.noaa.gov/opendata/LandCover/api/v1/distributionOfChangeGainsLossesByLandcovers?$filter=startswith(geoId,’12’) and length(geoId) eq ‘5’ and earlyYear eq ‘1996’ and lateYear eq ‘2010’
If viewing the url in the browser, make sure to include the reserved characters for things like spaces and single quotes as seen below:
The constructed URL will return the data in JSON format. In the example below, you may notice how similar JSON structure looks to Python dictionaries.
{"@odata.context":"https://csc-s-ims-93-p.coast.noaa.gov/Landcover/api/v1/$metadata#distributionOfChangeGainsLossesByLandcovers","value":[{"geoId":"12001","earlyYear":"1996","lateYear":"2010","GrsAreaGain":29.900000,"SscbAreaGain":63.600000,"BarAreaGain":1.230000,"WtrAreaGain":0.990000,"AgrAreaGain":4.220000,"ForAreaGain":58.500000,"WdwAreaGain":7.610000,"EmwAreaGain":15.320000,"GrsAreaLoss":-44.820000,"SscbAreaLoss":-54.850000,"BarAreaLoss":-0.470000,"WtrAreaLoss":-6.600000,"AgrAreaLoss":-10.640000,"ForAreaLoss":-62.280000,"WdwAreaLoss":-8.820000,"EmwAreaLoss":-4.410000,"HIDAreaGain":2.740000,"LIDAreaGain":5.080000,"OSDAreaGain":4.310000,"HIDAreaLoss":-0.030000,"LIDAreaLoss":-0.090000,"OSDAreaLoss":-0.490000},
Retrieve the Data from the API
Now that we understand how the API request is constructed, let’s start working with our Python script. First, we need to load all of the necessary libraries.
Note: The code from each blog in this series will be available as a Jupyter Notebook that can downloaded here
import os
import pandas as pd
import requests
%matplotlib inline
from matplotlib import pyplot as plt
To communicate with the API and dynamically request data for different combinations of state FIPS codes, early years, and late years, we need to employ the Requests library. Requests is a library designed to allow users to send HTTP requests with Python. In the function below, we use the method .get to request data from API and load it into a Response object r. The Requests library includes a built in JSON decoder to read the JSON data that is returned and load it into a Pandas dataframe.
def create_dataframe(fips,early_year,late_year):
"""
Submits request for data to the API and creates formatted pandas dataframe for use with clustering algorithms
fips = string defining the 2-digit FIPS code for your state you of interest
early_year = string defining the 1st date of change period (1996, 2001, 2006 or 2010)
late_year = string defining the 2nd date of change period (1996, 2001, 2006 or 2010)
"""
# Format inputs for url
fips = "'{}'".format(fips)
early_year = "'{}'".format(early_year)
late_year = "'{}'".format(late_year)
#API request for Land Cover data in JSON format
url = "https://coast.noaa.gov/opendata/LandCover/api/v1/distributionOfChangeGainsLossesByLandcovers?$filter=startswith(geoId, {0}) and length(geoId) eq 5 and earlyYear eq {1} and lateYear eq {2}".format(fips, early_year, late_year)
r = requests.get(url, headers={'Accept': 'application/json'})
data = r.json()
lca_df = pd.DataFrame(data['value'])
# Set index to the FIPS code and drop Early and Late Year Fields
lca_df_index = lca_df.set_index('geoId').drop(['earlyYear', 'lateYear'], axis=1)
return lca_df_index
Now we will execute the function. The C-CAP Atlas has county level land cover change statistics organized by FIPS code and dates of change. C-CAP has mapped land cover for the years 1996, 2001, 2006 and 2010 using imagery acquired by the Landsat suite of satellites. This means we can look at changes in land cover that have occurred over multiple time series such as coarse intervals like 1996-2010 or we can look at finer scale changes over a 5 year period such as 2001-2006. For this example, again we will select the state of Connecticut (FIPS code:9) and request land cover changes occurring from 2001-2010.
CT_2001_2010 = create_dataframe('09',2001,2010)
Let’s confirm our new object contains the correct data using the method .head( )
CT_2001_2010.head()
We can also get a list of the column names by using the following code
list(CT_2001_2010.columns.values)
['AgrAreaGain', 'AgrAreaLoss', 'BarAreaGain', 'BarAreaLoss', 'EmwAreaGain', 'EmwAreaLoss', 'ForAreaGain', 'ForAreaLoss', 'GrsAreaGain', 'GrsAreaLoss', 'HIDAreaGain', 'HIDAreaLoss', 'LIDAreaGain', 'LIDAreaLoss', 'OSDAreaGain', 'OSDAreaLoss', 'SscbAreaGain', 'SscbAreaLoss', 'WdwAreaGain', 'WdwAreaLoss', 'WtrAreaGain', 'WtrAreaLoss']
As you can see the list of column names is consistent with the table above. Now we can start playing with our data!
Conclusion
Congratulations! You have just made an API call to the C-CAP Atlas using Python. Was it everything you imagined it would be? Hopefully, you learned something new and your interest in using APIs has been piqued. You may be surprised by how many actually exist and the types of data they can provide. Imagine the possibilities……
Up Next
In the next blog, we will get into Step 2 – Cleaning the data. This will involve diving into the Pandas library and exploring some of the functionality it provides for formatting and manipulating our data. All in preparation for using it with some unsupervised machine learning algorithms.
Until next time!
Chris
Acknowledgements
Many thanks to my colleague Gabe Sataloff for his help with this blog post. If you have any questions about the C-CAP Atlas API, feel free to contact him at gabe dot sataloff at noaa.gov.
Great blog post Chris.
LikeLike
[…] my last post, I highlighted the similarities between GeoSpatial and Data Science projects as well as […]
LikeLike
[…] posts, I discussed the similarities between geospatial and data science projects. We worked through Step 1: retrieving a data set through an API and Step 2: cleaning the data. Now we are ready to move onto […]
LikeLike