image Exploring the C-CAP Land Cover Atlas using Machine Learning and Python Part 1: Retrieving Data from an API


Introduction

Over the past few years there has been an undeniable surge in interest related to using open source tools for data analysis and modeling. Not only do open source tools provide users with immediate cost savings but they can also be customized to better suit your particular needs and in some cases may exhibit better performance than their proprietary counterparts. Professionals within the fields of GIS and Remote Sensing have recognized these benefits and are leveraging open source tools more and more frequently for their day-to-day workflows. Additionally, some of these tools overlap with those employed by data scientists and provide the means to perform exploratory analysis and predictive modeling with machine learning algorithms. In an effort to shed some light on machine learning and raise awareness about open source tools, I am presenting a short series of blogs that will demonstrate how some of these tools (listed below) can be used to explore land cover change information available from the NOAA Office for Coastal Management’s Coastal Change Analysis Program (C-CAP) Land Cover Atlas (referred to as Atlas).  In case you are not familiar, the C-CAP Atlas has been one of the flagship tools on NOAA’s Digital Coast website providing massive amounts of land cover change information for the coastal U.S. to a non-technical audience. The tool synthesizes data about land cover gains and losses into tables, maps, and reports that are ready for use. But did you know you can access that data directly and perform your own analysis through an API? Wait…Do you know what an API is? No, it is not on tap at your local brewery.

What is an API?

API stands for Application Programming Interface and in the case of the C-CAP Atlas, it is part of the web service that accepts requests for data and sends responses. The C-CAP Atlas API accepts requests for land cover change information for coastal counties and watersheds and returns the change statistics in JSON format which is a typical format for facilitating the transfer of data through an API. This allows the user to ingest the data directly to create visualizations, perform analysis or add it to an application.

Tools used in this blog

Python – high level programming language which uses highly readable syntax

SciPy and Scikit-learn – Python libraries which contain machine learning algorithms for data exploration and modeling

Jupyter Notebooks – web application for sharing code, visualizations and narrative text

Pandas – Python library providing data structures and analysis tools

GeoPandas – Python library providing tools for working with geospatial data

Matplotlib – Library for plotting data and creating visualizations

Data Science and Geospatial

The tools listed above are widely popular among practitioners of data science and it is worth noting that setting up a data science project is not all that different from how one would approach a geospatial one. Typically, before we can produce a map or perform spatial analysis, we have to gather our data sets, review them for consistency, accuracy and appropriateness and then transform them into a common coordinate system or format. Data scientists follow a similar process which for the purposes of this blog, I am distilling into the following steps:

  1. Retrieve the data
  2. Clean the data
  3. Explore the data
  4. Build a model with the data

The objective of this blog entry is to focus on Step 1 – Retrieving the Data. Nowadays, it is quite common for analysts to acquire geospatial data from the web using download sites or ingesting web services. Data scientists frequently rely on APIs for acquiring data. If an API does not exist they use a less desirable the process called Web Scraping which could be another entire blog post. For this demonstration, we will request data from the C-CAP Atlas API and load it into a Pandas dataframe using a few lines of Python code. Does that sound like fun to you? Great, but before we dive in, let’s get some background on the C-CAP Atlas API.

Using the C-CAP Atlas API

The C-CAP Atlas API was built to allow users to stream live data across an internet connection. The API uses OpenData protocals to construct queries against the data. Now let’s walk through accessing the API, identifying the lookup fields we will use to refine the data, and constructing a URL that will be used in our script to pull in the data to be analyzed.

We start by accessing the root url to view the full list of available API function names, also known as resources:

https://coast.noaa.gov/opendata/Landcover/api/v1/

Once we identify a resource of interest, we can access the API metadata for additional information, including the names of each field and the field types:

https://coast.noaa.gov/opendata/Landcover/api/v1/$metadata

For this demonstration, we will be using the resource named “distributionOfChangeGainsLossesByLandcovers”. This resource returns areas of gains and losses for a group of land cover classes for each geography for different year pairs, as seen in the metadata below.

You’ll notice each of the land cover class names are abbreviated. While we tried to make the naming conventions self explanatory, a key for the abbreviations can be found in the table below. A full definition of the land cover classes can be found here:

https://coast.noaa.gov/data/digitalcoast/pdf/ccap-class-scheme-regional.pdf

Abbreviation Full Name Abbreviation Full Name
GrsAreaGain Grass Area Gained GrsAreaLoss Grass Area Lost
SscbAreaGain Scrub/Shrub Area Gained SscbAreaLoss Scrub/Shrub Area Lost
BarAreaGain Barren Area Gained BarAreaLoss Barren Area Lost
WtrAreaGain Water Area Gained WtrAreaLoss Water Area Lost
AgrAreaGain Agricultural Area Gained AgrAreaLoss Agricultural Area Lost
ForAreaGain Forest Area Gained ForAreaLoss Forest Area Lost
WdwAreaGain Woody Wetland Area Gained WdwAreaLoss Woody Wetland Area Lost
EmwAreaGain Emergent Wetland Area Gained EmwAreaLoss Emergent Wetland Area Lost
HIDAreaGain High Intensity Development Area Gained HIDAreaLoss High Intensity Development Area Lost
LIDAreaGain Low Intensity Development Area Gained LIDAreaLoss Low Intensity Development Area Lost
OSDAreaGain Open Space Development Area Gained OSDAreaLoss Open Space Development Area Lost

If we access the base url for the entity, it will return all data available:

https://coast.noaa.gov/opendata/Landcover/api/v1/distributionOfChangeGainsLossesByLandcovers

To limit the data returned to just the items we are interested in, we will query three key lookup fields:

geoid = A geographic identifier, either a 5-digit FIPS code or an 8-digit watershed code

earlyyear = A 4-digit year defining the 1st date of change period (1996, 2001, 2006 or 2010)

lateyear = A 4-digit year defining the 2nd date of change period (1996, 2001, 2006 or 2010)

To construct our query, we will use OpenData URL conventions. Below are a list of the conventions we will used to build our query to limit our data returns to only counties in the state of Florida (FIPS code = 12000) with an early year of 1996 and a late year of 2010.

API URL Conventions

$filter – allows a subset of uri conventions to be applied to the url

startswith – requests data that matches the starting characters provided. Used here to limit the request to the 2-digit state FIPS code of interest

length – restricts the query to responses matching the character length in a field. Used here to limit responses to 5 characters, the length of the full FIPS code. It excludes watershed codes, which are 8 digits.

eq – Will only return data that exactly matches the requested field value

Example API call to retrieve data:

https://coast.noaa.gov/opendata/LandCover/api/v1/distributionOfChangeGainsLossesByLandcovers?$filter=startswith(geoId,’12’) and length(geoId) eq ‘5’ and earlyYear eq ‘1996’ and lateYear eq ‘2010’

If viewing the url in the browser, make sure to include the reserved characters for things like spaces and single quotes as seen below:

https://coast.noaa.gov/opendata/LandCover/api/v1/distributionOfChangeGainsLossesByLandcovers?$filter=startswith(geoId,%20%2712%27)%20and%20length(geoId)%20eq%205and%20earlyYear%20eq%20%271996%27%20and%20lateYear%20eq%20%272010%27

The constructed URL will return the data in JSON format. In the example below, you may notice how similar JSON structure looks to Python dictionaries.

{"@odata.context":"https://csc-s-ims-93-p.coast.noaa.gov/Landcover/api/v1/$metadata#distributionOfChangeGainsLossesByLandcovers","value":[{"geoId":"12001","earlyYear":"1996","lateYear":"2010","GrsAreaGain":29.900000,"SscbAreaGain":63.600000,"BarAreaGain":1.230000,"WtrAreaGain":0.990000,"AgrAreaGain":4.220000,"ForAreaGain":58.500000,"WdwAreaGain":7.610000,"EmwAreaGain":15.320000,"GrsAreaLoss":-44.820000,"SscbAreaLoss":-54.850000,"BarAreaLoss":-0.470000,"WtrAreaLoss":-6.600000,"AgrAreaLoss":-10.640000,"ForAreaLoss":-62.280000,"WdwAreaLoss":-8.820000,"EmwAreaLoss":-4.410000,"HIDAreaGain":2.740000,"LIDAreaGain":5.080000,"OSDAreaGain":4.310000,"HIDAreaLoss":-0.030000,"LIDAreaLoss":-0.090000,"OSDAreaLoss":-0.490000},

Retrieve the Data from the API

Now that we understand how the API request is constructed, let’s start working with our Python script. First, we need to load all of the necessary libraries.

Note: The code from each blog in this series will be available as a Jupyter Notebook that can downloaded here 

import os
import pandas as pd
import requests

%matplotlib inline
from matplotlib import pyplot as plt

To communicate with the API and dynamically request data for different combinations of state FIPS codes, early years, and late years, we need to employ the Requests library. Requests is a library designed to allow users to send HTTP requests with Python. In the function below, we use the method .get to request data from API and load it into a Response object r. The Requests library includes a built in JSON decoder to read the JSON data that is returned and load it into a Pandas dataframe.

def create_dataframe(fips,early_year,late_year):
    """
    Submits request for data to the API and creates formatted pandas dataframe for use with clustering algorithms
    
    fips = string defining the 2-digit FIPS code for your state you of interest
    early_year = string defining the 1st date of change period (1996, 2001, 2006 or 2010)
    late_year = string defining the 2nd date of change period (1996, 2001, 2006 or 2010)
    """
    # Format inputs for url
    fips = "'{}'".format(fips)
    early_year = "'{}'".format(early_year)
    late_year = "'{}'".format(late_year)
    
    #API request for Land Cover data in JSON format
    url = "https://coast.noaa.gov/opendata/LandCover/api/v1/distributionOfChangeGainsLossesByLandcovers?$filter=startswith(geoId, {0}) and length(geoId) eq 5 and earlyYear eq {1} and lateYear eq {2}".format(fips, early_year, late_year)
    
    r = requests.get(url, headers={'Accept': 'application/json'})
    data = r.json()
    lca_df = pd.DataFrame(data['value'])
    
    # Set index to the FIPS code and drop Early and Late Year Fields
    lca_df_index = lca_df.set_index('geoId').drop(['earlyYear', 'lateYear'], axis=1)
    
    return lca_df_index

Now we will execute the function. The C-CAP Atlas has county level land cover change statistics organized by FIPS code and dates of change. C-CAP has mapped land cover for the years 1996, 2001, 2006 and 2010 using imagery acquired by the Landsat suite of satellites. This means we can look at changes in land cover that have occurred over multiple time series such as coarse intervals like 1996-2010 or we can look at finer scale changes over a 5 year period such as 2001-2006.  For this example, again we will select the state of Connecticut (FIPS code:9) and request land cover changes occurring from 2001-2010.

CT_2001_2010 = create_dataframe('09',2001,2010)

 Let’s confirm our new object contains the correct data using the method .head( )

CT_2001_2010.head()

We can also get a list of the column names by using the following code

list(CT_2001_2010.columns.values)
['AgrAreaGain',
 'AgrAreaLoss',
 'BarAreaGain',
 'BarAreaLoss',
 'EmwAreaGain',
 'EmwAreaLoss',
 'ForAreaGain',
 'ForAreaLoss',
 'GrsAreaGain',
 'GrsAreaLoss',
 'HIDAreaGain',
 'HIDAreaLoss',
 'LIDAreaGain',
 'LIDAreaLoss',
 'OSDAreaGain',
 'OSDAreaLoss',
 'SscbAreaGain',
 'SscbAreaLoss',
 'WdwAreaGain',
 'WdwAreaLoss',
 'WtrAreaGain',
 'WtrAreaLoss']

As you can see the list of column names is consistent with the table above. Now we can start playing with our data!

Conclusion

Congratulations! You have just made an API call to the C-CAP Atlas using Python. Was it everything you imagined it would be? Hopefully, you learned something new and your interest in using APIs has been piqued. You may be surprised by how many actually exist and the types of data they can provide. Imagine the possibilities……

Up Next

In the next blog, we will get into Step 2 – Cleaning the data. This will involve diving into the Pandas library and exploring some of the functionality it provides for formatting and manipulating our data. All in preparation for using it with some unsupervised machine learning algorithms.

Until next time!

Chris

Acknowledgements

Many thanks to my colleague Gabe Sataloff for his help with this blog post. If you have any questions about the C-CAP Atlas API, feel free to contact him at gabe dot sataloff at noaa.gov.

3 comments

Leave a Reply. Comments are moderated.

This site uses Akismet to reduce spam. Learn how your comment data is processed.