“In the beginning… there was Landsat”
This is how Robert Simmon from Planet set the stage for his talk at the first Satellite Data Interoperability Workshop last month, where remote sensing scientists and data providers from around the globe gathered to share their work and talk about industry-wide collaboration on the latest trend towards Analysis Ready Data (ARD) and SpatioTemporal Asset Catalog (STAC). Earth observations (EO) have come a long way since the launch of the first Landsat satellite mission in 1972. At the time, Landsat imagery had to be purchased and it wasn’t until 2008 when USGS offered Landsat 7 Enhanced Thematic Mapper (ETM) free of charge. This led to the release of the entire Landsat archive as open data, and since 2008, the number of open satellite data downloads has increased exponentially due to subsequent Landsat missions and new missions, such as Sentinel 1A-B and 2A-B, that entered the imaging arena post-2014. We have entered into a new frontier in EO where there are more sensors than ever before and a tremendous amount of satellite data that is available at our fingertips. However, the ability to make use of satellite data is still a niche because many users do not know how to find the right imagery or search for the data they need and pre-process it properly to turn raw pixels into actionable insight that is timely and reliable.
Historically, remote sensing scientists knew which agency provided the satellite imagery that they needed and they would want to know the intricacies of that dataset. Commercial providers also preprocessed and sold raw imagery collected for a user’s specific area or season of interest. Now the consumers of satellite data are changing and we have less sophisticated users accessing imagery. With a lot more missions generating satellite data, the complexity of image discovery and processing is also increasing, as well as the disconnect between pixels and actionable decisions. We can no longer waste time finding the best cloud-free image, masking clouds, or performing atmospheric correction. We should be spending more time thinking about the user, their interaction with data, and best practices for developing satellite-derived data products that are readily available and ready to use.
This three-day workshop was one of three ‘sprints’ and was sponsored by Radiant Earth, Planet, and Maxar, along with CosmiQ Works, PCI Geomatics, The Climate Corporation, Astraea, and many more coming on board in the final days before the start of the event. Sprint #1 took place last year in Boulder, CO where participants worked together to define core fields for satellite data interoperability in effort to develop a search engine, like Google, that exposed massive amounts of data in a common way. Participants gathered again for Sprint #2 at USGS in Fort Collins, CO, this time joining with the Open Geospatial Consortium (OGC) to develop a STAC API that was integrated with OGC’s Web Feature Service 3.0. For Sprint #3, USGS hosted again at their office in Menlo Park, CA. The goals of this sprint were to tighten and validate work from previous sprints, develop catalog level metadata, take STAC further to make it more discoverable (the group set a milestone challenge of a billion unique STAC records online), and spearhead an effort to develop standards for ARD.
Photo Credit: Ignacio Zuleta
Purpose-chosen data is new frontier for remote sensing.
Leo Lymburner from Geoscience Australia kicked off the ARD track of the workshop with a presentation that highlighted the need for ARD and introduced the Australian Geoscience Data Cube, explaining “people now want to be able to pull imagery from different sources, not just one sensor”. He used a music streaming analogy to point out that if we wanted to listen to a specific song, we can now play it through Spotify rather than having to buy the whole CD album. The same goes for satellite data. We don’t necessarily want to invest in the entire data archive when need to analyze imagery. It is time to forget about individual sensors and start thinking about creating end products that are timely and asset specific. By making data more readily available, we can easily monitor coastal and riverine fluctuations, develop tidal models that explore shoreline changes over time, detect deforestation in near real-time, identify insect infestation for targeting forestry management practices, provide insights on crop health and distribution to support food security and agricultural monitoring, and track urban growth and new development at a higher temporal frequency.
The new economy of place
“The currency of remote sensing has typically been an image, but with so many pixels being collected, selling pixels to people isn’t effective anymore.” said Will Cadell, CEO and Geospatial Developer at Sparkgeo. “If someone asks for information about a place, we should be able to tell them about that place, not hand over an image.” EO data have become the raw materials and the real opportunity is the ability to turn satellite data into simple, relevant products that help make a decision on the ground. Every pixel is an opportunity to create value and deliver useful products to a user. This has been done well in government, agriculture, defense, and scientific research, but not as much in the many facets of business.
Furthermore, with so much satellite data available, “the role of space agencies is also evolving, so where should we take it?”. This question was posed by Steven Hosford at the European Space Agency (ESA) who spoke about the future direction of the Committee on Earth Observation Satellites (CEOS), a multi-agency organization that provides international coordination of civil space-based EO programs and founding partner in the Open Data Cube (ODC) initiative. CEOS works on satellite data harmonization at many levels through their CARD4L framework, which processes imagery and is organized to allow interoperability and immediate analysis with little effort needed from the user. According to Steven, private company involvement and their feedback is becoming more important for space agencies, therefore a public/private synergistic approach to ARD is desired moving forward.
Going beyond the image: Turning pixels into actionable information
There’s no argument that ARD, STAC, and satellite data interoperability can help us operationalize monitoring rather than dealing with thousands of pixels at each time interval, but we still need to define analysis ready data and interoperability. What does ‘analysis ready’ really mean? We can agree that in an ideal world we want images that are cloud and gap free, acquired at high spatial and temporal resolution, and provide consistent measurement. However, we still need to define what the analysis is. Analysis ready data for what? This is difficult to answer because ARD and interoperability have different meanings for people and ARD is still in the R&D phase, so agreeing on an industry-wide definition is a key qualifier moving forward. At the start of the workshop, ARD was broadly introduced as the baseline and protocol for interoperability across all datasets, with STAC providing the dimension of temporal. Planet defined ARD as “cloud-based, version-controlled, scientific-grade data that come from a single source and are processed at different levels based on information the customer wants”. Digital Globe described ARD simply as “pre-processed and ready to use. There is consistency across sensors and pixel predictability”. Harvest, a NASA Program for food security and agriculture, caters to wide variety of users and develops ARD and ARD products.
It’s also important to note that Google, Planet, NASA, Digital Globe, USGS, OGC, ESA, and other satellite data providers already make ARD products available, and some have been for decades, but they call it something else. “ARD” usually referred to different data processing levels, with USGS’ Landsat processing levels being the most well known. More recently, Google Earth Engine started providing cloud-based geoprocessing of satellite imagery for the non-data scientist and developed a standardized API for EO even though there are different data sources and processing behind the scenes. Today, researchers are developing algorithms in python code to interact with images in a stack, stream pixels to machine learning models, and essentially adjusting imaging to enable virtual indexing of imagery and titled access for higher temporal resolution. EO satellite data interoperability really is thought of as sensor interoperability through operational APIs and machine learning to create a convergence of road maps and decentralized data. Data cubes aim to address then integrate ARD data into a giant repository to provides a time-series multi-dimensional (space, time, data type) stack of spatially aligned pixels for efficient data access and analysis.
However, is ARD for everyone? ARD doesn’t necessarily mean error free and the danger of ARD is that it can lower the barrier to entry and makes it very easy to make very bad map. Ignacio Zuleta at Planet referred to the restored century-old painting of Jesus Christ “Ecce Homo” as a comical example of when the original is fine the way it is and sometimes we should leave it alone. Quality control and error quantification is a must for ARD in order to reduce data misuse and misapplication.
Geometric and radiometric challenges
Along with defining ARD and interoperability standards, there is a need for processing requirements within the EO industry for a scalable global grid of ARD. To achieve true satellite interoperability and more direct access to pixels, we need an industry standard for cross-calibration, but developing standards for ARD products across the industry, multiple sensors, multi-resolutions to integrate data and create a sensor agnostic algorithm framework is difficult. Different methods produce different results (e.g. surface reflectance estimation), so which atmospheric correction models or algorithms do we use? Which preprocessing standards should we develop? Radiometric and geometric calibration is non-contentious within the satellite data community, but the rest can be difficult to find standards because there are many parameters to choose from (e.g. topographic normalization, BRDF normalization, spatial and temporal resampling, etc.) and preprocessing typically depends on the image application and users’ needs.
We could reduce variability using sensor fusion (harmonization) and ARD would ultimately be more powerful if we could take advantage of multiple sensors across space and time, but consistency between sensors is also a challenge and leveraging multiple sensors has been limited. Continuity between sensors (Landsat, Sentinel, MODIS, etc.) is sparse because different sensors are measuring different parts of the spectrum, so the spectral responses will be different and are not necessarily comparable. The relationship between top of atmosphere (TOA) and surface reflectance is well understand, and spectral/radiometric calibration between sensors could be improved over time by the availability of more data for the same location to develop an out-of-the-box co-registration model that could enable quicker analysis. Even so, the challenge of varying spatial resolutions and geolocation due to small shifts between sensor pixels would still remain. Perhaps focusing on the spatial domain and identifying consistent geometry and projects across sensors would be a better place to start.
A growing community of calibrators
As the workshop came to a close and the challenge to deliver on the vision of a multi-mission constellation seemed daunting, hope came from those who insisted that these problems are solvable. First, we must come to an agreement on definition and processing standards (while at the same time take care to not over standardize). As access to data, tools, and computing power increases, so does the availability of ARD inputs, and current machine learning models crave these ARD inputs, which have the ability to catapult EO into new dimensions. Data cubes appear to be the future and the community of EO collaborators is growing towards a stable core. At the end of each day, we shared our work and our stories over good food and drinks where real collaboration and innovation can begin. At the pixel-level.
Photo Credit: Maria A Capellades
For posts and more photos by participants at the ARD+STAC workshop, go to https://twitter.com/hashtag/ARDSTACworkshop?src=hash