Raster Provisioning with Open Source


In June of this year we had to turn off the raster provisioning software for the Digital Coast Data Access Viewer (DAV). That was the package that let us grab tiles of imagery, mosaic and reproject them, and then deliver the output to users. Unfortunately, it used Java 5 and that was considered a security hole. The company had no plans to upgrade to at least Java 7, so there was no choice but to turn it off. That left us with only the ability to point people to a big ftp site that might have hundreds of files for each dataset. That worked fine for some people, but isn’t very user friendly if you only need six tiles and the hundreds of tile names are very similar.

Data Access Viewer schematic
Data Access Viewer (DAV) system schematic after loss of the raster processing engine.

After we turned the raster parts of the DAV system off and started to get a few disgruntled users letting us know how they felt, we decided we had to do something better than the pile of files on ftp. After thinking about our options, I headed down the open source road using the GDAL package. It turned out that figuring out all the bits related to the databases in the DAV system was the hard part and getting GDAL to mosaic and reproject was the easy part. The gdalwarp program is where the core processing happens. The rest is just setting up the information about your data so that you know what files and options to pass to gdalwarp.

Finding Files

The first step is finding the files you’ll need. For this we use a database to store the location on disk and the bounds (as a shape) for each file. We have over hundreds of datasets and generally don’t want to mix them in the final output, so we set an id for each dataset and include that as a field in our table of files. Then we can easily make an SQL query to find the files needed for a given bounding box request. If I’ve made a table by ingesting the output from gdaltindex (so it has a location and shape) plus added the dataset id, I could use an SQL statement like:
SELECT location FROM filesTable
WHERE datasetid = $id and geometry::STGeomFromText(
'POLYGON(($minx $miny, $minx $maxy, $maxx $maxy, $maxx $miny, $minx $miny))',
4269 ).STIntersects(shape) = 1

where the $minx, $miny, etc., are variables for the bounding box.

Reprojection

GDAL provides a number of options for specifying the output projection. If your input data has georeferencing set, gdalwarp will read it, so that’s nice. For our needs, using the EPSG codes that describe a projection seemed the best choice. By making a database table with the EPSG code for each combination of projection (UTM, State Plane, Geographic, Albers), datum (WGS84, NAD83, NAD27), and zone (where applicable), I can easily find what I need and use the -t_srs EPSG:#### (where #### is the code). You can find EPSG codes in a lot of places, including epsg.io.

Resampling

There isn’t a single answer on how to resample, which is a necessary evil when you reproject. Gdalwarp give you several options, so you’ve got choices. The primary issue is to make sure you always use nearest neighbor if you’re working with thematic data (e.g. data where the values represent types/classes, such as land cover). The last thing you want is find that you’ve resampled a location that is between bare land (20) and wetland (18) and called it unconsolidated shore (19). So, thematic data like C-CAP uses the gdalwarp default -r near while imagery could use the more aesthetically pleasing -r cubic.

Clipping

Data Access Viewer screenshot showing area of interest box.
Screenshot of the Data Access Viewer showing the yellow area of interest that will be used for clipping. The “draw area” tool in the upper left is used to draw area.

One of the features of the DAV is clipping to the area you specify. The area of interest (AOI) you draw on the map (using the draw area tool, see figure) is stored in the job database with the rest of the information as noted above. Fortunately, gdalwarp has a few ways to clip. I choose to do it from a shapefile. I created a shapefile from the four corners of the AOI in geographic coordinates and then used ogr2ogr (part of the GDAL package) to reproject the shapefile to the desired coordinates, again using EPSG codes. Once I had my clipping shapefile in the right coordinates, I just needed to add options -crop_to_cutline -cutline aoi.shp (where aoi.shp is my clipping shapefile) to the gdalwarp command line. As a note for the future, the clipping shape doesn’t need to be a box. Perhaps someday we’ll add the ability to clip to things like watersheds or counties.

Putting It Together

Once you’ve got the input files, the reprojection, and the resampling arguments, about the only thing left is the creation options for different file formats. Most of these are straightforward, but there are a few you have to pay attention to. Here are a few of the ones we had to watch for:

We have both three and four band imagery. The default settings for a GeoTiff from the four band imagery will end up using the fourth band as an alpha channel. Giving the creation option of -co ALPHA=NO solves that problem.

For JPEG output, gdalwarp was not willing to do a mosaic and reprojection. The solution was to output in some other format (I used Imagine or HFA) and then use gdal_translate to change to JPEG.

I initially set GeoTIFF output to always make them BigTIFF format to allow larger than 4 Gb files. However, that format isn’t accepted by all software and isn’t needed if the output is less than 4 Gb. If you can predict ahead of time how big the output will be, it’s easy to decide which format you need. However, if you apply compression to the image, you can’t predict the size in advance. The solution I used was to apply the -co BIGGTIFF=IF_SAFER option in gdalwarp, which will use BigTIFF if the output might be over 4 Gb. I then check the actual output size and if it is less than 4 Gb, I use gdal_translate -co BIGTIFF=NO to make a non-BigTIFF version.

There was only one thing that required changing the gdal code itself. We have some thematic data sets, such as C-CAP land cover, that contain an attribute table that provides the meaning for the values stored in the data. For instance, a two might mean forest and the table provides that crosswalk. Unfortunately, gdalwarp does not copy that attribute table, probably under the assumption that it can’t assure it is the same or still applicable to the new data. In our case, it would be applicable, so I changed the gdalwarp code to copy the attribute table of the first image.

Why Open Source?

There are other packages out there that we could have bought, but we didn’t have the money and it still would have required effort to integrate a package into our system. That integration was the bulk of the work. Gdal provided almost everything we needed. The one thing it didn’t do was the attribute table, but since I had the code, I could easily fix that. Having the source makes it much easier to leverage what’s there and add new functions that might be needed.

New DAV Functionality

If you’re a previous user of the DAV, you may wonder what new functionality was been added and what might have been lost with the switch to GDAL. I don’t believe anything was lost, but there were definitely some additions.

  • 16-bit thematic data – this didn’t work in the old system correctly, so we didn’t offer it. The new system works fine, but we have discovered popular software packages that can’t display the 16-bit thematic geoTIFF correctly. That’s a topic for another post though.
  • Metadata – the old system didn’t change the original metadata to reflect changes in projection and area, it just copied over the original and provided a somewhat cryptic text file that gave the new projection information. The new system modifies the metadata to reflect the changes.
  • World files – to support some software packages, primarily CAD, that don’t read the geoTIFF location information, the new system provides world files using the -co TFW=YES option to gdalwarp.
  • Pyramid layers – reduced resolution or pyramid layers to support faster browsing when zoomed out are now generated using the gdaladdo command.

How’s It Working?

The new system went live in early August, approximately six weeks after the old system was taken down. Since then it has worked relatively smoothly and given fewer problems than the old system. As it doesn’t have any license issues, we can run it on multiple machines and have fewer log jams. Overall, I’m pretty happy with how easy it was to take what’s already in GDAL and glue the pieces together (I used Perl as my glue). Personally, I also find great relief in a system where everything is there if you have to make changes instead of reporting a bug to someone and hoping it gets fixed in a future release.

Leave a Reply. Comments are moderated.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s