Virtual Formats – Using Raster VRTs


We’ve quietly started making our imagery and digital elevation models (DEMs) available via the VRT format. The next step is to let you know about them and how they can help you work with large amounts of raster data without having to download it all. That’s what this post is all about.

What’s a VRT?

I admit that I thought that VRT stood for virtual raster table, but I can’t find proof of that. It’s currently called the GDAL Virtual Format and can be used for more than rasters. VRT is the short name and the file extension. A VRT, is essentially a simple XML catalog for some raster files. It has information about the data projection, bands, and each file’s bounds. It could also have information about operations to be done, such as reprojection, but we won’t touch on that much here. VRTs are part of the Geospatial Abstraction Library (GDAL), which supports a great many GIS and remote sensing software packages. You can learn more about the details of a VRT on the GDAL Virtual Format page.

For our purposes, a VRT provides a way to reference a bunch of files as if they are one giant file. There are some some restrictions to what we can put in a VRT that follow logically if we think of it as one big image. for instance, you can’t put in files with different projections or a differing number of bands, just as you couldn’t do that with a single image. However, we can take thousands of image tiles that make up dataset, put them into a VRT, and treat it as a single image.

How do I use a VRT?

Let’s start with a simple example using QGIS (a free open source GIS package) to view some data. If you open this link to the VRT file I’ll use for some 2009 imagery of Kings Bay, GA, you’ll see lots of text referencing the individual image tiles. In QGIS, you select the Raster data type in the Data Source Manager, set the source type to a protocol, and paste in the URI for the VRT.

Loading a VRT in QGIS.

After adding it, it will show up in the map and you can use it like any other image, except you can’t write to it for obvious reasons. This example isn’t a particularly large one as there’s only about two gigabytes of data there. However, it would have taken a lot longer to download a couple of gigabytes compared to the time it took to show up. Zooming in and out works relatively quickly. It’s half meter data, so there is only so far you can zoom in. Part of the reason it can zoom well is because the underlying data that the VRT references is in cloud optimized GeoTiff format, so it only pulls what it needs over the wire.

QGIS with the VRT loaded.

Where do I find these VRTs?

Both NOAA and USGS have been putting up VRTs lately. I’ll review them separately. You can be sure there are a lot of other sources of VRTs too.

NOAA

I’ll start with source I have some control over, the data on the NOAA Digital Coast. If you search for data on the Data Access Viewer, you’ll find a link to ‘Bulk Download’ for each data set. For the data in raster format (i.e. not the lidar point clouds), that bulk download page will have a link to the VRT.

Bulk download page for the Kings Bay, GA, data with the VRT section highlighted.

In addition to the Digital Coast, the National Geodetic Survey has created VRT files for their emergency response imagery. These are primarily post-hurricane imagery. The data and the VRTs are under https://noaa-eri-pds.s3.amazonaws.com/index.html. It takes a little bit for the page to populate and then it will show links to the various datasets. Each dataset tends to be separated into days and the flights in a day, with a VRT for each flight. The VRTs are in with the rest of the data, so use the search in the upper right to find them by entering ‘vrt’. You could also search for vrt on the opening page to see all the VRTs. Note that only the more recent datasets (maybe 2020 and later) have VRTs.

The VRT for imagery collected after Hurricane Ida on August 30, 2021, flight a.

USGS

The US Geological Survey currently has VRTs for the 1/3 arc-second, 1 arc-second, and 2 arc-second seamless elevation data sets. These datasets cover the nation, so that’s a lot of territory in one file. If there is demand, I wouldn’t be surprised if they add VRTs for more data too.

Beyond Viewing

While it’s great that you can use the VRT in QGIS to view the data, what if you want to do more than viewing? You can certainly do that within QGIS where you can treat it like any other dataset. You can also use the applications that come with GDAL to do things like reproject and subset the data. Suppose I have a vector with the outline of South Carolina and I want to extract data from the USGS 1/3 arc-second elevation for South Carolina and reprojected it to State Plane coordinates. I can do something like this:

gdalwarp -cutline sc_outline.shp -crop_to_cutline -t_srs EPSG:2273 /vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/USGS_Seamless_DEM_13.vrt sc_dem13.tif

Notice that I put /vsicurl/ in front of the link to the VRT file. That tells GDAL what driver to use to access the data. The -t_srs EPSG:2273 specifies that the target reference system is South Carolina State Plane feet. The vertical units are still in meters though. You could do a two step process where you make a VRT as output from the gdalwarp and then use gdal_translate to change the vertical units.

gdalwarp -cutline sc_outline.shp -crop_to_cutline -t_srs EPSG:2273 /vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/USGS_Seamless_DEM_13.vrt sc_dem13_vertmeters.vrt
gdal_translate -scale 0 0.3048 0 1 sc_dem13_vertmeters.vrt sc_dem13_vertfeet.tif

The intermediate VRT file is very small and fast to create. Naturally, it’s going to have to download all the raster cells that fall in South Carolina, but it doesn’t have to download the full dataset and I don’t have to figure out which tiles I need for South Carolina from The National Map.

What about ArcGIS?

Many people use some form of ArcGIS to work with geospatial data, so how well do VRTs work there? The ArcPro documentation shows that a VRT will work as it is listed in the raster file formats. However, it has to be a local file as far as I can tell. Trying to give it a URL, with or without the /vsicurl/, does not work. If I missed the trick to using a URL, please tell us in the comments. However, you could make a local VRT that just points to the web VRT. For instance:

gdalbuildvrt noaa.vrt /vsicurl/https://chs.coast.noaa.gov/htdata/raster2/imagery/KingsBayGA_RGB_2009_1107/KingsBayGA_RGB_2009.vrt

or

gdalbuildvrt usgs.vrt /vsicurl/https://prd-tnm.s3.amazonaws.com/StagedProducts/Elevation/13/TIFF/USGS_Seamless_DEM_13.vrt

Oddly, the NOAA one gives some warnings about files that don’t, and have no reason to, exist. The USGS one doesn’t. The warnings don’t matter, but the difference in behavior is puzzling.

Once you have the local VRT, you can add that to a map. Unfortunately, my tests for the Kings Bay, GA, data were much slower for ArcPro than for QGIS, using the same local VRT that references the VRT on the web. QGIS (version 3.24.1) took about 1 minute to load and display. ArcPro (version 2.9.2) took 25 minutes. I don’t know why. It’s possible that ArcPro is ignoring the overviews embedded in the cloud optimized GeoTiffs and downloading all the data.

Conclusion

The primary gain you get from a VRT is an aggregation of potentially lots of files that can be referred to as a single file. It’s not the only way to do something like this. The group of files could be managed as a map or image service, though that requires a lot more management and infrastructure on the part of the data owner. Something like a SpatioTemporal Asset Catalog could also serve this purpose and provide additional flexibility, though they might lack the ability to encapsulate operations like the reprojection as that would be outside their scope.

From the perspective of managing the Digital Coast Data Access Viewer, we have found it very useful to have other people provide VRT files. For example, we can use the VRTs that were created for the emergency response imagery to allow people to do custom processing without having to either download or reference all the individual TIFF files. We simply reference the VRTs and let GDAL pull the data as needed for a job request. That provides greater data access options with less work for us.

One comment

  1. Thanks Kirk, this is a great resource.

    I had a go with adding the VRT and COGs into ArcGIS Pro. I found that the easiest way of doing this was adding the S3 bucket as a cloud connection, then browsing for the relevant files.

    As you said, ArcGIS doesn’t pick up the overviews when using the VRT, so unless you tell it to not generate new image pyramids then it will spend 25 minutes building them.

    It picks up on the individual COG overviews fine though and they load a treat. An option could be to generate overviews on the VRT itself? But at that point, you might as well generate a mosaiced COG (which would be nice too).

    Like

Leave a Reply. Comments are moderated.

This site uses Akismet to reduce spam. Learn how your comment data is processed.