We’ve been offering geospatial data for many years with the ability to customize the output. That means we have a treasure trove of records showing people’s choices of projections, data formats, and a few other things. I thought it would be fun to have a look at the choices people are making and see if we can make any useful observations.
All of our records come from the Digital Coast DAV system. The geospatial data is primarily lidar, imagery, raster DEMs, and land cover, with popularity in approximately that order. For all the data already in a raster format, users can primarily choose the projection and the file format. The lidar data is stored as points and products can be derived, so there are a lot more choices that can be made in addition. So, we can also look at the cell sizes people request for making raster products, or the contour intervals. The stats shown are for requests made between Jan 1, 2018 and July 24, 2019.
By the way, there are already some publicly accessible stats for the DAV system that cover things like dataset popularity and internet domains. I won’t be looking at those as you’re free to do that yourself.
The DAV system happens to use a different default projection for each of the data types. You’ll see the influence of that default in the stats and it either means that the examination of people’s choices is biased by having a default or we’re really good at picking a default for people. I’m inclined to think it’s a bias. So, for people choosing elevation data, their projection selections look like this:
|Projection for elevation||Times Chosen|
|State Plane 27||102|
|State Plane 83||48744|
While the projections chosen for imagery look like:
|Projection for imagery||Times Chosen|
|State Plane 27||18|
|State Plane 83||3811|
The main difference is the switch between State Plane 83 and UTM, although for imagery the choice between the two is much closer. As you might guess, our default projection for each case is the one that was selected most often. As you can see, Albers doesn’t get a lot of interest. It is even lower than State Plane 27, which we hope nobody has to use any longer. However, it is our default for land cover data and we can see the impact in those statistics.
|Projection for land cover||Times Chosen|
|State Plane 27||16|
|State Plane 83||405|
Albers is the most popular for land cover. Given that State Plane 83 is still popular even when it isn’t the default and wildly popular when it is, it may be that we should use State Plane 83 for all types.
Points, Rasters, or Contours
For the elevation data that starts as a lidar point cloud, the user can choose if they want the points or a derived product, such as a raster or contours. Even though we’re starting from points, the default choice is for a raster, so we expect that to be the most popular. We’ll also break out the file format they choose for each type. Here’s what the stats say.
|Product and File Format||Times Chosen|
|Raster – ESRI binary grid||345|
|Raster – Imagine||353|
|Raster – ESRI ASCII grid||1091|
|Points – LAZ||1482|
|Contour – DXF||4269|
|Points – ASCII X,Y,Z||5475|
|Contour – Shapefile||10963|
|Points – LAS||12088|
|Raster – GeoTiff 32-bit||21163|
I find some interesting points in the selections here, but again defaults play a large role. For each type of output, the default file format is the most chosen. For the points, the choice of LAS over LAZ is probably a poor one just from the viewpoint of download times. It would be faster to download the data in LAZ format, and uncompress it to LAS with laszip locally, than to download LAS. There are also a surprising number of people requesting ASCII point data and I suspect these are engineers trying to find a way to get the data into CAD programs.
If you group those format choices by type of product (i.e. raster, point, or contour), you can see that while the default choice, raster, has the most requests, they are all requested a lot.
The DAV system lets you change the horizontal and vertical datum of the data. The default datums, if possible, are NAD83 horizontally and NAVD88 (or island equivalent) vertically. Is the effort to support changing datums worthwhile? Here are the numbers:
|Horizontal Datum||Request Count|
For the vertical, there are more options and you may only get the choice of some vertical datums if you pick the appropriate horizontal datum. For instance, you can’t pick the EGM2008 geoid model unless you picked WGS84 horizontally.The tidal datums (MLLW and MSL) are a bit misleading because only data sets already in those datums have that choice and you can’t change it, so nobody really picked it.
|Vertical Datum (lidar point clouds only)||Request Count|
|Mean Lower Low Water||52|
|WGS84/ITRF Ellipsoid heights||339|
|NAD83 Ellipsoid heights||399|
|Mean Sea Level||496|
|WGS84 with EGM2008||909|
|NAVD88 (NGS GEOID)||54110|
We clearly see that the default of NAVD88 dominates. If someone were to choose the advanced options, they could pick which NGS GEOID model is going to be applied. If they don’t pick the advanced options, they’ll get GEOID12B. Let’s look at that:
|Geoid Model Name||Request Count|
As expected, the default is the dominate choice. However, I’m a little surprised there are people that want some of those older models and worked hard enough to find where to pick them. The choice of Geoid 12A is interesting as the only difference between Geoid 12B (the default) and Geoid 12A is that 12A had mistakes in a couple locations.
One of the reasons I wanted to look at what people choose is the upcoming new reference frames for the USA. We’ll clearly need to add those options, but perhaps it’s time to cull some of the others. We may also end up rebuilding the app to operate as a cloud native service and reducing the complexity would save money. I’d be happy to hear opinions about the options you’d like to see or what other stats you’d find interesting.