DAV Tips: Size Estimates


If you order data through the Digital Coast Data Access Viewer (DAV), how do you know how big the data will be? That seems like a simple enough question but, like most things, it gets a little complicated. I’m going to start with the raster data because that’s pretty simple. It’s the lidar point clouds that get messy.

Raster

Screengrab for the DAV system showing an area in the Snake River data set.
Screengrab for the DAV system showing an area in the Snake River data set.

The raster data includes the imagery, land cover, and digital elevation models (DEM). The image below shows the results after drawing a search box (using the white pencil next to the text geography search box) in the Snake River area. For the DEM, it comes back with an estimate of 7.15 Gigabytes (circled in red). That should be a good estimate of the size of your file if you add it to the cart. So, why isn’t it? The estimate is derived by intersecting the box you drew with the footprint of the data to get an area in Web Mercator projection. The pixel size of the original raster data is then used to estimate how many pixels are needed to fill that area. Finally, we account for how many bytes are needed for each pixel, in this case four since the DEM is a 32-bit floating point raster. So far, so good.

What we didn’t account for is the distortion from using Web Mercator. It’s always too big and it gets worse the farther you are from the equator. In the Great Lakes, the estimate is something like a factor of two too large. In terms of the file you’d download, we also aren’t accounting for any compression we put on the file or the compression zipping the file will provide. The adjustment for compression is highly dependent on the data itself. All we can really say is that it shouldn’t get bigger.

Lidar Point Cloud

Everything I said about rasters will also apply to the lidar point clouds. However, instead of using a pixel size to get the total pixels, we use an approximate point density to estimate how many points there will be. You can see the point estimate in the above image if you look at the first data set in the results where it says “15,671,389,276 Pts”. Lidar is not nearly as uniform as a raster and the point density will vary across a dataset, so we certainly have some error there, but lets assume that we’re relatively correct and only Web Mercator is throwing off our point estimate.

What makes the final file size so much harder to estimate is that you have so many more choices. I’ll split those out by basic type and see what we get.

Points

If you request points, you’re in a fairly good position to estimate how big the file will be. You can choose LAS, LAZ, or ASCII formats for the data. If you choose LAS, you’re typically looking at 28 bytes per point. This will be higher for data with RGB values for each point, but it’s a reasonable estimate for most data in LAS 1.2. You’ll want to bump that up by a couple bytes for newer LAS 1.4 files. Multiple by the number of points and you’re done (408 Gb in our example above, neglecting the Web Mercator issue). If you estimate zipping will compress by a factor of 2 and that Web Mercator is a factor of 2 too high at this latitude, you’re looking at around 100 Gb. Note that the little red triangle next to the point estimate lets you know it’s too big for point cloud processing, even though you could get the same area from the pre-made raster.

If you choose LAZ (compression using laszip), the simplest way would be to do the same estimate as for LAS, but assume a compression ration of 7:1 instead of the 2:1 for zip. Even though we’ll zip the results, it will mostly serve to gather all the pieces and won’t add much compression. So, we’d estimate a file size of 29 Gb. You should be able to get laszip at laszip.org, but it looks a little out of date as I write this. The site has good background, but the version there about doesn’t handle LAS 1.4 fully. Instead, you can get the latest version as one of the free parts of LAStools.

Choosing ASCII currently throws away a lot of information. It only returns the X,Y,Z values. After compression with zip, it ends up in the same ballpark as the LAS files. In general, you’d be better off getting the LAS files and extracting what you want with the free las2txt from LAStools. Even better than that would be to get the LAZ files and do the same thing.

DEM or Raster

Choosing a raster output will generate a DEM from the point cloud. Exactly what you get will depend on interpolation methods chosen, particularly with regard to how much of the result has holes where there is no data. Some methods do a lot of filling, others don’t. File formats can also influence the result as they have different compression capabilities. If an Imagine (*.img) file gets too large, it stores the data in an *.ige file that has no compression at all.

Screengrab of the DAV system illustrating where the metadata link is.
Screengrab of the DAV system illustrating where the metadata link is.

Generally, it’s going to be hard to estimate the size unless there is also a pre-made DEM for the same dataset available. In that case you can just look at what it estimated. If you choose a different cell size, you’ll need to account for that. One approach is to consider that the LAZ file is storing each point in approximately four bytes, the same number of bytes used per pixel in the raster. If you had an estimate of how many points there were per pixel of your output, you can make the adjustment. That kind of information should be in the metadata. To find the metadata, get the details on the dataset by clicking it’s title. There will be a metadata like as shown below.

The metadata link is circled in red, but the information you’re looking for is also in the “metadata-lite” and circled in green. In this particular case, the description also mentions 8 points per square meter. So, if we wanted a 1-meter raster and used the LAZ estimate of 29 Gb, we’d get 3.6 Gb. Not exactly the same as the DEM data set’s estimate, but in the ballpark.

Contours

Finally, we come to estimating contours. Here, all bets are off. I’ve never found a good way to make this estimate because so much depends on the data itself. It’s also easy to make crazy, near-useless, contours by contouring with all the points. That will make huge files, sometimes exceeding the 4 Gb limit on a shapefile. It’s the contouring up and down trees that does it. On the other hand, you could contour only the ground points and set a contour interval higher than the default, resulting in much smaller files.

Kirk Waters

I’m a physical scientist at the NOAA Office for Coastal Management. In my spare time, when I’m not torturing co-workers, I try to fit in some technical work on lidar processing and distribution. I also try to figure out ways to improve the Digital Coast’s data offerings in general. Somewhere in the back of my head there are still a few brain cells that remember satellite ocean color, oceanographic field work, and something about the ozone hole.

Leave a Reply. Comments are moderated.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s