Getting Lidar Data for a Polygon


The Digital Coast provides a couple different ways to get lidar data. You can use the Data Access Viewer (DAV) to draw a box around your area or you can go to the bulk download and grab entire data sets or individual tiles. What’s missing is a way to get just the data to fit an odd shape, like a watershed. I’m going to show you a way to do that. This also works on the raster datasets, such as DEMs and imagery, but I’ll use lidar point clouds as an example.

Tile Index Shapefile

For our example, we’re going to look a dataset covering Barrow, Clarke Madison, & Oglethorpe Counties in Georgia funded by Georgia DNR. You can get to the data through the DAV or the bulk download. On the bulk download page, you’ll see a link for the tile index. Grab that.

If you load the tile index into a GIS program, it should look something like the figure below. I’m using ArcMap for this, but QGIS and other GIS software should work fine too, though the details may differ.

Tile index for four counties of lidar are shown as red rectangles on top of a streets basemap.

URL field

That tile index has an important field called ‘URL’ in the attribute table. If we can get a list of the URLs for just the files we want, we can download them fairly easily.

Attribute table for the tile index.

Clipping polygon

Next we need a polygon for the area where we want data. You’ll probably have a meaningful area, like a watershed, but I’m just going to draw one for an example, roughly following some roads.

Now we want to clip out all the tiles from the tile index that fall within our polygon or polygons. Use your clip tool for this, where the tile index is your input layer and the polygon is your clip feature or overlay layer. In ArcMap, the clip tool is in the ArcToolbox under Analysis Tools -> Extract. In QGIS, it is under the Vector -> Geoprocessing Tools tab. My output was a shapefile called clippedtiles, which you can see below in blue.

Text file of URLs

What I want now is all the URLs in the URL field of the attributes for my new shapefile. If I open the attributes table for the clipped tiles, I see that I only have 70 tiles, instead of over 1500, which is much more manageable. The menu in the upper left of the attribute table dialog lets you export to a new table. You can save it as the default dbf type or pick something like csv.

Selecting the table export function from the attribute table dialog.

At this point, we’ve got a file with all our URLs in it. It also has few other columns we don’t care much about. There are lots of ways to grab just the column you want. One easy approach is to just use your favorite spreadsheet program to open the file, delete the columns other than URL, and save it as a text file. Unix savvy folks can simply use the cut command. We won’t need the header, so you can get rid of that too. You could also just do a copy paste of the column you want into a text editor and save it as text. As long as you can get yourself to the point where you have a text file with just the list of the URLs you want.

Download with wget

Finally, we are ready to actually do the download. There are a few different ways you could do this. You could use the powershell Invoke-WebRequest command while looping over all the URLs. You could also do the same thing with curl. I think the easy route is to use wget. Wget comes standard with a Linux machine, but you’ll probably have to install it for Windows. Powershell appears to have a wget, but this is just an alias for Invoke-WebRequest, so don’t be fooled. Here’s the simplest way to use wget, using urllist.txt as our file with the list of URLs.

 wget --input-file urllist.txt 

You’ll probably also want to put some controls on the level of directories that wget creates. By default, it makes a directory from the hostname and then all the directories in the URL. Use -nH to avoid the host directory and –cut-dirs=number to cut the additional directories. In our example, the default would result in a directory structure like coast.noaa.gov/htdata/lidar1_z/geoid12a/data/2617 and all the files would be in that final directory. To get rid of that part, you might use

wget -input-file urllist.txt -nH --cut-dirs=5

Download with uGet

A similar route but with a GUI interface can be had using the uGet open source program. The USGS National Map has a set of instructions for using uGet. While written with the National Map viewer in mind, the concepts are all the same once you have your list of URLs.

Download with Chrome Extension

If you use the Chrome browser, there is a simple mass downloader extension that you can use. I haven’t been able to try it out as it is one of the many things blocked by my organization, but you should be able to copy your list of URLs into the clipboard and then download URLs. The other browsers may also have ways to do similar downloads.

That’s all there is too it. Note that you’re just grabbing the full files that intersect your polygon(s). They aren’t being further clipped to only have the data within the polygon. I’m sure I’ve missed lots of ways to download a list of URLs, so feel free to add your favorites to the comments.

8 comments

  1. Very refreshing to have clear and concise instructions on downloading data from a government website.
    Thank you Kirk!

    Like

  2. Kirk,
    Thanks, this was very helpful. Linking to this post from the NOAA bulk download page was very much appreciated.

    Like

  3. […] That’s it, you’re all done. Whether you used uGet with the new URL list or good old wget or went the custom route, I hope you’ve gotten your lidar data. If you’ve got other suggestions that would make it easier to download the data, please let us know. If you only want the data within a polygon, you might want to check “Getting lidar data with a polygon“. […]

    Like

  4. Great Thanks. But after I download a bunch of tiles in the selected area, how could I combine the tifs into a big one in a quick way without draging them one by one.

    Like

    • That’s a good question and might deserve a post of its own. The answer depends a bit on what software you have, but for a free and open source approach, you can use GDAL (https://gdal.org). You can use the gdal command line tools to do it. For example, given a bunch of tiffs in a directory, you could do:
      gdalbuildvrt combined.vrt *.tif
      which will quickly make an xml file that describes all the tiff files and acts like a single raster for other gdal commands. Then do:
      gdal_translate combined.vrt combined.tif -co BIGTIFF=IF_NEEDED
      which will combined all the files into one. The option about BIGTIFF is to make sure you don’t have a problem if the result is over 4 Gb. If you know you won’t exceed that limit, you don’t need that option.

      Like

  5. With uGet is there a way to prevent overwriting existing files in the download folder? I’ve tried Skip Existing URIs, but that just adds a .0 extension to the downloaded file name. —no-clobber would be nice!

    Like

Leave a Reply. Comments are moderated.

This site uses Akismet to reduce spam. Learn how your comment data is processed.