There are some new ways to access lidar data that should be a boon to many of you. Today I’m going to talk about how we’re supporting that access, how you can find the data, and some caveats. A major issue with typical lidar repositories is that you need to pull down many gigabytes of data before you can start to look at it and determine if it’s what you need. There are now easier ways.
Entwine Point Tiles (EPT)
The main new mechanism to show is called Entwine Point Tiles and you can read a lot more about it on the main page entwine.io. For our purposes, the main thing is that the data is organized in such a way that the data can be reasonably streamed over the internet, pulling only the points you need. Like the way an image service only sends back an image at a suitable resolution for your view, EPT files can be queried to return a subset of the points that give you a representation of the area. As you zoom further in, you are requesting higher and higher densities.
A dataset in EPT will contain a lot of files, just like an imagery tile cache does. We’ll only care about one file though, and that’s the ept.json file that describes all the rest. It’s just a text file and it’s human readable.
Software for Viewing
Potree can do more than just view the data. It can also query the data, extract profiles, and make measurements.
As an example, the 2016 Connecticut Statewide collection with 65 billion points displays in a few seconds and looks like the image below:
In that image, only 2 million points are streamed and displayed. If I zoom in quite a bit, I can look at the buildings in Harford and extract a profile (image below). The profile is drawn using the measure tool that looks sort of like an ‘M’. After selecting it, click once to put a red dot at the start of your profile and then drag the dot to the end. At that point, you’ll wonder how to view the profile because nothing clues you in. Open the menu (upper left with three horizontal bars), then open the Scene submenu and click on the Profile listed under Measurements. There will be a “show 2d profile” button in the information that pops up. Click that.
In the menu, you’ll also find a lot of other settings that may be useful to you, including changing how you navigate and how things are colored. Under filters you can turn on and off the different point classifications.
The viewer can also color the points by different attributes. While I’ve shown it colored by elevation, you could choose intensity or classification or something like that. Look under the Scene submenu and click on the Point Cloud you want (there may be more than one), then look for the Attribute section for your choices.
QGIS is a full-blown open source GIS package that is well worth checking out. As of version 3.20, QGIS understands the EPT format. While the ability to do analysis with the data is still under development, the ability to view the points is there. If you open the Data Source Manager (control-L is the shortcut), you can select Point Cloud from the list on the left and then the Protocol selection under Source Type. Enter the address of the ept.json file as the URI under Protocol.
After you hit the add button, the points will come into the map panel. One advantage of QGIS over Potree is that you can put a basemap underneath to more easily see where you are. The screenshot below has the Connecticut points again with the Open Street Maps below them.
I confess that I haven’t worked with QGIS nearly enough at this point to give a proper demonstration of its abilities, but I think this is very promising.
Where’s the Data?
While it’s certainly great that there is software that can stream the points and let you explore without downloading tens to hundreds of gigabytes or more of data, there needs to be data out there to stream. Luckily there is!
Both NOAA and USGS have made available a lot of lidar point clouds. I’ll outline how you can find those ept.json files and links to the 3D viewer like Potree with the ept.json embedded.
We’re going to look at NOAA first because I work for NOAA. The main approach we’ve taken is to leverage our Data Access Viewer (DAV) to include the links in the detailed information about each dataset. Once you do a search, you can click on the dataset title and it will open the details pane, as in the example below. Under the related links are both the Potree, listed as 3D Viewer (Potree NOAA), and the EPT link (EPT NOAA).
As you probably noticed, we also have links to the USGS EPT and Potree viewer too. In general, we’ve tried not to duplicate, but there are some that are. Those are often the datasets where USGS has broken them into multiple EPT locations and NOAA made one large one.
Using the DAV isn’t the only way to find them. If you prefer tables instead of maps, there is a listing of all the NOAA Digital Coast lidar data holdings with columns for the EPT and Potree links.
The NOAA EPT datasets are stored as AWS public datasets. More information and their s3 buckets address can be found at https://registry.opendata.aws/noaa-coastal-lidar/. They are stored there under directory “entwine”. We are also putting the LAZ files in orthometric heights under “laz”. Both sets use an ID for the directory naming to separate the collections. That ID is one of the columns in the lidar data holdings table, so you can figure out the names for each dataset. Since they are in cloud storage, this also means that you could derive your own products from the data without having to move lots of bytes over the wire, but you would need to pay the compute costs. As a public dataset, you don’t have to pay egress costs if you do download the data.
Like the NOAA lidar data, the USGS 3DEP data is also on AWS and was there first. We put the NOAA data on AWS because that’s where the USGS data is and we hoped to make computing easier for anyone needing data from both. There is an easier, map-based, way to find the data though. The folks at Hobu, who created the open source entwine software, also stood up the USGS data as EPT and host a mapping web site for it at usgs.entwine.io. You can simply click on a polygon and get the links to both Potree and Cesium viewers. There is also a table with all the names, links, and point counts below the map.
There are some other ways to get to the USGS 3DEP data that include more information about the datasets, including links to the metadata, such as their Work Extent Spatial Metadata geopackage file (WESM.gpkg on the USGS website or somewhere in the AWS s3 bucket above).
US Interagency Elevation Inventory
The EPT and Potree links for both NOAA and USGS have also been added to the US Interagency Elevation Inventory, which should aid in your one-stop shopping. However, it does not get updated as often as the Data Access Viewer or the usgs.entwine.io site, so there are sure to be some links missing. There are also some issues we need to resolve in matching the USGS work units used for the EPT publication, with the USGS projects used in the elevation inventory. Thus, you’ll often see several USGS EPT links for one dataset in the elevation inventory. These can be combined in the Potree link and we’ve tried to do that for you. Those same links are in the Data Access Viewer for coastal datasets. Finding the links in the elevation inventory may seem a bit hit-or-miss because the inventory has a lot more datasets than just the USGS 3DEP or the NOAA holdings, so only around half or less of the datasets have those links. Clicking on a dataset name in the search results will bring up the details with a row for the EPT and 3D viewer links.
More Than Viewing?
I pointed out some ways to view and explore the data, but what if you need to do some computations. Perhaps create an elevation model or a viewshed. You can use those ept.json files and still only stream the points that you need. You can do that using the Point Data Abstraction Library (PDAL). PDAL has lots of ready to use tools already built on top of the library, like the better known GDAL for raster work, so you don’t need to code to use it. Look for the readers.ept to stream the data. PDAL is also an open source project from Hobu. You don’t have to build it yourself either. It’s installable using Conda and there’s a Docker image too.
At least a couple of sites already leverage PDAL to work with the data in EPT format. OpenTopography allows people in the education community to use their processing engine to access the USGS 3DEP data in EPT format and create products. The NOAA Data Access Viewer can also do that, although only coastal datasets have been added and far fewer of them. More will be added to reduce storage. USGS’s Lidar Explorer also has some limited ways to view and process the EPT files via PDAL pipelines.
PDAL isn’t the only option either. Safe FME has EPT support, there is a Python library for manipulating and fetching EPT data at https://pypi.org/project/ept-python/. I expect there will be more options to come.
As an example of what you could do that would be quite hard without EPT, I recently had someone from a major bank wanting to know if there was a way to get the elevations to match his millions of lat lon points around the country. There isn’t a service to do that with the lidar data currently and even if an agency built one, it would probably only return data from their holdings. However, you could take all those ept.json URLs and build a system to do it by making requests to stream the data in a small area around each point. I don’t think it would be too hard, but I’ll leave that for another day.
LAS 1.4 Issue
There is one issue you should be aware of when using data stored as EPT. While the conversion to EPT is primarily one of data organization to make the streaming possible, it also converts to LAS 1.2 if the data was originally in LAS 1.4. That might not mean much to you and it usually isn’t a big problem. An aspect that is worth noting is that point classes over 31 are stored in a way that can cause issues. In LAS 1.2, a single byte is used to store both the point class (5 bits or 0-31) and some class flags (3 bits for things like withheld). In LAS 1.4, the point class gets the full byte and can be 0-256. In the EPT conversion of a LAS 1.4 file, the class flags are stored as an extra bytes field and the point class byte is stored as is. That could be okay if the software reading it expected to read 8-bits for the class instead of 5, since it’s LAS 1.2 now. That’s unlikely to be the case.
Most of the classes you’ll find in the USGS or NOAA data are under 32, so it isn’t much of an issue, but the topobathy domain profile puts the bathymetry related points in classes in the 40s. As an example, 40 is the class number used for bathymetric points if the domain profile is followed. If you only look at the first five bits, a 40 looks like an 8 and that’s what Potree will think the point class is. This can cause two different point classes to be interpreted as the same class.
Note that there isn’t currently an easy way out of this. One of the reasons the entwine software converts to LAS 1.2 is because that’s the format the various readers could handle. For example, Potree doesn’t understand LAS 1.4.
This does give us an issue in the NOAA front-end to the Potree viewer. For each of our ept.json files, I made a side-car meta.json file that has the dataset title and the correct class scheme for that data set. The class scheme often doesn’t match the default ASPRS LAS 1.4 class scheme that Potree and other softwares usually assume. In my opinion, that’s pretty useful to have, but if I marked class 40 as being bathymetry, you may notice that you can’t turn on and off the bathymetry points. That’s because Potree thinks all those points are 8s, not 40s. In fact, if you turn off all the classes, you may still see points and they will have class numbers that aren’t supposed to exist.
Cloud Optimized Point Cloud Format
This isn’t a reality yet, but there is work on a cloud optimized point cloud format (COPC). The idea is to essentially put the organization used by EPT into a single LAZ file. The COPC would use LAS 1.4. Since it would be an LAZ file, many readers would already be able to handle it with no change. The organizational information that allows efficient streaming would be in an extended variable length header that could be used by a reader that knows to look for it or ignored by other readers. There’s no software available to build COPC files yet, but I’m sure there will be. I’m expecting that this is the direction we’ll be moving in as it develops. Since it’s still early days for COPC and they’re working on the specification, this is also a good time to get engaged and add your views on what we need for the future.
Well, this turned out to be pretty long. Much of this has been a long time in development and there’s a lot of interconnected stuff. There are still flaws and glitches to work out. The main message is that we’re starting to move past the point where you had to download a lot of data if you wanted to look at lidar point clouds.