Lidar has been collected for more than 15 years now along the coast. The earliest data sets the NOAA Office for Coastal Management has out there are from 1996 (so yeah, 18 years) and the same areas continue to be collected. There are, of course, reasons for this; the primary ones being coastal storms, beach/dune change, and sediment transport. And for these dynamic processes there are few techniques that can compete with lidar.
So, what about the non-dynamic or very subtly dynamic areas along the coast? I would include uplands, developed areas, and marshes to some degree in this group. We have tons of data about these areas and they should not have changed much.
The question I am asking (and hopefully answering) is: Do we have, in essence, billions of ground control points now and can we begin to really assess the accuracy of future collections without hitting the field for 2 weeks to collect 150 random, expensive, and extremely widely spaced points. Yeah, each collection has a unique bias and error distribution and could not, itself, give us a fair one-to-one comparison of another data set at the level of detail we are looking at. But what if we had three or more data sets that we could use to look at the ranges, averages, standard deviations, etc. to compare to? And could lidar collection firms, in turn, use this same data to ‘control’ their data? If so, there is money to be saved here (and is it being done already).
I am not, however, proposing or advocating (at this point) for the use of all points in several data sets, although I suppose that is possible, rather points in those non-dynamic and flat areas – parking lots, playing fields, and reasonable vegetated areas – that we already target for ground control.
As a test of the process, I have examined a small area affected by Hurricane/Super Storm Sandy on Long Island (Figure 1). This area has a long history of data, but I used data from 2012, 2011, 2010, and 2007 to assess the accuracy of a Post-Hurricane Sandy data set (2012). As a matter of kinda knowing the answer before starting, the metadata for the Post-Sandy data states that it was collected to a 12.5 cm RMSE vertical accuracy (no mention of land cover), but was measured at 4 cm RMSE from ground points with no bias. As a side note, we see this a lot – the use of collection specifications and measured values in the metadata accuracy assessment, which should you use? Well, hopefully this will shed some light on that as well.
The idea is to use the existing data sets (2012, 2011, 2010, and 2007) to find the average elevation, standard deviation, and ultimately a ‘standard error of the mean’ for a series of points on a flat surface (parking lot), a scrubby forested area, and a marsh. This is the ‘truth’ part. The Post-Sandy data can then be compared to define the residual, the z-score of the residual, and error estimates (RMSE). So for each land cover/patch there will be 100 to 200 points describing the Post-Sandy data, which is orders of magnitudes more than any ground control points (GCPs) used in the area (lucky if there are any in the area). The goal is to use the residuals to define the bias, an average z-score to describe the use of the data with the other data sets, and an RMSE to define the accuracy (this being the big one).
OK – the pink elephant in the corner is the idea that the three or four data sets accurately capture the ‘true elevation.’ The easy answer is: they don’t, but, then neither do standard GCPs. GCPs are generally collected to have accuracies (RMSE) around 2 to 5 cm, but we accept them as ‘truth.’ In the early days (ca. 2006) GCPs were accepted when they had an error of 3x less than the required accuracy of the data being tested and I guess that still holds (e.g., GCPs must have a 3 cm RMSE or better for a lidar data set collected to a 9 cm RMSE accuracy specification).
Let’s use the bare earth (BE) points as an example. There are 159 ground control points in the parking lot (15 m spacing; Figure 1); an elevation was calculated using bilinear interpolation for each BE point from existing 3 m DEMs (a total of 3 DEMs). From those 3 values, an average elevation and standard deviation was computed at each point (Table 1). Because this is being considered a ‘sample’ of the population (although I am having a hard time defining the population of an elevation at a point) the Standard Error of the Mean (SEM) was also computed. The SEM is a nod to the pink elephant, such that the average is not the real ‘truth’ but an estimator of the range of the average.
[table id=3 /]
This is an important point, so I will dwell on it for a bit. For a hypothetical example, let’s assume that the data set being tested matches the average value at all 159 points (extremely unlikely), the computed residual result would be an RMSE of 0 (perfect) but, having respect for the pink elephant, the computed RMSESEM that relates to the SEM (Equation 1) would, using all 159 points, yield an RMSESEM of 2.5 cm. I think this makes sense, since no one would say “yeah the data are perfect” based on a comparison to the average value. However, and I feel compelled to add this, if all the data were exactly the SEM away from the average then it would yield a RMSESEM = 0 (but an RMSE of 2.5 cm). Regardless, I am more concerned about the issue of the average not being the real truth – the first hypothetical situation.
RMSESEM = SQRT(MEAN(ABS(Residuali) – SEMi)2) (1)
I hope that makes sense. In all real world situations it is basically taking the residual and subtracting the SEM to arrive at a corrected residual – call it ResidualSEM. This value is then plugged into the standard formula for determining the RMSE.
One other stat that is not normally calculated is the z-score; this is simply the residual divided by the standard-deviation at each point and helps to determine how well the data set fits into the existing data sets. The z-score of each point is averaged to provide a mean z-score for the patch being compared.
The results are shown in Table 2 and, in general, agree with the metadata’s stated accuracy assessment. The FVA is in the 5 cm range (quoted at 4 cm) and the CVA (using RMSESEM since we have lots of points) is in the 8 cm range. The bad part is that the data is biased– based on bare earth – by close to 5 cm in this area (Figure 2). So, while the data meet the specs quite easily, it would not be good to use the data to compare to the other data (Figure 2). A z-score of -2.1 would further indicate that the data is statistically different (lower in this case) at the 95% confidence level. [table id=4 /]
Some finer points can also be taken from all this info. It suggests that the marsh was probably denser or taller than when flown in the past – as noted by the difference in bias between the other two land covers and the marsh. The large difference between the Post-Sandy and Control data in low vegetation suggests that point classification routines can have large effects on outcomes. The z-score for the bare earth is more worrisome and I would be careful in using the data in this are to compute a volume (see Beach Volume Blog). If I were using these data just at this study area (micro scale) I would, however, feel confident in adjusting (global shift) the data upwards. The same analysis can also be done on a macro scale – at which point, one could create a spatial RMSESEM layer to better assess any large scale patterns.
The negative and positive bias within a couple hundred yards leads to another practical aspect and that is the idea that lidar data in coastal or other areas can be biased upwards or downwards, and that when looking at uncertainty of flooding it should be calculated on both sides of the ‘inundation’ line (e.g., chance of not flooded but shown as flooded is as real a possibility as flooded but not mapped as flooded).