Quantifying terrestrial ecosystem biomass is an essential part of monitoring carbon stocks and fluxes within the global carbon cycle and optimizing natural resource management. Point cloud data such as from lidar and structure from motion can be effective for quantifying biomass over large areas, but significant challenges remain in developing effective models that allow for such predictions. Inference models that estimate biomass from point clouds are established in many environments, yet, are often scale-dependent, needing to be fitted and applied at the same spatial scale and grid size at which they were developed. Furthermore, training such models typically requires large in situ datasets that are often prohibitively costly or time-consuming to obtain. Here, we present a novel scale- and sensor-invariant framework for efficiently estimating biomass from point clouds. Central to this framework, we present a new algorithm, which we term Assign Points To Existing Clusters (APTEC), developed for finding matches between in situ data and clusters in remotely-sensed point clouds. This algorithm can be used for assessing canopy segmentation accuracy and for training and validating machine learning models for predicting biophysical variables. We demonstrate the algorithm's efficacy by using it to train a random forest model of aboveground biomass in a shrubland environment in Southern Arizona. We show that by learning a nonlinear function to estimate biomass from segmented canopy features, we can reduce error, especially in the presence of inaccurate clusterings, when compared to a traditional, deterministic technique to estimate biomass from remotely measured canopies. Importantly, our random forest on cluster features model extends established methods of training random forest regressions to predict biomass of subplots but requires significantly less training data and is scale invariant. This model reduced mean absolute error, when evaluated on all test data in leave-one-out cross-validation by 41% from deterministic mesquite allometry and 36.2% from the inferred ecosystem-state allometric function on terrestrial lidar data. Our best performing model reduced mean absolute error across all data sources by 22.5%. Our framework should allow for the inference of biomass more efficiently than common subplot methods and more accurately than individual tree segmentation methods in vegetated environments in which accurate segmentation of individual plants is difficult.
Hendryx S.M. (2017): Quantifying biomass from point clouds by connecting representations of ecosystem structure. MS Thesis, Natural Resources, University of Arizona, Tucson, Arizona, 58 pp.