A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat

Developing rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structura...

Full description

Bibliographic Details
Main Authors: Syed Haleem Shah, Yoseline Angel, Rasmus Houborg, Shawkat Ali, Matthew F. McCabe
Format: Article
Language:English
Published: MDPI AG 2019-04-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/11/8/920
id doaj-713e8c90c6e34a68843852ef6681b30d
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Syed Haleem Shah
Yoseline Angel
Rasmus Houborg
Shawkat Ali
Matthew F. McCabe
spellingShingle Syed Haleem Shah
Yoseline Angel
Rasmus Houborg
Shawkat Ali
Matthew F. McCabe
A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat
Remote Sensing
chlorophyll
wheat
photosynthetic pigment
linear regression
vegetation indices
hyperspectral
leaf
retrieval
prediction
author_facet Syed Haleem Shah
Yoseline Angel
Rasmus Houborg
Shawkat Ali
Matthew F. McCabe
author_sort Syed Haleem Shah
title A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat
title_short A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat
title_full A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat
title_fullStr A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat
title_full_unstemmed A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat
title_sort random forest machine learning approach for the retrieval of leaf chlorophyll content in wheat
publisher MDPI AG
series Remote Sensing
issn 2072-4292
publishDate 2019-04-01
description Developing rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structural and functional dynamics. Traditional methods of direct tissue analysis or the use of handheld meters, are not able to capture chlorophyll variability at anything beyond point scales, so are not particularly useful for informing decisions on plant health and status at the field scale. Examining the spectral response of plants via remote sensing has shown much promise as a means to capture variations in vegetation properties, while offering a non-destructive and scalable approach to monitoring. However, determining the optimum combination of spectra or spectral indices to inform plant response remains an active area of investigation. Here, we explore the use of a machine learning approach to enhance the estimation of leaf chlorophyll (<i>Chl<sub>t</sub></i>), defined as the sum of chlorophyll a and b, from spectral reflectance data. Using an ASD FieldSpec 4 Hi-Res spectroradiometer, 2700 individual leaf hyperspectral reflectance measurements were acquired from wheat plants grown across a gradient of soil salinity and nutrient levels in a greenhouse experiment. The extractable <i>Chl<sub>t</sub></i> was determined from laboratory analysis of 270 collocated samples, each composed of three leaf discs. A random forest regression algorithm was trained against these data, with input predictors based upon (1) reflectance values from 2102 bands across the 400&#8211;2500 nm spectral range; and (2) 45 established vegetation indices. As a benchmark, a standard univariate regression analysis was performed to model the relationship between measured <i>Chl<sub>t</sub></i> and the selected vegetation indices. Results show that the root mean square error (RMSE) was significantly reduced when using the machine learning approach compared to standard linear regression. When exploiting the entire spectral range of individual bands as input variables, the random forest estimated <i>Chl<sub>t</sub></i> with an RMSE of 5.49 &#181;g&#183;cm<sup>&#8722;2</sup> and an <i>R</i><sup>2</sup> of 0.89. Model accuracy was improved when using vegetation indices as input variables, producing an RMSE ranging from 3.62 to 3.91 &#181;g&#183;cm<sup>&#8722;2</sup>, depending on the particular combination of indices selected. In further analysis, input predictors were ranked according to their importance level, and a step-wise reduction in the number of input features (from 45 down to 7) was performed. Implementing this resulted in no significant effect on the RMSE, and showed that much the same prediction accuracy could be obtained by a smaller subset of indices. Importantly, the random forest regression approach identified many important variables that were not good predictors according to their linear regression statistics. Overall, the research illustrates the promise in using established vegetation indices as input variables in a machine learning approach for the enhanced estimation of <i>Chl<sub>t</sub></i> from hyperspectral data.
topic chlorophyll
wheat
photosynthetic pigment
linear regression
vegetation indices
hyperspectral
leaf
retrieval
prediction
url https://www.mdpi.com/2072-4292/11/8/920
work_keys_str_mv AT syedhaleemshah arandomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT yoselineangel arandomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT rasmushouborg arandomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT shawkatali arandomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT matthewfmccabe arandomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT syedhaleemshah randomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT yoselineangel randomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT rasmushouborg randomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT shawkatali randomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
AT matthewfmccabe randomforestmachinelearningapproachfortheretrievalofleafchlorophyllcontentinwheat
_version_ 1716813059784179712
spelling doaj-713e8c90c6e34a68843852ef6681b30d2020-11-24T20:46:16ZengMDPI AGRemote Sensing2072-42922019-04-0111892010.3390/rs11080920rs11080920A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in WheatSyed Haleem Shah0Yoseline Angel1Rasmus Houborg2Shawkat Ali3Matthew F. McCabe4Hydrology, Agriculture and Land Observation Group, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi ArabiaHydrology, Agriculture and Land Observation Group, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi ArabiaPlanet, San Francisco, CA 94107, USAKentville Research and Development Centre, Agriculture and Agri-Food Canada, 32 Main Street Kentville, Kentville, NS B4N 1J5, CanadaHydrology, Agriculture and Land Observation Group, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi ArabiaDeveloping rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structural and functional dynamics. Traditional methods of direct tissue analysis or the use of handheld meters, are not able to capture chlorophyll variability at anything beyond point scales, so are not particularly useful for informing decisions on plant health and status at the field scale. Examining the spectral response of plants via remote sensing has shown much promise as a means to capture variations in vegetation properties, while offering a non-destructive and scalable approach to monitoring. However, determining the optimum combination of spectra or spectral indices to inform plant response remains an active area of investigation. Here, we explore the use of a machine learning approach to enhance the estimation of leaf chlorophyll (<i>Chl<sub>t</sub></i>), defined as the sum of chlorophyll a and b, from spectral reflectance data. Using an ASD FieldSpec 4 Hi-Res spectroradiometer, 2700 individual leaf hyperspectral reflectance measurements were acquired from wheat plants grown across a gradient of soil salinity and nutrient levels in a greenhouse experiment. The extractable <i>Chl<sub>t</sub></i> was determined from laboratory analysis of 270 collocated samples, each composed of three leaf discs. A random forest regression algorithm was trained against these data, with input predictors based upon (1) reflectance values from 2102 bands across the 400&#8211;2500 nm spectral range; and (2) 45 established vegetation indices. As a benchmark, a standard univariate regression analysis was performed to model the relationship between measured <i>Chl<sub>t</sub></i> and the selected vegetation indices. Results show that the root mean square error (RMSE) was significantly reduced when using the machine learning approach compared to standard linear regression. When exploiting the entire spectral range of individual bands as input variables, the random forest estimated <i>Chl<sub>t</sub></i> with an RMSE of 5.49 &#181;g&#183;cm<sup>&#8722;2</sup> and an <i>R</i><sup>2</sup> of 0.89. Model accuracy was improved when using vegetation indices as input variables, producing an RMSE ranging from 3.62 to 3.91 &#181;g&#183;cm<sup>&#8722;2</sup>, depending on the particular combination of indices selected. In further analysis, input predictors were ranked according to their importance level, and a step-wise reduction in the number of input features (from 45 down to 7) was performed. Implementing this resulted in no significant effect on the RMSE, and showed that much the same prediction accuracy could be obtained by a smaller subset of indices. Importantly, the random forest regression approach identified many important variables that were not good predictors according to their linear regression statistics. Overall, the research illustrates the promise in using established vegetation indices as input variables in a machine learning approach for the enhanced estimation of <i>Chl<sub>t</sub></i> from hyperspectral data.https://www.mdpi.com/2072-4292/11/8/920chlorophyllwheatphotosynthetic pigmentlinear regressionvegetation indiceshyperspectralleafretrievalprediction