Classification of lidar measurements using supervised and unsupervised machine learning methods

<p>While it is relatively straightforward to automate the processing of lidar signals, it is more difficult to choose periods of “good” measurements to process. Groups use various ad hoc procedures involving either very simple (e.g. signal-to-noise ratio) or more complex procedures <span cl...

Full description

Bibliographic Details
Main Authors: G. Farhani, R. J. Sica, M. J. Daley
Format: Article
Language:English
Published: Copernicus Publications 2021-01-01
Series:Atmospheric Measurement Techniques
Online Access:https://amt.copernicus.org/articles/14/391/2021/amt-14-391-2021.pdf
id doaj-dfc3268180b540d6ae2e04d59c1469c3
record_format Article
spelling doaj-dfc3268180b540d6ae2e04d59c1469c32021-01-18T08:55:07ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482021-01-011439140210.5194/amt-14-391-2021Classification of lidar measurements using supervised and unsupervised machine learning methodsG. Farhani0R. J. Sica1M. J. Daley2Department of Physics and Astronomy, The University of Western Ontario, 1151 Richmond St., London, ON, N6A 3K7, CanadaDepartment of Physics and Astronomy, The University of Western Ontario, 1151 Richmond St., London, ON, N6A 3K7, CanadaDepartment of Computer Science, The Vector Institute for Artificial Intelligence, The University of Western Ontario, 1151 Richmond St., London, ON, N6A 3K7, Canada<p>While it is relatively straightforward to automate the processing of lidar signals, it is more difficult to choose periods of “good” measurements to process. Groups use various ad hoc procedures involving either very simple (e.g. signal-to-noise ratio) or more complex procedures <span class="cit" id="xref_paren.1">(e.g. <a href="#bib1.bibx24">Wing et al.</a>, <a href="#bib1.bibx24">2018</a>)</span> to perform a task that is easy to train humans to perform but is time-consuming. Here, we use machine learning techniques to train the machine to sort the measurements before processing. The presented method is generic and can be applied to most lidars. We test the techniques using measurements from the Purple Crow Lidar (PCL) system located in London, Canada. The PCL has over 200 000 raw profiles in Rayleigh and Raman channels available for classification. We classify raw (level-0) lidar measurements as “clear” sky profiles with strong lidar returns, “bad” profiles, and profiles which are significantly influenced by clouds or aerosol loads. We examined different supervised machine learning algorithms including the random forest, the support vector machine, and the gradient boosting trees, all of which can successfully classify profiles. The algorithms were trained using about 1500 profiles for each PCL channel, selected randomly from different nights of measurements in different years. The success rate of identification for all the channels is above 95 %. We also used the <span class="inline-formula"><i>t</i></span>-distributed stochastic embedding (<span class="inline-formula"><i>t</i></span>-SNE) method, which is an unsupervised algorithm, to cluster our lidar profiles. Because the <span class="inline-formula"><i>t</i></span>-SNE is a data-driven method in which no labelling of the training set is needed, it is an attractive algorithm to find anomalies in lidar profiles. The method has been tested on several nights of measurements from the PCL measurements. The <span class="inline-formula"><i>t</i></span>-SNE can successfully cluster the PCL data profiles into meaningful categories. To demonstrate the use of the technique, we have used the algorithm to identify stratospheric aerosol layers due to wildfires.</p>https://amt.copernicus.org/articles/14/391/2021/amt-14-391-2021.pdf
collection DOAJ
language English
format Article
sources DOAJ
author G. Farhani
R. J. Sica
M. J. Daley
spellingShingle G. Farhani
R. J. Sica
M. J. Daley
Classification of lidar measurements using supervised and unsupervised machine learning methods
Atmospheric Measurement Techniques
author_facet G. Farhani
R. J. Sica
M. J. Daley
author_sort G. Farhani
title Classification of lidar measurements using supervised and unsupervised machine learning methods
title_short Classification of lidar measurements using supervised and unsupervised machine learning methods
title_full Classification of lidar measurements using supervised and unsupervised machine learning methods
title_fullStr Classification of lidar measurements using supervised and unsupervised machine learning methods
title_full_unstemmed Classification of lidar measurements using supervised and unsupervised machine learning methods
title_sort classification of lidar measurements using supervised and unsupervised machine learning methods
publisher Copernicus Publications
series Atmospheric Measurement Techniques
issn 1867-1381
1867-8548
publishDate 2021-01-01
description <p>While it is relatively straightforward to automate the processing of lidar signals, it is more difficult to choose periods of “good” measurements to process. Groups use various ad hoc procedures involving either very simple (e.g. signal-to-noise ratio) or more complex procedures <span class="cit" id="xref_paren.1">(e.g. <a href="#bib1.bibx24">Wing et al.</a>, <a href="#bib1.bibx24">2018</a>)</span> to perform a task that is easy to train humans to perform but is time-consuming. Here, we use machine learning techniques to train the machine to sort the measurements before processing. The presented method is generic and can be applied to most lidars. We test the techniques using measurements from the Purple Crow Lidar (PCL) system located in London, Canada. The PCL has over 200 000 raw profiles in Rayleigh and Raman channels available for classification. We classify raw (level-0) lidar measurements as “clear” sky profiles with strong lidar returns, “bad” profiles, and profiles which are significantly influenced by clouds or aerosol loads. We examined different supervised machine learning algorithms including the random forest, the support vector machine, and the gradient boosting trees, all of which can successfully classify profiles. The algorithms were trained using about 1500 profiles for each PCL channel, selected randomly from different nights of measurements in different years. The success rate of identification for all the channels is above 95 %. We also used the <span class="inline-formula"><i>t</i></span>-distributed stochastic embedding (<span class="inline-formula"><i>t</i></span>-SNE) method, which is an unsupervised algorithm, to cluster our lidar profiles. Because the <span class="inline-formula"><i>t</i></span>-SNE is a data-driven method in which no labelling of the training set is needed, it is an attractive algorithm to find anomalies in lidar profiles. The method has been tested on several nights of measurements from the PCL measurements. The <span class="inline-formula"><i>t</i></span>-SNE can successfully cluster the PCL data profiles into meaningful categories. To demonstrate the use of the technique, we have used the algorithm to identify stratospheric aerosol layers due to wildfires.</p>
url https://amt.copernicus.org/articles/14/391/2021/amt-14-391-2021.pdf
work_keys_str_mv AT gfarhani classificationoflidarmeasurementsusingsupervisedandunsupervisedmachinelearningmethods
AT rjsica classificationoflidarmeasurementsusingsupervisedandunsupervisedmachinelearningmethods
AT mjdaley classificationoflidarmeasurementsusingsupervisedandunsupervisedmachinelearningmethods
_version_ 1724333639722860544