Classification of lidar measurements using supervised and unsupervised machine learning methods
<p>While it is relatively straightforward to automate the processing of lidar signals, it is more difficult to choose periods of “good” measurements to process. Groups use various ad hoc procedures involving either very simple (e.g. signal-to-noise ratio) or more complex procedures <span cl...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2021-01-01
|
Series: | Atmospheric Measurement Techniques |
Online Access: | https://amt.copernicus.org/articles/14/391/2021/amt-14-391-2021.pdf |
id |
doaj-dfc3268180b540d6ae2e04d59c1469c3 |
---|---|
record_format |
Article |
spelling |
doaj-dfc3268180b540d6ae2e04d59c1469c32021-01-18T08:55:07ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482021-01-011439140210.5194/amt-14-391-2021Classification of lidar measurements using supervised and unsupervised machine learning methodsG. Farhani0R. J. Sica1M. J. Daley2Department of Physics and Astronomy, The University of Western Ontario, 1151 Richmond St., London, ON, N6A 3K7, CanadaDepartment of Physics and Astronomy, The University of Western Ontario, 1151 Richmond St., London, ON, N6A 3K7, CanadaDepartment of Computer Science, The Vector Institute for Artificial Intelligence, The University of Western Ontario, 1151 Richmond St., London, ON, N6A 3K7, Canada<p>While it is relatively straightforward to automate the processing of lidar signals, it is more difficult to choose periods of “good” measurements to process. Groups use various ad hoc procedures involving either very simple (e.g. signal-to-noise ratio) or more complex procedures <span class="cit" id="xref_paren.1">(e.g. <a href="#bib1.bibx24">Wing et al.</a>, <a href="#bib1.bibx24">2018</a>)</span> to perform a task that is easy to train humans to perform but is time-consuming. Here, we use machine learning techniques to train the machine to sort the measurements before processing. The presented method is generic and can be applied to most lidars. We test the techniques using measurements from the Purple Crow Lidar (PCL) system located in London, Canada. The PCL has over 200 000 raw profiles in Rayleigh and Raman channels available for classification. We classify raw (level-0) lidar measurements as “clear” sky profiles with strong lidar returns, “bad” profiles, and profiles which are significantly influenced by clouds or aerosol loads. We examined different supervised machine learning algorithms including the random forest, the support vector machine, and the gradient boosting trees, all of which can successfully classify profiles. The algorithms were trained using about 1500 profiles for each PCL channel, selected randomly from different nights of measurements in different years. The success rate of identification for all the channels is above 95 %. We also used the <span class="inline-formula"><i>t</i></span>-distributed stochastic embedding (<span class="inline-formula"><i>t</i></span>-SNE) method, which is an unsupervised algorithm, to cluster our lidar profiles. Because the <span class="inline-formula"><i>t</i></span>-SNE is a data-driven method in which no labelling of the training set is needed, it is an attractive algorithm to find anomalies in lidar profiles. The method has been tested on several nights of measurements from the PCL measurements. The <span class="inline-formula"><i>t</i></span>-SNE can successfully cluster the PCL data profiles into meaningful categories. To demonstrate the use of the technique, we have used the algorithm to identify stratospheric aerosol layers due to wildfires.</p>https://amt.copernicus.org/articles/14/391/2021/amt-14-391-2021.pdf |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
G. Farhani R. J. Sica M. J. Daley |
spellingShingle |
G. Farhani R. J. Sica M. J. Daley Classification of lidar measurements using supervised and unsupervised machine learning methods Atmospheric Measurement Techniques |
author_facet |
G. Farhani R. J. Sica M. J. Daley |
author_sort |
G. Farhani |
title |
Classification of lidar measurements using supervised and unsupervised machine learning methods |
title_short |
Classification of lidar measurements using supervised and unsupervised machine learning methods |
title_full |
Classification of lidar measurements using supervised and unsupervised machine learning methods |
title_fullStr |
Classification of lidar measurements using supervised and unsupervised machine learning methods |
title_full_unstemmed |
Classification of lidar measurements using supervised and unsupervised machine learning methods |
title_sort |
classification of lidar measurements using supervised and unsupervised machine learning methods |
publisher |
Copernicus Publications |
series |
Atmospheric Measurement Techniques |
issn |
1867-1381 1867-8548 |
publishDate |
2021-01-01 |
description |
<p>While it is relatively straightforward to automate the processing of lidar signals, it is more
difficult to choose periods of “good” measurements to process. Groups use various ad hoc
procedures involving either very simple (e.g. signal-to-noise ratio) or more complex procedures
<span class="cit" id="xref_paren.1">(e.g. <a href="#bib1.bibx24">Wing et al.</a>, <a href="#bib1.bibx24">2018</a>)</span> to perform a task that is easy to train humans to perform but is time-consuming. Here, we use machine learning techniques to train the machine to sort the measurements
before processing. The presented method is generic and can be applied to most lidars. We test the
techniques using measurements from the Purple Crow Lidar (PCL) system located in London,
Canada. The PCL has over 200 000 raw profiles in Rayleigh and Raman channels available for
classification. We classify raw (level-0) lidar measurements as “clear” sky profiles with strong
lidar returns, “bad” profiles, and profiles which are significantly influenced by clouds or
aerosol loads. We examined different supervised machine learning algorithms including the random
forest, the support vector machine, and the gradient boosting trees, all of which can successfully
classify profiles. The algorithms were trained using about 1500 profiles for each PCL channel,
selected randomly from different nights of measurements in different years. The success rate of identification for all the channels is above 95 %. We also used the <span class="inline-formula"><i>t</i></span>-distributed stochastic embedding (<span class="inline-formula"><i>t</i></span>-SNE) method, which is an unsupervised algorithm, to cluster our lidar profiles. Because the <span class="inline-formula"><i>t</i></span>-SNE is a data-driven method in which no labelling of the training set is needed, it is an attractive algorithm to find anomalies in lidar profiles. The method has been tested on several nights of measurements from the PCL measurements. The <span class="inline-formula"><i>t</i></span>-SNE can successfully
cluster the PCL data profiles into meaningful categories. To demonstrate the use of the technique,
we have used the algorithm to identify stratospheric aerosol layers due to wildfires.</p> |
url |
https://amt.copernicus.org/articles/14/391/2021/amt-14-391-2021.pdf |
work_keys_str_mv |
AT gfarhani classificationoflidarmeasurementsusingsupervisedandunsupervisedmachinelearningmethods AT rjsica classificationoflidarmeasurementsusingsupervisedandunsupervisedmachinelearningmethods AT mjdaley classificationoflidarmeasurementsusingsupervisedandunsupervisedmachinelearningmethods |
_version_ |
1724333639722860544 |