Improving the quantification of peak concentrations for air quality sensors via data weighting

<p>Traditional calibration models for low-cost air quality sensors have demonstrated a tendency to underpredict peak concentrations. We assessed the utility of adding data weights to low-cost sensor colocation data to improve the quantification of peak concentrations when the majority of coloc...

Full description

Bibliographic Details
Published in:Atmospheric Measurement Techniques
Main Authors: C. Frischmon, J. Silberstein, A. Guth, E. Mattson, J. Porter, M. Hannigan
Format: Article
Language:English
Published: Copernicus Publications 2025-07-01
Online Access:https://amt.copernicus.org/articles/18/3147/2025/amt-18-3147-2025.pdf
Description
Summary:<p>Traditional calibration models for low-cost air quality sensors have demonstrated a tendency to underpredict peak concentrations. We assessed the utility of adding data weights to low-cost sensor colocation data to improve the quantification of peak concentrations when the majority of colocation data is at a baseline concentration and varies due to intermittent, transient events. Specifically, we explore the effects of data weighting on three different pollutant colocation datasets: total volatile organic compounds (VOCs), carbon monoxide (CO), and methane (CH<span class="inline-formula"><sub>4</sub></span>). Leveraging two different weighting functions, a sigmoidal and a piecewise weighting regime, we explored the impacts of the base model choice (multilinear regression, MLR, vs. random forest, RF, models), the sensitivity of weighting functions, and the ability of data weighting to improve high-concentration pollution measurements. When compared to unweighted colocation data, we demonstrate significant reductions in both error (root mean square error, RMSE) and bias (mean bias error, MBE) for pollutant peaks across all three datasets when data weighting is employed. For the top percentile of data, we observe an average of 23 % reduction in RMSE and a 35 % reduction in MBE when optimal weights are employed. More significant reductions occurred in the 95th–99th percentile of data, where MBE was reduced by an average of 70 %. RMSE in the 95th-99th percentile was reduced by an average of 26 %. However, data weighting can also generate larger errors at baseline pollutant concentrations. Data weighting regimes were sensitive to input parameters, and input weighting functions may be tuned to better predict peak concentration data without significant reductions in the fidelity of baseline pollutant predictions.</p>
ISSN:1867-1381
1867-8548