Satellite Image Price Prediction Based on Machine Learning

This study develops a comprehensive, data-driven framework for predicting satellite imagery prices using four state-of-the-art ensemble learning algorithms: XGBoost, LightGBM, AdaBoost, and CatBoost. Two distinct datasets—optical and Synthetic Aperture Radar (SAR) imagery—were assembled, each charac...

وصف كامل

التفاصيل البيبلوغرافية
الحاوية / القاعدة:Remote Sensing
المؤلفون الرئيسيون: Linhan Yang, Zugang Chen, Guoqing Li
التنسيق: مقال
اللغة:الإنجليزية
منشور في: MDPI AG 2025-06-01
الموضوعات:
الوصول للمادة أونلاين:https://www.mdpi.com/2072-4292/17/12/1960
الوصف
الملخص:This study develops a comprehensive, data-driven framework for predicting satellite imagery prices using four state-of-the-art ensemble learning algorithms: XGBoost, LightGBM, AdaBoost, and CatBoost. Two distinct datasets—optical and Synthetic Aperture Radar (SAR) imagery—were assembled, each characterized by nine technical and economic features (e.g., imaging mode, spatial resolution, satellite manufacturing cost, and acquisition timeliness). Bayesian optimization is employed to systematically tune hyperparameters, thereby minimizing overfitting and maximizing generalization. Models are evaluated on held-out test sets (20% of data) using Pearson’s correlation coefficient (<i>R</i>), mean bias error (MBE), root mean square error (RMSE), unbiased RMSE (ubRMSE), Nash–Sutcliffe Efficiency (NSE), and Kling–Gupta Efficiency (KGE). For optical imagery, the Bayesian-optimized XGBoost model achieves the best performance (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mo>=</mo><mn>0.9870</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>RMSE</mi><mo>=</mo><mi>$</mi><mn>3.44</mn><mo>/</mo><msup><mi>km</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>NSE</mi><mo>=</mo><mn>0.9651</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>KGE</mi><mo>=</mo><mn>0.8950</mn></mrow></semantics></math></inline-formula>), followed closely by CatBoost (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mo>=</mo><mn>0.9826</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>RMSE</mi><mo>=</mo><mi>$</mi><mn>3.83</mn><mo>/</mo><msup><mi>km</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>). For SAR imagery, CatBoost outperforms all others after optimization (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>R</mi><mo>=</mo><mn>0.9278</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>RMSE</mi><mo>=</mo><mi>$</mi><mn>9.94</mn><mo>/</mo><msup><mi>km</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>NSE</mi><mo>=</mo><mn>0.8575</mn></mrow></semantics></math></inline-formula>, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>KGE</mi><mo>=</mo><mn>0.8443</mn></mrow></semantics></math></inline-formula>), reflecting its robustness to heavy-tailed price distributions. AdaBoost also demonstrates competitive accuracy, while LightGBM and XGBoost exhibit larger errors in high-value regimes. SHapley Additive exPlanations (SHAP) analysis reveals that imaging mode and spatial resolution are the primary drivers of price variance across both domains, followed by satellite manufacturing cost and acquisition recency. These insights demonstrate how ensemble models capture nonlinear, high-dimensional interactions that traditional rule-based pricing schemes overlook. Compared to static, experience-driven price brackets, our machine learning approach provides a scalable, transparent, and economically rational pricing engine—adaptable to rapidly changing market conditions and capable of supporting fine-grained, application-specific pricing strategies.
تدمد:2072-4292