LI Delun,XIAO Zhixiang,XIE Ningxin,et al.A Study on the Adjusting Spring and Summer Surface Air Temperature of ECMWF Model by a Hybrid Feature Selection Method in Machine Learning of Guangxi[J].Journal of Chengdu University of Information Technology,2023,38(05):602-609.[doi:10.16836/j.cnki.jcuit.2023.05.016]
机器学习中混合特征选择对模式预报广西春夏气温的订正研究
- Title:
- A Study on the Adjusting Spring and Summer Surface Air Temperature of ECMWF Model by a Hybrid Feature Selection Method in Machine Learning of Guangxi
- 文章编号:
- 2096-1618(2023)05-0602-08
- Keywords:
- atmospheric science; temperature forecast; machine learning; hybrid feature selection; 2 m temperature correction
- 分类号:
- P457.3
- 文献标志码:
- A
- 摘要:
- 针对机器学习中单一特征选择方法性能不优良,结果稳定性差的问题,提出Spearman相关系数和XGBoost特征重要性混合的特征选择方法(SpearmanXgb),并结合RF、XGBoost和LightGBM 3种机器学习算法对ECMWF模式预报的广西春夏近地面2 m气温进行订正。结果表明:(1)混合特征选择方法在训练时间和均方根误差两方面,均优于单一的Spearman相关系数和XGBoost特征重要性特征选择方法,即训练时间减少19.7%和10.3%,均方根误差下降0.94%和0.64%。(2)3种模型预测的气温平均均方根误差相比模式分别下降了7.04%、7.47%和7.37%; 预报前期(24~96 h)XGBoost的预报效果较好,预报中后期(120~240 h)LightGBM的预报效果较好。(3)由于广西东南部和东北部地形以山地、丘陵为主,地形较复杂,且易受台风、华南前汛期等复杂天气过程影响,气温变化幅度较大,ECMWF模式和3种机器学习模型对这两个地区的预报误差都较高。(4)利用SHAP值分析模型结果对各特征取值幅度的敏感程度,检验表明更准确的入选特征可不同程度降低模型的RMSE,为改善ECMWF模式预报效果提供了思路。
- Abstract:
- Aiming at the poor performance and unstable result of single feature selection method in machine learning feature selections,a hybrid feature selection method(SpearmanXgb)combined with Spearman correlation coefficient and XGBoost feature importance is proposed. Then three machine learning algorithms(i.e.RF,XGBoost and LightGBM) are selected to correct the near-surface 2 m air temperature in spring and summer of Guangxi predicted by the ECMWF model. Results show that:(1)The hybrid feature selection method outperforms the single feature selection method in terms of training time and root mean square error(RMSE), i.e., the training time is reduced by 19.7% and 10.3%,and the RMSE is decreased by 0.94% and 0.64%, respectively.(2)Compared with the ECMWF model, the average RMSE of the three models decreases by 7.04%, 7.47% and 7.37%, respectively.XGBoost performs better in the early forecast hours(24-96 h), while LightGBM does well in the middle and late hours(120-240 h).(3)Due to both the southeastern and northeastern Guangxi are complex underlying surface with mountainous and hilly, and easily suffer fromcomplex weather processes such as typhoons and the first rainy season in South China, inducing vigorous daily variation of surface temperature over these two regions. Therefore, errors of the ECMWF model and three machine learning models are high.(4)Sensitivity of model results to values of each feature is examined by using the SHAP value. And the RMSE can be reduced to some extent by further tests with more accuracy on incoming features, which provides an idea for improving the forecast effect of the ECMWF model.
参考文献/References:
[1] 王焕毅,谭政华,杨萌,等.三种数值模式气温预报产品的检验及误差订正方法研究[J].气象与环境学报,2018,34(1):22-29.
[2] 金巍,刘卫华,高凌峰,等.辽宁地区ECMWF模式气温预报检验及误差订正研究[J].气象与环境学报,2020,36(6):50-57.
[3] 冯景瑜,慕秀香,张莹莹,等.基于地形因素的吉林省ECMWF气温预报订正方法研究[J].气象灾害防御,2021,28(3):12-17.
[4] 王丹,戴昌明,娄盼星,等.陕西ECMWF、GRAPES_Meso和SCMOC气温预报的对比检验及订正[J].干旱气象,2021,39(4):697-708.
[5] 蔡凝昊,俞剑蔚.基于数值模式误差分析的气温预报方法[J].大气科学学报,2019,42(6):864-873.
[6] 齐铎,刘松涛,张天华,等.基于格点的中国东北中北部2m温度数值预报检验及偏差订正[J].干旱气象,2020,38(1):81-88.
[7] Alerskans E,Kaas E.Local temperature forecasts based on statistical post-processing of numerical weather prediction data[J].Meteorological Applications,2021,28(4):1-21.
[8] 谭江红,陈伟亮,王珊珊.一种机器学习方法在湖北定时气温预报中的应用试验[J].气象科技进展,2018,8(5):46-50.
[9] 门晓磊,焦瑞莉,王鼎,等.基于机器学习的华北气温多模式集合预报的订正方法[J].气象与环境研究,2019,24(1):116-124.
[10] 陈有龙,宁雨珂,唐荣年,等.基于时空独立的随机森林模型对海南热带气温数值预报的订正[J].海南大学学报(自然科学版),2020,38(4):356-364.
[11] Cho D,Yoo C,Im J,et al.Comparative assesment of various machine learning-based bias correction methods for numerical weather prediction model forecasts of extreme air temperatures in urban areas[J].American Geophysical Union,2020,7:1-18.
[12] Ikram S T,Cherukuri A K.Intrusion Detection Model Using fusion of Chi-square feature selection and multi class SVM[J].Journal of King Saud University-Computer and Information Sciences,2017,29:462-472.
[13] Feng Y,Akiyama H,Lu L,et al.Feature selection for machine learning based early detection of distributed cyber attacks[C].Proceeding of the2018 IEEE 16th International Conference on Dependable,Autonomic and Secure Computing,16th International Conference on Pervasive Intelligence and Computing,4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress,2018,173-180.
[14] 田剑刚,张沛,彭春华,等.基于分时长短期记忆神经网络的光伏发电超短期功率预测[J].现代电力,2020,37(6):629-637.
[15] 贾焱鑫,陈翔,葛骅,等.ORESP:基于有序回归的软件缺陷严重程度预测方法[J].计算机应用研究,2021,38(6):1815-1818.
[16] 安宇,陈桂芬,李静.基于递归特征消除和随机森林融合算法的大豆前体MicroRNA预测模型研究[J].大豆科学,2020,39(3):401-405.
[17] 黄秋丽,黄柱兴,杨燕.基于递归特征消除和Stacking集成学习的股票预测实证研究[J].南宁师范大学学报(自然科学版),2021,38(3):37-43.
[18] 岳鹏,侯凌燕,杨大利,等.基于XGBoost特征选择的疾病诊断XLC-Stacking方法[J].计算机工程与应用,2020,56(17):136-141.
[19] 乔楠,李振兴,赵国生.XGBoost-RF的物联网入侵检测模型[J].小型微型计算机系统,2022,1(43):152-158.
[20] Bolón-Canedo V,Alonso-Betanzos A. Ensembles for feature selection:A review and future trends[J]. Information Fusion,2019,52:1-12.
[21] 谢勇,项薇,季孟忠,等.基于Xgboost和LightGBM算法预测住房月租金的应用分析[J].计算机应用与件,2019,36(9):151-155,191.
[22] Arya S,Seho L,Anuj K,et al.Exploratory analysis of machine learning methods in predicting subsurface temperature and geothermal gradient of Northeastern United States[J].Geotherm Energy,2021,9:18.
[23] 潘留杰,张宏芳,朱伟军,等.ECMWF 模式对东北半球气象要素场预报能力的检验[J]. 气候与环境研究,2013,18(1):111-123.
[24] Xu H,Deng Y.Dependent Evidence Combination Based on Shearman Coefficient and Pearson Coefficient[J].IEEE Access,2017,6:11634-11640.
[25] 赵鑫.基于机载雷达的森林地上生物量估测研究[D].西安:西安科技大学,2020.
[26] Chen T Q,Guestrin C.XGBoost:A Scalable Tree Boosting System[C].Procee-dings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD).San Francisco,CA,USA,2016:785-794.
[27] Breiman L.Random Forests[J].Machine Learning,2001,45(1):5-32.
[28] 黄颖,杨会杰.基于XGBoost和LSTM模型的金融时间序列预测[J].科技和产业,2021,21(8):158-162.
[29] Lundberg S M,Lee S I. A unified approach to interpreting model predictio-ns[J].Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS’17),2017,31:4768-4777.
[30] Futagami K,Fukazawa Y,Kapoor N,et al.Pairwise acquisition prediction with SHAP value interpretation[J].The Journal of Finance and Data Science,2021(7):22-44.
[31] Gu X,See K,Wang Y,et al.The Sliding Window and SHAP Theory——An Improved System with a Long Short-Term Memory Network Model for State of Charge Prediction in Electric Vehicle Application[J].Energies,2021,14:3692.
[32] 罗妍,王枞,叶文玲.基于XGBoost和SHAP的急性肾损伤可解释预测模型[J].电子与信息学报,2022,44(1):27-38.
[33] 王奕森,夏树涛.集成学习之随机森林算法综述[J].信息通信技术,2018,12(1):49-55.
[34] 张亚伟,陈瑞凤,徐春婕,等.基于LSTM-LightGBM模型的车站环境温度预测[J].计算机测量与控制,2022,30(1):20-25.
[35] 王佃来,宿爱霞,刘文萍.基于Spearman等级系数的植被变化趋势分析[J].应用科学学报,2019,37(4):519-528.
相似文献/References:
[1]梁家豪,陈科艺,李 毓.WRF模式中积云对流参数化方案对南海土台风“Ryan”模拟的影响研究[J].成都信息工程大学学报,2019,(02):162.[doi:10.16836/j.cnki.jcuit.2019.02.010]
LIANG Jiahao,CHEN Keyi,LI Yu.The Impact of Different Cumulus Parameterization Schemes of the WRF
Model on the Typhoon “Ryan” Simulation over the South China Sea[J].Journal of Chengdu University of Information Technology,2019,(05):162.[doi:10.16836/j.cnki.jcuit.2019.02.010]
[2]廖 琦,肖天贵,金荣花.东亚副热带西风急流年际变化特征分析[J].成都信息工程大学学报,2018,(01):68.[doi:10.16836/j.cnki.jcuit.2018.01.013]
LIAO Qi,XIAO Tian-Gui,JIN Rong Hua.Analysis on Inter-annual Variation of EastAsian Subtropical Westerly Jet[J].Journal of Chengdu University of Information Technology,2018,(05):68.[doi:10.16836/j.cnki.jcuit.2018.01.013]
[3]高清泉,韩瑽琤,肖天贵.微波通信链路监测降水试验及可行性探究[J].成都信息工程大学学报,2018,(02):197.[doi:10.16836/j.cnki.jcuit.2018.02.015]
GAO Qing-quan,HAN Cong-cheng,XIAO Tian-gui.Feasibility Study of Microwave CommunicationLink for Rainfall Monitoring Purposes[J].Journal of Chengdu University of Information Technology,2018,(05):197.[doi:10.16836/j.cnki.jcuit.2018.02.015]
[4]黄 瑶,肖天贵,刘思齐.2016年7月四川持续性强降水的中尺度滤波分析[J].成都信息工程大学学报,2018,(03):307.[doi:10.16836/j.cnki.jcuit.2018.03.014]
HUANG Yao,XIAO Tian-gui,LIU Si-qi.Mesoscale Filtering Analysis of Persistent Heavy Rainfall in Sichuan in July 2016[J].Journal of Chengdu University of Information Technology,2018,(05):307.[doi:10.16836/j.cnki.jcuit.2018.03.014]
[5]李雅婷,苏德斌,孙晓光,等.四川盆地风廓线雷达大气折射率结构常数特征分析[J].成都信息工程大学学报,2018,(04):375.[doi:10.16836/j.cnki.jcuit.2018.04.005]
LI Ya-ting,SU De-bin,SUN Xiao-guang,et al.Characteristic Analysis of Atmospheric Structure Constant of Refractive Index of
Sichuan Basin based on Wind Profiler Radar[J].Journal of Chengdu University of Information Technology,2018,(05):375.[doi:10.16836/j.cnki.jcuit.2018.04.005]
[6]石 宇,肖子牛,朱克云.夏季角动量输送变化与中国东部降水的关系[J].成都信息工程大学学报,2018,(04):456.[doi:10.16836/j.cnki.jcuit.2018.04.016]
SHI Yu,XIAO Zi-niu,ZHU Ke-yun.Relationship between Angular Momentum Transportand Precipitation in Eastern China in Summer[J].Journal of Chengdu University of Information Technology,2018,(05):456.[doi:10.16836/j.cnki.jcuit.2018.04.016]
[7]宾 昕,程志刚,王俊锋,等.近17a秦巴山区NDVI季节变化差异及其海拔依赖性特征分析[J].成都信息工程大学学报,2019,(03):302.[doi:10.16836/j.cnki.jcuit.2019.03.016]
BIN Xin,CHENG Zhigang,WANG Junfeng,et al.Seasonal Variation of NDVI and Altitude Dependent Characteristics in Qinling-Daba Mountains in Recent 17 Years[J].Journal of Chengdu University of Information Technology,2019,(05):302.[doi:10.16836/j.cnki.jcuit.2019.03.016]
[8]金凡琦,程志刚,靳立亚,等.成渝城市群热环境效应与植被覆盖度关系研究[J].成都信息工程大学学报,2019,(03):308.[doi:10.16836/j.cnki.jcuit.2019.03.017]
JIN Fanqi,CHENG Zhigang,JIN Liya,et al.Study on the Relationship between Thermal Environment Effect and Vegetation Coverage in Chengyu Urban Agglomeration[J].Journal of Chengdu University of Information Technology,2019,(05):308.[doi:10.16836/j.cnki.jcuit.2019.03.017]
[9]元 震,肖天贵.高原低涡与OLR、风场的气候变化及低频信号特征[J].成都信息工程大学学报,2018,(05):551.[doi:10.16836/j.cnki.jcuit.2018.05.013]
YUAN Zhen,XIAO Tian-gui.Climate Change and Low-frequency Signal Characteristics of
Plateau Vortex, OLR and Wind Fields[J].Journal of Chengdu University of Information Technology,2018,(05):551.[doi:10.16836/j.cnki.jcuit.2018.05.013]
[10]周 颖,向卫国.四川盆地大气混合层高度特征及其与AQI的相关性分析[J].成都信息工程大学学报,2018,(05):562.[doi:10.16836/j.cnki.jcuit.2018.05.014]
ZHOU Ying,XIANG Wei-guo.Analysis of the Characteristics of the Height of Atmospheric Mixed
Layers in Sichuan Basin and its Correlation with AQI[J].Journal of Chengdu University of Information Technology,2018,(05):562.[doi:10.16836/j.cnki.jcuit.2018.05.014]
备注/Memo
收稿日期:2022-06-23
基金项目:国家自然科学基金资助项目(41905077); 广西重点研发资助项目(桂科AB21196041); 广西气象局科研计划资助项目(桂气科2021ZL05)
通信作者:肖志祥.E-mail:xiaozx_gx@163.com