Abstract:
Accurate and timely prediction of crop yields is critical for anticipating market price fluctuations, guiding agricultural planning, and ensuring national food security. Given the nonlinear spatiotemporal characteristics of yield formation and the complex interactions between crop growth dynamics and final yields, it is difficult to accurately predict crop yields within a single time period. Therefore, determining an appropriate yield estimation period is of paramount importance. This study utilized the Moderate Resolution Imaging Spectroradiometer (MODIS) data, meteorological data, and yield statistics from the maize growing season in Henan Province from 2013 to 2020 to divide the maize growth period into seven period scenarios. Five machine learning algorithms, including the Gaussian Process Regression (GPR), the Light Gradient Boosting Machine (LightGBM), were used to predict maize yield in different period scenarios, aiming to explore the optimal yield estimation period for maize yield prediction in Henan Province. Results show that: 1) July 4 to September 14 (J5-J14) constituted the optimal estimation period; 2) The best prediction models are the GPR, with an
R2 of 0.68, the RMSE of 647.55 kg/hm
2, and the MAPE of 9.39%. and 3) The analysis of spatial error distribution characteristics of annual yield prediction further revealed that extreme climate was one of the key factors affecting the accuracy of crop yield prediction. This study provides an important reference method for maize yield prediction by selecting the appropriate yield estimation period to obtain the most accurate yield prediction value with the least amount of data.