Abstract:As the largest C pool in terrestrial ecosystems, soil plays a major role in the enhancement of ecosystem services and the regulation of climate change. An accurate prediction of soil organic carbon (SOC) content in areas with complex and variable environments will assist in assessing soil quality and carbon sink functions at regional scales. In this study, a typical small watershed in a subtropical hilly region was selected as the research object, and four machine learning algorithms namely, support vector machine regression (SVR), random forest (RF), extreme gradient boosting algorithm (XGBoost) and light gradient boosting machine (LightGBM), were used to predict the SOC content in the soil surface layer (0~20 cm). Three types of environmental variables, including topography, climate and vegetation, were utilized as environmental factors. The purpose was to determine the effectiveness of different algorithms in predicting SOC content, and to screen the primary environmental influences affecting SOC distribution. Among the four models, the RF model performed the best in predicting SOC with RF (R2=0.540), and its prediction accuracy was superior to that of XGBoost (R2=0.528) and LightGBM (R2=0.504). Contrary to this, the SVR model had relatively low prediction accuracy (R2=0.427), therefore it was not suitable for predicting SOC content in subtropical hilly landscapes. As a result of the correlation analysis, it was found that topography (primarily elevation) played the most significant role in the model prediction in the subtropical hilly landscape area. In the digital mapping made by four model predictions, it was generally found that the trends of SOC spatial distribution were similar. Each showed a higher SOC content in the northern, southwestern and southeastern marginal regions, while the central region exhibited a low SOC content.