Improved prediction of tree species richness and interpretability of environmental drivers using a machine learning approach

Lian Brugere, Youngsang Kwon, Amy E. Frazier, Peter Kedron

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Biodiversity is in decline globally and predicting species diversity is critically important if current trends are to be reversed. Tree species richness (TSR) has long been a key measure of biodiversity, but considerable uncertainties exist in current models, particularly given the classic statistical assumptions and poor ecological interpretability of machine learning outcomes. Here, we test several ecologically interpretable machine learning approaches to predict TSR and interpret the driving environmental factors in the continental United States. We develop two artificial neural networks (ANN) and one random forest (RF) model to predict TSR using Forest Inventory and Analysis data and 20 environmental covariates and compare them to a classic generalized linear model (GLM). Models were evaluated on an independent, unseen testing dataset using R2 and Mean Absolute Error (MAE) and residual spatial autocorrelation analysis. An Interpretable Machine Learning approach, SHapley Additive exPlanations (SHAP), was adopted to explain the major environmental factors driving TSR. Compared to a baseline GLM (R2 = 0.7; MAE = 4.7), the ANN and RF models achieved R2 greater than 0.9 and MAE<3.1. Additionally, the ANN and RF models produced less spatially clustered TSR residuals than the GLM. SHAP analysis suggested that TSR is best predicted by Aridity Index, Forest Area, Altitude, Mean Precipitation of the Driest Quarter and Mean Annual Temperature. SHAP further revealed a non-linear relationship of environmental covariates with TSR and complex interactions that were not revealed by the GLM. The study highlights the need for conservation efforts of forest areas and reducing precipitation-related physiological stress on tree species in low forested but arid regions. The machine learning approach used here is transferrable for studies of biodiversity for other organisms or prediction of TSR under future climatic scenarios.

Original languageEnglish (US)
Article number120972
JournalForest Ecology and Management
StatePublished - Jul 1 2023


  • Deep learning
  • FIA
  • Generalized linear model
  • Neural networks
  • Random forest
  • Tree species richness modeling

ASJC Scopus subject areas

  • Forestry
  • Nature and Landscape Conservation
  • Management, Monitoring, Policy and Law


Dive into the research topics of 'Improved prediction of tree species richness and interpretability of environmental drivers using a machine learning approach'. Together they form a unique fingerprint.

Cite this