Recently, a collaborative research team led by Associate Professor Ke Deng from the Department of Statistics and Data Science at Tsinghua University and Associate Researcher Chao Chen from the School of Environment published a research article online entitled “Development of a Hybrid Algal Population Prediction (HAPP) Model by Algae Growth Potential Estimation and Time Series Regression and Its Application in One Reservoir in China” in the leading international journal Water Research (impact factor 12.4). The study developed a data assimilation strategy, HAPP, that is computationally simple and demonstrates strong performance. It effectively integrates observed algal abundance data with algal growth kinetic models, enabling accurate prediction of algal bloom dynamics in eutrophic water bodies and providing important technical support for algal bloom prevention and control as well as water supply safety in China’s lake and reservoir water sources. Yuxuan Xie, a PhD student (Class of 2022) from the School of Environment at Tsinghua University, is the first author of the paper. Shirui Chen, a PhD student (Class of 2024) from the Deng research group, is the second author. Associate Researcher Chao Chen from the School of Environment at Tsinghua University and Associate Professor Ke Deng are the co-corresponding authors.
1. Research Background
Lakes and reservoirs are important sources of drinking water worldwide, while algal proliferation in eutrophic lakes and reservoirs can deteriorate water quality and seriously threaten water supply safety. Therefore, research on predicting algal population dynamics and developing accurate and practical algal bloom early-warning models is of great practical significance for ensuring water supply safety. Existing algal bloom prediction models mainly include process-based mechanistic models and data-driven black-box models. The former have strong interpretability but are characterized by complex structures, numerous parameters, and difficulties in calibration, whereas the latter are flexible and efficient but lack mechanistic understanding and have limited generalization ability. In addition, although remote sensing technology is widely used in large-scale water body monitoring, its resolution is insufficient for small and medium-sized reservoirs, which limits its application. In this context, this study proposes a hybrid algal population prediction model (HAPP) that integrates algae growth potential (AGP) estimation with regression correction based on observed time-series data of algal abundance, combining mechanistic interpretability with the advantages of data-driven approaches, and it is suitable for small and medium-sized reservoirs and lakes lacking remote sensing data; the model was applied and validated using a dataset from a reservoir in northern China from 2018 to 2023, providing an effective tool for predicting algal dynamics in drinking water sources and offering new insights for algal bloom control under climate change.
2. Methodological Innovation: A Hybrid Modeling Approach Integrating Process Mechanisms and Data-Driven Methods
The core methodological innovation of this study lies in proposing a novel data assimilation strategy that is computationally simple, highly interpretable, and capable of accurate prediction. Traditional approaches in this field are often limited to either “white-box” mechanistic inference or “black-box” data fitting, both of which have clear limitations. When applying white-box dynamical mechanistic models for inference, several key challenges often arise: (1) the external driving variables in the model are dynamically changing, and their exact values are often difficult to capture accurately, which makes precise simulation of the dynamical system difficult; (2) high-precision numerical solutions of dynamical models are computationally intensive and may suffer from stability issues; and (3) the models usually contain numerous parameters, making parameter optimization difficult. These challenges often make white-box models difficult to implement in practice, and their results are not easily integrated with actual observational data. In this study, we propose that at each observation time point, for the given observed values of external driving variables, instead of explicitly simulating the full evolution process of the dynamical system, the model directly solves for the equilibrium state that the dynamical system would ultimately reach under the fixed external conditions. In this application, this equilibrium state has a simple analytical solution, which we refer to as algae growth potential (AGP), and it serves as a key bridge variable linking the dynamical process of algal growth with historical observational data.
Specifically, we first estimate AGP based on mechanistic equations using environmental variables such as temperature, light intensity, and nutrient concentrations. We then establish a dynamic mapping between AGP and the observed algal density through a time-lagged statistical regression model, thereby forming a complete modeling framework of “environmental mechanism driving - growth potential estimation - actual population prediction.” In addition, to address the spatiotemporal heterogeneity in algal growth responses, a time-specific parameter optimization strategy is proposed, in which model parameters are optimized separately by year and by high- and low-growth periods, significantly improving the model’s ability to capture real ecosystem dynamics. Furthermore, the model employs a Bayesian optimization algorithm to achieve joint optimization of dynamical and statistical parameters, enhancing both the robustness of parameter identification and computational efficiency in complex parameter systems. Ultimately, the model relies only on four readily available variables: water temperature, light intensity, total nitrogen, and total phosphorus to accurately predict future algal growth trends, providing a modeling tool that is both scientifically reliable and readily applicable in practice for algal bloom prevention and control in small and medium-sized lakes and reservoirs.
3. Application Scenarios
In the analysis of real-world data, the hybrid algal population prediction model (HAPP) proposed in this study demonstrated good fitting and predictive performance when applied at different temporal scales, including the full dataset, annual datasets, and seasonal datasets. In particular, through time-specific parameter optimization, the model’s explanatory power at the annual and seasonal scales was significantly improved, enabling it to more accurately capture the intrinsic dynamics of the ecosystem. Figure 2 illustrates the simulation results of algal biomass when the model is implemented with uniform parameters, annual parameters, and seasonal parameters, respectively.
Figure 1. Graphical abstract.
Figure 2. Simulation results of algal biomass using the HAPP model with uniform parameters (a), annual parameters (b), and seasonal parameters (c).