Model Ensemble - Random Forest

Model ensembles combine predictions from multiple models to improve accuracy, robustness, and generalization. By leveraging diverse models (e.g., bagging, boosting, or stacking), ensembles reduce the risk of overfitting and capture different patterns in the data. Common techniques include averaging predictions for regression or majority voting for classification. Overall, ensembles often outperform individual models by mitigating their individual weaknesses and errors.

Similar to the SVM approach used in the prior section, we will use a random forest approach to classify our snow binary label and compare results.

Results

Using random forest instead of a SVM approach gives us a more accurate prediction at an accuracy of 94%. The random forest does a better job capturing some of the more extreme or uncommon events, when average temperature is higher, but intradaily swings in temperature and precipitation can still produce snow. The random forest approach will be our model of choice for the snow binary label when trending for RCP scenarios.

After running the random forest model, we will predict using the RCP 4.5 and 8.5 precipitation and temperature data, which will give us snow day predictions to 2100.

Soil moisture is a key component of the final streamflow model. Using a single LSTM neural network, we predict streamflow and apply it to our RCP scenarios. There is a slight increasing trend in the RCP 4.5 scenario, suggesting more moisture in our soil. However, the RCP 8.5 and its drastically increasing temperatures causes a downward shift in soil moisture over time.

Model Ensemble - LSTM

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) that can learn long-term dependencies. LSTMs are well-suited for time series data and sequential data, making them popular for tasks like speech recognition, language modeling, and time series forecasting. LSTMs have memory cells that can maintain information over long periods, allowing them to learn complex patterns and relationships in the data.

For our final modeling, we will perform an LSTM ensemble using RCP 4.5 and RCP 8.5 temperature and precipitation, and using the modeled snow binary label and soil moisture predictions as features. The LSTM neural network will be split into 5 different folds, and the results will be averaged.

Conclusions

Using a random forest ensemble approach for snow, we are able to apply the model prediction to both RCP 4.5 and 8.5 (moderate and severe climate projections). The results generally make sense; we see an overall decreasing trend as temperature is expected to rise in both RCP 4.5 and RCP 8.5. There are less "snow days" on average in the RCP 8.5 scenario compared to the RCP 4.5 scenario. Next, we apply a LSTM model to soil moisture. The model is generally able to pick up the trends of how soil moisture changes over time. Soil moisture is a key component to streamflow (our ultimate goal). We apply the LSTM model to the RCP scenario data. Under RCP 4.5, we observe a flat to slightly positive trend. It's possible we're getting signal that increased temperature may cause earlier snow melt, resulting in a slightly more moist and more volatile environment. Under RCP 8.5, we see a clear downward trend that indicates a more dry soil. These two modeled features will feed into our ultimate streamflow prediction seen below.

Utilizing a LSTM ensemble, we are able to capture a decent amount of volatility in our streamflow testing data. In general, the model does a poor job at predicting the extreme swings in streamflow (both high and low cubic ft/s). However, the model may be useful to uncover hidden, broader trends on average, rather than helping to predict extreme events. Therefore, we aggregate results at the yearly level to see if we can uncover any useful trends of streamflow over time. Based on increasing temperatures, fewer snow days, and generally decreasing soil moisture, we see the effects of climate change play out in our modeled data. The overall trend suggests that mean streamflow will decrease in the next 70-80 years, indicating a potential real problem for our downstream states that rely on this water.

Lastly, we analyze the yearly seasonality and how it may shift given projected higher temperatures. Based on the above graphs, one can observe the models inability to predict the extremes of peak streamflow, especially into the winter months. However, the model does give us some indication that early to late spring may produce higher streamflows on average due to earlier snowmelt, experience drier than normal conditions in the summer, and fail to produce historically peak streamflows going into the winter months.