3 resultados para multivariate regression tree

em Digital Commons - Michigan Tech


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Standard procedures for forecasting flood risk (Bulletin 17B) assume annual maximum flood (AMF) series are stationary, meaning the distribution of flood flows is not significantly affected by climatic trends/cycles, or anthropogenic activities within the watershed. Historical flood events are therefore considered representative of future flood occurrences, and the risk associated with a given flood magnitude is modeled as constant over time. However, in light of increasing evidence to the contrary, this assumption should be reconsidered, especially as the existence of nonstationarity in AMF series can have significant impacts on planning and management of water resources and relevant infrastructure. Research presented in this thesis quantifies the degree of nonstationarity evident in AMF series for unimpaired watersheds throughout the contiguous U.S., identifies meteorological, climatic, and anthropogenic causes of this nonstationarity, and proposes an extension of the Bulletin 17B methodology which yields forecasts of flood risk that reflect climatic influences on flood magnitude. To appropriately forecast flood risk, it is necessary to consider the driving causes of nonstationarity in AMF series. Herein, large-scale climate patterns—including El Niño-Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), and Atlantic Multidecadal Oscillation (AMO)—are identified as influencing factors on flood magnitude at numerous stations across the U.S. Strong relationships between flood magnitude and associated precipitation series were also observed for the majority of sites analyzed in the Upper Midwest and Northeastern regions of the U.S. Although relationships between flood magnitude and associated temperature series are not apparent, results do indicate that temperature is highly correlated with the timing of flood peaks. Despite consideration of watersheds classified as unimpaired, analyses also suggest that identified change-points in AMF series are due to dam construction, and other types of regulation and diversion. Although not explored herein, trends in AMF series are also likely to be partially explained by changes in land use and land cover over time. Results obtained herein suggest that improved forecasts of flood risk may be obtained using a simple modification of the Bulletin 17B framework, wherein the mean and standard deviation of the log-transformed flows are modeled as functions of climate indices associated with oceanic-atmospheric patterns (e.g. AMO, ENSO, NAO, and PDO) with lead times between 3 and 9 months. Herein, one-year ahead forecasts of the mean and standard deviation, and subsequently flood risk, are obtained by applying site specific multivariate regression models, which reflect the phase and intensity of a given climate pattern, as well as possible impacts of coupling of the climate cycles. These forecasts of flood risk are compared with forecasts derived using the existing Bulletin 17B model; large differences in the one-year ahead forecasts are observed in some locations. The increased knowledge of the inherent structure of AMF series and an improved understanding of physical and/or climatic causes of nonstationarity gained from this research should serve as insight for the formulation of a physical-casual based statistical model, incorporating both climatic variations and human impacts, for flood risk over longer planning horizons (e.g., 10-, 50, 100-years) necessary for water resources design, planning, and management.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background mortality is an essential component of any forest growth and yield model. Forecasts of mortality contribute largely to the variability and accuracy of model predictions at the tree, stand and forest level. In the present study, I implement and evaluate state-of-the-art techniques to increase the accuracy of individual tree mortality models, similar to those used in many of the current variants of the Forest Vegetation Simulator, using data from North Idaho and Montana. The first technique addresses methods to correct for bias induced by measurement error typically present in competition variables. The second implements survival regression and evaluates its performance against the traditional logistic regression approach. I selected the regression calibration (RC) algorithm as a good candidate for addressing the measurement error problem. Two logistic regression models for each species were fitted, one ignoring the measurement error, which is the “naïve” approach, and the other applying RC. The models fitted with RC outperformed the naïve models in terms of discrimination when the competition variable was found to be statistically significant. The effect of RC was more obvious where measurement error variance was large and for more shade-intolerant species. The process of model fitting and variable selection revealed that past emphasis on DBH as a predictor variable for mortality, while producing models with strong metrics of fit, may make models less generalizable. The evaluation of the error variance estimator developed by Stage and Wykoff (1998), and core to the implementation of RC, in different spatial patterns and diameter distributions, revealed that the Stage and Wykoff estimate notably overestimated the true variance in all simulated stands, but those that are clustered. Results show a systematic bias even when all the assumptions made by the authors are guaranteed. I argue that this is the result of the Poisson-based estimate ignoring the overlapping area of potential plots around a tree. Effects, especially in the application phase, of the variance estimate justify suggested future efforts of improving the accuracy of the variance estimate. The second technique implemented and evaluated is a survival regression model that accounts for the time dependent nature of variables, such as diameter and competition variables, and the interval-censored nature of data collected from remeasured plots. The performance of the model is compared with the traditional logistic regression model as a tool to predict individual tree mortality. Validation of both approaches shows that the survival regression approach discriminates better between dead and alive trees for all species. In conclusion, I showed that the proposed techniques do increase the accuracy of individual tree mortality models, and are a promising first step towards the next generation of background mortality models. I have also identified the next steps to undertake in order to advance mortality models further.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The municipality of San Juan La Laguna, Guatemala is home to approximately 5,200 people and located on the western side of the Lake Atitlán caldera. Steep slopes surround all but the eastern side of San Juan. The Lake Atitlán watershed is susceptible to many natural hazards, but most predictable are the landslides that can occur annually with each rainy season, especially during high-intensity events. Hurricane Stan hit Guatemala in October 2005; the resulting flooding and landslides devastated the Atitlán region. Locations of landslide and non-landslide points were obtained from field observations and orthophotos taken following Hurricane Stan. This study used data from multiple attributes, at every landslide and non-landslide point, and applied different multivariate analyses to optimize a model for landslides prediction during high-intensity precipitation events like Hurricane Stan. The attributes considered in this study are: geology, geomorphology, distance to faults and streams, land use, slope, aspect, curvature, plan curvature, profile curvature and topographic wetness index. The attributes were pre-evaluated for their ability to predict landslides using four different attribute evaluators, all available in the open source data mining software Weka: filtered subset, information gain, gain ratio and chi-squared. Three multivariate algorithms (decision tree J48, logistic regression and BayesNet) were optimized for landslide prediction using different attributes. The following statistical parameters were used to evaluate model accuracy: precision, recall, F measure and area under the receiver operating characteristic (ROC) curve. The algorithm BayesNet yielded the most accurate model and was used to build a probability map of landslide initiation points. The probability map developed in this study was also compared to the results of a bivariate landslide susceptibility analysis conducted for the watershed, encompassing Lake Atitlán and San Juan. Landslides from Tropical Storm Agatha 2010 were used to independently validate this study’s multivariate model and the bivariate model. The ultimate aim of this study is to share the methodology and results with municipal contacts from the author's time as a U.S. Peace Corps volunteer, to facilitate more effective future landslide hazard planning and mitigation.