973 resultados para ensemble methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advanced forecasting of space weather requires simulation of the whole Sun-to-Earth system, which necessitates driving magnetospheric models with the outputs from solar wind models. This presents a fundamental difficulty, as the magnetosphere is sensitive to both large-scale solar wind structures, which can be captured by solar wind models, and small-scale solar wind “noise,” which is far below typical solar wind model resolution and results primarily from stochastic processes. Following similar approaches in terrestrial climate modeling, we propose statistical “downscaling” of solar wind model results prior to their use as input to a magnetospheric model. As magnetospheric response can be highly nonlinear, this is preferable to downscaling the results of magnetospheric modeling. To demonstrate the benefit of this approach, we first approximate solar wind model output by smoothing solar wind observations with an 8 h filter, then add small-scale structure back in through the addition of random noise with the observed spectral characteristics. Here we use a very simple parameterization of noise based upon the observed probability distribution functions of solar wind parameters, but more sophisticated methods will be developed in the future. An ensemble of results from the simple downscaling scheme are tested using a model-independent method and shown to add value to the magnetospheric forecast, both improving the best estimate and quantifying the uncertainty. We suggest a number of features desirable in an operational solar wind downscaling scheme.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With movement toward kilometer-scale ensembles, new techniques are needed for their characterization. A new methodology is presented for detailed spatial ensemble characterization using the fractions skill score (FSS). To evaluate spatial forecast differences, the average and standard deviation are taken of the FSS calculated over all ensemble member–member pairs at different scales and lead times. These methods were found to give important information about the ensemble behavior allowing the identification of useful spatial scales, spinup times for the model, and upscale growth of errors and forecast differences. The ensemble spread was found to be highly dependent on the spatial scales considered and the threshold applied to the field. High thresholds picked out localized and intense values that gave large temporal variability in ensemble spread: local processes and undersampling dominate for these thresholds. For lower thresholds the ensemble spread increases with time as differences between the ensemble members upscale. Two convective cases were investigated based on the Met Office United Model run at 2.2-km resolution. Different ensemble types were considered: ensembles produced using the Met Office Global and Regional Ensemble Prediction System (MOGREPS) and an ensemble produced using different model physics configurations. Comparison of the MOGREPS and multiphysics ensembles demonstrated the utility of spatial ensemble evaluation techniques for assessing the impact of different perturbation strategies and the need for assessing spread at different, believable, spatial scales.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The skill of a forecast can be assessed by comparing the relative proximity of both the forecast and a benchmark to the observations. Example benchmarks include climatology or a naïve forecast. Hydrological ensemble prediction systems (HEPS) are currently transforming the hydrological forecasting environment but in this new field there is little information to guide researchers and operational forecasters on how benchmarks can be best used to evaluate their probabilistic forecasts. In this study, it is identified that the forecast skill calculated can vary depending on the benchmark selected and that the selection of a benchmark for determining forecasting system skill is sensitive to a number of hydrological and system factors. A benchmark intercomparison experiment is then undertaken using the continuous ranked probability score (CRPS), a reference forecasting system and a suite of 23 different methods to derive benchmarks. The benchmarks are assessed within the operational set-up of the European Flood Awareness System (EFAS) to determine those that are ‘toughest to beat’ and so give the most robust discrimination of forecast skill, particularly for the spatial average fields that EFAS relies upon. Evaluating against an observed discharge proxy the benchmark that has most utility for EFAS and avoids the most naïve skill across different hydrological situations is found to be meteorological persistency. This benchmark uses the latest meteorological observations of precipitation and temperature to drive the hydrological model. Hydrological long term average benchmarks, which are currently used in EFAS, are very easily beaten by the forecasting system and the use of these produces much naïve skill. When decomposed into seasons, the advanced meteorological benchmarks, which make use of meteorological observations from the past 20 years at the same calendar date, have the most skill discrimination. They are also good at discriminating skill in low flows and for all catchment sizes. Simpler meteorological benchmarks are particularly useful for high flows. Recommendations for EFAS are to move to routine use of meteorological persistency, an advanced meteorological benchmark and a simple meteorological benchmark in order to provide a robust evaluation of forecast skill. This work provides the first comprehensive evidence on how benchmarks can be used in evaluation of skill in probabilistic hydrological forecasts and which benchmarks are most useful for skill discrimination and avoidance of naïve skill in a large scale HEPS. It is recommended that all HEPS use the evidence and methodology provided here to evaluate which benchmarks to employ; so forecasters can have trust in their skill evaluation and will have confidence that their forecasts are indeed better.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We systematically compare the performance of ETKF-4DVAR, 4DVAR-BEN and 4DENVAR with respect to two traditional methods (4DVAR and ETKF) and an ensemble transform Kalman smoother (ETKS) on the Lorenz 1963 model. We specifically investigated this performance with increasing nonlinearity and using a quasi-static variational assimilation algorithm as a comparison. Using the analysis root mean square error (RMSE) as a metric, these methods have been compared considering (1) assimilation window length and observation interval size and (2) ensemble size to investigate the influence of hybrid background error covariance matrices and nonlinearity on the performance of the methods. For short assimilation windows with close to linear dynamics, it has been shown that all hybrid methods show an improvement in RMSE compared to the traditional methods. For long assimilation window lengths in which nonlinear dynamics are substantial, the variational framework can have diffculties fnding the global minimum of the cost function, so we explore a quasi-static variational assimilation (QSVA) framework. Of the hybrid methods, it is seen that under certain parameters, hybrid methods which do not use a climatological background error covariance do not need QSVA to perform accurately. Generally, results show that the ETKS and hybrid methods that do not use a climatological background error covariance matrix with QSVA outperform all other methods due to the full flow dependency of the background error covariance matrix which also allows for the most nonlinearity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A smoother introduced earlier by van Leeuwen and Evensen is applied to a problem in which real obser vations are used in an area with strongly nonlinear dynamics. The derivation is new , but it resembles an earlier derivation by van Leeuwen and Evensen. Again a Bayesian view is taken in which the prior probability density of the model and the probability density of the obser vations are combined to for m a posterior density . The mean and the covariance of this density give the variance-minimizing model evolution and its errors. The assumption is made that the prior probability density is a Gaussian, leading to a linear update equation. Critical evaluation shows when the assumption is justified. This also sheds light on why Kalman filters, in which the same ap- proximation is made, work for nonlinear models. By reference to the derivation, the impact of model and obser vational biases on the equations is discussed, and it is shown that Bayes’ s for mulation can still be used. A practical advantage of the ensemble smoother is that no adjoint equations have to be integrated and that error estimates are easily obtained. The present application shows that for process studies a smoother will give superior results compared to a filter , not only owing to the smooth transitions at obser vation points, but also because the origin of features can be followed back in time. Also its preference over a strong-constraint method is highlighted. Further more, it is argued that the proposed smoother is more efficient than gradient descent methods or than the representer method when error estimates are taken into account

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The weak-constraint inverse for nonlinear dynamical models is discussed and derived in terms of a probabilistic formulation. The well-known result that for Gaussian error statistics the minimum of the weak-constraint inverse is equal to the maximum-likelihood estimate is rederived. Then several methods based on ensemble statistics that can be used to find the smoother (as opposed to the filter) solution are introduced and compared to traditional methods. A strong point of the new methods is that they avoid the integration of adjoint equations, which is a complex task for real oceanographic or atmospheric applications. they also avoid iterative searches in a Hilbert space, and error estimates can be obtained without much additional computational effort. the feasibility of the new methods is illustrated in a two-layer quasigeostrophic model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate knowledge of the location and magnitude of ocean heat content (OHC) variability and change is essential for understanding the processes that govern decadal variations in surface temperature, quantifying changes in the planetary energy budget, and developing constraints on the transient climate response to external forcings. We present an overview of the temporal and spatial characteristics of OHC variability and change as represented by an ensemble of dynamical and statistical ocean reanalyses (ORAs). Spatial maps of the 0–300 m layer show large regions of the Pacific and Indian Oceans where the interannual variability of the ensemble mean exceeds ensemble spread, indicating that OHC variations are well-constrained by the available observations over the period 1993–2009. At deeper levels, the ORAs are less well-constrained by observations with the largest differences across the ensemble mostly associated with areas of high eddy kinetic energy, such as the Southern Ocean and boundary current regions. Spatial patterns of OHC change for the period 1997–2009 show good agreement in the upper 300 m and are characterized by a strong dipole pattern in the Pacific Ocean. There is less agreement in the patterns of change at deeper levels, potentially linked to differences in the representation of ocean dynamics, such as water mass formation processes. However, the Atlantic and Southern Oceans are regions in which many ORAs show widespread warming below 700 m over the period 1997–2009. Annual time series of global and hemispheric OHC change for 0–700 m show the largest spread for the data sparse Southern Hemisphere and a number of ORAs seem to be subject to large initialization ‘shock’ over the first few years. In agreement with previous studies, a number of ORAs exhibit enhanced ocean heat uptake below 300 and 700 m during the mid-1990s or early 2000s. The ORA ensemble mean (±1 standard deviation) of rolling 5-year trends in full-depth OHC shows a relatively steady heat uptake of approximately 0.9 ± 0.8 W m−2 (expressed relative to Earth’s surface area) between 1995 and 2002, which reduces to about 0.2 ± 0.6 W m−2 between 2004 and 2006, in qualitative agreement with recent analysis of Earth’s energy imbalance. There is a marked reduction in the ensemble spread of OHC trends below 300 m as the Argo profiling float observations become available in the early 2000s. In general, we suggest that ORAs should be treated with caution when employed to understand past ocean warming trends—especially when considering the deeper ocean where there is little in the way of observational constraints. The current work emphasizes the need to better observe the deep ocean, both for providing observational constraints for future ocean state estimation efforts and also to develop improved models and data assimilation methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A set of four eddy-permitting global ocean reanalyses produced in the framework of the MyOcean project have been compared over the altimetry period 1993–2011. The main differences among the reanalyses used here come from the data assimilation scheme implemented to control the ocean state by inserting reprocessed observations of sea surface temperature (SST), in situ temperature and salinity profiles, sea level anomaly and sea-ice concentration. A first objective of this work includes assessing the interannual variability and trends for a series of parameters, usually considered in the community as essential ocean variables: SST, sea surface salinity, temperature and salinity averaged over meaningful layers of the water column, sea level, transports across pre-defined sections, and sea ice parameters. The eddy-permitting nature of the global reanalyses allows also to estimate eddy kinetic energy. The results show that in general there is a good consistency between the different reanalyses. An intercomparison against experiments without data assimilation was done during the MyOcean project and we conclude that data assimilation is crucial for correctly simulating some quantities such as regional trends of sea level as well as the eddy kinetic energy. A second objective is to show that the ensemble mean of reanalyses can be evaluated as one single system regarding its reliability in reproducing the climate signals, where both variability and uncertainties are assessed through the ensemble spread and signal-to-noise ratio. The main advantage of having access to several reanalyses differing in the way data assimilation is performed is that it becomes possible to assess part of the total uncertainty. Given the fact that we use very similar ocean models and atmospheric forcing, we can conclude that the spread of the ensemble of reanalyses is mainly representative of our ability to gauge uncertainty in the assimilation methods. This uncertainty changes a lot from one ocean parameter to another, especially in global indices. However, despite several caveats in the design of the multi-system ensemble, the main conclusion from this study is that an eddy-permitting multi-system ensemble approach has become mature and our results provide a first step towards a systematic comparison of eddy-permitting global ocean reanalyses aimed at providing robust conclusions on the recent evolution of the oceanic state.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we present the idea of how generalized ensembles can be used to simplify the operational study of non-additive physical systems. As alternative of the usual methods of direct integration or mean-field theory, we show how the solution of the Ising model with infinite-range interactions is obtained by using a generalized canonical ensemble. We describe how the thermodynamical properties of this model in the presence of an external magnetic field are founded by simple parametric equations. Without impairing the usual interpretation, we obtain an identical critical behaviour as observed in traditional approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In condensed matter systems, the interfacial tension plays a central role for a multitude of phenomena. It is the driving force for nucleation processes, determines the shape and structure of crystalline structures and is important for industrial applications. Despite its importance, the interfacial tension is hard to determine in experiments and also in computer simulations. While for liquid-vapor interfacial tensions there exist sophisticated simulation methods to compute the interfacial tension, current methods for solid-liquid interfaces produce unsatisfactory results.rnrnAs a first approach to this topic, the influence of the interfacial tension on nuclei is studied within the three-dimensional Ising model. This model is well suited because despite its simplicity, one can learn much about nucleation of crystalline nuclei. Below the so-called roughening temperature, nuclei in the Ising model are not spherical anymore but become cubic because of the anisotropy of the interfacial tension. This is similar to crystalline nuclei, which are in general not spherical but more like a convex polyhedron with flat facets on the surface. In this context, the problem of distinguishing between the two bulk phases in the vicinity of the diffuse droplet surface is addressed. A new definition is found which correctly determines the volume of a droplet in a given configuration if compared to the volume predicted by simple macroscopic assumptions.rnrnTo compute the interfacial tension of solid-liquid interfaces, a new Monte Carlo method called ensemble switch method'' is presented which allows to compute the interfacial tension of liquid-vapor interfaces as well as solid-liquid interfaces with great accuracy. In the past, the dependence of the interfacial tension on the finite size and shape of the simulation box has often been neglected although there is a nontrivial dependence on the box dimensions. As a consequence, one needs to systematically increase the box size and extrapolate to infinite volume in order to accurately predict the interfacial tension. Therefore, a thorough finite-size scaling analysis is established in this thesis. Logarithmic corrections to the finite-size scaling are motivated and identified, which are of leading order and therefore must not be neglected. The astounding feature of these logarithmic corrections is that they do not depend at all on the model under consideration. Using the ensemble switch method, the validity of a finite-size scaling ansatz containing the aforementioned logarithmic corrections is carefully tested and confirmed. Combining the finite-size scaling theory with the ensemble switch method, the interfacial tension of several model systems, ranging from the Ising model to colloidal systems, is computed with great accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Systems biology techniques are a topic of recent interest within the neurological field. Computational intelligence (CI) addresses this holistic perspective by means of consensus or ensemble techniques ultimately capable of uncovering new and relevant findings. In this paper, we propose the application of a CI approach based on ensemble Bayesian network classifiers and multivariate feature subset selection to induce probabilistic dependences that could match or unveil biological relationships. The research focuses on the analysis of high-throughput Alzheimer's disease (AD) transcript profiling. The analysis is conducted from two perspectives. First, we compare the expression profiles of hippocampus subregion entorhinal cortex (EC) samples of AD patients and controls. Second, we use the ensemble approach to study four types of samples: EC and dentate gyrus (DG) samples from both patients and controls. Results disclose transcript interaction networks with remarkable structures and genes not directly related to AD by previous studies. The ensemble is able to identify a variety of transcripts that play key roles in other neurological pathologies. Classical statistical assessment by means of non-parametric tests confirms the relevance of the majority of the transcripts. The ensemble approach pinpoints key metabolic mechanisms that could lead to new findings in the pathogenesis and development of AD

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Variations in the physical deformation of the plasma membrane play a significant role in the sorting and behavior of the proteins that occupy it. Determining the interplay between membrane curvature and protein behavior required the development and thorough characterization of a model plasma membrane with well defined and localized regions of curvature. This model system consists of a fluid lipid bilayer that is supported by a dye-loaded polystyrene nanoparticle patterned glass substrate. As the physical deformation of the supported lipid bilayer is essential to our understanding of the behavior of the protein occupying the bilayer, extensive characterization of the structure of the model plasma membrane was conducted. Neither the regions of curvature in the vicinity of the polystyrene nanoparticles or the interaction between a lipid bilayer and small patches of curved polystyrene are well understood, so the results of experiments to determine these properties are described. To do so, individual fluorescently labeled proteins and lipids are tracked on this model system and in live cells. New methods for analyzing the resulting tracks and ensemble data are presented and discussed. To validate the model system and analytical methods, fluorescence microscopy was used to image a peripheral membrane protein, cholera toxin subunit B (CTB). These results are compared to results obtained from membrane components that were not expected to show an preference for membrane curvature: an individual fluorescently-labeled lipid, lissamine rhodamine B DHPE, and another protein, streptavidin associated with biotin-labeled DHPE. The observed tendency for cholera toxin subunit B to avoid curved regions of curvature, as determined by new and established analytical methods, is presented and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Species distribution models (SDM) predict species occurrence based on statistical relationships with environmental conditions. The R-package biomod2 which includes 10 different SDM techniques and 10 different evaluation methods was used in this study. Macroalgae are the main biomass producers in Potter Cove, King George Island (Isla 25 de Mayo), Antarctica, and they are sensitive to climate change factors such as suspended particulate matter (SPM). Macroalgae presence and absence data were used to test SDMs suitability and, simultaneously, to assess the environmental response of macroalgae as well as to model four scenarios of distribution shifts by varying SPM conditions due to climate change. According to the averaged evaluation scores of Relative Operating Characteristics (ROC) and True scale statistics (TSS) by models, those methods based on a multitude of decision trees such as Random Forest and Classification Tree Analysis, reached the highest predictive power followed by generalized boosted models (GBM) and maximum-entropy approaches (Maxent). The final ensemble model used 135 of 200 calculated models (TSS > 0.7) and identified hard substrate and SPM as the most influencing parameters followed by distance to glacier, total organic carbon (TOC), bathymetry and slope. The climate change scenarios show an invasive reaction of the macroalgae in case of less SPM and a retreat of the macroalgae in case of higher assumed SPM values.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. © 2014 Ruifeng Xu et al.