973 resultados para ensemble methods
Resumo:
Motivation: The inference of regulatory networks from large-scale expression data holds great promise because of the potentially causal interpretation of these networks. However, due to the difficulty to establish reliable methods based on observational data there is so far only incomplete knowledge about possibilities and limitations of such inference methods in this context.
Results: In this article, we conduct a statistical analysis investigating differences and similarities of four network inference algorithms, ARACNE, CLR, MRNET and RN, with respect to local network-based measures. We employ ensemble methods allowing to assess the inferability down to the level of individual edges. Our analysis reveals the bias of these inference methods with respect to the inference of various network components and, hence, provides guidance in the interpretation of inferred regulatory networks from expression data. Further, as application we predict the total number of regulatory interactions in human B cells and hypothesize about the role of Myc and its targets regarding molecular information processing.
Resumo:
One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition.
Resumo:
The evidence provided by modelled assessments of future climate impact on flooding is fundamental to water resources and flood risk decision making. Impact models usually rely on climate projections from global and regional climate models (GCM/RCMs). However, challenges in representing precipitation events at catchment-scale resolution mean that decisions must be made on how to appropriately pre-process the meteorological variables from GCM/RCMs. Here the impacts on projected high flows of differing ensemble approaches and application of Model Output Statistics to RCM precipitation are evaluated while assessing climate change impact on flood hazard in the Upper Severn catchment in the UK. Various ensemble projections are used together with the HBV hydrological model with direct forcing and also compared to a response surface technique. We consider an ensemble of single-model RCM projections from the current UK Climate Projections (UKCP09); multi-model ensemble RCM projections from the European Union's FP6 ‘ENSEMBLES’ project; and a joint probability distribution of precipitation and temperature from a GCM-based perturbed physics ensemble. The ensemble distribution of results show that flood hazard in the Upper Severn is likely to increase compared to present conditions, but the study highlights the differences between the results from different ensemble methods and the strong assumptions made in using Model Output Statistics to produce the estimates of future river discharge. The results underline the challenges in using the current generation of RCMs for local climate impact studies on flooding. Copyright © 2012 Royal Meteorological Society
Resumo:
In this paper ensembles of forecasts (of up to six hours) are studied from a convection-permitting model with a representation of model error due to unresolved processes. The ensemble prediction system (EPS) used is an experimental convection-permitting version of the UK Met Office’s 24- member Global and Regional Ensemble Prediction System (MOGREPS). The method of representing model error variability, which perturbs parameters within the model’s parameterisation schemes, has been modified and we investigate the impact of applying this scheme in different ways. These are: a control ensemble where all ensemble members have the same parameter values; an ensemble where the parameters are different between members, but fixed in time; and ensembles where the parameters are updated randomly every 30 or 60 min. The choice of parameters and their ranges of variability have been determined from expert opinion and parameter sensitivity tests. A case of frontal rain over the southern UK has been chosen, which has a multi-banded rainfall structure. The consequences of including model error variability in the case studied are mixed and are summarised as follows. The multiple banding, evident in the radar, is not captured for any single member. However, the single band is positioned in some members where a secondary band is present in the radar. This is found for all ensembles studied. Adding model error variability with fixed parameters in time does increase the ensemble spread for near-surface variables like wind and temperature, but can actually decrease the spread of the rainfall. Perturbing the parameters periodically throughout the forecast does not further increase the spread and exhibits “jumpiness” in the spread at times when the parameters are perturbed. Adding model error variability gives an improvement in forecast skill after the first 2–3 h of the forecast for near-surface temperature and relative humidity. For precipitation skill scores, adding model error variability has the effect of improving the skill in the first 1–2 h of the forecast, but then of reducing the skill after that. Complementary experiments were performed where the only difference between members was the set of parameter values (i.e. no initial condition variability). The resulting spread was found to be significantly less than the spread from initial condition variability alone.
Resumo:
[EN]Ensemble forecasting is a methodology to deal with uncertainties in the numerical wind prediction. In this work we propose to apply ensemble methods to the adaptive wind forecasting model presented in. The wind field forecasting is based on a mass-consistent model and a log-linear wind profile using as input data the resulting forecast wind from Harmonie, a Non-Hydrostatic Dynamic model used experimentally at AEMET with promising results. The mass-consistent model parameters are estimated by using genetic algorithms. The mesh is generated using the meccano method and adapted to the geometry…
Resumo:
One challenge on data assimilation (DA) methods is how the error covariance for the model state is computed. Ensemble methods have been proposed for producing error covariance estimates, as error is propagated in time using the non-linear model. Variational methods, on the other hand, use the concepts of control theory, whereby the state estimate is optimized from both the background and the measurements. Numerical optimization schemes are applied which solve the problem of memory storage and huge matrix inversion needed by classical Kalman filter methods. Variational Ensemble Kalman filter (VEnKF), as a method inspired the Variational Kalman Filter (VKF), enjoys the benefits from both ensemble methods and variational methods. It avoids filter inbreeding problems which emerge when the ensemble spread underestimates the true error covariance. In VEnKF this is tackled by resampling the ensemble every time measurements are available. One advantage of VEnKF over VKF is that it needs neither tangent linear code nor adjoint code. In this thesis, VEnKF has been applied to a two-dimensional shallow water model simulating a dam-break experiment. The model is a public code with water height measurements recorded in seven stations along the 21:2 m long 1:4 m wide flume’s mid-line. Because the data were too sparse to assimilate the 30 171 model state vector, we chose to interpolate the data both in time and in space. The results of the assimilation were compared with that of a pure simulation. We have found that the results revealed by the VEnKF were more realistic, without numerical artifacts present in the pure simulation. Creating a wrapper code for a model and DA scheme might be challenging, especially when the two were designed independently or are poorly documented. In this thesis we have presented a non-intrusive approach of coupling the model and a DA scheme. An external program is used to send and receive information between the model and DA procedure using files. The advantage of this method is that the model code changes needed are minimal, only a few lines which facilitate input and output. Apart from being simple to coupling, the approach can be employed even if the two were written in different programming languages, because the communication is not through code. The non-intrusive approach is made to accommodate parallel computing by just telling the control program to wait until all the processes have ended before the DA procedure is invoked. It is worth mentioning the overhead increase caused by the approach, as at every assimilation cycle both the model and the DA procedure have to be initialized. Nonetheless, the method can be an ideal approach for a benchmark platform in testing DA methods. The non-intrusive VEnKF has been applied to a multi-purpose hydrodynamic model COHERENS to assimilate Total Suspended Matter (TSM) in lake Säkylän Pyhäjärvi. The lake has an area of 154 km2 with an average depth of 5:4 m. Turbidity and chlorophyll-a concentrations from MERIS satellite images for 7 days between May 16 and July 6 2009 were available. The effect of the organic matter has been computationally eliminated to obtain TSM data. Because of computational demands from both COHERENS and VEnKF, we have chosen to use 1 km grid resolution. The results of the VEnKF have been compared with the measurements recorded at an automatic station located at the North-Western part of the lake. However, due to TSM data sparsity in both time and space, it could not be well matched. The use of multiple automatic stations with real time data is important to elude the time sparsity problem. With DA, this will help in better understanding the environmental hazard variables for instance. We have found that using a very high ensemble size does not necessarily improve the results, because there is a limit whereby additional ensemble members add very little to the performance. Successful implementation of the non-intrusive VEnKF and the ensemble size limit for performance leads to an emerging area of Reduced Order Modeling (ROM). To save computational resources, running full-blown model in ROM is avoided. When the ROM is applied with the non-intrusive DA approach, it might result in a cheaper algorithm that will relax computation challenges existing in the field of modelling and DA.
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
The impacts of climate change on crop productivity are often assessed using simulations from a numerical climate model as an input to a crop simulation model. The precision of these predictions reflects the uncertainty in both models. We examined how uncertainty in a climate (HadAM3) and crop General Large-Area Model (GLAM) for annual crops model affects the mean and standard deviation of crop yield simulations in present and doubled carbon dioxide (CO2) climates by perturbation of parameters in each model. The climate sensitivity parameter (λ, the equilibrium response of global mean surface temperature to doubled CO2) was used to define the control climate. Observed 1966–1989 mean yields of groundnut (Arachis hypogaea L.) in India were simulated well by the crop model using the control climate and climates with values of λ near the control value. The simulations were used to measure the contribution to uncertainty of key crop and climate model parameters. The standard deviation of yield was more affected by perturbation of climate parameters than crop model parameters in both the present-day and doubled CO2 climates. Climate uncertainty was higher in the doubled CO2 climate than in the present-day climate. Crop transpiration efficiency was key to crop model uncertainty in both present-day and doubled CO2 climates. The response of crop development to mean temperature contributed little uncertainty in the present-day simulations but was among the largest contributors under doubled CO2. The ensemble methods used here to quantify physical and biological uncertainty offer a method to improve model estimates of the impacts of climate change.
Resumo:
The impacts of climate change on crop productivity are often assessed using simulations from a numerical climate model as an input to a crop simulation model. The precision of these predictions reflects the uncertainty in both models. We examined how uncertainty in a climate (HadAM3) and crop General Large-Area Model (GLAM) for annual crops model affects the mean and standard deviation of crop yield simulations in present and doubled carbon dioxide (CO2) climates by perturbation of parameters in each model. The climate sensitivity parameter (lambda, the equilibrium response of global mean surface temperature to doubled CO2) was used to define the control climate. Observed 1966-1989 mean yields of groundnut (Arachis hypogaea L.) in India were simulated well by the crop model using the control climate and climates with values of lambda near the control value. The simulations were used to measure the contribution to uncertainty of key crop and climate model parameters. The standard deviation of yield was more affected by perturbation of climate parameters than crop model parameters in both the present-day and doubled CO2 climates. Climate uncertainty was higher in the doubled CO2 climate than in the present-day climate. Crop transpiration efficiency was key to crop model uncertainty in both present-day and doubled CO2 climates. The response of crop development to mean temperature contributed little uncertainty in the present-day simulations but was among the largest contributors under doubled CO2. The ensemble methods used here to quantify physical and biological uncertainty offer a method to improve model estimates of the impacts of climate change.
Resumo:
The climate belongs to the class of non-equilibrium forced and dissipative systems, for which most results of quasi-equilibrium statistical mechanics, including the fluctuation-dissipation theorem, do not apply. In this paper we show for the first time how the Ruelle linear response theory, developed for studying rigorously the impact of perturbations on general observables of non-equilibrium statistical mechanical systems, can be applied with great success to analyze the climatic response to general forcings. The crucial value of the Ruelle theory lies in the fact that it allows to compute the response of the system in terms of expectation values of explicit and computable functions of the phase space averaged over the invariant measure of the unperturbed state. We choose as test bed a classical version of the Lorenz 96 model, which, in spite of its simplicity, has a well-recognized prototypical value as it is a spatially extended one-dimensional model and presents the basic ingredients, such as dissipation, advection and the presence of an external forcing, of the actual atmosphere. We recapitulate the main aspects of the general response theory and propose some new general results. We then analyze the frequency dependence of the response of both local and global observables to perturbations having localized as well as global spatial patterns. We derive analytically several properties of the corresponding susceptibilities, such as asymptotic behavior, validity of Kramers-Kronig relations, and sum rules, whose main ingredient is the causality principle. We show that all the coefficients of the leading asymptotic expansions as well as the integral constraints can be written as linear function of parameters that describe the unperturbed properties of the system, such as its average energy. Some newly obtained empirical closure equations for such parameters allow to define such properties as an explicit function of the unperturbed forcing parameter alone for a general class of chaotic Lorenz 96 models. We then verify the theoretical predictions from the outputs of the simulations up to a high degree of precision. The theory is used to explain differences in the response of local and global observables, to define the intensive properties of the system, which do not depend on the spatial resolution of the Lorenz 96 model, and to generalize the concept of climate sensitivity to all time scales. We also show how to reconstruct the linear Green function, which maps perturbations of general time patterns into changes in the expectation value of the considered observable for finite as well as infinite time. Finally, we propose a simple yet general methodology to study general Climate Change problems on virtually any time scale by resorting to only well selected simulations, and by taking full advantage of ensemble methods. The specific case of globally averaged surface temperature response to a general pattern of change of the CO2 concentration is discussed. We believe that the proposed approach may constitute a mathematically rigorous and practically very effective way to approach the problem of climate sensitivity, climate prediction, and climate change from a radically new perspective.
Resumo:
Following trends in operational weather forecasting, where ensemble prediction systems (EPS) are now increasingly the norm, flood forecasters are beginning to experiment with using similar ensemble methods. Most of the effort to date has focused on the substantial technical challenges of developing coupled rainfall-runoff systems to represent the full cascade of uncertainties involved in predicting future flooding. As a consequence much less attention has been given to the communication and eventual use of EPS flood forecasts. Drawing on interviews and other research with operational flood forecasters from across Europe, this paper highlights a number of challenges to communicating and using ensemble flood forecasts operationally. It is shown that operational flood forecasters understand the skill, operational limitations, and informational value of EPS products in a variety of different and sometimes contradictory ways. Despite the efforts of forecasting agencies to design effective ways to communicate EPS forecasts to non-experts, operational flood forecasters were often skeptical about the ability of forecast recipients to understand or use them appropriately. It is argued that better training and closer contacts between operational flood forecasters and EPS system designers can help ensure the uncertainty represented by EPS forecasts is represented in ways that are most appropriate and meaningful for their intended consumers, but some fundamental political and institutional challenges to using ensembles, such as differing attitudes to false alarms and to responsibility for management of blame in the event of poor or mistaken forecasts are also highlighted. Copyright © 2010 Royal Meteorological Society.
Resumo:
The main goal of this work is to investigate the suitability of applying cluster ensemble techniques (ensembles or committees) to gene expression data. More specifically, we will develop experiments with three diferent cluster ensembles methods, which have been used in many works in literature: coassociation matrix, relabeling and voting, and ensembles based on graph partitioning. The inputs for these methods will be the partitions generated by three clustering algorithms, representing diferent paradigms: kmeans, ExpectationMaximization (EM), and hierarchical method with average linkage. These algorithms have been widely applied to gene expression data. In general, the results obtained with our experiments indicate that the cluster ensemble methods present a better performance when compared to the individual techniques. This happens mainly for the heterogeneous ensembles, that is, ensembles built with base partitions generated with diferent clustering algorithms
Resumo:
[EN]Ensemble forecasting [1] is a methodology to deal with uncertainties in the numerical wind prediction. In this work we propose to apply ensemble methods to the adaptive wind forecasting model presented in [2]. The wind _eld forecasting is based on a mass-consistent model and a log-linear wind pro_le using as input data the resulting forecast wind from Harmonie [3], a Non-Hydrostatic Dynamic model. The mass-consistent model parameters are estimated by using genetic algorithms [4]. The mesh is generated using the meccano method [5] and adapted to the geometry. The main source of uncertainties in this model is the parameter estimation and the in- trinsic uncertainties of the Harmonie Model…