973 resultados para ensemble methods


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many methods exist at the moment for deformable face fitting. A drawback to nearly all these approaches is that they are (i) noisy in terms of landmark positions, and (ii) the noise is biased across frames (i.e. the misalignment is toward common directions across all frames). In this paper we propose a grouped $\mathcal{L}1$-norm anchored method for simultaneously aligning an ensemble of deformable face images stemming from the same subject, given noisy heterogeneous landmark estimates. Impressive alignment performance improvement and refinement is obtained using very weak initialization as "anchors".

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpr​ed_page.php.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a novel framework for the unsupervised alignment of an ensemble of temporal sequences. This approach draws inspiration from the axiom that an ensemble of temporal signals stemming from the same source/class should have lower rank when "aligned" rather than "misaligned". Our approach shares similarities with recent state of the art methods for unsupervised images ensemble alignment (e.g. RASL) which breaks the problem into a set of image alignment problems (which have well known solutions i.e. the Lucas-Kanade algorithm). Similarly, we propose a strategy for decomposing the problem of temporal ensemble alignment into a similar set of independent sequence problems which we claim can be solved reliably through Dynamic Time Warping (DTW). We demonstrate the utility of our method using the Cohn-Kanade+ dataset, to align expression onset across multiple sequences, which allows us to automate the rapid discovery of event annotations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

High-Order Co-Clustering (HOCC) methods have attracted high attention in recent years because of their ability to cluster multiple types of objects simultaneously using all available information. During the clustering process, HOCC methods exploit object co-occurrence information, i.e., inter-type relationships amongst different types of objects as well as object affinity information, i.e., intra-type relationships amongst the same types of objects. However, it is difficult to learn accurate intra-type relationships in the presence of noise and outliers. Existing HOCC methods consider the p nearest neighbours based on Euclidean distance for the intra-type relationships, which leads to incomplete and inaccurate intra-type relationships. In this paper, we propose a novel HOCC method that incorporates multiple subspace learning with a heterogeneous manifold ensemble to learn complete and accurate intra-type relationships. Multiple subspace learning reconstructs the similarity between any pair of objects that belong to the same subspace. The heterogeneous manifold ensemble is created based on two-types of intra-type relationships learnt using p-nearest-neighbour graph and multiple subspaces learning. Moreover, in order to make sure the robustness of clustering process, we introduce a sparse error matrix into matrix decomposition and develop a novel iterative algorithm. Empirical experiments show that the proposed method achieves improved results over the state-of-art HOCC methods for FScore and NMI.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Environmentally benign and economical methods for the preparation of industrially important hydroxy acids and diacids were developed. The carboxylic acids, used in polyesters, alkyd resins, and polyamides, were obtained by the oxidation of the corresponding alcohols with hydrogen peroxide or air catalyzed by sodium tungstate or supported noble metals. These oxidations were carried out using water as a solvent. The alcohols are also a useful alternative to the conventional reactants, hydroxyaldehydes and cycloalkanes. The oxidation of 2,2-disubstituted propane-1,3-diols with hydrogen peroxide catalyzed by sodium tungstate afforded 2,2-disubstituted 3-hydroxypropanoic acids and 1,1-disubstituted ethane-1,2-diols as products. A computational study of the Baeyer-Villiger rearrangement of the intermediate 2,2-disubstituted 3-hydroxypropanals gave in-depth data of the mechanism of the reaction. Linear primary diols having chain length of at least six carbons were easily oxidized with hydrogen peroxide to linear dicarboxylic acids catalyzed by sodium tungstate. The Pt/C catalyzed air oxidation of 2,2-disubstituted propane-1,3-diols and linear primary diols afforded the highest yield of the corresponding hydroxy acids, while the Pt, Bi/C catalyzed oxidation of the diols afforded the highest yield of the corresponding diacids. The mechanism of the promoted oxidation was best described by the ensemble effect, and by the formation of a complex of the hydroxy and the carboxy groups of the hydroxy acids with bismuth atoms. The Pt, Bi/C catalyzed air oxidation of 2-substituted 2-hydroxymethylpropane-1,3-diols gave 2-substituted malonic acids by the decarboxylation of the corresponding triacids. Activated carbon was the best support and bismuth the most efficient promoter in the air oxidation of 2,2-dialkylpropane-1,3-diols to diacids. In oxidations carried out in organic solvents barium sulfate could be a valuable alternative to activated carbon as a non-flammable support. In the Pt/C catalyzed air oxidation of 2,2-disubstituted propane-1,3-diols to 2,2-disubstituted 3-hydroxypropanoic acids the small size of the 2-substituents enhanced the rate of the oxidation. When the potential of platinum of the catalyst was not controlled, the highest yield of the diacids in the Pt, Bi/C catalyzed air oxidation of 2,2-dialkylpropane-1,3-diols was obtained in the regime of mass transfer. The most favorable pH of the reaction mixture of the promoted oxidation was 10. The reaction temperature of 40°C prevented the decarboxylation of the diacids.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The significance of treating rainfall as a chaotic system instead of a stochastic system for a better understanding of the underlying dynamics has been taken up by various studies recently. However, an important limitation of all these approaches is the dependence on a single method for identifying the chaotic nature and the parameters involved. Many of these approaches aim at only analyzing the chaotic nature and not its prediction. In the present study, an attempt is made to identify chaos using various techniques and prediction is also done by generating ensembles in order to quantify the uncertainty involved. Daily rainfall data of three regions with contrasting characteristics (mainly in the spatial area covered), Malaprabha, Mahanadi and All-India for the period 1955-2000 are used for the study. Auto-correlation and mutual information methods are used to determine the delay time for the phase space reconstruction. Optimum embedding dimension is determined using correlation dimension, false nearest neighbour algorithm and also nonlinear prediction methods. The low embedding dimensions obtained from these methods indicate the existence of low dimensional chaos in the three rainfall series. Correlation dimension method is done on th phase randomized and first derivative of the data series to check whether the saturation of the dimension is due to the inherent linear correlation structure or due to low dimensional dynamics. Positive Lyapunov exponents obtained prove the exponential divergence of the trajectories and hence the unpredictability. Surrogate data test is also done to further confirm the nonlinear structure of the rainfall series. A range of plausible parameters is used for generating an ensemble of predictions of rainfall for each year separately for the period 1996-2000 using the data till the preceding year. For analyzing the sensitiveness to initial conditions, predictions are done from two different months in a year viz., from the beginning of January and June. The reasonably good predictions obtained indicate the efficiency of the nonlinear prediction method for predicting the rainfall series. Also, the rank probability skill score and the rank histograms show that the ensembles generated are reliable with a good spread and skill. A comparison of results of the three regions indicates that although they are chaotic in nature, the spatial averaging over a large area can increase the dimension and improve the predictability, thus destroying the chaotic nature. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Perfect or even mediocre weather predictions over a long period are almost impossible because of the ultimate growth of a small initial error into a significant one. Even though the sensitivity of initial conditions limits the predictability in chaotic systems, an ensemble of prediction from different possible initial conditions and also a prediction algorithm capable of resolving the fine structure of the chaotic attractor can reduce the prediction uncertainty to some extent. All of the traditional chaotic prediction methods in hydrology are based on single optimum initial condition local models which can model the sudden divergence of the trajectories with different local functions. Conceptually, global models are ineffective in modeling the highly unstable structure of the chaotic attractor. This paper focuses on an ensemble prediction approach by reconstructing the phase space using different combinations of chaotic parameters, i.e., embedding dimension and delay time to quantify the uncertainty in initial conditions. The ensemble approach is implemented through a local learning wavelet network model with a global feed-forward neural network structure for the phase space prediction of chaotic streamflow series. Quantification of uncertainties in future predictions are done by creating an ensemble of predictions with wavelet network using a range of plausible embedding dimensions and delay times. The ensemble approach is proved to be 50% more efficient than the single prediction for both local approximation and wavelet network approaches. The wavelet network approach has proved to be 30%-50% more superior to the local approximation approach. Compared to the traditional local approximation approach with single initial condition, the total predictive uncertainty in the streamflow is reduced when modeled with ensemble wavelet networks for different lead times. Localization property of wavelets, utilizing different dilation and translation parameters, helps in capturing most of the statistical properties of the observed data. The need for taking into account all plausible initial conditions and also bringing together the characteristics of both local and global approaches to model the unstable yet ordered chaotic attractor of a hydrologic series is clearly demonstrated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current methods for molecular simulations of Electric Double Layer Capacitors (EDLC) have both the electrodes and the electrolyte region in a single simulation box. This necessitates simulation of the electrode-electrolyte region interface. Typical capacitors have macroscopic dimensions where the fraction of the molecules at the electrode-electrolyte region interface is very low. Hence, large systems sizes are needed to minimize the electrode-electrolyte region interfacial effects. To overcome these problems, a new technique based on the Gibbs Ensemble is proposed for simulation of an EDLC. In the proposed technique, each electrode is simulated in a separate simulation box. Application of periodic boundary conditions eliminates the interfacial effects. This in addition to the use of constant voltage ensemble allows for a more convenient comparison of simulation results with experimental measurements on typical EDLCs. (C) 2014 AIP Publishing LLC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the first part of this paper we reviewed the fingerprint classification literature from two different perspectives: the feature extraction and the classifier learning. Aiming at answering the question of which among the reviewed methods would perform better in a real implementation we end up in a discussion which showed the difficulty in answering this question. No previous comparison exists in the literature and comparisons among papers are done with different experimental frameworks. Moreover, the difficulty in implementing published methods was stated due to the lack of details in their description, parameters and the fact that no source code is shared. For this reason, in this paper we will go through a deep experimental study following the proposed double perspective. In order to do so, we have carefully implemented some of the most relevant feature extraction methods according to the explanations found in the corresponding papers and we have tested their performance with different classifiers, including those specific proposals made by the authors. Our aim is to develop an objective experimental study in a common framework, which has not been done before and which can serve as a baseline for future works on the topic. This way, we will not only test their quality, but their reusability by other researchers and will be able to indicate which proposals could be considered for future developments. Furthermore, we will show that combining different feature extraction models in an ensemble can lead to a superior performance, significantly increasing the results obtained by individual models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Wind power generation differs from conventional thermal generation due to the stochastic nature of wind. Thus wind power forecasting plays a key role in dealing with the challenges of balancing supply and demand in any electricity system, given the uncertainty associated with the wind farm power output. Accurate wind power forecasting reduces the need for additional balancing energy and reserve power to integrate wind power. Wind power forecasting tools enable better dispatch, scheduling and unit commitment of thermal generators, hydro plant and energy storage plant and more competitive market trading as wind power ramps up and down on the grid. This paper presents an in-depth review of the current methods and advances in wind power forecasting and prediction. Firstly, numerical wind prediction methods from global to local scales, ensemble forecasting, upscaling and downscaling processes are discussed. Next the statistical and machine learning approach methods are detailed. Then the techniques used for benchmarking and uncertainty analysis of forecasts are overviewed, and the performance of various approaches over different forecast time horizons is examined. Finally, current research activities, challenges and potential future developments are appraised.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim: Ecological niche modelling can provide valuable insight into species' environmental preferences and aid the identification of key habitats for populations of conservation concern. Here, we integrate biologging, satellite remote-sensing and ensemble ecological niche models (EENMs) to identify predictable foraging habitats for a globally important population of the grey-headed albatross (GHA) Thalassarche chrysostoma. Location: Bird Island, South Georgia; Southern Atlantic Ocean. Methods: GPS and geolocation-immersion loggers were used to track at-sea movements and activity patterns of GHA over two breeding seasons (n = 55; brood-guard). Immersion frequency (landings per 10-min interval) was used to define foraging events. EENM combining Generalized Additive Models (GAM), MaxEnt, Random Forest (RF) and Boosted Regression Trees (BRT) identified the biophysical conditions characterizing the locations of foraging events, using time-matched oceanographic predictors (Sea Surface Temperature, SST; chlorophyll a, chl-a; thermal front frequency, TFreq; depth). Model performance was assessed through iterative cross-validation and extrapolative performance through cross-validation among years. Results: Predictable foraging habitats identified by EENM spanned neritic (<500 m), shelf break and oceanic waters, coinciding with a set of persistent biophysical conditions characterized by particular thermal ranges (3–8 °C, 12–13 °C), elevated primary productivity (chl-a > 0.5 mg m−3) and frequent manifestation of mesoscale thermal fronts. Our results confirm previous indications that GHA exploit enhanced foraging opportunities associated with frontal systems and objectively identify the APFZ as a region of high foraging habitat suitability. Moreover, at the spatial and temporal scales investigated here, the performance of multi-model ensembles was superior to that of single-algorithm models, and cross-validation among years indicated reasonable extrapolative performance. Main conclusions: EENM techniques are useful for integrating the predictions of several single-algorithm models, reducing potential bias and increasing confidence in predictions. Our analysis highlights the value of EENM for use with movement data in identifying at-sea habitats of wide-ranging marine predators, with clear implications for conservation and management.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim: Ecological niche modelling can provide valuable insight into species' environmental preferences and aid the identification of key habitats for populations of conservation concern. Here, we integrate biologging, satellite remote-sensing and ensemble ecological niche models (EENMs) to identify predictable foraging habitats for a globally important population of the grey-headed albatross (GHA) Thalassarche chrysostoma. Location: Bird Island, South Georgia; Southern Atlantic Ocean. Methods: GPS and geolocation-immersion loggers were used to track at-sea movements and activity patterns of GHA over two breeding seasons (n = 55; brood-guard). Immersion frequency (landings per 10-min interval) was used to define foraging events. EENM combining Generalized Additive Models (GAM), MaxEnt, Random Forest (RF) and Boosted Regression Trees (BRT) identified the biophysical conditions characterizing the locations of foraging events, using time-matched oceanographic predictors (Sea Surface Temperature, SST; chlorophyll a, chl-a; thermal front frequency, TFreq; depth). Model performance was assessed through iterative cross-validation and extrapolative performance through cross-validation among years. Results: Predictable foraging habitats identified by EENM spanned neritic (<500 m), shelf break and oceanic waters, coinciding with a set of persistent biophysical conditions characterized by particular thermal ranges (3–8 °C, 12–13 °C), elevated primary productivity (chl-a > 0.5 mg m−3) and frequent manifestation of mesoscale thermal fronts. Our results confirm previous indications that GHA exploit enhanced foraging opportunities associated with frontal systems and objectively identify the APFZ as a region of high foraging habitat suitability. Moreover, at the spatial and temporal scales investigated here, the performance of multi-model ensembles was superior to that of single-algorithm models, and cross-validation among years indicated reasonable extrapolative performance. Main conclusions: EENM techniques are useful for integrating the predictions of several single-algorithm models, reducing potential bias and increasing confidence in predictions. Our analysis highlights the value of EENM for use with movement data in identifying at-sea habitats of wide-ranging marine predators, with clear implications for conservation and management.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Wind power generation differs from conventional thermal generation due to the stochastic nature of wind. Thus wind power forecasting plays a key role in dealing with the challenges of balancing supply and demand in any electricity system, given the uncertainty associated with the wind farm power output. Accurate wind power forecasting reduces the need for additional balancing energy and reserve power to integrate wind power. Wind power forecasting tools enable better dispatch, scheduling and unit commitment of thermal generators, hydro plant and energy storage plant and more competitive market trading as wind power ramps up and down on the grid. This paper presents an in-depth review of the current methods and advances in wind power forecasting and prediction. Firstly, numerical wind prediction methods from global to local scales, ensemble forecasting, upscaling and downscaling processes are discussed. Next the statistical and machine learning approach methods are detailed. Then the techniques used for benchmarking and uncertainty analysis of forecasts are overviewed, and the performance of various approaches over different forecast time horizons is examined. Finally, current research activities, challenges and potential future developments are appraised. (C) 2011 Elsevier Ltd. All rights reserved.