982 resultados para ensemble methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A set of four eddy-permitting global ocean reanalyses produced in the framework of the MyOcean project have been compared over the altimetry period 1993–2011. The main differences among the reanalyses used here come from the data assimilation scheme implemented to control the ocean state by inserting reprocessed observations of sea surface temperature (SST), in situ temperature and salinity profiles, sea level anomaly and sea-ice concentration. A first objective of this work includes assessing the interannual variability and trends for a series of parameters, usually considered in the community as essential ocean variables: SST, sea surface salinity, temperature and salinity averaged over meaningful layers of the water column, sea level, transports across pre-defined sections, and sea ice parameters. The eddy-permitting nature of the global reanalyses allows also to estimate eddy kinetic energy. The results show that in general there is a good consistency between the different reanalyses. An intercomparison against experiments without data assimilation was done during the MyOcean project and we conclude that data assimilation is crucial for correctly simulating some quantities such as regional trends of sea level as well as the eddy kinetic energy. A second objective is to show that the ensemble mean of reanalyses can be evaluated as one single system regarding its reliability in reproducing the climate signals, where both variability and uncertainties are assessed through the ensemble spread and signal-to-noise ratio. The main advantage of having access to several reanalyses differing in the way data assimilation is performed is that it becomes possible to assess part of the total uncertainty. Given the fact that we use very similar ocean models and atmospheric forcing, we can conclude that the spread of the ensemble of reanalyses is mainly representative of our ability to gauge uncertainty in the assimilation methods. This uncertainty changes a lot from one ocean parameter to another, especially in global indices. However, despite several caveats in the design of the multi-system ensemble, the main conclusion from this study is that an eddy-permitting multi-system ensemble approach has become mature and our results provide a first step towards a systematic comparison of eddy-permitting global ocean reanalyses aimed at providing robust conclusions on the recent evolution of the oceanic state.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we present the idea of how generalized ensembles can be used to simplify the operational study of non-additive physical systems. As alternative of the usual methods of direct integration or mean-field theory, we show how the solution of the Ising model with infinite-range interactions is obtained by using a generalized canonical ensemble. We describe how the thermodynamical properties of this model in the presence of an external magnetic field are founded by simple parametric equations. Without impairing the usual interpretation, we obtain an identical critical behaviour as observed in traditional approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In condensed matter systems, the interfacial tension plays a central role for a multitude of phenomena. It is the driving force for nucleation processes, determines the shape and structure of crystalline structures and is important for industrial applications. Despite its importance, the interfacial tension is hard to determine in experiments and also in computer simulations. While for liquid-vapor interfacial tensions there exist sophisticated simulation methods to compute the interfacial tension, current methods for solid-liquid interfaces produce unsatisfactory results.rnrnAs a first approach to this topic, the influence of the interfacial tension on nuclei is studied within the three-dimensional Ising model. This model is well suited because despite its simplicity, one can learn much about nucleation of crystalline nuclei. Below the so-called roughening temperature, nuclei in the Ising model are not spherical anymore but become cubic because of the anisotropy of the interfacial tension. This is similar to crystalline nuclei, which are in general not spherical but more like a convex polyhedron with flat facets on the surface. In this context, the problem of distinguishing between the two bulk phases in the vicinity of the diffuse droplet surface is addressed. A new definition is found which correctly determines the volume of a droplet in a given configuration if compared to the volume predicted by simple macroscopic assumptions.rnrnTo compute the interfacial tension of solid-liquid interfaces, a new Monte Carlo method called ensemble switch method'' is presented which allows to compute the interfacial tension of liquid-vapor interfaces as well as solid-liquid interfaces with great accuracy. In the past, the dependence of the interfacial tension on the finite size and shape of the simulation box has often been neglected although there is a nontrivial dependence on the box dimensions. As a consequence, one needs to systematically increase the box size and extrapolate to infinite volume in order to accurately predict the interfacial tension. Therefore, a thorough finite-size scaling analysis is established in this thesis. Logarithmic corrections to the finite-size scaling are motivated and identified, which are of leading order and therefore must not be neglected. The astounding feature of these logarithmic corrections is that they do not depend at all on the model under consideration. Using the ensemble switch method, the validity of a finite-size scaling ansatz containing the aforementioned logarithmic corrections is carefully tested and confirmed. Combining the finite-size scaling theory with the ensemble switch method, the interfacial tension of several model systems, ranging from the Ising model to colloidal systems, is computed with great accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Systems biology techniques are a topic of recent interest within the neurological field. Computational intelligence (CI) addresses this holistic perspective by means of consensus or ensemble techniques ultimately capable of uncovering new and relevant findings. In this paper, we propose the application of a CI approach based on ensemble Bayesian network classifiers and multivariate feature subset selection to induce probabilistic dependences that could match or unveil biological relationships. The research focuses on the analysis of high-throughput Alzheimer's disease (AD) transcript profiling. The analysis is conducted from two perspectives. First, we compare the expression profiles of hippocampus subregion entorhinal cortex (EC) samples of AD patients and controls. Second, we use the ensemble approach to study four types of samples: EC and dentate gyrus (DG) samples from both patients and controls. Results disclose transcript interaction networks with remarkable structures and genes not directly related to AD by previous studies. The ensemble is able to identify a variety of transcripts that play key roles in other neurological pathologies. Classical statistical assessment by means of non-parametric tests confirms the relevance of the majority of the transcripts. The ensemble approach pinpoints key metabolic mechanisms that could lead to new findings in the pathogenesis and development of AD

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Variations in the physical deformation of the plasma membrane play a significant role in the sorting and behavior of the proteins that occupy it. Determining the interplay between membrane curvature and protein behavior required the development and thorough characterization of a model plasma membrane with well defined and localized regions of curvature. This model system consists of a fluid lipid bilayer that is supported by a dye-loaded polystyrene nanoparticle patterned glass substrate. As the physical deformation of the supported lipid bilayer is essential to our understanding of the behavior of the protein occupying the bilayer, extensive characterization of the structure of the model plasma membrane was conducted. Neither the regions of curvature in the vicinity of the polystyrene nanoparticles or the interaction between a lipid bilayer and small patches of curved polystyrene are well understood, so the results of experiments to determine these properties are described. To do so, individual fluorescently labeled proteins and lipids are tracked on this model system and in live cells. New methods for analyzing the resulting tracks and ensemble data are presented and discussed. To validate the model system and analytical methods, fluorescence microscopy was used to image a peripheral membrane protein, cholera toxin subunit B (CTB). These results are compared to results obtained from membrane components that were not expected to show an preference for membrane curvature: an individual fluorescently-labeled lipid, lissamine rhodamine B DHPE, and another protein, streptavidin associated with biotin-labeled DHPE. The observed tendency for cholera toxin subunit B to avoid curved regions of curvature, as determined by new and established analytical methods, is presented and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Species distribution models (SDM) predict species occurrence based on statistical relationships with environmental conditions. The R-package biomod2 which includes 10 different SDM techniques and 10 different evaluation methods was used in this study. Macroalgae are the main biomass producers in Potter Cove, King George Island (Isla 25 de Mayo), Antarctica, and they are sensitive to climate change factors such as suspended particulate matter (SPM). Macroalgae presence and absence data were used to test SDMs suitability and, simultaneously, to assess the environmental response of macroalgae as well as to model four scenarios of distribution shifts by varying SPM conditions due to climate change. According to the averaged evaluation scores of Relative Operating Characteristics (ROC) and True scale statistics (TSS) by models, those methods based on a multitude of decision trees such as Random Forest and Classification Tree Analysis, reached the highest predictive power followed by generalized boosted models (GBM) and maximum-entropy approaches (Maxent). The final ensemble model used 135 of 200 calculated models (TSS > 0.7) and identified hard substrate and SPM as the most influencing parameters followed by distance to glacier, total organic carbon (TOC), bathymetry and slope. The climate change scenarios show an invasive reaction of the macroalgae in case of less SPM and a retreat of the macroalgae in case of higher assumed SPM values.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public. © 2014 Ruifeng Xu et al.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as ƒ-test is performed during each node's split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis introduces two related lines of study on classification of hyperspectral images with nonlinear methods. First, it describes a quantitative and systematic evaluation, by the author, of each major component in a pipeline for classifying hyperspectral images (HSI) developed earlier in a joint collaboration [23]. The pipeline, with novel use of nonlinear classification methods, has reached beyond the state of the art in classification accuracy on commonly used benchmarking HSI data [6], [13]. More importantly, it provides a clutter map, with respect to a predetermined set of classes, toward the real application situations where the image pixels not necessarily fall into a predetermined set of classes to be identified, detected or classified with.

The particular components evaluated are a) band selection with band-wise entropy spread, b) feature transformation with spatial filters and spectral expansion with derivatives c) graph spectral transformation via locally linear embedding for dimension reduction, and d) statistical ensemble for clutter detection. The quantitative evaluation of the pipeline verifies that these components are indispensable to high-accuracy classification.

Secondly, the work extends the HSI classification pipeline with a single HSI data cube to multiple HSI data cubes. Each cube, with feature variation, is to be classified of multiple classes. The main challenge is deriving the cube-wise classification from pixel-wise classification. The thesis presents the initial attempt to circumvent it, and discuss the potential for further improvement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

First-order transitions of system where both lattice site occupancy and lattice spacing fluctuate, such as cluster crystals, cannot be efficiently studied by traditional simulation methods, which necessarily fix one of these two degrees of freedom. The difficulty, however, can be surmounted by the generalized [N]pT ensemble [J. Chem. Phys. 136, 214106 (2012)]. Here we show that histogram reweighting and the [N]pT ensemble can be used to study an isostructural transition between cluster crystals of different occupancy in the generalized exponential model of index 4 (GEM-4). Extending this scheme to finite-size scaling studies also allows us to accurately determine the critical point parameters and to verify that it belongs to the Ising universality class.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lors du transport du bois de la forêt vers les usines, de nombreux événements imprévus peuvent se produire, événements qui perturbent les trajets prévus (par exemple, en raison des conditions météo, des feux de forêt, de la présence de nouveaux chargements, etc.). Lorsque de tels événements ne sont connus que durant un trajet, le camion qui accomplit ce trajet doit être détourné vers un chemin alternatif. En l’absence d’informations sur un tel chemin, le chauffeur du camion est susceptible de choisir un chemin alternatif inutilement long ou pire, qui est lui-même "fermé" suite à un événement imprévu. Il est donc essentiel de fournir aux chauffeurs des informations en temps réel, en particulier des suggestions de chemins alternatifs lorsqu’une route prévue s’avère impraticable. Les possibilités de recours en cas d’imprévus dépendent des caractéristiques de la chaîne logistique étudiée comme la présence de camions auto-chargeurs et la politique de gestion du transport. Nous présentons trois articles traitant de contextes d’application différents ainsi que des modèles et des méthodes de résolution adaptés à chacun des contextes. Dans le premier article, les chauffeurs de camion disposent de l’ensemble du plan hebdomadaire de la semaine en cours. Dans ce contexte, tous les efforts doivent être faits pour minimiser les changements apportés au plan initial. Bien que la flotte de camions soit homogène, il y a un ordre de priorité des chauffeurs. Les plus prioritaires obtiennent les volumes de travail les plus importants. Minimiser les changements dans leurs plans est également une priorité. Étant donné que les conséquences des événements imprévus sur le plan de transport sont essentiellement des annulations et/ou des retards de certains voyages, l’approche proposée traite d’abord l’annulation et le retard d’un seul voyage, puis elle est généralisée pour traiter des événements plus complexes. Dans cette ap- proche, nous essayons de re-planifier les voyages impactés durant la même semaine de telle sorte qu’une chargeuse soit libre au moment de l’arrivée du camion à la fois au site forestier et à l’usine. De cette façon, les voyages des autres camions ne seront pas mo- difiés. Cette approche fournit aux répartiteurs des plans alternatifs en quelques secondes. De meilleures solutions pourraient être obtenues si le répartiteur était autorisé à apporter plus de modifications au plan initial. Dans le second article, nous considérons un contexte où un seul voyage à la fois est communiqué aux chauffeurs. Le répartiteur attend jusqu’à ce que le chauffeur termine son voyage avant de lui révéler le prochain voyage. Ce contexte est plus souple et offre plus de possibilités de recours en cas d’imprévus. En plus, le problème hebdomadaire peut être divisé en des problèmes quotidiens, puisque la demande est quotidienne et les usines sont ouvertes pendant des périodes limitées durant la journée. Nous utilisons un modèle de programmation mathématique basé sur un réseau espace-temps pour réagir aux perturbations. Bien que ces dernières puissent avoir des effets différents sur le plan de transport initial, une caractéristique clé du modèle proposé est qu’il reste valable pour traiter tous les imprévus, quelle que soit leur nature. En effet, l’impact de ces événements est capturé dans le réseau espace-temps et dans les paramètres d’entrée plutôt que dans le modèle lui-même. Le modèle est résolu pour la journée en cours chaque fois qu’un événement imprévu est révélé. Dans le dernier article, la flotte de camions est hétérogène, comprenant des camions avec des chargeuses à bord. La configuration des routes de ces camions est différente de celle des camions réguliers, car ils ne doivent pas être synchronisés avec les chargeuses. Nous utilisons un modèle mathématique où les colonnes peuvent être facilement et naturellement interprétées comme des itinéraires de camions. Nous résolvons ce modèle en utilisant la génération de colonnes. Dans un premier temps, nous relaxons l’intégralité des variables de décision et nous considérons seulement un sous-ensemble des itinéraires réalisables. Les itinéraires avec un potentiel d’amélioration de la solution courante sont ajoutés au modèle de manière itérative. Un réseau espace-temps est utilisé à la fois pour représenter les impacts des événements imprévus et pour générer ces itinéraires. La solution obtenue est généralement fractionnaire et un algorithme de branch-and-price est utilisé pour trouver des solutions entières. Plusieurs scénarios de perturbation ont été développés pour tester l’approche proposée sur des études de cas provenant de l’industrie forestière canadienne et les résultats numériques sont présentés pour les trois contextes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-08

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study was designed to investigate professional choral singers’ training, perceptions on the importance of sight-reading skill in their work, and thoughts on effective pedagogy for teaching sight-reading to undergraduate choral ensemble singers. Participants in this study (N=48) included self-selected professional singers and choral conductors from the Summer 2015 Oregon Bach Festival’s Berwick Chorus and conducting Master Class. Data were gathered from questionnaire responses and audio recorded focus group sessions. Focus group data showed that the majority of participants developed proficiency in their sight-reading skills from instrumental study, aural skills classes, and through on-the-job training at a church job or other professional choral singing employment. While participants brought up a number of important job skills, sightreading was listed as perhaps the single most important skill that a professional choral singer could develop. When reading music during the rehearsal process, the data revealed two main strategies that professional singers used to interpret the pitches in their musical line: an intervallic approach and a harmonic approach. Participants marked their scores systematically to identify problem spots and leave reminders to aid with future readings, such as marking intervals, solfege syllables, or rhythmic counts. Participants reported using a variety of skills other than score marking to try to accurately find their pitches, such as looking at other vocal or instrumental lines, looking ahead, and using knowledge about a musical style or time period to make more intuitive “guesses” when sight-reading. Participants described using additional approaches when sight-reading in an audition situation, including scanning for anchors or anomalies and positive self-talk. Singers learned these sight-reading techniques from a variety of sources. Participants had many different ideas about how best to teach sight-reading in the undergraduate choral ensemble rehearsal. The top response was that sight-reading needed to be practiced consistently in order for students to improve. Other responses included developing personal accountability, empowering students, combining different teaching methods, and discussing real-life applications of becoming strong sight-readers. There was discussion about the ultimate purpose of choir at the university level and whether it is to teach musicianship skills or produce excellent performances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract: Quantitative Methods (QM) is a compulsory course in the Social Science program in CEGEP. Many QM instructors assign a number of homework exercises to give students the opportunity to practice the statistical methods, which enhances their learning. However, traditional written exercises have two significant disadvantages. The first is that the feedback process is often very slow. The second disadvantage is that written exercises can generate a large amount of correcting for the instructor. WeBWorK is an open-source system that allows instructors to write exercises which students answer online. Although originally designed to write exercises for math and science students, WeBWorK programming allows for the creation of a variety of questions which can be used in the Quantitative Methods course. Because many statistical exercises generate objective and quantitative answers, the system is able to instantly assess students’ responses and tell them whether they are right or wrong. This immediate feedback has been shown to be theoretically conducive to positive learning outcomes. In addition, the system can be set up to allow students to re-try the problem if they got it wrong. This has benefits both in terms of student motivation and reinforcing learning. Through the use of a quasi-experiment, this research project measured and analysed the effects of using WeBWorK exercises in the Quantitative Methods course at Vanier College. Three specific research questions were addressed. First, we looked at whether students who did the WeBWorK exercises got better grades than students who did written exercises. Second, we looked at whether students who completed more of the WeBWorK exercises got better grades than students who completed fewer of the WeBWorK exercises. Finally, we used a self-report survey to find out what students’ perceptions and opinions were of the WeBWorK and the written exercises. For the first research question, a crossover design was used in order to compare whether the group that did WeBWorK problems during one unit would score significantly higher on that unit test than the other group that did the written problems. We found no significant difference in grades between students who did the WeBWorK exercises and students who did the written exercises. The second research question looked at whether students who completed more of the WeBWorK exercises would get significantly higher grades than students who completed fewer of the WeBWorK exercises. The straight-line relationship between number of WeBWorK exercises completed and grades was positive in both groups. However, the correlation coefficients for these two variables showed no real pattern. Our third research question was investigated by using a survey to elicit students’ perceptions and opinions regarding the WeBWorK and written exercises. Students reported no difference in the amount of effort put into completing each type of exercise. Students were also asked to rate each type of exercise along six dimensions and a composite score was calculated. Overall, students gave a significantly higher score to the written exercises, and reported that they found the written exercises were better for understanding the basic statistical concepts and for learning the basic statistical methods. However, when presented with the choice of having only written or only WeBWorK exercises, slightly more students preferred or strongly preferred having only WeBWorK exercises. The results of this research suggest that the advantages of using WeBWorK to teach Quantitative Methods are variable. The WeBWorK system offers immediate feedback, which often seems to motivate students to try again if they do not have the correct answer. However, this does not necessarily translate into better performance on the written tests and on the final exam. What has been learned is that the WeBWorK system can be used by interested instructors to enhance student learning in the Quantitative Methods course. Further research may examine more specifically how this system can be used more effectively.