756 resultados para Grid-based clustering approach
Resumo:
Contexte: Les champignons mycorhiziens à arbuscules (AMF) établissent des relations symbiotiques avec la plupart des plantes grâce à leurs réseaux d’hyphes qui s’associent avec les racines de leurs hôtes. De précédentes études ont révélé des niveaux de variation génétique extrêmes pour des loci spécifiques permettant de supposer que les AMF peuvent contenir des milliers de noyaux génétiquement divergents dans un même cytoplasme. Si aucun processus de reproduction sexuée n’a jusqu’ici été observé chez ces mycorhizes, on constate cependant que des niveaux élevés de variation génétique peuvent être maintenus à la fois par l’échange de noyaux entre hyphes et par des processus fréquents de recombinaison entre noyaux. Les AMF se propagent par l’intermédiaire de spores qui contiennent chacune un échantillon d’une population initiale de noyaux hétérogènes, directement hérités du mycélium parent. À notre connaissance les AMF sont les seuls organismes qui ne passent jamais par un stade mononucléaire, ce qui permet aux noyaux de diverger génétiquement dans un même cytoplasme. Ces aspects singuliers de la biologie des AMF rendent l’estimation de leur diversité génétique problématique. Ceci constitue un défi majeur pour les écologistes sur le terrain mais également pour les biologistes moléculaires dans leur laboratoire. Au-delà même des problématiques de diversité spécifique, l’amplitude du polymorphisme entre noyaux mycorhiziens est mal connue. Le travail proposé dans ce manuscrit de thèse explore donc les différents aspects de l’architecture génomique singulière des AMF. Résultats L’ampleur du polymorphisme intra-isolat a été déjà observée pour la grande sous-unité d’ARN ribosomal de l’isolat Glomus irregulare DAOM-197198 (précédemment identifié comme G. intraradices) et pour le gène de la polymerase1-like (PLS) de Glomus etunicatum isolat NPI. Dans un premier temps, nous avons pu confirmer ces résultats et nous avons également pu constater que ces variations étaient transcrites. Nous avons ensuite pu mettre en évidence la présence d’un goulot d’étranglement génétique au moment de la sporulation pour le locus PLS chez l’espèce G. etunicatum illustrant les importants effets d’échantillonnage qui se produisaient entre chaque génération de spore. Enfin, nous avons estimé la différentiation génétique des AMF en utilisant à la fois les réseaux de gènes appliqués aux données de séquençage haut-débit ainsi que cinq nouveaux marqueurs génomiques en copie unique. Ces analyses révèlent que la différenciation génomique est présente de manière systématique dans deux espèces (G. irregulare et G. diaphanum). Conclusions Les résultats de cette thèse fournissent des preuves supplémentaires en faveur du scénario d’une différenciation génomique entre noyaux au sein du même isolat mycorhizien. Ainsi, au moins trois membres du genre Glomus, G. irregulare, G. diaphanum and G. etunicatum, apparaissent comme des organismes dont l’organisation des génomes ne peut pas être décrit d’après un modèle Mendélien strict, ce qui corrobore l’hypothèse que les noyaux mycorhiziens génétiquement différenciés forment un pangenome.
Resumo:
Abstract Background: The amount and structure of genetic diversity in dessert apple germplasm conserved at a European level is mostly unknown, since all diversity studies conducted in Europe until now have been performed on regional or national collections. Here, we applied a common set of 16 SSR markers to genotype more than 2,400 accessions across 14 collections representing three broad European geographic regions (North+East, West and South) with the aim to analyze the extent, distribution and structure of variation in the apple genetic resources in Europe. Results: A Bayesian model-based clustering approach showed that diversity was organized in three groups, although these were only moderately differentiated (FST=0.031). A nested Bayesian clustering approach allowed identification of subgroups which revealed internal patterns of substructure within the groups, allowing a finer delineation of the variation into eight subgroups (FST=0.044). The first level of stratification revealed an asymmetric division of the germplasm among the three groups, and a clear association was found with the geographical regions of origin of the cultivars. The substructure revealed clear partitioning of genetic groups among countries, but also interesting associations between subgroups and breeding purposes of recent cultivars or particular usage such as cider production. Additional parentage analyses allowed us to identify both putative parents of more than 40 old and/or local cultivars giving interesting insights in the pedigree of some emblematic cultivars. Conclusions: The variation found at group and sub-group levels may reflect a combination of historical processes of migration/selection and adaptive factors to diverse agricultural environments that, together with genetic drift, have resulted in extensive genetic variation but limited population structure. The European dessert apple germplasm represents an important source of genetic diversity with a strong historical and patrimonial value. The present work thus constitutes a decisive step in the field of conservation genetics. Moreover, the obtained data can be used for defining a European apple core collection useful for further identification of genomic regions associated with commercially important horticultural traits in apple through genome-wide association studies.
Resumo:
Raccoons are the reservoir for the raccoon rabies virus variant in the United States. To combat this threat, oral rabies vaccination (ORV) programs are conducted in many eastern states. To aid in these efforts, the genetic structure of raccoons (Procyon lotor) was assessed in southwestern Pennsylvania to determine if select geographic features (i.e., ridges and valleys) serve as corridors or hindrances to raccoon gene flow (e.g., movement) and, therefore, rabies virus trafficking in this physiographic region. Raccoon DNA samples (n = 185) were collected from one ridge site and two adjacent valleys in southwestern Pennsylvania (Westmoreland, Cambria, Fayette, and Somerset counties). Raccoon genetic structure within and among these study sites was characterized at nine microsatellite loci. Results indicated that there was little population subdivision among any sites sampled. Furthermore, analyses using a model-based clustering approach indicated one essentially panmictic population was present among all the raccoons sampled over a reasonably broad geographic area (e.g., sites up to 36 km apart). However, a signature of isolation by distance was detected, suggesting that widths of ORV zones are critical for success. Combined, these data indicate that geographic features within this landscape influence raccoon gene flow only to a limited extent, suggesting that ridges of this physiographic system will not provide substantial long-term natural barriers to rabies virus trafficking. These results may be of value for future ORV efforts in Pennsylvania and other eastern states with similar landscapes.
Resumo:
This paper presents the characterization of high voltage (HV) electric power consumers based on a data clustering approach. The typical load profiles (TLP) are obtained selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The choice of the best partition is supported using several cluster validity indices. The proposed data-mining (DM) based methodology, that includes all steps presented in the process of knowledge discovery in databases (KDD), presents an automatic data treatment application in order to preprocess the initial database in an automatic way, allowing time saving and better accuracy during this phase. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ consumption behavior. To validate our approach, a case study with a real database of 185 HV consumers was used.
Resumo:
This study focuses on the implementation of several pair trading strategies across three emerging markets, with the objective of comparing the results obtained from the different strategies and assessing if pair trading benefits from a more volatile environment. The results show that, indeed, there are higher potential profits arising from emerging markets. However, the higher excess return will be partially offset by higher transaction costs, which will be a determinant factor to the profitability of pair trading strategies. Also, a new clustering approach based on the Principal Component Analysis was tested as an alternative to the more standard clustering by Industry Groups. The new clustering approach delivers promising results, consistently reducing volatility to a greater extent than the Industry Group approach, with no significant harm to the excess returns.
Resumo:
2000 Mathematics Subject Classification: 62H30
Resumo:
Garlic is a spice and a medicinal plant; hence, there is an increasing interest in 'developing' new varieties with different culinary properties or with high content of nutraceutical compounds. Phenotypic traits and dominant molecular markers are predominantly used to evaluate the genetic diversity of garlic clones. However, 24 SSR markers (codominant) specific for garlic are available in the literature, fostering germplasm researches. In this study, we genotyped 130 garlic accessions from Brazil and abroad using 17 polymorphic SSR markers to assess the genetic diversity and structure. This is the first attempt to evaluate a large set of accessions maintained by Brazilian institutions. A high level of redundancy was detected in the collection (50 % of the accessions represented eight haplotypes). However, non-redundant accessions presented high genetic diversity. We detected on average five alleles per locus, Shannon index of 1.2, HO of 0.5, and HE of 0.6. A core collection was set with 17 accessions, covering 100 % of the alleles with minimum redundancy. Overall FST and D values indicate a strong genetic structure within accessions. Two major groups identified by both model-based (Bayesian approach) and hierarchical clustering (UPGMA dendrogram) techniques were coherent with the classification of accessions according to maturity time (growth cycle): early-late and midseason accessions. Assessing genetic diversity and structure of garlic collections is the first step towards an efficient management and conservation of accessions in genebanks, as well as to advance future genetic studies and improvement of garlic worldwide.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
The large increase of distributed energy resources, including distributed generation, storage systems and demand response, especially in distribution networks, makes the management of the available resources a more complex and crucial process. With wind based generation gaining relevance, in terms of the generation mix, the fact that wind forecasting accuracy rapidly drops with the increase of the forecast anticipation time requires to undertake short-term and very short-term re-scheduling so the final implemented solution enables the lowest possible operation costs. This paper proposes a methodology for energy resource scheduling in smart grids, considering day ahead, hour ahead and five minutes ahead scheduling. The short-term scheduling, undertaken five minutes ahead, takes advantage of the high accuracy of the very-short term wind forecasting providing the user with more efficient scheduling solutions. The proposed method uses a Genetic Algorithm based approach for optimization that is able to cope with the hard execution time constraint of short-term scheduling. Realistic power system simulation, based on PSCAD , is used to validate the obtained solutions. The paper includes a case study with a 33 bus distribution network with high penetration of distributed energy resources implemented in PSCAD .
Resumo:
Biosignals analysis has become widespread, upstaging their typical use in clinical settings. Electrocardiography (ECG) plays a central role in patient monitoring as a diagnosis tool in today's medicine and as an emerging biometric trait. In this paper we adopt a consensus clustering approach for the unsupervised analysis of an ECG-based biometric records. This type of analysis highlights natural groups within the population under investigation, which can be correlated with ground truth information in order to gain more insights about the data. Preliminary results are promising, for meaningful clusters are extracted from the population under analysis. © 2014 EURASIP.
Resumo:
Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
Resumo:
This paper consists in the characterization of medium voltage (MV) electric power consumers based on a data clustering approach. It is intended to identify typical load profiles by selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The best partition is selected using several cluster validity indices. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ behavior. The data-mining-based methodology presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partitions. To validate our approach, a case study with a real database of 1.022 MV consumers was used.
Resumo:
Depth-averaged velocities and unit discharges within a 30 km reach of one of the world's largest rivers, the Rio Parana, Argentina, were simulated using three hydrodynamic models with different process representations: a reduced complexity (RC) model that neglects most of the physics governing fluid flow, a two-dimensional model based on the shallow water equations, and a three-dimensional model based on the Reynolds-averaged Navier-Stokes equations. Row characteristics simulated using all three models were compared with data obtained by acoustic Doppler current profiler surveys at four cross sections within the study reach. This analysis demonstrates that, surprisingly, the performance of the RC model is generally equal to, and in some instances better than, that of the physics based models in terms of the statistical agreement between simulated and measured flow properties. In addition, in contrast to previous applications of RC models, the present study demonstrates that the RC model can successfully predict measured flow velocities. The strong performance of the RC model reflects, in part, the simplicity of the depth-averaged mean flow patterns within the study reach and the dominant role of channel-scale topographic features in controlling the flow dynamics. Moreover, the very low water surface slopes that typify large sand-bed rivers enable flow depths to be estimated reliably in the RC model using a simple fixed-lid planar water surface approximation. This approach overcomes a major problem encountered in the application of RC models in environments characterised by shallow flows and steep bed gradients. The RC model is four orders of magnitude faster than the physics based models when performing steady-state hydrodynamic calculations. However, the iterative nature of the RC model calculations implies a reduction in computational efficiency relative to some other RC models. A further implication of this is that, if used to simulate channel morphodynamics, the present RC model may offer only a marginal advantage in terms of computational efficiency over approaches based on the shallow water equations. These observations illustrate the trade off between model realism and efficiency that is a key consideration in RC modelling. Moreover, this outcome highlights a need to rethink the use of RC morphodynamic models in fluvial geomorphology and to move away from existing grid-based approaches, such as the popular cellular automata (CA) models, that remain essentially reductionist in nature. In the case of the world's largest sand-bed rivers, this might be achieved by implementing the RC model outlined here as one element within a hierarchical modelling framework that would enable computationally efficient simulation of the morphodynamics of large rivers over millennial time scales. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
A spectral angle based feature extraction method, Spectral Clustering Independent Component Analysis (SC-ICA), is proposed in this work to improve the brain tissue classification from Magnetic Resonance Images (MRI). SC-ICA provides equal priority to global and local features; thereby it tries to resolve the inefficiency of conventional approaches in abnormal tissue extraction. First, input multispectral MRI is divided into different clusters by a spectral distance based clustering. Then, Independent Component Analysis (ICA) is applied on the clustered data, in conjunction with Support Vector Machines (SVM) for brain tissue analysis. Normal and abnormal datasets, consisting of real and synthetic T1-weighted, T2-weighted and proton density/fluid-attenuated inversion recovery images, were used to evaluate the performance of the new method. Comparative analysis with ICA based SVM and other conventional classifiers established the stability and efficiency of SC-ICA based classification, especially in reproduction of small abnormalities. Clinical abnormal case analysis demonstrated it through the highest Tanimoto Index/accuracy values, 0.75/98.8%, observed against ICA based SVM results, 0.17/96.1%, for reproduced lesions. Experimental results recommend the proposed method as a promising approach in clinical and pathological studies of brain diseases
A benchmark-driven modelling approach for evaluating deployment choices on a multi-core architecture
Resumo:
The complexity of current and emerging architectures provides users with options about how best to use the available resources, but makes predicting performance challenging. In this work a benchmark-driven model is developed for a simple shallow water code on a Cray XE6 system, to explore how deployment choices such as domain decomposition and core affinity affect performance. The resource sharing present in modern multi-core architectures adds various levels of heterogeneity to the system. Shared resources often includes cache, memory, network controllers and in some cases floating point units (as in the AMD Bulldozer), which mean that the access time depends on the mapping of application tasks, and the core's location within the system. Heterogeneity further increases with the use of hardware-accelerators such as GPUs and the Intel Xeon Phi, where many specialist cores are attached to general-purpose cores. This trend for shared resources and non-uniform cores is expected to continue into the exascale era. The complexity of these systems means that various runtime scenarios are possible, and it has been found that under-populating nodes, altering the domain decomposition and non-standard task to core mappings can dramatically alter performance. To find this out, however, is often a process of trial and error. To better inform this process, a performance model was developed for a simple regular grid-based kernel code, shallow. The code comprises two distinct types of work, loop-based array updates and nearest-neighbour halo-exchanges. Separate performance models were developed for each part, both based on a similar methodology. Application specific benchmarks were run to measure performance for different problem sizes under different execution scenarios. These results were then fed into a performance model that derives resource usage for a given deployment scenario, with interpolation between results as necessary.