959 resultados para Clustering methods


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recently divergent species that can hybridize are ideal models for investigating the genetic exchanges that can occur while preserving the species boundaries. Petunia exserta is an endemic species from a very limited and specific area that grows exclusively in rocky shelters. These shaded spots are an inhospitable habitat for all other Petunia species, including the closely related and widely distributed species P. axillaris. Individuals with intermediate morphologic characteristics have been found near the rocky shelters and were believed to be putative hybrids between P. exserta and P. axillaris, suggesting a situation where Petunia exserta is losing its genetic identity. In the current study, we analyzed the plastid intergenic spacers trnS/trnG and trnH/psbA and six nuclear CAPS markers in a large sampling design of both species to understand the evolutionary process occurring in this biological system. Bayesian clustering methods, cpDNA haplotype networks, genetic diversity statistics, and coalescence-based analyses support a scenario where hybridization occurs while two genetic clusters corresponding to two species are maintained. Our results reinforce the importance of coupling differentially inherited markers with an extensive geographic sample to assess the evolutionary dynamics of recently diverged species that can hybridize. (C) 2013 Elsevier Inc. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Lichens are symbioses between fungi (mycobionts) and photoautotrophic green algae or cyanobacteria (photobionts). Many lichens occupy large distributional ranges covering several climatic zones. So far, little is known about the large-scale phylogeography of lichen photobionts and their role in shaping the distributional ranges of lichens. We studied south polar, temperate and north polar populations of the widely distributed fruticose lichen Cetraria aculeata. Based on the DNA sequences from three loci for each symbiont, we compared the genetic structure of mycobionts and photobionts. Phylogenetic reconstructions and Bayesian clustering methods divided the mycobiont and photobiont data sets into three groups. An AMOVA shows that the genetic variance of the photobiont is best explained by differentiation between temperate and polar regions and that of the mycobiont by an interaction of climatic and geographical factors. By partialling out the relative contribution of climate, geography and codispersal, we found that the most relevant factors shaping the genetic structure of the photobiont are climate and a history of codispersal. Mycobionts in the temperate region are consistently associated with a specific photobiont lineage. We therefore conclude that a photobiont switch in the past enabled C. aculeata to colonize temperate as well as polar habitats. Rare photobiont switches may increase the geographical range and ecological niche of lichen mycobionts by associating them with locally adapted photobionts in climatically different regions and, together with isolation by distance, may lead to genetic isolation between populations and thus drive the evolution of lichens.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The hierarchical properties of potential energy landscapes have been used to gain insight into thermodynamic and kinetic properties of protein ensembles. It also may be possible to use them to direct computational searches for thermodynamically stable macroscopic states, i.e., computational protein folding. To this end, we have developed a top-down search procedure in which conformation space is recursively dissected according to the intrinsic hierarchical structure of a landscape's effective-energy barriers. This procedure generates an inverted tree similar to the disconnectivity graphs generated by local minima-clustering methods, but it fundamentally differs in the manner in which the portion of the tree that is to be computationally explored is selected. A key ingredient is a branch-selection algorithm that takes advantage of statistically predictive properties of the landscape to guide searches down the tree branches that are most likely to lead to the physically relevant macroscopic states. Using the computational folding of a β-hairpin-forming peptide as an example, we show that such predictive properties indeed exist and can be used for structure prediction by free-energy global minimization.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Automated human behaviour analysis has been, and still remains, a challenging problem. It has been dealt from different points of views: from primitive actions to human interaction recognition. This paper is focused on trajectory analysis which allows a simple high level understanding of complex human behaviour. It is proposed a novel representation method of trajectory data, called Activity Description Vector (ADV) based on the number of occurrences of a person is in a specific point of the scenario and the local movements that perform in it. The ADV is calculated for each cell of the scenario in which it is spatially sampled obtaining a cue for different clustering methods. The ADV representation has been tested as the input of several classic classifiers and compared to other approaches using CAVIAR dataset sequences obtaining great accuracy in the recognition of the behaviour of people in a Shopping Centre.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Andryala (Asteraceae: Cichorieae) is a little-known Mediterranean-Macaronesian genus whose taxonomy is much in need of revision. The aim of the present biosystematic study was to elucidate species relationships within this genus based on morphological and molecular data. In this study several taxa are recognised: 17 species, 14 subspecies, and 3 hybrids. Among these, 5 species are Macaronesian endemics (A. glandulosa, A. sparsiflora, A. crithmifolia Aiton, A. pinnatifida, and A. perezii), 4 species are Northwest African endemics (A. mogadorensis, A. maroccana, A. chevallieri, and A. nigricans) and one species is endemic to Romania (A. laevitomentosa). Historical background regarding taxonomic delimitation in the genus is addressed from Linnaean to present day concepts, as well as the origin of the name Andryala. The origin of Asteraceae and the systematic position of Andryala is shortly summarised. The morphological study was based on a bibliographic review and the revision of 1066 specimens of 13 herbaria as well as additional material collected during fieldwork. The variability of the morphological characters of the genus, including both vegetative taxonomic characters (root, stem, leaf and indumentum characters) and reproductive ones (inflorescence, floret, fruit and pappus characters), is assessed. Numerical analysis of the morphological data was performed using different similarity or dissimilarity measures and coefficients, as well as ordination and clustering methods. Results support the segregation of the recognised taxa and the congruence of the several analyses in the separation of the recognised taxa (using quantitative, binary or multi-state characters). The proposed taxonomy for Andryala includes a new infra-generic classification, new taxa and new combinations and ranks, typifications and diagnostic keys (one for the species and several for subspecies). For each taxon a list of synonyms, typification comments and a detailed description are provided, just as comments on taxonomy and nomenclature, and a brief discussion on karyology. Additionally, information on ecology and conservation status as well as on distribution and a list of studied material are also presented. Phylogenetic analyses based on different nuclear and chloroplast DNA markers, using Bayesian and maximum parsimony methods of inference, were performed. Results support three main lineages: separate ones for the relict species A. agardhii and A. laevitomentosa and a third including the majority of the Andryala species that underwent a relatively rapid and recent speciation. They also suggest a single colonization event of Madeira and the Canary Islands from the Mediterranean region, followed by insular speciation. Biogeography and speciation within the genus are briefly discussed, including a proposal for the centre of origin of the genus and possible dispersal routes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Bayesian clustering methods are typically used to identify barriers to gene flow, but they are prone to deduce artificial subdivisions in a study population characterized by an isolation-by-distance pattern (IbD). Here we analysed the landscape genetic structure of a population of wild boars (Sus scrofa) from south-western Germany. Two clustering methods inferred the presence of the same genetic discontinuity. However, the population in question was characterized by a strong IbD pattern. While landscape-resistance modelling failed to identify landscape features that influenced wild boar movement, partial Mantel tests and multiple regression of distance matrices (MRDMs) suggested that the empirically inferred clusters were separated by a genuine barrier. When simulating random lines bisecting the study area, 60% of the unique barriers represented, according to partial Mantel tests and MRDMs, significant obstacles to gene flow. By contrast, the random-lines simulation showed that the boundaries of the inferred empirical clusters corresponded to the most important genetic discontinuity in the study area. Given the degree of habitat fragmentation separating the two empirical partitions, it is likely that the clustering programs correctly identified a barrier to gene flow. The differing results between the work published here and other studies suggest that it will be very difficult to draw general conclusions about habitat permeability in wild boar from individual studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The structure and infrastructure of the Mexican technical literature was determined. A representative database of technical articles was extracted from the Science Citation Index for the year 2002, with each article containing at least one author with a Mexican address. Many different manual and statistical clustering methods were used to identify the structure of the technical literature (especially the science and technology core competencies). One of the pervasive technical topics identified from the clustering, thin films research, was analyzed further using bibliometrics, in order to identify the infrastructure of this technology. Published by Elsevier Inc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Clustering techniques such as k-means and hierarchical clustering are commonly used to analyze DNA microarray derived gene expression data. However, the interactions between processes underlying the cell activity suggest that the complexity of the microarray data structure may not be fully represented with discrete clustering methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis describes the development of a complete data visualisation system for large tabular databases, such as those commonly found in a business environment. A state-of-the-art 'cyberspace cell' data visualisation technique was investigated and a powerful visualisation system using it was implemented. Although allowing databases to be explored and conclusions drawn, it had several drawbacks, the majority of which were due to the three-dimensional nature of the visualisation. A novel two-dimensional generic visualisation system, known as MADEN, was then developed and implemented, based upon a 2-D matrix of 'density plots'. MADEN allows an entire high-dimensional database to be visualised in one window, while permitting close analysis in 'enlargement' windows. Selections of records can be made and examined, and dependencies between fields can be investigated in detail. MADEN was used as a tool for investigating and assessing many data processing algorithms, firstly data-reducing (clustering) methods, then dimensionality-reducing techniques. These included a new 'directed' form of principal components analysis, several novel applications of artificial neural networks, and discriminant analysis techniques which illustrated how groups within a database can be separated. To illustrate the power of the system, MADEN was used to explore customer databases from two financial institutions, resulting in a number of discoveries which would be of interest to a marketing manager. Finally, the database of results from the 1992 UK Research Assessment Exercise was analysed. Using MADEN allowed both universities and disciplines to be graphically compared, and supplied some startling revelations, including empirical evidence of the 'Oxbridge factor'.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Regional climate models (RCMs) provide reliable climatic predictions for the next 90 years with high horizontal and temporal resolution. In the 21st century northward latitudinal and upward altitudinal shift of the distribution of plant species and phytogeographical units is expected. It is discussed how the modeling of phytogeographical unit can be reduced to modeling plant distributions. Predicted shift of the Moesz line is studied as case study (with three different modeling approaches) using 36 parameters of REMO regional climate data-set, ArcGIS geographic information software, and periods of 1961-1990 (reference period), 2011-2040, and 2041-2070. The disadvantages of this relatively simple climate envelope modeling (CEM) approach are then discussed and several ways of model improvement are suggested. Some statistical and artificial intelligence (AI) methods (logistic regression, cluster analysis and other clustering methods, decision tree, evolutionary algorithm, artificial neural network) are able to provide development of the model. Among them artificial neural networks (ANN) seems to be the most suitable algorithm for this purpose, which provides a black box method for distribution modeling.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^