912 resultados para hierarchical clustering techniques


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The global amino acid compositions as deduced from the complete genomic sequences of six thermophilic archaea, two thermophilic bacteria, 17 mesophilic bacteria and two eukaryotic species were analysed by hierarchical clustering and principal components analysis. Both methods showed an influence of several factors on amino acid composition. Although GC content has a dominant effect, thermophilic species can be identified by their global amino acid compositions alone. This study presents a careful statistical analysis of factors that affect amino acid composition and also yielded specific features of the average amino acid composition of thermophilic species. Moreover, we introduce the first example of a ‘compositional tree’ of species that takes into account not only homologous proteins, but also proteins unique to particular species. We expect this simple yet novel approach to be a useful additional tool for the study of phylogeny at the genome level.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Optimal currency area theory suggests that business cycle comovement is a sufficient condition for monetary union, particularly if there are low levels of labour mobility between potential members of the monetary union. Previous studies of co-movement of business cycle variables (mainly authored by Artis and Zhang in the late 1990s) found that there was a core of member states in the EU that could be grouped together as having similar business cycle comovements, but these studies always used Germany as the country against which to compare. In this study, the analysis of Artis and Zhang is extended and updated but correlating against both German and euro area macroeconomic aggregates and using more recent techniques in cluster analysis, namely model-based clustering techniques.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We have used microarray gene expression pro. ling and machine learning to predict the presence of BRAF mutations in a panel of 61 melanoma cell lines. The BRAF gene was found to be mutated in 42 samples (69%) and intragenic mutations of the NRAS gene were detected in seven samples (11%). No cell line carried mutations of both genes. Using support vector machines, we have built a classifier that differentiates between melanoma cell lines based on BRAF mutation status. As few as 83 genes are able to discriminate between BRAF mutant and BRAF wild-type samples with clear separation observed using hierarchical clustering. Multidimensional scaling was used to visualize the relationship between a BRAF mutation signature and that of a generalized mitogen-activated protein kinase ( MAPK) activation ( either BRAF or NRAS mutation) in the context of the discriminating gene list. We observed that samples carrying NRAS mutations lie somewhere between those with or without BRAF mutations. These observations suggest that there are gene-specific mutation signals in addition to a common MAPK activation that result from the pleiotropic effects of either BRAF or NRAS on other signaling pathways, leading to measurably different transcriptional changes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Chronic alcohol exposure induces lasting behavioral changes, tolerance, and dependence. This results, at least partially, from neural adaptations at a cellular level. Previous genome-wide gene expression studies using pooled human brain samples showed that alcohol abuse causes widespread changes in the pattern of gene expression in the frontal and motor cortices of human brain. Because these studies used pooled samples, they could not determine variability between different individuals. In the present study, we profiled gene expression levels of 14 postmortem human brains (seven controls and seven alcoholic cases) using cDNA microarrays (46 448 clones per array). Both frontal cortex and motor cortex brain regions were studied. The list of genes differentially expressed confirms and extends previous studies of alcohol responsive genes. Genes identified as differentially expressed in two brain regions fell generally into similar functional groups, including metabolism, immune response, cell survival, cell communication, signal transduction and energy production. Importantly, hierarchical clustering of differentially expressed genes accurately distinguished between control and alcoholic cases, particularly in the frontal cortex.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the application of a new technique, rough clustering, to the problem of market segmentation. Rough clustering produces different solutions to k-means analysis because of the possibility of multiple cluster membership of objects. Traditional clustering methods generate extensional descriptions of groups, that show which objects are members of each cluster. Clustering techniques based on rough sets theory generate intensional descriptions, which outline the main characteristics of each cluster. In this study, a rough cluster analysis was conducted on a sample of 437 responses from a larger study of the relationship between shopping orientation (the general predisposition of consumers toward the act of shopping) and intention to purchase products via the Internet. The cluster analysis was based on five measures of shopping orientation: enjoyment, personalization, convenience, loyalty, and price. The rough clusters obtained provide interpretations of different shopping orientations present in the data without the restriction of attempting to fit each object into only one segment. Such descriptions can be an aid to marketers attempting to identify potential segments of consumers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aims of the project were twofold: 1) To investigate classification procedures for remotely sensed digital data, in order to develop modifications to existing algorithms and propose novel classification procedures; and 2) To investigate and develop algorithms for contextual enhancement of classified imagery in order to increase classification accuracy. The following classifiers were examined: box, decision tree, minimum distance, maximum likelihood. In addition to these the following algorithms were developed during the course of the research: deviant distance, look up table and an automated decision tree classifier using expert systems technology. Clustering techniques for unsupervised classification were also investigated. Contextual enhancements investigated were: mode filters, small area replacement and Wharton's CONAN algorithm. Additionally methods for noise and edge based declassification and contextual reclassification, non-probabilitic relaxation and relaxation based on Markov chain theory were developed. The advantages of per-field classifiers and Geographical Information Systems were investigated. The conclusions presented suggest suitable combinations of classifier and contextual enhancement, given user accuracy requirements and time constraints. These were then tested for validity using a different data set. A brief examination of the utility of the recommended contextual algorithms for reducing the effects of data noise was also carried out.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We investigated the diversity pattern of nine Swiss stone pine (Pinus cembra L.) populations along the Carpathian range including the High Tatras, by using six chloroplast DNA microsatellites (cpSSR). Our aim was to detect genetically distinct regions by clustering of populations, and to tackle possible historical colonization routes. Our analysis referred to an investigated geographical range with the two most distant populations situated at about 500 air km. We found that the most diverse populations are situated at the two edges of the investigated part, in the Retezat Mts. (South Carpathians) and the High Tatras, and diversity decreases towards the populations of the Eastern Carpathians. Hierarchical clustering and NMDS revealed that the populations of the South Carpathians with the Tatras form a distinct cluster, significantly separated from those of the Eastern Carpathians. Moreover, based on the most variable chloroplast microsatellites, the four populations of the two range edges are not significantly different. Our results, supported also by palynological and late glacial macrofossil evidences, indicate refugial territories within the Retezat Mts. that conserved rich haplotype composition. From this refugial territory Pinus cembra might have colonized the Eastern Carpathians, and this was accompanied by a gradual decrease in population diversity. Populations of the High Tatras might have had the same role in the colonizing events of the Carpathians, as positive correlation was detected among populations lying from each other at a distance of 280 km, the maximum distance between neighbouring populations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Az egyes nemzetek számviteli szabályozásának vizsgálatánál az adott ország sajátosságaiból eredően részben eltérő szabályozások alakultak ki. Az induktív megközelítésű vizsgálatok jellemzően a szabályozási kérdések széles körét fogják át, de csak néhány tényező mentén közelítve. A cash flow-kimutatások témakörénél a legtöbbször csak azt nézték, hogy van-e előírás a kimutatás elkészítésére, de a részletekkel már kevésbé foglalkoztak. Ebből adódóan e területen viszonylag kis különbséget mutattak ki ezek a felmérések. A szerző kutatása szerint a nemzeti cash flow-kimutatások szabályozásának részleteiben eltérések tapasztalhatók, és ezek alapján a nemzetek klaszterelemzéssel hierarchikusan csoportokba rendezhetők. _____ Research has found that as a result of their particularities, different countries have established partly different accounting frameworks. Studies with inductive approaches typically encompass a wide range of regulatory issues, but based on a limited number of factors only. In the case of Statements of Cash Flows, most studies have so far only examined the existence of rules governing the presentation of the statement, without an in-depth analysis of the details. Therefore, these studies only found relatively minor differences in this field. The author’s research shows that many differences exist in the details of national Cash Flow Statement regulations, which makes it possible to classify the countries in groups using the method of hierarchical clustering.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: (1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; (2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and (3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

South Florida’s watersheds have endured a century of urban and agricultural development and disruption of their hydrology. Spatial characterization of South Florida’s estuarine and coastal waters is important to Everglades’ restoration programs. We applied Factor Analysis and Hierarchical Clustering of water quality data in tandem to characterize and spatially subdivide South Florida’s coastal and estuarine waters. Segmentation rendered forty-four biogeochemically distinct water bodies whose spatial distribution is closely linked to geomorphology, circulation, benthic community pattern, and to water management. This segmentation has been adopted with minor changes by federal and state environmental agencies to derive numeric nutrient criteria.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation establishes a novel data-driven method to identify language network activation patterns in pediatric epilepsy through the use of the Principal Component Analysis (PCA) on functional magnetic resonance imaging (fMRI). A total of 122 subjects’ data sets from five different hospitals were included in the study through a web-based repository site designed here at FIU. Research was conducted to evaluate different classification and clustering techniques in identifying hidden activation patterns and their associations with meaningful clinical variables. The results were assessed through agreement analysis with the conventional methods of lateralization index (LI) and visual rating. What is unique in this approach is the new mechanism designed for projecting language network patterns in the PCA-based decisional space. Synthetic activation maps were randomly generated from real data sets to uniquely establish nonlinear decision functions (NDF) which are then used to classify any new fMRI activation map into typical or atypical. The best nonlinear classifier was obtained on a 4D space with a complexity (nonlinearity) degree of 7. Based on the significant association of language dominance and intensities with the top eigenvectors of the PCA decisional space, a new algorithm was deployed to delineate primary cluster members without intensity normalization. In this case, three distinct activations patterns (groups) were identified (averaged kappa with rating 0.65, with LI 0.76) and were characterized by the regions of: 1) the left inferior frontal Gyrus (IFG) and left superior temporal gyrus (STG), considered typical for the language task; 2) the IFG, left mesial frontal lobe, right cerebellum regions, representing a variant left dominant pattern by higher activation; and 3) the right homologues of the first pattern in Broca's and Wernicke's language areas. Interestingly, group 2 was found to reflect a different language compensation mechanism than reorganization. Its high intensity activation suggests a possible remote effect on the right hemisphere focus on traditionally left-lateralized functions. In retrospect, this data-driven method provides new insights into mechanisms for brain compensation/reorganization and neural plasticity in pediatric epilepsy.