959 resultados para Clustering methods
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.
Resumo:
Purpose: To evaluate the influence of cross-sectional arc calcification on the diagnostic accuracy of computed tomography (CT) angiography compared with conventional coronary angiography for the detection of obstructive coronary artery disease (CAD). Materials and Methods: Institutional Review Board approval and written informed consent were obtained from all centers and participants for this HIPAA-compliant study. Overall, 4511 segments from 371 symptomatic patients (279 men, 92 women; median age, 61 years [interquartile range, 53-67 years]) with clinical suspicion of CAD from the CORE-64 multi-center study were included in the analysis. Two independent blinded observers evaluated the percentage of diameter stenosis and the circumferential extent of calcium (arc calcium). The accuracy of quantitative multidetector CT angiography to depict substantial (>50%) stenoses was assessed by using quantitative coronary angiography (QCA). Cross-sectional arc calcium was rated on a segment level as follows: noncalcified or mild (<90 degrees), moderate (90 degrees-180 degrees), or severe (>180 degrees) calcification. Univariable and multivariable logistic regression, receiver operation characteristic curve, and clustering methods were used for statistical analyses. Results: A total of 1099 segments had mild calcification, 503 had moderate calcification, 338 had severe calcification, and 2571 segments were noncalcified. Calcified segments were highly associated (P < .001) with disagreement between CTA and QCA in multivariable analysis after controlling for sex, age, heart rate, and image quality. The prevalence of CAD was 5.4% in noncalcified segments, 15.0% in mildly calcified segments, 27.0% in moderately calcified segments, and 43.0% in severely calcified segments. A significant difference was found in area under the receiver operating characteristic curves (noncalcified: 0.86, mildly calcified: 0.85, moderately calcified: 0.82, severely calcified: 0.81; P < .05). Conclusion: In a symptomatic patient population, segment-based coronary artery calcification significantly decreased agreement between multidetector CT angiography and QCA to detect a coronary stenosis of at least 50%.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.
Resumo:
8th International Conference of Education, Research and Innovation. 18-20 November, 2015, Seville, Spain.
Resumo:
Com a crescente geração, armazenamento e disseminação da informação nos últimos anos, o anterior problema de falta de informação transformou-se num problema de extracção do conhecimento útil a partir da informação disponível. As representações visuais da informação abstracta têm sido utilizadas para auxiliar a interpretação os dados e para revelar padrões de outra forma escondidos. A visualização de informação procura aumentar a cognição humana aproveitando as capacidades visuais humanas, de forma a tornar perceptível a informação abstracta, fornecendo os meios necessários para que um humano possa absorver quantidades crescentes de informação, com as suas capacidades de percepção. O objectivo das técnicas de agrupamento de dados consiste na divisão de um conjunto de dados em vários grupos, em que dados semelhantes são colocados no mesmo grupo e dados dissemelhantes em grupos diferentes. Mais especificamente, o agrupamento de dados com restrições tem o intuito de incorporar conhecimento a priori no processo de agrupamento de dados, com o objectivo de aumentar a qualidade do agrupamento de dados e, simultaneamente, encontrar soluções apropriadas a tarefas e interesses específicos. Nesta dissertação é estudado a abordagem de Agrupamento de Dados Visual Interactivo que permite ao utilizador, através da interacção com uma representação visual da informação, incorporar o seu conhecimento prévio acerca do domínio de dados, de forma a influenciar o agrupamento resultante para satisfazer os seus objectivos. Esta abordagem combina e estende técnicas de visualização interactiva de informação, desenho de grafos de forças direccionadas e agrupamento de dados com restrições. Com o propósito de avaliar o desempenho de diferentes estratégias de interacção com o utilizador, são efectuados estudos comparativos utilizando conjuntos de dados sintéticos e reais.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Ciência e Sistemas de Informação Geográfica
Resumo:
Chagas disease is a chronic, tropical, parasitic disease, endemic throughout Latin America. The large-scale migration of populations has increased the geographic distribution of the disease and cases have been observed in many other countries around the world. To strengthen the critical mass of knowledge generated in different countries, it is essential to promote cooperative and translational research initiatives. We analyzed authorship of scientific documents on Chagas disease indexed in the Medline database from 1940 to 2009. Bibliometrics was used to analyze the evolution of collaboration patterns. A Social Network Analysis was carried out to identify the main research groups in the area by applying clustering methods. We then analyzed 13,989 papers produced by 21,350 authors. Collaboration among authors dramatically increased over the study period, reaching an average of 6.2 authors per paper in the last five-year period. Applying a threshold of collaboration of five or more papers signed in co-authorship, we identified 148 consolidated research groups made up of 1,750 authors. The Chagas disease network identified constitutes a "small world," characterized by a high degree of clustering and a notably high number of Brazilian researchers.
Resumo:
The use of the Internet now has a specific purpose: to find information. Unfortunately, the amount of data available on the Internet is growing exponentially, creating what can be considered a nearly infinite and ever-evolving network with no discernable structure. This rapid growth has raised the question of how to find the most relevant information. Many different techniques have been introduced to address the information overload, including search engines, Semantic Web, and recommender systems, among others. Recommender systems are computer-based techniques that are used to reduce information overload and recommend products likely to interest a user when given some information about the user's profile. This technique is mainly used in e-Commerce to suggest items that fit a customer's purchasing tendencies. The use of recommender systems for e-Government is a research topic that is intended to improve the interaction among public administrations, citizens, and the private sector through reducing information overload on e-Government services. More specifically, e-Democracy aims to increase citizens' participation in democratic processes through the use of information and communication technologies. In this chapter, an architecture of a recommender system that uses fuzzy clustering methods for e-Elections is introduced. In addition, a comparison with the smartvote system, a Web-based Voting Assistance Application (VAA) used to aid voters in finding the party or candidate that is most in line with their preferences, is presented.
Resumo:
Hi ha diversos mètodes d'anàlisi que duen a terme una agrupació global de la sèries de mostres de microarrays, com SelfOrganizing Maps, o que realitzen agrupaments locals tenint en compte només un subconjunt de gens coexpressats, com Biclustering, entre d'altres. En aquest projecte s'ha desenvolupat una aplicació web: el PCOPSamplecl, és una eina que pertany als mètodes d'agrupació (clustering) local, que no busca subconjunts de gens coexpresats (anàlisi de relacions linials), si no parelles de gens que davant canvis fenotípics, la seva relació d'expressió pateix fluctuacions. El resultats del PCOPSamplecl seràn les diferents distribucions finals de clusters i les parelles de gens involucrades en aquests canvis fenotípics. Aquestes parelles de gens podràn ser estudiades per trobar la causa i efecte del canvi fenotípic. A més, l'eina facilita l'estudi de les dependències entre les diferents distribucions de clusters que proporciona l'aplicació per poder estudiar la intersecció entre clusters o l'aparició de subclusters (2 clusters d'una mateixa agrupació de clusters poden ser subclusters d'altres clusters de diferents distribucions de clusters). L'eina és disponible al servidor: http://revolutionresearch.uab.es/
Resumo:
A fundamental question in developmental biology is how tissues are patterned to give rise to differentiated body structures with distinct morphologies. The Drosophila wing disc offers an accessible model to understand epithelial spatial patterning. It has been studied extensively using genetic and molecular approaches. Bristle patterns on the thorax, which arise from the medial part of the wing disc, are a classical model of pattern formation, dependent on a pre-pattern of trans-activators and –repressors. Despite of decades of molecular studies, we still only know a subset of the factors that determine the pre-pattern. We are applying a novel and interdisciplinary approach to predict regulatory interactions in this system. It is based on the description of expression patterns by simple logical relations (addition, subtraction, intersection and union) between simple shapes (graphical primitives). Similarities and relations between primitives have been shown to be predictive of regulatory relationships between the corresponding regulatory factors in other Systems, such as the Drosophila egg. Furthermore, they provide the basis for dynamical models of the bristle-patterning network, which enable us to make even more detailed predictions on gene regulation and expression dynamics. We have obtained a data-set of wing disc expression patterns which we are now processing to obtain average expression patterns for each gene. Through triangulation of the images we can transform the expression patterns into vectors which can easily be analysed by Standard clustering methods. These analyses will allow us to identify primitives and regulatory interactions. We expect to identify new regulatory interactions and to understand the basic Dynamics of the regulatory network responsible for thorax patterning. These results will provide us with a better understanding of the rules governing gene regulatory networks in general, and provide the basis for future studies of the evolution of the thorax-patterning network in particular.
Resumo:
Tire traces can be observed on several crime scenes as vehicles are often used by criminals. The tread abrasion on the road, while braking or skidding, leads to the production of small rubber particles which can be collected for comparison purposes. This research focused on the statistical comparison of Py-GC/MS profiles of tire traces and tire treads. The optimisation of the analytical method was carried out using experimental designs. The aim was to determine the best pyrolysis parameters regarding the repeatability of the results. Thus, the pyrolysis factor effect could also be calculated. The pyrolysis temperature was found to be five time more important than time. Finally, a pyrolysis at 650 °C during 15 s was selected. Ten tires of different manufacturers and models were used for this study. Several samples were collected on each tire, and several replicates were carried out to study the variability within each tire (intravariability). More than eighty compounds were integrated for each analysis and the variability study showed that more than 75% presented a relative standard deviation (RSD) below 5% for the ten tires, thus supporting a low intravariability. The variability between the ten tires (intervariability) presented higher values and the ten most variant compounds had a RSD value above 13%, supporting their high potential of discrimination between the tires tested. Principal Component Analysis (PCA) was able to fully discriminate the ten tires with the help of the first three principal components. The ten tires were finally used to perform braking tests on a racetrack with a vehicle equipped with an anti-lock braking system. The resulting tire traces were adequately collected using sheets of white gelatine. As for tires, the intravariability for the traces was found to be lower than the intervariability. Clustering methods were carried out and the Ward's method based on the squared Euclidean distance was able to correctly group all of the tire traces replicates in the same cluster than the replicates of their corresponding tire. Blind tests on traces were performed and were correctly assigned to their tire source. These results support the hypothesis that the tested tires, of different manufacturers and models, can be discriminated by a statistical comparison of their chemical profiles. The traces were found to be not differentiable from their source but differentiable from all the other tires present in the subset. The results are promising and will be extended on a larger sample set.
Resumo:
The use by police services and inquiring agencies of forensic data in an intelligence perspective is still fragmentary and to some extent ignored. In order to increase the efficiency of criminal investigation to target illegal drug trafficking organisations and to provide valuable information about their methods, it is necessary to include and interpret objective drug analysis results already during the investigation phase. The value of visual, physical and chemical data of seized ecstasy tablets, as a support for criminal investigation on a strategic and tactical level has been investigated. In a first phase different characteristics of ecstasy tablets have been studied in order to define their relevance, variation, correlation and discriminating power in an intelligence perspective. During 5 years, over 1200 cases of ecstasy seizures (concerning about 150000 seized tablets) coming from different regions of Switzerland (City and Canton of Zurich, Cantons Ticino, Neuchâtel and Geneva) have been systematically recorded. This turned out to be a statistically representative database including large and small cases. During the second phase various comparison and clustering methods have been tested and evaluated, on the type and relevance of tablet characteristics, thus increasing knowledge about synthetic drugs, their manufacturing and trafficking. Finally analytical methodologies have been investigated and formalised, applying traditional intelligence methods. In this context classical tools, which are used in criminal analysis (like the I2 Analyst Notebook, I2 Ibase, ?) have been tested and adapted to address the specific need of forensic drug intelligence. The interpretation of these links provides valuable information about criminal organisations and their trafficking methods. In the final part of this thesis practical examples illustrate the use and value of such information.