979 resultados para clustering techniques
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .
Resumo:
In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
In this work a method for building multiple-model structures is presented. A clustering algorithm that uses data from the system is employed to define the architecture of the multiple-model, including the size of the region covered by each model, and the number of models. A heating ventilation and air conditioning system is used as a testbed of the proposed method.
Resumo:
In this work a method for building multiple-model structures is presented. A clustering algorithm that uses data from the system is employed to define the architecture of the multiple-model, including the size of the region covered by each model, and the number of models. A heating ventilation and air conditioning system is used as a testbed of the proposed method.
Resumo:
One objective of the feeder reconfiguration problem in distribution systems is to minimize the power losses for a specific load. For this problem, mathematical modeling is a nonlinear mixed integer problem that is generally hard to solve. This paper proposes an algorithm based on artificial neural network theory. In this context, clustering techniques to determine the best training set for a single neural network with generalization ability are also presented. The proposed methodology was employed for solving two electrical systems and presented good results. Moreover, the methodology can be employed for large-scale systems in real-time environment.
Resumo:
The area of Human-Machine Interface is growing fast due to its high importance in all technological systems. The basic idea behind designing human-machine interfaces is to enrich the communication with the technology in a natural and easy way. Gesture interfaces are a good example of transparent interfaces. Such interfaces must identify properly the action the user wants to perform, so the proper gesture recognition is of the highest importance. However, most of the systems based on gesture recognition use complex methods requiring high-resource devices. In this work, we propose to model gestures capturing their temporal properties, which significantly reduce storage requirements, and use clustering techniques, namely self-organizing maps and unsupervised genetic algorithm, for their classification. We further propose to train a certain number of algorithms with different parameters and combine their decision using majority voting in order to decrease the false positive rate. The main advantage of the approach is its simplicity, which enables the implementation using devices with limited resources, and therefore low cost. The testing results demonstrate its high potential.
Resumo:
The study of the effectiveness of the cognitive rehabilitation processes and the identification of cognitive profiles, in order to define comparable populations, is a controversial area, but concurrently it is strongly needed in order to improve therapies. There is limited evidence about cognitive rehabilitation efficacy. Many of the trials conclude that in spite of an apparent clinical good response, differences do not show statistical significance. The common feature in all these trials is heterogeneity among populations. In this situation, observational studies on very well controlled cohort of studies, together with innovative methods in knowledge extraction, could provide methodological insights for the design of more accurate comparative trials. Some correlation studies between neuropsychological tests and patients capacities have been carried out -1---2- and also correlation between tests and morphological changes in the brain -3-. The procedures efficacy depends on three main factors: the affectation profile, the scheduled tasks and the execution results. The relationship between them makes up the cognitive rehabilitation as a discipline, but its structure is not properly defined. In this work we present a clustering method used in Neuro Personal Trainer (NPT) to group patients into cognitive profiles using data mining techniques. The system uses these clusters to personalize treatments, using the patients assigned cluster to select which tasks are more suitable for its concrete needs, by comparing the results obtained in the past by patients with the same profile.
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
A definition of medium voltage (MV) load diagrams was made, based on the data base knowledge discovery process. Clustering techniques were used as support for the agents of the electric power retail markets to obtain specific knowledge of their customers’ consumption habits. Each customer class resulting from the clustering operation is represented by its load diagram. The Two-step clustering algorithm and the WEACS approach based on evidence accumulation (EAC) were applied to an electricity consumption data from a utility client’s database in order to form the customer’s classes and to find a set of representative consumption patterns. The WEACS approach is a clustering ensemble combination approach that uses subsampling and that weights differently the partitions in the co-association matrix. As a complementary step to the WEACS approach, all the final data partitions produced by the different variations of the method are combined and the Ward Link algorithm is used to obtain the final data partition. Experiment results showed that WEACS approach led to better accuracy than many other clustering approaches. In this paper the WEACS approach separates better the customer’s population than Two-step clustering algorithm.
Resumo:
Dissertação para obtenção do Grau de Mestre em Logica Computicional
Resumo:
When a pregnant woman is guided to a hospital for obstetrics purposes, many outcomes are possible, depending on her current conditions. An improved understanding of these conditions could provide a more direct medical approach by categorizing the different types of patients, enabling a faster response to risk situations, and therefore increasing the quality of services. In this case study, the characteristics of the patients admitted in the maternity care unit of Centro Hospitalar of Porto are acknowledged, allowing categorizing the patient women through clustering techniques. The main goal is to predict the patients’ route through the maternity care, adapting the services according to their conditions, providing the best clinical decisions and a cost-effective treatment to patients. The models developed presented very interesting results, being the best clustering evaluation index: 0.65. The evaluation of the clustering algorithms proved the viability of using clustering based data mining models to characterize pregnant patients, identifying which conditions can be used as an alert to prevent the occurrence of medical complications.
Resumo:
In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.
Resumo:
Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices