856 resultados para Data Driven Clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this master’s thesis was to examine technology-based smart home devices and services. Topic was approached through basic theories, transaction cost theory and resource-based view in order to build basis for this thesis. Conceptual framework was discussed by means of networks, value networks and service systems which provide a useful framework for service development. The needs of the elderly living at home were discussed in order to find out which technology-based services could be used to satisfy the needs. Segmentation and need data collected previously during proactive home visits was exploited and additionally a survey targeted to experts and professionals of social and health care sector was done to verify the needs. Finally, the results of the survey were analyzed using quality function deployment method to figure out the most important and suitable service offerings for the elderly. As a conclusion of analysis, social media and monitoring services are the most useful technology-based services. However, traditional home services will still maintain their necessity too.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering soil and crop data can be used as a basis for the definition of management zones because the data are grouped into clusters based on the similar interaction of these variables. Therefore, the objective of this study was to identify management zones using fuzzy c-means clustering analysis based on the spatial and temporal variability of soil attributes and corn yield. The study site (18 by 250-m in size) was located in Jaboticabal, São Paulo/Brazil. Corn yield was measured in one hundred 4.5 by 10-m cells along four parallel transects (25 observations per transect) over five growing seasons between 2001 and 2010. Soil chemical and physical attributes were measured. SAS procedure MIXED was used to identify which variable(s) most influenced the spatial variability of corn yield over the five study years. Basis saturation (BS) was the variable that better related to corn yield, thus, semivariograms models were fitted for BS and corn yield and then, data values were krigged. Management Zone Analyst software was used to carry out the fuzzy c-means clustering algorithm. The optimum number of management zones can change over time, as well as the degree of agreement between the BS and corn yield management zone maps. Thus, it is very important take into account the temporal variability of crop yield and soil attributes to delineate management zones accurately.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The pumping processes requiring wide range of flow are often equipped with parallelconnected centrifugal pumps. In parallel pumping systems, the use of variable speed control allows that the required output for the process can be delivered with a varying number of operated pump units and selected rotational speed references. However, the optimization of the parallel-connected rotational speed controlled pump units often requires adaptive modelling of both parallel pump characteristics and the surrounding system in varying operation conditions. The available information required for the system modelling in typical parallel pumping applications such as waste water treatment and various cooling and water delivery pumping tasks can be limited, and the lack of real-time operation point monitoring often sets limits for accurate energy efficiency optimization. Hence, alternatives for easily implementable control strategies which can be adopted with minimum system data are necessary. This doctoral thesis concentrates on the methods that allow the energy efficient use of variable speed controlled parallel pumps in system scenarios in which the parallel pump units consist of a centrifugal pump, an electric motor, and a frequency converter. Firstly, the suitable operation conditions for variable speed controlled parallel pumps are studied. Secondly, methods for determining the output of each parallel pump unit using characteristic curve-based operation point estimation with frequency converter are discussed. Thirdly, the implementation of the control strategy based on real-time pump operation point estimation and sub-optimization of each parallel pump unit is studied. The findings of the thesis support the idea that the energy efficiency of the pumping can be increased without the installation of new, more efficient components in the systems by simply adopting suitable control strategies. An easily implementable and adaptive control strategy for variable speed controlled parallel pumping systems can be created by utilizing the pump operation point estimation available in modern frequency converters. Hence, additional real-time flow metering, start-up measurements, and detailed system model are unnecessary, and the pumping task can be fulfilled by determining a speed reference for each parallel-pump unit which suggests the energy efficient operation of the pumping system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this thesis is to find out whether all the peer to peer lenders are unworthy of credit and also if there are single qualities or combinations of qualities that determine the probability of default of a person or group of people. Distinguishing qualities are searched with self-organizing maps (SOM). Qualities and groups of people found by the self-organizing map are then compared to the average. The comparison is carried out by looking how big proportion of borrowers meeting the criteria is two months or more behind with their payments. Research data used is collected by an Estonian peer to peer lending company during the years of 2011-2014. Data consists of peer to peer borrowers and information gathered from them.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Previous reports from our group have demonstrated the association of molecular mimicry between cardiac myosin and the immunodominant Trypanosoma cruzi protein B13 with chronic Chagas' disease cardiomyopathy at both the antibody and heart-infiltrating T cell level. At the peripheral blood level, we observed no difference in primary proliferative responses to T. cruzi B13 protein between chronic Chagas' cardiopathy patients, asymptomatic chagasics and normal individuals. In the present study, we investigated whether T cells sensitized by T. cruzi B13 protein respond to cardiac myosin. T cell clones generated from a B13-stimulated T cell line obtained from peripheral blood of a B13-responsive normal donor were tested for proliferation against B13 protein and human cardiac myosin. The results showed that one clone responded to B13 protein alone and the clone FA46, displaying the highest stimulation index to B13 protein (SI = 25.7), also recognized cardiac myosin. These data show that B13 and cardiac myosin share epitopes at the T cell level and that sensitization of a T cell with B13 protein results in response to cardiac myosin. It can be hypothesized that this also occurs in vivo during T. cruzi infection which results in heart tissue damage in chronic Chagas' disease cardiomyopathy

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Previous genetic association studies have overlooked the potential for biased results when analyzing different population structures in ethnically diverse populations. The purpose of the present study was to quantify this bias in two-locus association studies conducted on an admixtured urban population. We studied the genetic structure distribution of angiotensin-converting enzyme insertion/deletion (ACE I/D) and angiotensinogen methionine/threonine (M/T) polymorphisms in 382 subjects from three subgroups in a highly admixtured urban population. Group I included 150 white subjects; group II, 142 mulatto subjects, and group III, 90 black subjects. We conducted sample size simulation studies using these data in different genetic models of gene action and interaction and used genetic distance calculation algorithms to help determine the population structure for the studied loci. Our results showed a statistically different population structure distribution of both ACE I/D (P = 0.02, OR = 1.56, 95% CI = 1.05-2.33 for the D allele, white versus black subgroup) and angiotensinogen M/T polymorphism (P = 0.007, OR = 1.71, 95% CI = 1.14-2.58 for the T allele, white versus black subgroup). Different sample sizes are predicted to be determinant of the power to detect a given genotypic association with a particular phenotype when conducting two-locus association studies in admixtured populations. In addition, the postulated genetic model is also a major determinant of the power to detect any association in a given sample size. The present simulation study helped to demonstrate the complex interrelation among ethnicity, power of the association, and the postulated genetic model of action of a particular allele in the context of clustering studies. This information is essential for the correct planning and interpretation of future association studies conducted on this population.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advancements in information technology have made it possible for organizations to gather and store vast amounts of data of their customers. Information stored in databases can be highly valuable for organizations. However, analyzing large databases has proven to be difficult in practice. For companies in the retail industry, customer intelligence can be used to identify profitable customers, their characteristics, and behavior. By clustering customers into homogeneous groups, companies can more effectively manage their customer base and target profitable customer segments. This thesis will study the use of the self-organizing map (SOM) as a method for analyzing large customer datasets, clustering customers, and discovering information about customer behavior. Aim of the thesis is to find out whether the SOM could be a practical tool for retail companies to analyze their customer data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goal of most clustering algorithms is to find the optimal number of clusters (i.e. fewest number of clusters). However, analysis of molecular conformations of biological macromolecules obtained from computer simulations may benefit from a larger array of clusters. The Self-Organizing Map (SOM) clustering method has the advantage of generating large numbers of clusters, but often gives ambiguous results. In this work, SOMs have been shown to be reproducible when the same conformational dataset is independently clustered multiple times (~100), with the help of the Cramérs V-index (C_v). The ability of C_v to determine which SOMs are reproduced is generalizable across different SOM source codes. The conformational ensembles produced from MD (molecular dynamics) and REMD (replica exchange molecular dynamics) simulations of the penta peptide Met-enkephalin (MET) and the 34 amino acid protein human Parathyroid Hormone (hPTH) were used to evaluate SOM reproducibility. The training length for the SOM has a huge impact on the reproducibility. Analysis of MET conformational data definitively determined that toroidal SOMs cluster data better than bordered maps due to the fact that toroidal maps do not have an edge effect. For the source code from MATLAB, it was determined that the learning rate function should be LINEAR with an initial learning rate factor of 0.05 and the SOM should be trained by a sequential algorithm. The trained SOMs can be used as a supervised classification for another dataset. The toroidal 10×10 hexagonal SOMs produced from the MATLAB program for hPTH conformational data produced three sets of reproducible clusters (27%, 15%, and 13% of 100 independent runs) which find similar partitionings to those of smaller 6×6 SOMs. The χ^2 values produced as part of the C_v calculation were used to locate clusters with identical conformational memberships on independently trained SOMs, even those with different dimensions. The χ^2 values could relate the different SOM partitionings to each other.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cette thèse a pour but d’améliorer l’automatisation dans l’ingénierie dirigée par les modèles (MDE pour Model Driven Engineering). MDE est un paradigme qui promet de réduire la complexité du logiciel par l’utilisation intensive de modèles et des transformations automatiques entre modèles (TM). D’une façon simplifiée, dans la vision du MDE, les spécialistes utilisent plusieurs modèles pour représenter un logiciel, et ils produisent le code source en transformant automatiquement ces modèles. Conséquemment, l’automatisation est un facteur clé et un principe fondateur de MDE. En plus des TM, d’autres activités ont besoin d’automatisation, e.g. la définition des langages de modélisation et la migration de logiciels. Dans ce contexte, la contribution principale de cette thèse est de proposer une approche générale pour améliorer l’automatisation du MDE. Notre approche est basée sur la recherche méta-heuristique guidée par les exemples. Nous appliquons cette approche sur deux problèmes importants de MDE, (1) la transformation des modèles et (2) la définition précise de langages de modélisation. Pour le premier problème, nous distinguons entre la transformation dans le contexte de la migration et les transformations générales entre modèles. Dans le cas de la migration, nous proposons une méthode de regroupement logiciel (Software Clustering) basée sur une méta-heuristique guidée par des exemples de regroupement. De la même façon, pour les transformations générales, nous apprenons des transformations entre modèles en utilisant un algorithme de programmation génétique qui s’inspire des exemples des transformations passées. Pour la définition précise de langages de modélisation, nous proposons une méthode basée sur une recherche méta-heuristique, qui dérive des règles de bonne formation pour les méta-modèles, avec l’objectif de bien discriminer entre modèles valides et invalides. Les études empiriques que nous avons menées, montrent que les approches proposées obtiennent des bons résultats tant quantitatifs que qualitatifs. Ceux-ci nous permettent de conclure que l’amélioration de l’automatisation du MDE en utilisant des méthodes de recherche méta-heuristique et des exemples peut contribuer à l’adoption plus large de MDE dans l’industrie à là venir.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An Overview of known spatial clustering algorithms The space of interest can be the two-dimensional abstraction of the surface of the earth or a man-made space like the layout of a VLSI design, a volume containing a model of the human brain, or another 3d-space representing the arrangement of chains of protein molecules. The data consists of geometric information and can be either discrete or continuous. The explicit location and extension of spatial objects define implicit relations of spatial neighborhood (such as topological, distance and direction relations) which are used by spatial data mining algorithms. Therefore, spatial data mining algorithms are required for spatial characterization and spatial trend analysis. Spatial data mining or knowledge discovery in spatial databases differs from regular data mining in analogous with the differences between non-spatial data and spatial data. The attributes of a spatial object stored in a database may be affected by the attributes of the spatial neighbors of that object. In addition, spatial location, and implicit information about the location of an object, may be exactly the information that can be extracted through spatial data mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biclustering is simultaneous clustering of both rows and columns of a data matrix. A measure called Mean Squared Residue (MSR) is used to simultaneously evaluate the coherence of rows and columns within a submatrix. In this paper a novel algorithm is developed for biclustering gene expression data using the newly introduced concept of MSR difference threshold. In the first step high quality bicluster seeds are generated using K-Means clustering algorithm. Then more genes and conditions (node) are added to the bicluster. Before adding a node the MSR X of the bicluster is calculated. After adding the node again the MSR Y is calculated. The added node is deleted if Y minus X is greater than MSR difference threshold or if Y is greater than MSR threshold which depends on the dataset. The MSR difference threshold is different for gene list and condition list and it depends on the dataset also. Proper values should be identified through experimentation in order to obtain biclusters of high quality. The results obtained on bench mark dataset clearly indicate that this algorithm is better than many of the existing biclustering algorithms

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A spectral angle based feature extraction method, Spectral Clustering Independent Component Analysis (SC-ICA), is proposed in this work to improve the brain tissue classification from Magnetic Resonance Images (MRI). SC-ICA provides equal priority to global and local features; thereby it tries to resolve the inefficiency of conventional approaches in abnormal tissue extraction. First, input multispectral MRI is divided into different clusters by a spectral distance based clustering. Then, Independent Component Analysis (ICA) is applied on the clustered data, in conjunction with Support Vector Machines (SVM) for brain tissue analysis. Normal and abnormal datasets, consisting of real and synthetic T1-weighted, T2-weighted and proton density/fluid-attenuated inversion recovery images, were used to evaluate the performance of the new method. Comparative analysis with ICA based SVM and other conventional classifiers established the stability and efficiency of SC-ICA based classification, especially in reproduction of small abnormalities. Clinical abnormal case analysis demonstrated it through the highest Tanimoto Index/accuracy values, 0.75/98.8%, observed against ICA based SVM results, 0.17/96.1%, for reproduced lesions. Experimental results recommend the proposed method as a promising approach in clinical and pathological studies of brain diseases

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many recent Web 2.0 resource sharing applications can be subsumed under the "folksonomy" moniker. Regardless of the type of resource shared, all of these share a common structure describing the assignment of tags to resources by users. In this report, we generalize the notions of clustering and characteristic path length which play a major role in the current research on networks, where they are used to describe the small-world effects on many observable network datasets. To that end, we show that the notion of clustering has two facets which are not equivalent in the generalized setting. The new measures are evaluated on two large-scale folksonomy datasets from resource sharing systems on the web.