814 resultados para Hierarchical clustering model
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
This paper presents a hierarchical clustering method for semantic Web service discovery. This method aims to improve the accuracy and efficiency of the traditional service discovery using vector space model. The Web service is converted into a standard vector format through the Web service description document. With the help of WordNet, a semantic analysis is conducted to reduce the dimension of the term vector and to make semantic expansion to meet the user’s service request. The process and algorithm of hierarchical clustering based semantic Web service discovery is discussed. Validation is carried out on the dataset.
Resumo:
Research in construction innovation highlights construction industry as having many barriers and resistance to innovations and suggests that it needs champions. A hierarchical structural model is presented, to assess the impact of the role of the project manager (PM) on the levels of innovation and project performance. The model adopts the structural equation modelling technique and uses the survey data collected from PMs and project team members working for general contractors in Singapore. The model fits well to the observed data, accounting for 24%, 37% and 49% of the variance in championing behaviour, the level of innovation and project performance, respectively. The results of this study show the importance of the championing role of PMs in construction innovation. However, in order to increase their effectiveness, such a role should be complemented by their competency and professionalism, tactical use of influence tactics, and decision authority. Moreover, senior management should provide adequate resources and a sustained support to innovation and create a conducive environment or organizational culture that nurtures and facilitates the PM’s role in the construction project as a champion of innovation.
Resumo:
The mechanisms of force generation and transference via microfilament networks are crucial to the understandings of mechanobiology of cellular processes in living cells. However, there exists an enormous challenge for all-atom physics simulation of real size microfilament networks due to scale limitation of molecular simulation techniques. Following biophysical investigations of constitutive relations between adjacent globular actin monomers on filamentous actin, a hierarchical multiscale model was developed to investigate the biomechanical properties of microfilament networks. This model was validated by previous experimental studies of axial tension and transverse vibration of single F-actin. The biomechanics of microfilament networks can be investigated at the scale of real eukaryotic cell size (10 μm). This multiscale approach provides a powerful modeling tool which can contribute to the understandings of actin-related cellular processes in living cells.
Resumo:
Genetic correlation (rg) analysis determines how much of the correlation between two measures is due to common genetic influences. In an analysis of 4 Tesla diffusion tensor images (DTI) from 531 healthy young adult twins and their siblings, we generalized the concept of genetic correlation to determine common genetic influences on white matter integrity, measured by fractional anisotropy (FA), at all points of the brain, yielding an NxN genetic correlation matrix rg(x,y) between FA values at all pairs of voxels in the brain. With hierarchical clustering, we identified brain regions with relatively homogeneous genetic determinants, to boost the power to identify causal single nucleotide polymorphisms (SNP). We applied genome-wide association (GWA) to assess associations between 529,497 SNPs and FA in clusters defined by hubs of the clustered genetic correlation matrix. We identified a network of genes, with a scale-free topology, that influences white matter integrity over multiple brain regions.
Resumo:
Background Different from other indicators of cardiac function, such as ejection fraction and transmitral early diastolic velocity, myocardial strain is promising to capture subtle alterations that result from early diseases of the myocardium. In order to extract the left ventricle (LV) myocardial strain and strain rate from cardiac cine-MRI, a modified hierarchical transformation model was proposed. Methods A hierarchical transformation model including the global and local LV deformations was employed to analyze the strain and strain rate of the left ventricle by cine-MRI image registration. The endocardial and epicardial contour information was introduced to enhance the registration accuracy by combining the original hierarchical algorithm with an Iterative Closest Points using Invariant Features algorithm. The hierarchical model was validated by a normal volunteer first and then applied to two clinical cases (i.e., the normal volunteer and a diabetic patient) to evaluate their respective function. Results Based on the two clinical cases, by comparing the displacement fields of two selected landmarks in the normal volunteer, the proposed method showed a better performance than the original or unmodified model. Meanwhile, the comparison of the radial strain between the volunteer and patient demonstrated their apparent functional difference. Conclusions The present method could be used to estimate the LV myocardial strain and strain rate during a cardiac cycle and thus to quantify the analysis of the LV motion function.
Resumo:
This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets.
Resumo:
This paper presents hierarchical clustering algorithms for land cover mapping problem using multi-spectral satellite images. In unsupervised techniques, the automatic generation of number of clusters and its centers for a huge database is not exploited to their full potential. Hence, a hierarchical clustering algorithm that uses splitting and merging techniques is proposed. Initially, the splitting method is used to search for the best possible number of clusters and its centers using Mean Shift Clustering (MSC), Niche Particle Swarm Optimization (NPSO) and Glowworm Swarm Optimization (GSO). Using these clusters and its centers, the merging method is used to group the data points based on a parametric method (k-means algorithm). A performance comparison of the proposed hierarchical clustering algorithms (MSC, NPSO and GSO) is presented using two typical multi-spectral satellite images - Landsat 7 thematic mapper and QuickBird. From the results obtained, we conclude that the proposed GSO based hierarchical clustering algorithm is more accurate and robust.
Resumo:
This paper presents an improved hierarchical clustering algorithm for land cover mapping problem using quasi-random distribution. Initially, Niche Particle Swarm Optimization (NPSO) with pseudo/quasi-random distribution is used for splitting the data into number of cluster centers by satisfying Bayesian Information Criteria (BIC). Themain objective is to search and locate the best possible number of cluster and its centers. NPSO which highly depends on the initial distribution of particles in search space is not been exploited to its full potential. In this study, we have compared more uniformly distributed quasi-random with pseudo-random distribution with NPSO for splitting data set. Here to generate quasi-random distribution, Faure method has been used. Performance of previously proposed methods namely K-means, Mean Shift Clustering (MSC) and NPSO with pseudo-random is compared with the proposed approach - NPSO with quasi distribution(Faure). These algorithms are used on synthetic data set and multi-spectral satellite image (Landsat 7 thematic mapper). From the result obtained we conclude that use of quasi-random sequence with NPSO for hierarchical clustering algorithm results in a more accurate data classification.
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.