935 resultados para clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many recent Web 2.0 resource sharing applications can be subsumed under the "folksonomy" moniker. Regardless of the type of resource shared, all of these share a common structure describing the assignment of tags to resources by users. In this report, we generalize the notions of clustering and characteristic path length which play a major role in the current research on networks, where they are used to describe the small-world effects on many observable network datasets. To that end, we show that the notion of clustering has two facets which are not equivalent in the generalized setting. The new measures are evaluated on two large-scale folksonomy datasets from resource sharing systems on the web.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, research projects such as PADLR and SWAP have developed tools like Edutella or Bibster, which are targeted at establishing peer-to-peer knowledge management (P2PKM) systems. In such a system, it is necessary to obtain provide brief semantic descriptions of peers, so that routing algorithms or matchmaking processes can make decisions about which communities peers should belong to, or to which peers a given query should be forwarded. This paper proposes the use of graph clustering techniques on knowledge bases for that purpose. Using this clustering, we can show that our strategy requires up to 58% fewer queries than the baselines to yield full recall in a bibliographic P2PKM scenario.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Estudi, disseny i implementació de diferents tècniques d’agrupament de fibres (clustering) per tal d’integrar a la plataforma DTIWeb diferents algorismes de clustering i tècniques de visualització de clústers de fibres de forma que faciliti la interpretació de dades de DTI als especialistes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the major problems in machine vision is the segmentation of images of natural scenes. This paper presents a new proposal for the image segmentation problem which has been based on the integration of edge and region information. The main contours of the scene are detected and used to guide the posterior region growing process. The algorithm places a number of seeds at both sides of a contour allowing stating a set of concurrent growing processes. A previous analysis of the seeds permits to adjust the homogeneity criterion to the regions's characteristics. A new homogeneity criterion based on clustering analysis and convex hull construction is proposed

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A finales de 2009 se emprendió un nuevo modelo de segmentación de mercados por conglomeraciones o clústers, con el cual se busca atender las necesidades de los clientes, advirtiendo el ciclo de vida en el cual se encuentran, realizando estrategias que mejoren la rentabilidad del negocio, por medio de indicadores de gestión KPI. Por medio de análisis tecnológico se desarrolló el proceso de inteligencia de la segmentación, por medio del cual se obtuvo el resultado de clústers, que poseían características similares entre sí, pero que diferían de los otros, en variables de comportamiento. Esto se refleja en el desarrollo de campañas estratégicas dirigidas que permitan crear una estrecha relación de fidelidad con el cliente, para aumentar la rentabilidad, en principio, y fortalecer la relación a largo plazo, respondiendo a la razón de ser del negocio

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The article examines the structure of the collaboration networks of research groups where Slovenian and Spanish PhD students are pursuing their doctorate. The units of analysis are student-supervisor dyads. We use duocentred networks, a novel network structure appropriate for networks which are centred around a dyad. A cluster analysis reveals three typical clusters of research groups. Those which are large and belong to several institutions are labelled under a bridging social capital label. Those which are small, centred in a single institution but have high cohesion are labelled as bonding social capital. Those which are small and with low cohesion are called weak social capital groups. Academic performance of both PhD students and supervisors are highest in bridging groups and lowest in weak groups. Other variables are also found to differ according to the type of research group. At the end, some recommendations regarding academic and research policy are drawn

Relevância:

20.00% 20.00%

Publicador:

Resumo:

En aquesta tesi s’estudia el problema de la segmentació del moviment. La tesi presenta una revisió dels principals algoritmes de segmentació del moviment, s’analitzen les característiques principals i es proposa una classificació de les tècniques més recents i importants. La segmentació es pot entendre com un problema d’agrupament d’espais (manifold clustering). Aquest estudi aborda alguns dels reptes més difícils de la segmentació de moviment a través l’agrupament d’espais. S’han proposat nous algoritmes per a l’estimació del rang de la matriu de trajectòries, s’ha presenta una mesura de similitud entre subespais, s’han abordat problemes relacionats amb el comportament dels angles canònics i s’ha desenvolupat una eina genèrica per estimar quants moviments apareixen en una seqüència. L´ultima part de l’estudi es dedica a la correcció de l’estimació inicial d’una segmentació. Aquesta correcció es du a terme ajuntant els problemes de la segmentació del moviment i de l’estructura a partir del moviment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The clustering in time (seriality) of extratropical cyclones is responsible for large cumulative insured losses in western Europe, though surprisingly little scientific attention has been given to this important property. This study investigates and quantifies the seriality of extratropical cyclones in the Northern Hemisphere using a point-process approach. A possible mechanism for serial clustering is the time-varying effect of the large-scale flow on individual cyclone tracks. Another mechanism is the generation by one parent cyclone of one or more offspring through secondary cyclogenesis. A long cyclone-track database was constructed for extended October March winters from 1950 to 2003 using 6-h analyses of 850-mb relative vorticity derived from the NCEP NCAR reanalysis. A dispersion statistic based on the varianceto- mean ratio of monthly cyclone counts was used as a measure of clustering. It reveals extensive regions of statistically significant clustering in the European exit region of the North Atlantic storm track and over the central North Pacific. Monthly cyclone counts were regressed on time-varying teleconnection indices with a log-linear Poisson model. Five independent teleconnection patterns were found to be significant factors over Europe: the North Atlantic Oscillation (NAO), the east Atlantic pattern, the Scandinavian pattern, the east Atlantic western Russian pattern, and the polar Eurasian pattern. The NAO alone is not sufficient for explaining the variability of cyclone counts in the North Atlantic region and western Europe. Rate dependence on time-varying teleconnection indices accounts for the variability in monthly cyclone counts, and a cluster process did not need to be invoked.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A methodology for discovering the mechanisms and dynamics of protein clustering on solid surfaces is presented. In situ atomic force microscopy images are quantitatively compared to Monte Carlo simulations using cluster statistics to differentiate various models. We study lysozyme adsorption on mica as a model system and find that all surface-supported clusters are mobile, not just the monomers, with diffusion constant inversely related to cluster size. The surface monomer diffusion constant is measured to be D1∼9×10-16  cm2 s-1, such a low value being difficult to measure using other techniques.