869 resultados para height partition clustering
Resumo:
In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced
Resumo:
A spectral angle based feature extraction method, Spectral Clustering Independent Component Analysis (SC-ICA), is proposed in this work to improve the brain tissue classification from Magnetic Resonance Images (MRI). SC-ICA provides equal priority to global and local features; thereby it tries to resolve the inefficiency of conventional approaches in abnormal tissue extraction. First, input multispectral MRI is divided into different clusters by a spectral distance based clustering. Then, Independent Component Analysis (ICA) is applied on the clustered data, in conjunction with Support Vector Machines (SVM) for brain tissue analysis. Normal and abnormal datasets, consisting of real and synthetic T1-weighted, T2-weighted and proton density/fluid-attenuated inversion recovery images, were used to evaluate the performance of the new method. Comparative analysis with ICA based SVM and other conventional classifiers established the stability and efficiency of SC-ICA based classification, especially in reproduction of small abnormalities. Clinical abnormal case analysis demonstrated it through the highest Tanimoto Index/accuracy values, 0.75/98.8%, observed against ICA based SVM results, 0.17/96.1%, for reproduced lesions. Experimental results recommend the proposed method as a promising approach in clinical and pathological studies of brain diseases
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
Many recent Web 2.0 resource sharing applications can be subsumed under the "folksonomy" moniker. Regardless of the type of resource shared, all of these share a common structure describing the assignment of tags to resources by users. In this report, we generalize the notions of clustering and characteristic path length which play a major role in the current research on networks, where they are used to describe the small-world effects on many observable network datasets. To that end, we show that the notion of clustering has two facets which are not equivalent in the generalized setting. The new measures are evaluated on two large-scale folksonomy datasets from resource sharing systems on the web.
Resumo:
Recently, research projects such as PADLR and SWAP have developed tools like Edutella or Bibster, which are targeted at establishing peer-to-peer knowledge management (P2PKM) systems. In such a system, it is necessary to obtain provide brief semantic descriptions of peers, so that routing algorithms or matchmaking processes can make decisions about which communities peers should belong to, or to which peers a given query should be forwarded. This paper proposes the use of graph clustering techniques on knowledge bases for that purpose. Using this clustering, we can show that our strategy requires up to 58% fewer queries than the baselines to yield full recall in a bibliographic P2PKM scenario.
Resumo:
Cubicle should provide good resting comfort as well as clean udders. Dairy cows in cubicle houses often face a restrictive environment with regard to resting behaviour, whereas cleanliness may still be impaired. This study aimed to determine reliable behavioural measures regarding resting comfort applicable in on-farm welfare assessments. Furthermore, relationships between cubicle design, cow sizes, management factors and udder cleanliness (namely teats and teat tips) were investigated. Altogether 15 resting measures were examined in terms of feasibility, inter-observer reliability (IOR) and consistency of results per farm over time. They were recorded during three farm visits on farms in Germany and Austria with cubicle, deep litter and tie stall systems. Seven measures occurred to infrequently to allow reliable recording within a limited observation time. IOR was generally acceptable to excellent except for 'collisions during lying down', which only showed good IOR after improvement of the definition. Only three measures were acceptably repeatable over time: 'duration of lying down', 'percentage of collisions during lying down' and 'percentage of cows lying partly or completely outside lying area'. These measures were evaluated as suitable animal based welfare measures regarding resting behaviour in the framework of an on-farm welfare assessment protocol. The second part of the thesis comprises a cross-sectional study on resting comfort and cow cleanliness including 23 Holstein Friesian dairy herds with very low within-farm variation in cubicle measures. Height at withers, shoulder width and diagonal body length were measured in 79-100 % of the cows (herd size 30 to115 cows). Based on the 25 % largest animals, compliance with recommendations for cubicle measures was calculated. Cleanliness of different body parts, the udder, teats and teat tips was assessed for each cow in the herd prior to morning milking. No significant correlation was found between udder soiling and teat or teat tip soiling on herd level. The final model of a stepwise regression regarding the percentage of dirty teats per farm explained 58.5 % the variance and contained four factors. Teat dipping after milking which might be associated with an overall clean and accurate management style, deep bedded cubicles, increasing cubicle maintenance times and decreasing compliance concerning total cubicle length predicted lower teat soiling. The final model concerning teat tip soiling explained 46.0 % of the variance and contained three factors. Increasing litter height in the rear part of the cubicle and increased alley soiling which is difficult to explain, predicted for less soiled teat tips, whereas increasing compliance concerning resting length was associated with higher percentages of dirty teat tips. The dependent variable ‘duration of lying down’ was analysed using again stepwise regression. The final model explained 54.8 % of the total variance. Lying down duration was significantly shorter in deep bedded cubicles. Further explanatory though not significant factors in the model were neck-rail height, deep bedding or comfort mattresses versus concrete floor or rubber mats and clearance height of side partitions. In the attempt to create a more comprehensive lying down measure, another analysis was carried out with percentage of ‘impaired lying down’ (i.e. events exceeding 6.3 seconds, with collisions or being interrupted) as dependent variable. The explanatory value of this final model was 41.3 %. An increase in partition length, in compliance concerning cubicle width and the presence of straw within bedding predicted a lower proportion of impaired lying down. The effect of partition length is difficult to interpret, but partition length and height were positively correlated on the study farms, possibly leading to a bigger zone of clear space for pelvis freedom. No associations could be found between impaired lying down and teat or teat tip soiling. Altogether, in agreement with earlier studies it was found that cubicle dimensions in practice are often inadequate with regard to the body dimensions of the cows, leading to high proportions of impaired lying down behaviour, whereas teat cleanliness is still unsatisfactory. Connections between cleanliness and cow comfort are far from simplistic. Especially the relationship between cubicle characteristics and lying down behaviour apparently is very complex, so that it is difficult to identify single influential factors that are valid for all farm situations. However, based on the results of the present study the use of deep bedded cubicles can be recommended as well as improved management with special regard to cubicle and litter maintenance in order to achieve both better resting comfort and teat cleanliness.
Resumo:
Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices
Resumo:
Estudi, disseny i implementació de diferents tècniques d’agrupament de fibres (clustering) per tal d’integrar a la plataforma DTIWeb diferents algorismes de clustering i tècniques de visualització de clústers de fibres de forma que faciliti la interpretació de dades de DTI als especialistes
A new approach to segmentation based on fusing circumscribed contours, region growing and clustering
Resumo:
One of the major problems in machine vision is the segmentation of images of natural scenes. This paper presents a new proposal for the image segmentation problem which has been based on the integration of edge and region information. The main contours of the scene are detected and used to guide the posterior region growing process. The algorithm places a number of seeds at both sides of a contour allowing stating a set of concurrent growing processes. A previous analysis of the seeds permits to adjust the homogeneity criterion to the regions's characteristics. A new homogeneity criterion based on clustering analysis and convex hull construction is proposed
Resumo:
In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach
Resumo:
A finales de 2009 se emprendió un nuevo modelo de segmentación de mercados por conglomeraciones o clústers, con el cual se busca atender las necesidades de los clientes, advirtiendo el ciclo de vida en el cual se encuentran, realizando estrategias que mejoren la rentabilidad del negocio, por medio de indicadores de gestión KPI. Por medio de análisis tecnológico se desarrolló el proceso de inteligencia de la segmentación, por medio del cual se obtuvo el resultado de clústers, que poseían características similares entre sí, pero que diferían de los otros, en variables de comportamiento. Esto se refleja en el desarrollo de campañas estratégicas dirigidas que permitan crear una estrecha relación de fidelidad con el cliente, para aumentar la rentabilidad, en principio, y fortalecer la relación a largo plazo, respondiendo a la razón de ser del negocio
Resumo:
Objetivo: Determinar la distribución por percentiles de la circunferencia de cintura en una población escolar de Bogotá, Colombia, pertenecientes al estudio FUPRECOL. Métodos: Estudio transversal, realizado en 3.005 niños y 2.916 adolescentes de entre 9 y 17,9 años de edad, de Bogotá, Colombia. Se tomaron medidas de peso, talla, circunferencia de cintura, circunferencia de cadera y estado de maduración sexual por auto-reporte. Se calcularon los percentiles (P3, P10, P25, P50, P75, P90 y P97) y curvas centiles según sexo y edad. Se realizó una comparación entre los valores de la circunferencia de cintura observados con estándares internacionales. Resultados: De la población general (n=5.921), el 57,0% eran chicas (promedio de edad 12,7±2,3 años). En la mayoría de los grupos etáreos la circunferencia de cintura de las chicas fue inferior a la de los chicos. El aumento entre el P50-P97 de la circunferencia de cintura , por edad, fue mínimo de 15,7 cm en chicos de 9-9.9 años y de 16,0 cm en las chicas de 11-11.9 años. Al comparar los resultados de este estudio, por grupos de edad y sexo, con trabajos internacionales de niños y adolescentes, el P50 fue inferior al reportado en Perú e Inglaterra a excepción de los trabajos de la India, Venezuela (Mérida), Estados Unidos y España. Conclusiones: Se presentan percentiles de la circunferencia de cintura según edad y sexo que podrán ser usados de referencia en la evaluación del estado nutricional y en la predicción del riesgo cardiovascular desde edades tempranas.
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process
Resumo:
The article examines the structure of the collaboration networks of research groups where Slovenian and Spanish PhD students are pursuing their doctorate. The units of analysis are student-supervisor dyads. We use duocentred networks, a novel network structure appropriate for networks which are centred around a dyad. A cluster analysis reveals three typical clusters of research groups. Those which are large and belong to several institutions are labelled under a bridging social capital label. Those which are small, centred in a single institution but have high cohesion are labelled as bonding social capital. Those which are small and with low cohesion are called weak social capital groups. Academic performance of both PhD students and supervisors are highest in bridging groups and lowest in weak groups. Other variables are also found to differ according to the type of research group. At the end, some recommendations regarding academic and research policy are drawn
Resumo:
En aquesta tesi s’estudia el problema de la segmentació del moviment. La tesi presenta una revisió dels principals algoritmes de segmentació del moviment, s’analitzen les característiques principals i es proposa una classificació de les tècniques més recents i importants. La segmentació es pot entendre com un problema d’agrupament d’espais (manifold clustering). Aquest estudi aborda alguns dels reptes més difícils de la segmentació de moviment a través l’agrupament d’espais. S’han proposat nous algoritmes per a l’estimació del rang de la matriu de trajectòries, s’ha presenta una mesura de similitud entre subespais, s’han abordat problemes relacionats amb el comportament dels angles canònics i s’ha desenvolupat una eina genèrica per estimar quants moviments apareixen en una seqüència. L´ultima part de l’estudi es dedica a la correcció de l’estimació inicial d’una segmentació. Aquesta correcció es du a terme ajuntant els problemes de la segmentació del moviment i de l’estructura a partir del moviment.