11 resultados para document clustering
em Universitat de Girona, Spain
Resumo:
Estudi, disseny i implementació de diferents tècniques d’agrupament de fibres (clustering) per tal d’integrar a la plataforma DTIWeb diferents algorismes de clustering i tècniques de visualització de clústers de fibres de forma que faciliti la interpretació de dades de DTI als especialistes
A new approach to segmentation based on fusing circumscribed contours, region growing and clustering
Resumo:
One of the major problems in machine vision is the segmentation of images of natural scenes. This paper presents a new proposal for the image segmentation problem which has been based on the integration of edge and region information. The main contours of the scene are detected and used to guide the posterior region growing process. The algorithm places a number of seeds at both sides of a contour allowing stating a set of concurrent growing processes. A previous analysis of the seeds permits to adjust the homogeneity criterion to the regions's characteristics. A new homogeneity criterion based on clustering analysis and convex hull construction is proposed
Resumo:
In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach
Resumo:
Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process
Resumo:
Our essay aims at studying suitable statistical methods for the clustering of compositional data in situations where observations are constituted by trajectories of compositional data, that is, by sequences of composition measurements along a domain. Observed trajectories are known as “functional data” and several methods have been proposed for their analysis. In particular, methods for clustering functional data, known as Functional Cluster Analysis (FCA), have been applied by practitioners and scientists in many fields. To our knowledge, FCA techniques have not been extended to cope with the problem of clustering compositional data trajectories. In order to extend FCA techniques to the analysis of compositional data, FCA clustering techniques have to be adapted by using a suitable compositional algebra. The present work centres on the following question: given a sample of compositional data trajectories, how can we formulate a segmentation procedure giving homogeneous classes? To address this problem we follow the steps described below. First of all we adapt the well-known spline smoothing techniques in order to cope with the smoothing of compositional data trajectories. In fact, an observed curve can be thought of as the sum of a smooth part plus some noise due to measurement errors. Spline smoothing techniques are used to isolate the smooth part of the trajectory: clustering algorithms are then applied to these smooth curves. The second step consists in building suitable metrics for measuring the dissimilarity between trajectories: we propose a metric that accounts for difference in both shape and level, and a metric accounting for differences in shape only. A simulation study is performed in order to evaluate the proposed methodologies, using both hierarchical and partitional clustering algorithm. The quality of the obtained results is assessed by means of several indices
Resumo:
Image segmentation of natural scenes constitutes a major problem in machine vision. This paper presents a new proposal for the image segmentation problem which has been based on the integration of edge and region information. This approach begins by detecting the main contours of the scene which are later used to guide a concurrent set of growing processes. A previous analysis of the seed pixels permits adjustment of the homogeneity criterion to the region's characteristics during the growing process. Since the high variability of regions representing outdoor scenes makes the classical homogeneity criteria useless, a new homogeneity criterion based on clustering analysis and convex hull construction is proposed. Experimental results have proven the reliability of the proposed approach
Resumo:
In image processing, segmentation algorithms constitute one of the main focuses of research. In this paper, new image segmentation algorithms based on a hard version of the information bottleneck method are presented. The objective of this method is to extract a compact representation of a variable, considered the input, with minimal loss of mutual information with respect to another variable, considered the output. First, we introduce a split-and-merge algorithm based on the definition of an information channel between a set of regions (input) of the image and the intensity histogram bins (output). From this channel, the maximization of the mutual information gain is used to optimize the image partitioning. Then, the merging process of the regions obtained in the previous phase is carried out by minimizing the loss of mutual information. From the inversion of the above channel, we also present a new histogram clustering algorithm based on the minimization of the mutual information loss, where now the input variable represents the histogram bins and the output is given by the set of regions obtained from the above split-and-merge algorithm. Finally, we introduce two new clustering algorithms which show how the information bottleneck method can be applied to the registration channel obtained when two multimodal images are correctly aligned. Different experiments on 2-D and 3-D images show the behavior of the proposed algorithms
Resumo:
Pantoea agglomerans strains are among the most promising biocontrol agents for a variety of bacterial and fungal plant diseases, particularly fire blight of apple and pear. However, commercial registration of P. agglomerans biocontrol products is hampered because this species is currently listed as a biosafety level 2 (BL2) organism due to clinical reports as an opportunistic human pathogen. This study compares plant-origin and clinical strains in a search for discriminating genotypic/phenotypic markers using multi-locus phylogenetic analysis and fluorescent amplified fragment length polymorphisms (fAFLP) fingerprinting. Results: Majority of the clinical isolates from culture collections were found to be improperly designated as P. agglomerans after sequence analysis. The frequent taxonomic rearrangements underwent by the Enterobacter agglomerans/Erwinia herbicola complex may be a major problem in assessing clinical associations within P. agglomerans. In the P. agglomerans sensu stricto (in the stricter sense) group, there was no discrete clustering of clinical/biocontrol strains and no marker was identified that was uniquely associated to clinical strains. A putative biocontrol-specific fAFLP marker was identified only in biocontrol strains. The partial ORF located in this band corresponded to an ABC transporter that was found in all P. agglomerans strains. Conclusion: Taxonomic mischaracterization was identified as a major problem with P. agglomerans, and current techniques removed a majority of clinical strains from this species. Although clear discrimination between P. agglomerans plant and clinical strains was not obtained with phylogenetic analysis, a single marker characteristic of biocontrol strains was identified which may be of use in strain biosafety determinations. In addition, the lack of Koch's postulate fulfilment, rare retention of clinical strains for subsequent confirmation, and the polymicrobial nature of P. agglomerans clinical reports should be considered in biosafety assessment of beneficial strains in this species
Resumo:
The article examines the structure of the collaboration networks of research groups where Slovenian and Spanish PhD students are pursuing their doctorate. The units of analysis are student-supervisor dyads. We use duocentred networks, a novel network structure appropriate for networks which are centred around a dyad. A cluster analysis reveals three typical clusters of research groups. Those which are large and belong to several institutions are labelled under a bridging social capital label. Those which are small, centred in a single institution but have high cohesion are labelled as bonding social capital. Those which are small and with low cohesion are called weak social capital groups. Academic performance of both PhD students and supervisors are highest in bridging groups and lowest in weak groups. Other variables are also found to differ according to the type of research group. At the end, some recommendations regarding academic and research policy are drawn
Resumo:
La Agència Valenciana de Turisme ha emprendido el proyecto de actualización de la web de promoción de la Comunidad Valenciana, el portal http://www.comunitatvalenciana.com. Este portal es una referencia internacional para todos los visitantes de la Comunidad Valenciana y por tanto un proyecto clave en su promoción turística. En esta ponencia se presentará la elaboración de un geoportal turístico atractivo y preparado para soportar un número elevado de visitas. En el contexto del proyecto se han abordado problemas como la visualización de información vectorial (puntos de interés) en un elevado número mediante la técnica de agregación o clustering. Por otro lado dicha información vectorial se procesa de tal forma que el visitante de la web obtiene un rendimiento en los tiempos de respuesta bastante elevado gracias al uso de técnicas de multirresolución en el visor web. La información de partida se migra a una base de datos espacial libre y se procesa para generar archivos en formato JSON. Por otro lado, el geoportal ofrece un flexible motor de búsquedas, preparado igualmente para soportar una carga elevada de peticiones mediante el uso de la indexación con el soporte para consultas espaciales. Este motor de búsquedas igualmente se ha preparado para utilizarse como servidor para ofrecer toda la información del portal a través de Layar, un servicio de realidad aumentada para móviles. Este servicio está completamente basado en componentes libres como el framework Spring o el soporte de búsquedas Lucene. Se presentará por tanto cómo se ha abordado la realización de una solución completa de presentación de información de un portal turístico de exigencias de rendimiento elevadas, centrando la atención en los componentes del servidor, todos ellos basados en software libre
Resumo:
En aquesta tesi s’estudia el problema de la segmentació del moviment. La tesi presenta una revisió dels principals algoritmes de segmentació del moviment, s’analitzen les característiques principals i es proposa una classificació de les tècniques més recents i importants. La segmentació es pot entendre com un problema d’agrupament d’espais (manifold clustering). Aquest estudi aborda alguns dels reptes més difícils de la segmentació de moviment a través l’agrupament d’espais. S’han proposat nous algoritmes per a l’estimació del rang de la matriu de trajectòries, s’ha presenta una mesura de similitud entre subespais, s’han abordat problemes relacionats amb el comportament dels angles canònics i s’ha desenvolupat una eina genèrica per estimar quants moviments apareixen en una seqüència. L´ultima part de l’estudi es dedica a la correcció de l’estimació inicial d’una segmentació. Aquesta correcció es du a terme ajuntant els problemes de la segmentació del moviment i de l’estructura a partir del moviment.