52 resultados para label hierarchical clustering

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

90.00% 90.00%

Publicador:

Resumo:

HEMOLIA (a project under European community’s 7th framework programme) is a new generation Anti-Money Laundering (AML) intelligent multi-agent alert and investigation system which in addition to the traditional financial data makes extensive use of modern society’s huge telecom data source, thereby opening up a new dimension of capabilities to all Money Laundering fighters (FIUs, LEAs) and Financial Institutes (Banks, Insurance Companies, etc.). This Master-Thesis project is done at AIA, one of the partners for the HEMOLIA project in Barcelona. The objective of this thesis is to find the clusters in a network drawn by using the financial data. An extensive literature survey has been carried out and several standard algorithms related to networks have been studied and implemented. The clustering problem is a NP-hard problem and several algorithms like K-Means and Hierarchical clustering are being implemented for studying several problems relating to sociology, evolution, anthropology etc. However, these algorithms have certain drawbacks which make them very difficult to implement. The thesis suggests (a) a possible improvement to the K-Means algorithm, (b) a novel approach to the clustering problem using the Genetic Algorithms and (c) a new algorithm for finding the cluster of a node using the Genetic Algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hierarchical clustering is a popular method for finding structure in multivariate data,resulting in a binary tree constructed on the particular objects of the study, usually samplingunits. The user faces the decision where to cut the binary tree in order to determine the numberof clusters to interpret and there are various ad hoc rules for arriving at a decision. A simplepermutation test is presented that diagnoses whether non-random levels of clustering are presentin the set of objects and, if so, indicates the specific level at which the tree can be cut. The test isvalidated against random matrices to verify the type I error probability and a power study isperformed on data sets with known clusteredness to study the type II error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parts based model is a parametrization of an object class using a collection of landmarks following the object structure. The matching of parts based models is one of the problems where pairwise Conditional Random Fields have been successfully applied. The main reason of their effectiveness is tractable inference and learning due to the simplicity of involved graphs, usually trees. However, these models do not consider possible patterns of statistics among sets of landmarks, and thus they sufffer from using too myopic information. To overcome this limitation, we propoese a novel structure based on a hierarchical Conditional Random Fields, which we explain in the first part of this memory. We build a hierarchy of combinations of landmarks, where matching is performed taking into account the whole hierarchy. To preserve tractable inference we effectively sample the label set. We test our method on facial feature selection and human pose estimation on two challenging datasets: Buffy and MultiPIE. In the second part of this memory, we present a novel approach to multiple kernel combination that relies on stacked classification. This method can be used to evaluate the landmarks of the parts-based model approach. Our method is based on combining responses of a set of independent classifiers for each individual kernel. Unlike earlier approaches that linearly combine kernel responses, our approach uses them as inputs to another set of classifiers. We will show that we outperform state-of-the-art methods on most of the standard benchmark datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The article examines the structure of the collaboration networks of research groups where Slovenian and Spanish PhD students are pursuing their doctorate. The units of analysis are student-supervisor dyads. We use duocentred networks, a novel network structure appropriate for networks which are centred around a dyad. A cluster analysis reveals three typical clusters of research groups. Those which are large and belong to several institutions are labelled under a bridging social capital label. Those which are small, centred in a single institution but have high cohesion are labelled as bonding social capital. Those which are small and with low cohesion are called weak social capital groups. Academic performance of both PhD students and supervisors are highest in bridging groups and lowest in weak groups. Other variables are also found to differ according to the type of research group. At the end, some recommendations regarding academic and research policy are drawn

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We uncover the global organization of clustering in real complex networks. To this end, we ask whether triangles in real networks organize as in maximally random graphs with given degree and clustering distributions, or as in maximally ordered graph models where triangles are forced into modules. The answer comes by way of exploring m-core landscapes, where the m-core is defined, akin to the k-core, as the maximal subgraph with edges participating in at least m triangles. This property defines a set of nested subgraphs that, contrarily to k-cores, is able to distinguish between hierarchical and modular architectures. We find that the clustering organization in real networks is neither completely random nor ordered although, surprisingly, it is more random than modular. This supports the idea that the structure of real networks may in fact be the outcome of self-organized processes based on local optimization rules, in contrast to global optimization principles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Report for the scientific sojourn at the Department of Information Technology (INTEC) at the Ghent University, Belgium, from january to june 2007. All-Optical Label Swapping (AOLS) forms a key technology towards the implementation of All-Optical Packet Switching nodes (AOPS) for the future optical Internet. The capital expenditures of the deployment of AOLS increases with the size of the label spaces (i.e. the number of used labels), since a special optical device is needed for each recognized label on every node. Label space sizes are affected by the wayin which demands are routed. For instance, while shortest-path routing leads to the usage of fewer labels but high link utilization, minimum interference routing leads to the opposite. This project studies and proposes All-Optical Label Stacking (AOLStack), which is an extension of the AOLS architecture. AOLStack aims at reducing label spaces while easing the compromise with link utilization. In this project, an Integer Lineal Program is proposed with the objective of analyzing the softening of the aforementioned trade-off due to AOLStack. Furthermore, a heuristic aiming at finding good solutions in polynomial-time is proposed as well. Simulation results show that AOLStack either a) reduces the label spaces with a low increase in the link utilization or, similarly, b) uses better the residual bandwidth to decrease the number of labels even more.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Memòria elaborada a partir d’una estada al projecte Proteus de la New York University entre abril i juny del 2007. Les tècniques de clustering poden ajudar a reduir la supervisió en processos d’obtenció de patrons per a Extracció d’Informació. Tanmateix, és necessari disposar d’algorismes adequats a documents, i aquests algorismes requereixen mesures adequades de similitud entre patrons. Els kernels poden oferir una solució a aquests problemes, però l’aprenentatge no supervisat requereix d’estrat`egies m´es astutes que l’aprenentatge supervisat per a incorporar major quantitat d’informació. En aquesta memòria, fruit de la meva estada de mes d’Abril al de Juny de 2007 al projecte. Proteus de la New York University, es proposen i avaluen diversos kernels sobre patrons. Ini- cialment s’estudien kernels amb una família de patrons restringits, i a continuació s’apliquen kernels ja usats en tasques supervisades d’Extracció d’Informació. Degut a la degradació del rendiment que experimenta el clustering a l’afegir informació irrellevant, els kernels se simpli- fiquen i es busquen estratègies per a incorporar-hi semàntica de forma selectiva. Finalment, s’estudia quin efecte té aplicar clustering sobre el coneixement semàntic com a pas previ al clustering de patrons. Les diverses estratègies s’avaluen en tasques de clustering de documents i patrons usant dades reals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Los mapas de vegetación son a menudo utilizados como proxis de una estratificación de hábitats para generar distribuciones geográficas contínuas de organismos a partir de datos discretos mediante modelos multi-variantes. Sin embargo, los mapas de vegetación suelen ser poco apropiados para ser directamente aplicados a este fin, pues sus categorías no se concibieron con la intención de corresponder a tipos de hábitat. En este artículo presentamos y aplicamos el método de Agrupamiento por Doble Criterio para generalizar un mapa de vegetación extraordinariamente detallado (350 clases) del Parque Natural del Montseny (Cataluña) en categorías que mantienen la coherencia tanto desde el punto de vista estructural (a través de una matriz de disimilaridad espectral calculada mediante una imágen del satélite SPOT-5) como en términos de vegetación (gracias a una matriz de disimilaridad calculada mediante propiedades de vegetación deducidas de la leyenda jerárquica del mapa). El método simplifica de 114 a 18 clases el 67% del área de estudio. Añadiendo otras agregaciones más triviales basadas exclusivamente en criterios de cubierta de suelo, el 73% del área de estudio pasa de 167 a 25 categorías. Como valor añadido, el método identifica el 10% de los polígonos originales como anómalos (a partir de comparar las propiedades espectrales de cada polígono con el resto de los de su clases), lo que implica cambios en la cubierta entre las fechas del soporte utilizado para generar el mapa original y la imagen de satélite, o errores en la producción de éste.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Creative industries tend to concentrate mainly around large- and medium-sized cities, forming creative local production systems. The text analyses the forces behind clustering of creative industries to provide the first empirical explanation of the determinants of creative employment clustering following a multidisciplinary approach based on cultural and creative economics, evolutionary geography and urban economics. A comparative analysis has been performed for Italy and Spain. The results show different patterns of creative employment clustering in both countries. The small role of historical and cultural endowments, the size of the place, the average size of creative industries, the productive diversity and the concentration of human capital and creative class have been found as common factors of clustering in both countries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Concerns on the clustering of retail industries and professional services in main streets had traditionally been the public interest rationale for supporting distance regulations. Although many geographic restrictions have been suppressed, deregulation has hinged mostly upon the theory results on the natural tendency of outlets to differentiate spatially. Empirical evidence has so far offered mixed results. Using the case of deregulation of pharmacy establishment in a region of Spain, we empirically show how pharmacy locations scatter, and that there is not rationale for distance regulation apart from the underlying private interest of very few incumbents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most network operators have considered reducing Label Switched Routers (LSR) label spaces (i.e. the number of labels that can be used) as a means of simplifying management of underlaying Virtual Private Networks (VPNs) and, hence, reducing operational expenditure (OPEX). This letter discusses the problem of reducing the label spaces in Multiprotocol Label Switched (MPLS) networks using label merging - better known as MultiPoint-to-Point (MP2P) connections. Because of its origins in IP, MP2P connections have been considered to have tree- shapes with Label Switched Paths (LSP) as branches. Due to this fact, previous works by many authors affirm that the problem of minimizing the label space using MP2P in MPLS - the Merging Problem - cannot be solved optimally with a polynomial algorithm (NP-complete), since it involves a hard- decision problem. However, in this letter, the Merging Problem is analyzed, from the perspective of MPLS, and it is deduced that tree-shapes in MP2P connections are irrelevant. By overriding this tree-shape consideration, it is possible to perform label merging in polynomial time. Based on how MPLS signaling works, this letter proposes an algorithm to compute the minimum number of labels using label merging: the Full Label Merging algorithm. As conclusion, we reclassify the Merging Problem as Polynomial-solvable, instead of NP-complete. In addition, simulation experiments confirm that without the tree-branch selection problem, more labels can be reduced

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traffic Engineering objective is to optimize network resource utilization. Although several works have been published about minimizing network resource utilization in MPLS networks, few of them have been focused in LSR label space reduction. This letter studies Asymmetric Merged Tunneling (AMT) as a new method for reducing the label space in MPLS network. The proposed method may be regarded as a combination of label merging (proposed in the MPLS architecture) and asymmetric tunneling (proposed recently in our previous works). Finally, simulation results are performed by comparing AMT with both ancestors. They show a great improvement in the label space reduction factor