31 resultados para spatial clustering algorithms
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
                                
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
                                
Resumo:
In the present paper we focus on the performance of clustering algorithms using indices of paired agreement to measure the accordance between clusters and an a priori known structure. We specifically propose a method to correct all indices considered for agreement by chance - the adjusted indices are meant to provide a realistic measure of clustering performance. The proposed method enables the correction of virtually any index - overcoming previous limitations known in the literature - and provides very precise results. We use simulated datasets under diverse scenarios and discuss the pertinence of our proposal which is particularly relevant when poorly separated clusters are considered. Finally we compare the performance of EM and KMeans algorithms, within each of the simulated scenarios and generally conclude that EM generally yields best results.
                                
Resumo:
Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
                                
Resumo:
Clustering analysis is a useful tool to detect and monitor disease patterns and, consequently, to contribute for an effective population disease management. Portugal has the highest incidence of tuberculosis in the European Union (in 2012, 21.6 cases per 100.000 inhabitants), although it has been decreasing consistently. Two critical PTB (Pulmonary Tuberculosis) areas, metropolitan Oporto and metropolitan Lisbon regions, were previously identified through spatial and space-time clustering for PTB incidence rate and risk factors. Identifying clusters of temporal trends can further elucidate policy makers about municipalities showing a faster or a slower TB control improvement.
                                
Resumo:
The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.
                                
Resumo:
In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.
                                
Resumo:
A previously developed model is used to numerically simulate real clinical cases of the surgical correction of scoliosis. This model consists of one-dimensional finite elements with spatial deformation in which (i) the column is represented by its axis; (ii) the vertebrae are assumed to be rigid; and (iii) the deformability of the column is concentrated in springs that connect the successive rigid elements. The metallic rods used for the surgical correction are modeled by beam elements with linear elastic behavior. To obtain the forces at the connections between the metallic rods and the vertebrae geometrically, non-linear finite element analyses are performed. The tightening sequence determines the magnitude of the forces applied to the patient column, and it is desirable to keep those forces as small as possible. In this study, a Genetic Algorithm optimization is applied to this model in order to determine the sequence that minimizes the corrective forces applied during the surgery. This amounts to find the optimal permutation of integers 1, ... , n, n being the number of vertebrae involved. As such, we are faced with a combinatorial optimization problem isomorph to the Traveling Salesman Problem. The fitness evaluation requires one computing intensive Finite Element Analysis per candidate solution and, thus, a parallel implementation of the Genetic Algorithm is developed.
                                
Resumo:
Amorphous glass/ZnO-Al/p(a-Si:H)/i(a-Si:H)/n(a-Si1-xCx:H)/Al imagers with different n-layer resistivities were produced by plasma enhanced chemical vapour deposition technique (PE-CVD). An image is projected onto the sensing element and leads to spatially confined depletion regions that can be readout by scanning the photodiode with a low-power modulated laser beam. The essence of the scheme is the analog readout, and the absence of semiconductor arrays or electrode potential manipulations to transfer the information coming from the transducer. The influence of the intensity of the optical image projected onto the sensor surface is correlated with the sensor output characteristics (sensitivity, linearity blooming, resolution and signal-to-noise ratio) are analysed for different material compositions (0.5 < x < 1). The results show that the responsivity and the spatial resolution are limited by the conductivity of the doped layers. An enhancement of one order of magnitude in the image intensity signal and on the spatial resolution are achieved at 0.2 mW cm(-2) light flux by decreasing the n-layer conductivity by the same amount. A physical model supported by electrical simulation gives insight into the image-sensing technique used.
                                
Resumo:
Topology optimization consists in finding the spatial distribution of a given total volume of material for the resulting structure to have some optimal property, for instance, maximization of structural stiffness or maximization of the fundamental eigenfrequency. In this paper a Genetic Algorithm (GA) employing a representation method based on trees is developed to generate initial feasible individuals that remain feasible upon crossover and mutation and as such do not require any repairing operator to ensure feasibility. Several application examples are studied involving the topology optimization of structures where the objective functions is the maximization of the stiffness and the maximization of the first and the second eigenfrequencies of a plate, all cases having a prescribed material volume constraint.
                                
Resumo:
Mestrado de Radiações aplicadas às Tecnologias da Saúde. Área de especialização: Imagem Digital com Radiação X.
                                
                                
Resumo:
Fluorescence confocal microscopy (FCM) is now one of the most important tools in biomedicine research. In fact, it makes it possible to accurately study the dynamic processes occurring inside the cell and its nucleus by following the motion of fluorescent molecules over time. Due to the small amount of acquired radiation and the huge optical and electronics amplification, the FCM images are usually corrupted by a severe type of Poisson noise. This noise may be even more damaging when very low intensity incident radiation is used to avoid phototoxicity. In this paper, a Bayesian algorithm is proposed to remove the Poisson intensity dependent noise corrupting the FCM image sequences. The observations are organized in a 3-D tensor where each plane is one of the images acquired along the time of a cell nucleus using the fluorescence loss in photobleaching (FLIP) technique. The method removes simultaneously the noise by considering different spatial and temporal correlations. This is accomplished by using an anisotropic 3-D filter that may be separately tuned in space and in time dimensions. Tests using synthetic and real data are described and presented to illustrate the application of the algorithm. A comparison with several state-of-the-art algorithms is also presented.
                                
Resumo:
Anaemia has a significant impact on child development and mortality and is a severe public health problem in most countries in sub-Saharan Africa. Nutritional and infectious causes of anaemia are geographically variable and anaemia maps based on information on the major aetiologies of anaemia are important for identifying communities most in need and the relative contribution of major causes. We investigated the consistency between ecological and individual-level approaches to anaemia mapping, by building spatial anaemia models for children aged ≤15 years using different modeling approaches. We aimed to a) quantify the role of malnutrition, malaria, Schistosoma haematobium and soil-transmitted helminths (STH) for anaemia endemicity in children aged ≤15 years and b) develop a high resolution predictive risk map of anaemia for the municipality of Dande in Northern Angola. We used parasitological survey data on children aged ≤15 years to build Bayesian geostatistical models of malaria (PfPR≤15), S. haematobium, Ascaris lumbricoides and Trichuris trichiura and predict small-scale spatial variation in these infections. The predictions and their associated uncertainty were used as inputs for a model of anemia prevalence to predict small-scale spatial variation of anaemia. Stunting, PfPR≤15, and S. haematobium infections were significantly associated with anaemia risk. An estimated 12.5%, 15.6%, and 9.8%, of anaemia cases could be averted by treating malnutrition, malaria, S. haematobium, respectively. Spatial clusters of high risk of anaemia (>86%) were identified. Using an individual-level approach to anaemia mapping at a small spatial scale, we found that anaemia in children aged ≤15 years is highly heterogeneous and that malnutrition and parasitic infections are important contributors to the spatial variation in anemia risk. The results presented in this study can help inform the integration of the current provincial malaria control program with ancillary micronutrient supplementation and control of neglected tropical diseases, such as urogenital schistosomiasis and STH infection.
                                
Resumo:
Anaemia is known to have an impact on child development and mortality and is a severe public health problem in most countries in sub-Saharan Africa. We investigated the consistency between ecological and individual-level approaches to anaemia mapping by building spatial anaemia models for children aged ≤15 years using different modelling approaches. We aimed to (i) quantify the role of malnutrition, malaria, Schistosoma haematobium and soil-transmitted helminths (STHs) in anaemia endemicity; and (ii) develop a high resolution predictive risk map of anaemia for the municipality of Dande in northern Angola. We used parasitological survey data for children aged ≤15 years to build Bayesian geostatistical models of malaria (PfPR≤15), S. haematobium, Ascaris lumbricoides and Trichuris trichiura and predict small-scale spatial variations in these infections. Malnutrition, PfPR≤15, and S. haematobium infections were significantly associated with anaemia risk. An estimated 12.5%, 15.6% and 9.8% of anaemia cases could be averted by treating malnutrition, malaria and S. haematobium, respectively. Spatial clusters of high risk of anaemia (>86%) were identified. Using an individual-level approach to anaemia mapping at a small spatial scale, we found that anaemia in children aged ≤15 years is highly heterogeneous and that malnutrition and parasitic infections are important contributors to the spatial variation in anaemia risk. The results presented in this study can help inform the integration of the current provincial malaria control programme with ancillary micronutrient supplementation and control of neglected tropical diseases such as urogenital schistosomiasis and STH infections.
                                
Resumo:
In team sports, the spatial distribution of players on the field is determined by the interaction behavior established at both player and team levels. The distribution patterns observed during a game emerge from specific technical and tactical methods adopted by the teams, and from individual, environmental and task constraints that influence players' behaviour. By understanding how specific patterns of spatial interaction are formed, one can characterize the behavior of the respective teams and players. Thus, in the present work we suggest a novel spatial method for describing teams' spatial interaction behaviour, which results from superimposing the Voronoi diagrams of two competing teams. We considered theoretical patterns of spatial distribution in a well-defined scenario (5 vs 4+ GK played in a field of 20x20m) in order to generate reference values of the variables derived from the superimposed Voronoi diagrams (SVD). These variables were tested in a formal application to empirical data collected from 19 Futsal trials with identical playing settings. Results suggest that it is possible to identify a number of characteristics that can be used to describe players' spatial behavior at different levels, namely the defensive methods adopted by the players.
 
                    