27 resultados para Clustering methods

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work, 14 primary schools of Lisbon city, Portugal, followed a questionnaire of the ISAAC - International Study of Asthma and Allergies in Childhood Program, in 2009/2010. The questionnaire contained questions to identify children with respiratory diseases (wheeze, asthma and rhinitis). Total particulate matter (TPM) was passively collected inside two classrooms of each of 14 primary schools. Two types of filter matrices were used to collect TPM: Millipore (IsoporeTM) polycarbonate and quartz. Three campaigns were selected for the measurement of TPM: Spring, Autumn and Winter. The highest difference between the two types of filters is that the mass of collected particles was higher in quartz filters than in polycarbonate filters, even if their correlation is excellent. The highest TPM depositions occurred between October 2009 and March 2010, when related with rhinitis proportion. Rhinitis was found to be related to TPM when the data were grouped seasonally and averaged for all the schools. For the data of 2006/2007, the seasonal variation was found to be related to outdoor particle deposition (below 10 μm).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Chromium dioxide (CrO2) has been extensively used in the magnetic recording industry. However, it is its ferromagnetic half-metallic nature that has more recently attracted much attention, primarily for the development of spintronic devices. CrO2 is the only stoichiometric binary oxide theoretically predicted to be fully spin polarized at the Fermi level. It presents a Curie temperature of ∼ 396 K, i.e. well above room temperature, and a magnetic moment of 2 mB per formula unit. However an antiferromagnetic native insulating layer of Cr2O3 is always present on the CrO2 surface which enhances the CrO2 magnetoresistance and might be used as a barrier in magnetic tunnel junctions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thin films of TiO2 were doped with Au by ion implantation and in situ during the deposition. The films were grown by reactive magnetron sputtering and deposited in silicon and glass substrates at a temperature around 150 degrees C. The undoped films were implanted with Au fiuences in the range of 5 x 10(15) Au/cm(2)-1 x 10(17) Au/cm(2) with a energy of 150 keV. At a fluence of 5 x 10(16) Au/cm(2) the formation of Au nanoclusters in the films is observed during the implantation at room temperature. The clustering process starts to occur during the implantation where XRD estimates the presence of 3-5 nm precipitates. After annealing in a reducing atmosphere, the small precipitates coalesce into larger ones following an Ostwald ripening mechanism. In situ XRD studies reveal that Au atoms start to coalesce at 350 degrees C, reaching the precipitates dimensions larger than 40 nm at 600 degrees C. Annealing above 700 degrees C promotes drastic changes in the Au profile of in situ doped films with the formation of two Au rich regions at the interface and surface respectively. The optical properties reveal the presence of a broad band centered at 550 nm related to the plasmon resonance of gold particles visible in AFM maps. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Personal memories composed of digital pictures are very popular at the moment. To retrieve these media items annotation is required. During the last years, several approaches have been proposed in order to overcome the image annotation problem. This paper presents our proposals to address this problem. Automatic and semi-automatic learning methods for semantic concepts are presented. The automatic method is based on semantic concepts estimated using visual content, context metadata and audio information. The semi-automatic method is based on results provided by a computer game. The paper describes our proposals and presents their evaluations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tomographic image can be degraded, partially by patient based attenuation. The aim of this paper is to quantitatively verify the effects of attenuation correction methods Chang and CT in 111In studies through the analysis of profiles from abdominal SPECT, correspondent to a uniform radionuclide uptake organ, the left kidney.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Epidemiological studies showed increased prevalence of respiratory symptoms and adverse changes in pulmonary function parameters in poultry workers, corroborating the increased exposure to risk factors, such as fungal load and their metabolites. This study aimed to determine the occupational exposure threat due to fungal contamination caused by the toxigenic isolates belonging to the complex of the species of Aspergillus flavus and also isolates fromAspergillus fumigatus species complex. The study was carried out in seven Portuguese poultries, using cultural and molecularmethodologies. For conventional/cultural methods, air, surfaces, and litter samples were collected by impaction method using the Millipore Air Sampler. For the molecular analysis, air samples were collected by impinger method using the Coriolis μ air sampler. After DNA extraction, samples were analyzed by real-time PCR using specific primers and probes for toxigenic strains of the Aspergillus flavus complex and for detection of isolates from Aspergillus fumigatus complex. Through conventional methods, and among the Aspergillus genus, different prevalences were detected regarding the presence of Aspergillus flavus and Aspergillus fumigatus species complexes, namely: 74.5 versus 1.0% in the air samples, 24.0 versus 16.0% in the surfaces, 0 versus 32.6% in new litter, and 9.9 versus 15.9%in used litter. Through molecular biology, we were able to detect the presence of aflatoxigenic strains in pavilions in which Aspergillus flavus did not grow in culture. Aspergillus fumigatus was only found in one indoor air sample by conventional methods. Using molecular methodologies, however, Aspergillus fumigatus complex was detected in seven indoor samples from three different poultry units. The characterization of fungal contamination caused by Aspergillus flavus and Aspergillus fumigatus raises the concern of occupational threat not only due to the detected fungal load but also because of the toxigenic potential of these species.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This project was developed to fully assess the indoor air quality in archives and libraries from a fungal flora point of view. It uses classical methodologies such as traditional culture media – for the viable fungi – and modern molecular biology protocols, especially relevant to assess the non-viable fraction of the biological contaminants. Denaturing high-performance liquid chromatography (DHPLC) has emerged as an alternative to denaturing gradient gel electrophoresis (DGGE) and has already been applied to the study of a few bacterial communities. We propose the application of DHPLC to the study of fungal colonization on paper-based archive materials. This technology allows for the identification of each component of a mixture of fungi based on their genetic variation. In a highly complex mixture of microbial DNA this method can be used simply to study the population dynamics, and it also allows for sample fraction collection, which can, in many cases, be immediately sequenced, circumventing the need for cloning. Some examples of the methodological application are shown. Also applied is fragment length analysis for the study of mixed Candida samples. Both of these methods can later be applied in various fields, such as clinical and sand sample analysis. So far, the environmental analyses have been extremely useful to determine potentially pathogenic/toxinogenic fungi such as Stachybotrys sp., Aspergillus niger, Aspergillus fumigatus, and Fusarium sp. This work will hopefully lead to more accurate evaluation of environmental conditions for both human health and the preservation of documents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The handling of waste and compost that occurs frequently in composting plants (compost turning, shredding, and screening) has been shown to be responsible for the release of dust and air borne microorganisms and their compounds in the air. Thermophilic fungi, such as A. fumigatus, have been reported and this kind of contamination in composting facilities has been associated with increased respiratory symptoms among compost workers. This study intended to characterize fungal contamination in a totally indoor composting plant located in Portugal. Besides conventional methods, molecular biology was also applied to overcome eventual limitations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Microarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.