4 resultados para selection model
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
Resumo:
Toxic amides, such as acrylamide, are potentially harmful to Human health, so there is great interest in the fabrication of compact and economical devices to measure their concentration in food products and effluents. The CHEmically Modified Field Effect Transistor (CHEMFET) based onamorphous silicon technology is a candidate for this type of application due to its low fabrication cost. In this article we have used a semi-empirical modelof the device to predict its performance in a solution of interfering ions. The actual semiconductor unit of the sensor was fabricated by the PECVD technique in the top gate configuration. The CHEMFET simulation was performed based on the experimental current voltage curves of the semiconductor unit and on an empirical model of the polymeric membrane. Results presented here are useful for selection and design of CHEMFET membranes and provide an idea of the limitations of the amorphous CHEMFET device. In addition to the economical advantage, the small size of this prototype means it is appropriate for in situ operation and integration in a sensor array.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.