6 resultados para Automatic Analysis of Multivariate Categorical Data Sets

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Opposite enantiomers exhibit different NMR properties in the presence of an external common chiral element, and a chiral molecule exhibits different NMR properties in the presence of external enantiomeric chiral elements. Automatic prediction of such differences, and comparison with experimental values, leads to the assignment of the absolute configuration. Here two cases are reported, one using a dataset of 80 chiral secondary alcohols esterified with (R)-MTPA and the corresponding 1H NMR chemical shifts and the other with 94 13C NMR chemical shifts of chiral secondary alcohols in two enantiomeric chiral solvents. For the first application, counterpropagation neural networks were trained to predict the sign of the difference between chemical shifts of opposite stereoisomers. The neural networks were trained to process the chirality code of the alcohol as the input, and to give the NMR property as the output. In the second application, similar neural networks were employed, but the property to predict was the difference of chemical shifts in the two enantiomeric solvents. For independent test sets of 20 objects, 100% correct predictions were obtained in both applications concerning the sign of the chemical shifts differences. Additionally, with the second dataset, the difference of chemical shifts in the two enantiomeric solvents was quantitatively predicted, yielding r2 0.936 for the test set between the predicted and experimental values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Incidental findings on low-dose CT images obtained during hybrid imaging are an increasing phenomenon as CT technology advances. Understanding the diagnostic value of incidental findings along with the technical limitations is important when reporting image results and recommending follow-up, which may result in an additional radiation dose from further diagnostic imaging and an increase in patient anxiety. This study assessed lesions incidentally detected on CT images acquired for attenuation correction on two SPECT/CT systems. Methods: An anthropomorphic chest phantom containing simulated lesions of varying size and density was imaged on an Infinia Hawkeye 4 and a Symbia T6 using the low-dose CT settings applied for attenuation correction acquisitions in myocardial perfusion imaging. Twenty-two interpreters assessed 46 images from each SPECT/CT system (15 normal images and 31 abnormal images; 41 lesions). Data were evaluated using a jackknife alternative free-response receiver-operating-characteristic analysis (JAFROC). Results: JAFROC analysis showed a significant difference (P < 0.0001) in lesion detection, with the figures of merit being 0.599 (95% confidence interval, 0.568, 0.631) and 0.810 (95% confidence interval, 0.781, 0.839) for the Infinia Hawkeye 4 and Symbia T6, respectively. Lesion detection on the Infinia Hawkeye 4 was generally limited to larger, higher-density lesions. The Symbia T6 allowed improved detection rates for midsized lesions and some lower-density lesions. However, interpreters struggled to detect small (5 mm) lesions on both image sets, irrespective of density. Conclusion: Lesion detection is more reliable on low-dose CT images from the Symbia T6 than from the Infinia Hawkeye 4. This phantom-based study gives an indication of potential lesion detection in the clinical context as shown by two commonly used SPECT/CT systems, which may assist the clinician in determining whether further diagnostic imaging is justified.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction: multimodality environment; requirement for greater understanding of the imaging technologies used, the limitations of these technologies, and how to best interpret the results; dose optimization; introduction of new techniques; current practice and best practice; incidental findings, in low-dose CT images obtained as part of the hybrid imaging process, are an increasing phenomenon with advancing CT technology; resultant ethical and medico-legal dilemmas; understanding limitations of these procedures important when reporting images and recommending follow-up; free-response observer performance study was used to evaluate lesion detection in low-dose CT images obtained during attenuation correction acquisitions for myocardial perfusion imaging, on two hybrid imaging systems.