968 resultados para Unsupervised classification
Resumo:
Imaging flow cytometry is an emerging technology that combines the statistical power of flow cytometry with spatial and quantitative morphology of digital microscopy. It allows high-throughput imaging of cells with good spatial resolution, while they are in flow. This paper proposes a general framework for the processing/classification of cells imaged using imaging flow cytometer. Each cell is localized by finding an accurate cell contour. Then, features reflecting cell size, circularity and complexity are extracted for the classification using SVM. Unlike the conventional iterative, semi-automatic segmentation algorithms such as active contour, we propose a noniterative, fully automatic graph-based cell localization. In order to evaluate the performance of the proposed framework, we have successfully classified unstained label-free leukaemia cell-lines MOLT, K562 and HL60 from video streams captured using custom fabricated cost-effective microfluidics-based imaging flow cytometer. The proposed system is a significant development in the direction of building a cost-effective cell analysis platform that would facilitate affordable mass screening camps looking cellular morphology for disease diagnosis. Lay description In this article, we propose a novel framework for processing the raw data generated using microfluidics based imaging flow cytometers. Microfluidics microscopy or microfluidics based imaging flow cytometry (mIFC) is a recent microscopy paradigm, that combines the statistical power of flow cytometry with spatial and quantitative morphology of digital microscopy, which allows us imaging cells while they are in flow. In comparison to the conventional slide-based imaging systems, mIFC is a nascent technology enabling high throughput imaging of cells and is yet to take the form of a clinical diagnostic tool. The proposed framework process the raw data generated by the mIFC systems. The framework incorporates several steps: beginning from pre-processing of the raw video frames to enhance the contents of the cell, localising the cell by a novel, fully automatic, non-iterative graph based algorithm, extraction of different quantitative morphological parameters and subsequent classification of cells. In order to evaluate the performance of the proposed framework, we have successfully classified unstained label-free leukaemia cell-lines MOLT, K562 and HL60 from video streams captured using cost-effective microfluidics based imaging flow cytometer. The cell lines of HL60, K562 and MOLT were obtained from ATCC (American Type Culture Collection) and are separately cultured in the lab. Thus, each culture contains cells from its own category alone and thereby provides the ground truth. Each cell is localised by finding a closed cell contour by defining a directed, weighted graph from the Canny edge images of the cell such that the closed contour lies along the shortest weighted path surrounding the centroid of the cell from a starting point on a good curve segment to an immediate endpoint. Once the cell is localised, morphological features reflecting size, shape and complexity of the cells are extracted and used to develop a support vector machine based classification system. We could classify the cell-lines with good accuracy and the results were quite consistent across different cross validation experiments. We hope that imaging flow cytometers equipped with the proposed framework for image processing would enable cost-effective, automated and reliable disease screening in over-loaded facilities, which cannot afford to hire skilled personnel in large numbers. Such platforms would potentially facilitate screening camps in low income group countries; thereby transforming the current health care paradigms by enabling rapid, automated diagnosis for diseases like cancer.
Resumo:
Among the multiple advantages and applications of remote sensing, one of the most important uses is to solve the problem of crop classification, i.e., differentiating between various crop types. Satellite images are a reliable source for investigating the temporal changes in crop cultivated areas. In this letter, we propose a novel bat algorithm (BA)-based clustering approach for solving crop type classification problems using a multispectral satellite image. The proposed partitional clustering algorithm is used to extract information in the form of optimal cluster centers from training samples. The extracted cluster centers are then validated on test samples. A real-time multispectral satellite image and one benchmark data set from the University of California, Irvine (UCI) repository are used to demonstrate the robustness of the proposed algorithm. The performance of the BA is compared with two other nature-inspired metaheuristic techniques, namely, genetic algorithm and particle swarm optimization. The performance is also compared with the existing hybrid approach such as the BA with K-means. From the results obtained, it can be concluded that the BA can be successfully applied to solve crop type classification problems.
Resumo:
Acoustic feature based speech (syllable) rate estimation and syllable nuclei detection are important problems in automatic speech recognition (ASR), computer assisted language learning (CALL) and fluency analysis. A typical solution for both the problems consists of two stages. The first stage involves computing a short-time feature contour such that most of the peaks of the contour correspond to the syllabic nuclei. In the second stage, the peaks corresponding to the syllable nuclei are detected. In this work, instead of the peak detection, we perform a mode-shape classification, which is formulated as a supervised binary classification problem - mode-shapes representing the syllabic nuclei as one class and remaining as the other. We use the temporal correlation and selected sub-band correlation (TCSSBC) feature contour and the mode-shapes in the TCSSBC feature contour are converted into a set of feature vectors using an interpolation technique. A support vector machine classifier is used for the classification. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora in a five-fold cross validation setup. The average correlation coefficients for the syllable rate estimation turn out to be 0.6761, 0.6928 and 0.3604 for three corpora respectively, which outperform those obtained by the best of the existing peak detection techniques. Similarly, the average F-scores (syllable level) for the syllable nuclei detection are 0.8917, 0.8200 and 0.7637 for three corpora respectively. (C) 2016 Elsevier B.V. All rights reserved.
Resumo:
A significant cost in obtaining acoustic training data is the generation of accurate transcriptions. For some sources close-caption data is available. This allows the use of lightly-supervised training techniques. However, for some sources and languages close-caption is not available. In these cases unsupervised training techniques must be used. This paper examines the use of unsupervised techniques for discriminative training. In unsupervised training automatic transcriptions from a recognition system are used for training. As these transcriptions may be errorful data selection may be useful. Two forms of selection are described, one to remove non-target language shows, the other to remove segments with low confidence. Experiments were carried out on a Mandarin transcriptions task. Two types of test data were considered, Broadcast News (BN) and Broadcast Conversations (BC). Results show that the gains from unsupervised discriminative training are highly dependent on the accuracy of the automatic transcriptions. © 2007 IEEE.
Resumo:
Traffic classification using machine learning continues to be an active research area. The majority of work in this area uses off-the-shelf machine learning tools and treats them as black-box classifiers. This approach turns all the modelling complexity into a feature selection problem. In this paper, we build a problem-specific solution to the traffic classification problem by designing a custom probabilistic graphical model. Graphical models are a modular framework to design classifiers which incorporate domain-specific knowledge. More specifically, our solution introduces semi-supervised learning which means we learn from both labelled and unlabelled traffic flows. We show that our solution performs competitively compared to previous approaches while using less data and simpler features. Copyright © 2010 ACM.