989 resultados para Unsupervised classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a comparative study of three closely related Bayesian models for unsupervised document level sentiment classification, namely, the latent sentiment model (LSM), the joint sentiment-topic (JST) model, and the Reverse-JST model. Extensive experiments have been conducted on two corpora, the movie review dataset and the multi-domain sentiment dataset. It has been found that while all the three models achieve either better or comparable performance on these two corpora when compared to the existing unsupervised sentiment classification approaches, both JST and Reverse-JST are able to extract sentiment-oriented topics. In addition, Reverse-JST always performs worse than JST suggesting that the JST model is more appropriate for joint sentiment topic detection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a modification to the analytic hierarchy process (AHP) to select the most informative genes that serve as inputs to an interval type-2 fuzzy logic system (IT2FLS) for cancer classification. Unlike the conventional AHP, the modified AHP allows us to process quantitative factors that are ranking outcomes of individual gene selection methods including t-test, entropy, receiver operating characteristic curve, Wilcoxon test, and signal-to-noise ratio. The IT2FLS is introduced for the classification task due to its great ability for handling nonlinear, noisy, and outlier data, which are common problems in cancer microarray gene expression profiles. An unsupervised learning strategy using the fuzzy c-means clustering is employed to initialize parameters of the IT2FLS. Other classifiers such as multilayer perceptron network, support vector machine, and fuzzy ARTMAP are also implemented for comparisons. Experiments are carried out on three well-known microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, and prostate. Rather than the traditional cross validation, leave-one-out cross-validation strategy is applied for the experiments. Results demonstrate the performance dominance of the IT2FLS against the competing classifiers. More noticeably, the modified AHP improves the classification performance not only of the IT2FLS but of all other classifiers as well. Accordingly, the proposed combination between the modified AHP and IT2FLS is a powerful tool for cancer classification and can be implemented as a real clinical decision support system that is useful for medical practitioners.