939 resultados para Supervised classifier


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article presents two novel approaches for incorporating sentiment prior knowledge into the topic model for weakly supervised sentiment analysis where sentiment labels are considered as topics. One is by modifying the Dirichlet prior for topic-word distribution (LDA-DP), the other is by augmenting the model objective function through adding terms that express preferences on expectations of sentiment labels of the lexicon words using generalized expectation criteria (LDA-GE). We conducted extensive experiments on English movie review data and multi-domain sentiment dataset as well as Chinese product reviews about mobile phones, digital cameras, MP3 players, and monitors. The results show that while both LDA-DP and LDAGE perform comparably to existing weakly supervised sentiment classification algorithms, they are much simpler and computationally efficient, rendering themmore suitable for online and real-time sentiment classification on the Web. We observed that LDA-GE is more effective than LDA-DP, suggesting that it should be preferred when considering employing the topic model for sentiment analysis. Moreover, both models are able to extract highly domain-salient polarity words from text.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon. Preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than exiting weakly-supervised sentiment classification methods despite using no labeled documents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identification of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classification results and produces more coherent violence-related topics compared toa few competitive baselines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The early stages of dieting to lose weight have been associated with neuro-psychological impairments. Previous work has not elucidated whether these impairments are a function solely of unsupported or supported dieting. Raised cortico-steroid levels have been implicated as a possible causal mechanism. Healthy, overweight, pre-menopausal women were randomised to one of three conditions in which they dieted either as part of a commercially available weight loss group, dieted without any group support or acted as non-dieting controls for 8 weeks. Testing occurred at baseline and at 1, 4 and 8 weeks post baseline. During each session, participants completed measures of simple reaction time, motor speed, vigilance, immediate verbal recall, visuo-spatial processing and (at Week 1 only) executive function. Cortisol levels were gathered at the beginning and 30 min into each test session, via saliva samples. Also, food intake was self-recorded prior to each session and fasting body weight and percentage body fat were measured at each session. Participants in the unsupported diet condition displayed poorer vigilance performance (p=0.001) and impaired executive planning function (p=0.013) (along with a marginally significant trend for poorer visual recall (p=0.089)) after 1 week of dieting. No such impairments were observed in the other two groups. In addition, the unsupported dieters experienced a significant rise in salivary cortisol levels after 1 week of dieting (p<0.001). Both dieting groups lost roughly the same amount of body mass (p=0.011) over the course of the 8 weeks of dieting, although only the unsupported dieters experienced a significant drop in percentage body fat over the course of dieting (p=0.016). The precise causal nature of the relationship between stress, cortisol, unsupported dieting and cognitive function is, however, uncertain and should be the focus of further research. © 2005 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper aims at development of procedures and algorithms for application of artificial intelligence tools to acquire process and analyze various types of knowledge. The proposed environment integrates techniques of knowledge and decision process modeling such as neural networks and fuzzy logic-based reasoning methods. The problem of an identification of complex processes with the use of neuro-fuzzy systems is solved. The proposed classifier has been successfully applied for building one decision support systems for solving managerial problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Report published in the Proceedings of the National Conference on "Education in the Information Society", Plovdiv, May, 2013

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Resource discovery is one of the key services in digitised cultural heritage collections. It requires intelligent mining in heterogeneous digital content as well as capabilities in large scale performance; this explains the recent advances in classification methods. Associative classifiers are convenient data mining tools used in the field of cultural heritage, by applying their possibilities to taking into account the specific combinations of the attribute values. Usually, the associative classifiers prioritize the support over the confidence. The proposed classifier PGN questions this common approach and focuses on confidence first by retaining only 100% confidence rules. The classification tasks in the field of cultural heritage usually deal with data sets with many class labels. This variety is caused by the richness of accumulated culture during the centuries. Comparisons of classifier PGN with other classifiers, such as OneR, JRip and J48, show the competitiveness of PGN in recognizing multi-class datasets on collections of masterpieces from different West and East European Fine Art authors and movements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present one approach for extending the learning set of a classification algorithm with additional metadata. It is used as a base for giving appropriate names to found regularities. The analysis of correspondence between connections established in the attribute space and existing links between concepts can be used as a test for creation of an adequate model of the observed world. Meta-PGN classifier is suggested as a possible tool for establishing these connections. Applying this approach in the field of content-based image retrieval of art paintings provides a tool for extracting specific feature combinations, which represent different sides of artists' styles, periods and movements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Graph-based representations have been used with considerable success in computer vision in the abstraction and recognition of object shape and scene structure. Despite this, the methodology available for learning structural representations from sets of training examples is relatively limited. In this paper we take a simple yet effective Bayesian approach to attributed graph learning. We present a naïve node-observation model, where we make the important assumption that the observation of each node and each edge is independent of the others, then we propose an EM-like approach to learn a mixture of these models and a Minimum Message Length criterion for components selection. Moreover, in order to avoid the bias that could arise with a single estimation of the node correspondences, we decide to estimate the sampling probability over all the possible matches. Finally we show the utility of the proposed approach on popular computer vision tasks such as 2D and 3D shape recognition. © 2011 Springer-Verlag.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ACM Computing Classification System (1998): H.2.8, H.3.3.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space