43 resultados para Latent class model


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment analysis has long focused on binary classification of text as either positive or negative. There has been few work on mapping sentiments or emotions into multiple dimensions. This paper studies a Bayesian modeling approach to multi-class sentiment classification and multidimensional sentiment distributions prediction. It proposes effective mechanisms to incorporate supervised information such as labeled feature constraints and document-level sentiment distributions derived from the training data into model learning. We have evaluated our approach on the datasets collected from the confession section of the Experience Project website where people share their life experiences and personal stories. Our results show that using the latent representation of the training documents derived from our approach as features to build a maximum entropy classifier outperforms other approaches on multi-class sentiment classification. In the more difficult task of multi-dimensional sentiment distributions prediction, our approach gives superior performance compared to a few competitive baselines. © 2012 ACM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon. Preferences on expectations of sentiment labels of those lexicon words are expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than exiting weakly-supervised sentiment classification methods despite using no labeled documents.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study the dynamics of a growing crystalline facet where the growth mechanism is controlled by the geometry of the local curvature. A continuum model, in (2+1) dimensions, is developed in analogy with the Kardar-Parisi-Zhang (KPZ) model is considered for the purpose. Following standard coarse graining procedures, it is shown that in the large time, long distance limit, the continuum model predicts a curvature independent KPZ phase, thereby suppressing all explicit effects of curvature and local pinning in the system, in the "perturbative" limit. A direct numerical integration of this growth equation, in 1+1 dimensions, supports this observation below a critical parametric range, above which generic instabilities, in the form of isolated pillared structures lead to deviations from standard scaling behaviour. Possibilities of controlling this instability by introducing statistically "irrelevant" (in the sense of renormalisation groups) higher ordered nonlinearities have also been discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cleavage by the proteasome is responsible for generating the C terminus of T-cell epitopes. Modeling the process of proteasome cleavage as part of a multi-step algorithm for T-cell epitope prediction will reduce the number of non-binders and increase the overall accuracy of the predictive algorithm. Quantitative matrix-based models for prediction of the proteasome cleavage sites in a protein were developed using a training set of 489 naturally processed T-cell epitopes (nonamer peptides) associated with HLA-A and HLA-B molecules. The models were validated using an external test set of 227 T-cell epitopes. The performance of the models was good, identifying 76% of the C-termini correctly. The best model of proteasome cleavage was incorporated as the first step in a three-step algorithm for T-cell epitope prediction, where subsequent steps predicted TAP affinity and MHC binding using previously derived models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The accurate identification of T-cell epitopes remains a principal goal of bioinformatics within immunology. As the immunogenicity of peptide epitopes is dependent on their binding to major histocompatibility complex (MHC) molecules, the prediction of binding affinity is a prerequisite to the reliable prediction of epitopes. The iterative self-consistent (ISC) partial-least-squares (PLS)-based additive method is a recently developed bioinformatic approach for predicting class II peptide−MHC binding affinity. The ISC−PLS method overcomes many of the conceptual difficulties inherent in the prediction of class II peptide−MHC affinity, such as the binding of a mixed population of peptide lengths due to the open-ended class II binding site. The method has applications in both the accurate prediction of class II epitopes and the manipulation of affinity for heteroclitic and competitor peptides. The method is applied here to six class II mouse alleles (I-Ab, I-Ad, I-Ak, I-As, I-Ed, and I-Ek) and included peptides up to 25 amino acids in length. A series of regression equations highlighting the quantitative contributions of individual amino acids at each peptide position was established. The initial model for each allele exhibited only moderate predictivity. Once the set of selected peptide subsequences had converged, the final models exhibited a satisfactory predictive power. Convergence was reached between the 4th and 17th iterations, and the leave-one-out cross-validation statistical terms - q2, SEP, and NC - ranged between 0.732 and 0.925, 0.418 and 0.816, and 1 and 6, respectively. The non-cross-validated statistical terms r2 and SEE ranged between 0.98 and 0.995 and 0.089 and 0.180, respectively. The peptides used in this study are available from the AntiJen database (http://www.jenner.ac.uk/AntiJen). The PLS method is available commercially in the SYBYL molecular modeling software package. The resulting models, which can be used for accurate T-cell epitope prediction, will be made freely available online (http://www.jenner.ac.uk/MHCPred).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A set of 38 epitopes and 183 non-epitopes, which bind to alleles of the HLA-A3 supertype, was subjected to a combination of comparative molecular similarity indices analysis (CoMSIA) and soft independent modeling of class analogy (SIMCA). During the process of T cell recognition, T cell receptors (TCR) interact with the central section of the bound nonamer peptide; thus only positions 4−8 were considered in the study. The derived model distinguished 82% of the epitopes and 73% of the non-epitopes after cross-validation in five groups. The overall preference from the model is for polar amino acids with high electron density and the ability to form hydrogen bonds. These so-called “aggressive” amino acids are flanked by small-sized residues, which enable such residues to protrude from the binding cleft and take an active role in TCR-mediated T cell recognition. Combinations of “aggressive” and “passive” amino acids in the middle part of epitopes constitute a putative TCR binding motif

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background - The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. Results - We developed a quantitative support vector machine regression (SVR) approach, called SVRMHC, to model peptide-MHC binding affinities. As a non-linear method, SVRMHC was able to generate models that out-performed existing linear models, such as the "additive method". By adopting a new "11-factor encoding" scheme, SVRMHC takes into account similarities in the physicochemical properties of the amino acids constituting the input peptides. When applied to MHC-peptide binding data for three mouse class I MHC alleles, the SVRMHC models produced more accurate predictions than those produced previously. Furthermore, comparisons based on Receiver Operating Characteristic (ROC) analysis indicated that SVRMHC was able to out-perform several prominent methods in identifying strongly binding peptides. Conclusion - As a method with demonstrated performance in the quantitative modeling of MHC-peptide binding and in identifying strong binders, SVRMHC is a promising immunoinformatics tool with not inconsiderable future potential.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: The immunogenicity of peptides depends on their ability to bind to MHC molecules. MHC binding affinity prediction methods can save significant amounts of experimental work. The class II MHC binding site is open at both ends, making epitope prediction difficult because of the multiple binding ability of long peptides. Results: An iterative self-consistent partial least squares (PLS)-based additive method was applied to a set of 66 pep- tides no longer than 16 amino acids, binding to DRB1*0401. A regression equation containing the quantitative contributions of the amino acids at each of the nine positions was generated. Its predictability was tested using two external test sets which gave r pred =0.593 and r pred=0.655, respectively. Furthermore, it was benchmarked using 25 known T-cell epitopes restricted by DRB1*0401 and we compared our results with four other online predictive methods. The additive method showed the best result finding 24 of the 25 T-cell epitopes. Availability: Peptides used in the study are available from http://www.jenner.ac.uk/JenPep. The PLS method is available commercially in the SYBYL molecular modelling software package. The final model for affinity prediction of peptides binding to DRB1*0401 molecule is available at http://www.jenner.ac.uk/MHCPred. Models developed for DRB1*0101 and DRB1*0701 also are available in MHC- Pred

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a pilot project an optimized mobile latent heat storage based on a system available on the market has been tested at Fraunhofer Institute for Environmental, Safety and Energy Technology. Initially trials were conducted with the aim of optimizing the process of charging and discharging. A specifically constructed test rig at the incineration trials centre at the institute allowed charging and discharging procedures of the mobile latent heat storage with adjustable parameters. In addition an evaluation model was constructed to further optimize the heat exchanger systems. In conclusion the prototype of the mobile latent heat storage was tested in practical operation. The economic and technical feasibility of heat transportation was shown if not utilized waste heat is available. © 2014 The Authors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Markovian models are widely used to analyse quality-of-service properties of both system designs and deployed systems. Thanks to the emergence of probabilistic model checkers, this analysis can be performed with high accuracy. However, its usefulness is heavily dependent on how well the model captures the actual behaviour of the analysed system. Our work addresses this problem for a class of Markovian models termed discrete-time Markov chains (DTMCs). We propose a new Bayesian technique for learning the state transition probabilities of DTMCs based on observations of the modelled system. Unlike existing approaches, our technique weighs observations based on their age, to account for the fact that older observations are less relevant than more recent ones. A case study from the area of bioinformatics workflows demonstrates the effectiveness of the technique in scenarios where the model parameters change over time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate a class of simple models for Langevin dynamics of turbulent flows, including the one-layer quasi-geostrophic equation and the two-dimensional Euler equations. Starting from a path integral representation of the transition probability, we compute the most probable fluctuation paths from one attractor to any state within its basin of attraction. We prove that such fluctuation paths are the time reversed trajectories of the relaxation paths for a corresponding dual dynamics, which are also within the framework of quasi-geostrophic Langevin dynamics. Cases with or without detailed balance are studied. We discuss a specific example for which the stationary measure displays either a second order (continuous) or a first order (discontinuous) phase transition and a tricritical point. In situations where a first order phase transition is observed, the dynamics are bistable. Then, the transition paths between two coexisting attractors are instantons (fluctuation paths from an attractor to a saddle), which are related to the relaxation paths of the corresponding dual dynamics. For this example, we show how one can analytically determine the instantons and compute the transition probabilities for rare transitions between two attractors.