28 resultados para binary to multi-class classifiers
em Aston University Research Archive
Resumo:
Sentiment analysis has long focused on binary classification of text as either positive or negative. There has been few work on mapping sentiments or emotions into multiple dimensions. This paper studies a Bayesian modeling approach to multi-class sentiment classification and multidimensional sentiment distributions prediction. It proposes effective mechanisms to incorporate supervised information such as labeled feature constraints and document-level sentiment distributions derived from the training data into model learning. We have evaluated our approach on the datasets collected from the confession section of the Experience Project website where people share their life experiences and personal stories. Our results show that using the latent representation of the training documents derived from our approach as features to build a maximum entropy classifier outperforms other approaches on multi-class sentiment classification. In the more difficult task of multi-dimensional sentiment distributions prediction, our approach gives superior performance compared to a few competitive baselines. © 2012 ACM.
Resumo:
Two algorithms, based onBayesian Networks (BNs), for bacterial subcellular location prediction, are explored in this paper: one predicts all locations for Gram+ bacteria and the other all locations for Gram- bacteria. Methods were evaluated using different numbers of residues (from the N-terminal 10 residues to the whole sequence) and residue representation (amino acid-composition, percentage amino acid-composition or normalised amino acid-composition). The accuracy of the best resulting BN was compared to PSORTB. The accuracy of this multi-location BN was roughly comparable to PSORTB; the difference in predictions is low, often less than 2%. The BN method thus represents both an important new avenue of methodological development for subcellular location prediction and a potentially value new tool of true utilitarian value for candidate subunit vaccine selection.
Combinatorial approach to multi-substituted 1,4-Benzodiazepines as novel non-peptide CCK-antagonists
Resumo:
For the drug discovery process, a library of 168 multisubstituted 1,4-benzodiazepines were prepared by a 5-step solid phase combinatorial approach. Substituents were varied in the 3,5, 7 and 8-position on the benzodiazepine scaffold. The combinatorial library was evaluated in a CCK radiolabelled binding assay and CCKA (alimentary) and CCKB (brain) selective lead structures were discovered. The template of CCKA selective 1,4-benzodiazepin-2-ones bearing the tryptophan moiety was chemically modified by selective alkylation and acylation reactions. These studies provided a series of Asperlicin naturally analogues. The fully optimised Asperlicin related compound possessed a similar CCKA activity as the natural occuring compound. 3-Alkylated 1,4-benzodiazepines with selectivity towards the CCKB receptor subtype were optimised on A) the lipophilic side chain and B) the 2-aminophenyl-ketone moiety, together with some stereochemical changes. A C3 unit in the 3-position of 1,4-benzodiazepines possessed a CCKB activity within the nanomolar range. Further SAR optimisation on the N1-position by selective alkylation resulted in an improved CCKB binding with potentially decreased activity on the GABAA/benzodiazepine receptor complex. The in vivo studies revealed two N1-alkylated compounds containing unsaturated alkyl groups with anxiolytic properties. Alternative chemical approaches have been developed, including a route that is suitable for scale up of the desired target molecule in order to provide sufficient quantities for further in vivo evaluation.
Resumo:
We have investigated how optimal coding for neural systems changes with the time available for decoding. Optimization was in terms of maximizing information transmission. We have estimated the parameters for Poisson neurons that optimize Shannon transinformation with the assumption of rate coding. We observed a hierarchy of phase transitions from binary coding, for small decoding times, toward discrete (M-ary) coding with two, three and more quantization levels for larger decoding times. We postulate that the presence of subpopulations with specific neural characteristics could be a signiture of an optimal population coding scheme and we use the mammalian auditory system as an example.
Resumo:
Biomedical relation extraction aims to uncover high-quality relations from life science literature with high accuracy and efficiency. Early biomedical relation extraction tasks focused on capturing binary relations, such as protein-protein interactions, which are crucial for virtually every process in a living cell. Information about these interactions provides the foundations for new therapeutic approaches. In recent years, more interests have been shifted to the extraction of complex relations such as biomolecular events. While complex relations go beyond binary relations and involve more than two arguments, they might also take another relation as an argument. In the paper, we conduct a thorough survey on the research in biomedical relation extraction. We first present a general framework for biomedical relation extraction and then discuss the approaches proposed for binary and complex relation extraction with focus on the latter since it is a much more difficult task compared to binary relation extraction. Finally, we discuss challenges that we are facing with complex relation extraction and outline possible solutions and future directions.
Resumo:
We consider the problem of assigning an input vector bfx to one of m classes by predicting P(c|bfx) for c = 1, ldots, m. For a two-class problem, the probability of class 1 given bfx is estimated by s(y(bfx)), where s(y) = 1/(1 + e-y). A Gaussian process prior is placed on y(bfx), and is combined with the training data to obtain predictions for new bfx points. We provide a Bayesian treatment, integrating over uncertainty in y and in the parameters that control the Gaussian process prior; the necessary integration over y is carried out using Laplace's approximation. The method is generalized to multi-class problems (m >2) using the softmax function. We demonstrate the effectiveness of the method on a number of datasets.
Resumo:
This thesis presents a thorough and principled investigation into the application of artificial neural networks to the biological monitoring of freshwater. It contains original ideas on the classification and interpretation of benthic macroinvertebrates, and aims to demonstrate their superiority over the biotic systems currently used in the UK to report river water quality. The conceptual basis of a new biological classification system is described, and a full review and analysis of a number of river data sets is presented. The biological classification is compared to the common biotic systems using data from the Upper Trent catchment. This data contained 292 expertly classified invertebrate samples identified to mixed taxonomic levels. The neural network experimental work concentrates on the classification of the invertebrate samples into biological class, where only a subset of the sample is used to form the classification. Other experimentation is conducted into the identification of novel input samples, the classification of samples from different biotopes and the use of prior information in the neural network models. The biological classification is shown to provide an intuitive interpretation of a graphical representation, generated without reference to the class labels, of the Upper Trent data. The selection of key indicator taxa is considered using three different approaches; one novel, one from information theory and one from classical statistical methods. Good indicators of quality class based on these analyses are found to be in good agreement with those chosen by a domain expert. The change in information associated with different levels of identification and enumeration of taxa is quantified. The feasibility of using neural network classifiers and predictors to develop numeric criteria for the biological assessment of sediment contamination in the Great Lakes is also investigated.
Resumo:
In this paper, the problem of semantic place categorization in mobile robotics is addressed by considering a time-based probabilistic approach called dynamic Bayesian mixture model (DBMM), which is an improved variation of the dynamic Bayesian network. More specifically, multi-class semantic classification is performed by a DBMM composed of a mixture of heterogeneous base classifiers, using geometrical features computed from 2D laserscanner data, where the sensor is mounted on-board a moving robot operating indoors. Besides its capability to combine different probabilistic classifiers, the DBMM approach also incorporates time-based (dynamic) inferences in the form of previous class-conditional probabilities and priors. Extensive experiments were carried out on publicly available benchmark datasets, highlighting the influence of the number of time-slices and the effect of additive smoothing on the classification performance of the proposed approach. Reported results, under different scenarios and conditions, show the effectiveness and competitive performance of the DBMM.
Resumo:
It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that, for the purposes of network training, the regularization term can be reduced to a positive definite form which involves only first derivatives of the network mapping. For a sum-of-squares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise.
Resumo:
Modelling class B G-protein-coupled receptors (GPCRs) using class A GPCR structural templates is difficult due to lack of homology. The plant GPCR, GCR1, has homology to both class A and class B GPCRs. We have used this to generate a class A-class B alignment, and by incorporating maximum lagged correlation of entropy and hydrophobicity into a consensus score, we have been able to align receptor transmembrane regions. We have applied this analysis to generate active and inactive homology models of the class B calcitonin gene-related peptide (CGRP) receptor, and have supported it with site-directed mutagenesis data using 122 CGRP receptor residues and 144 published mutagenesis results on other class B GPCRs. The variation of sequence variability with structure, the analysis of polarity violations, the alignment of group-conserved residues and the mutagenesis results at 27 key positions were particularly informative in distinguishing between the proposed and plausible alternative alignments. Furthermore, we have been able to associate the key molecular features of the class B GPCR signalling machinery with their class A counterparts for the first time. These include the [K/R]KLH motif in intracellular loop 1, [I/L]xxxL and KxxK at the intracellular end of TM5 and TM6, the NPXXY/VAVLY motif on TM7 and small group-conserved residues in TM1, TM2, TM3 and TM7. The equivalent of the class A DRY motif is proposed to involve Arg(2.39), His(2.43) and Glu(3.46), which makes a polar lock with T(6.37). These alignments and models provide useful tools for understanding class B GPCR function.
Resumo:
The Multiple Pheromone Ant Clustering Algorithm (MPACA) models the collective behaviour of ants to find clusters in data and to assign objects to the most appropriate class. It is an ant colony optimisation approach that uses pheromones to mark paths linking objects that are similar and potentially members of the same cluster or class. Its novelty is in the way it uses separate pheromones for each descriptive attribute of the object rather than a single pheromone representing the whole object. Ants that encounter other ants frequently enough can combine the attribute values they are detecting, which enables the MPACA to learn influential variable interactions. This paper applies the model to real-world data from two domains. One is logistics, focusing on resource allocation rather than the more traditional vehicle-routing problem. The other is mental-health risk assessment. The task for the MPACA in each domain was to predict class membership where the classes for the logistics domain were the levels of demand on haulage company resources and the mental-health classes were levels of suicide risk. Results on these noisy real-world data were promising, demonstrating the ability of the MPACA to find patterns in the data with accuracy comparable to more traditional linear regression models. © 2013 Polish Information Processing Society.
Resumo:
Effective clinical decision making depends upon identifying possible outcomes for a patient, selecting relevant cues, and processing the cues to arrive at accurate judgements of each outcome's probability of occurrence. These activities can be considered as classification tasks. This paper describes a new model of psychological classification that explains how people use cues to determine class or outcome likelihoods. It proposes that clinicians respond to conditional probabilities of outcomes given cues and that these probabilities compete with each other for influence on classification. The model explains why people appear to respond to base rates inappropriately, thereby overestimating the occurrence of rare categories, and a clinical example is provided for predicting suicide risk. The model makes an effective representation for expert clinical judgements and its psychological validity enables it to generate explanations in a form that is comprehensible to clinicians. It is a strong candidate for incorporation within a decision support system for mental-health risk assessment, where it can link with statistical and pattern recognition tools applied to a database of patients. The symbiotic combination of empirical evidence and clinical expertise can provide an important web-based resource for risk assessment, including multi-disciplinary education and training. © 2002 Informa UK Ltd All rights reserved.
Resumo:
Background - MHC Class I molecules present antigenic peptides to cytotoxic T cells, which forms an integral part of the adaptive immune response. Peptides are bound within a groove formed by the MHC heavy chain. Previous approaches to MHC Class I-peptide binding prediction have largely concentrated on the peptide anchor residues located at the P2 and C-terminus positions. Results - A large dataset comprising MHC-peptide structural complexes was created by re-modelling pre-determined x-ray crystallographic structures. Static energetic analysis, following energy minimisation, was performed on the dataset in order to characterise interactions between bound peptides and the MHC Class I molecule, partitioning the interactions within the groove into van der Waals, electrostatic and total non-bonded energy contributions. Conclusion - The QSAR techniques of Genetic Function Approximation (GFA) and Genetic Partial Least Squares (G/PLS) algorithms were used to identify key interactions between the two molecules by comparing the calculated energy values with experimentally-determined BL50 data. Although the peptide termini binding interactions help ensure the stability of the MHC Class I-peptide complex, the central region of the peptide is also important in defining the specificity of the interaction. As thermodynamic studies indicate that peptide association and dissociation may be driven entropically, it may be necessary to incorporate entropic contributions into future calculations.