27 resultados para latent semantic analysis
em Aston University Research Archive
Resumo:
Summary writing is an important part of many English Language Examinations. As grading students' summary writings is a very time-consuming task, computer-assisted assessment will help teachers carry out the grading more effectively. Several techniques such as latent semantic analysis (LSA), n-gram co-occurrence and BLEU have been proposed to support automatic evaluation of summaries. However, their performance is not satisfactory for assessing summary writings. To improve the performance, this paper proposes an ensemble approach that integrates LSA and n-gram co-occurrence. As a result, the proposed ensemble approach is able to achieve high accuracy and improve the performance quite substantially compared with current techniques. A summary assessment system based on the proposed approach has also been developed.
Resumo:
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.
Resumo:
Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets. We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.
Resumo:
This paper proposes a semantic analysis of the French free-choice indefinite 'n’importe qui'. The semantics of the indefinite is organised as a ternary structure. The (1) abstract meaning underlies all uses of the item and acts as a principle of creative interpretation generation and comprehension. This principle is actualised via (2) discrete contextual features through to (3) contextual interpretations. Thus, the “existential” reading of 'n’importe qui' is derived by a veridical reading of the arbitrary selection of a qualitatively-marked occurrence from the set of human animates. The derivation of contextual readings from the enrichment by contextual cues of an underspecified meaning has a claim to an explanatory model of the semantics of grammatical polysemous items, and is certainly relevant to model-theoretic approaches in as much as formal semantic notions are intricately linked to the contextual interpretation of items. It is not 'n’importe qui' itself, but its contextual interpretations which may be weak or strong, and an homonymous treatment is not possible given the continuity of the quality and free-choice dimensions from one observed reading of n’importe qui to the next.
Resumo:
This paper examines the status of scalarity in the analysis of the meaning of the English determiner any. The latter’s position as a prime exemplar of the category of polarity-sensitive items has led it to be generally assumed to have scalar meaning. Scalar effects are absent however from a number of common uses of this word. This suggests that any does not involve scales as part of its core meaning, but produces them as a derived interpretative property. The role of three factors in the derivation of the expressive effect of scalarity is explored: grammatical number, stress and the presence of gradable concepts in the NP. The general conclusions point to the importance of developing a causal semantic analysis in which the contributions of each of the various meaningful components of an utterance to the overall message expressed are carefully distinguished.
Resumo:
Bove, Pervan, Beatty, and Shiu [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] develop and test a latent variable model of the role of service workers in encouraging customers' organizational citizenship behaviors. However, Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] claim support for hypothesized relationships between constructs that, due to insufficient discriminant validity regarding certain constructs, may be inaccurate. This research comment discusses what discriminant validity represents, procedures for establishing discriminant validity, and presents an example of inaccurate discriminant validity assessment based upon the work of Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.]. Solutions to discriminant validity problems and a five-step procedure for assessing discriminant validity then conclude the paper. This comment hopes to motivate a review of discriminant validity issues and offers assistance to future researchers conducting latent variable analysis.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY WITH PRIOR ARRANGEMENT
Resumo:
Sentiment analysis over Twitter offer organisations a fast and effective way to monitor the publics' feelings towards their brand, business, directors, etc. A wide range of features and methods for training sentiment classifiers for Twitter datasets have been researched in recent years with varying results. In this paper, we introduce a novel approach of adding semantics as additional features into the training set for sentiment analysis. For each extracted entity (e.g. iPhone) from tweets, we add its semantic concept (e.g. Apple product) as an additional feature, and measure the correlation of the representative concept with negative/positive sentiment. We apply this approach to predict sentiment for three different Twitter datasets. Our results show an average increase of F harmonic accuracy score for identifying both negative and positive sentiment of around 6.5% and 4.8% over the baselines of unigrams and part-of-speech features respectively. We also compare against an approach based on sentiment-bearing topic analysis, and find that semantic features produce better Recall and F score when classifying negative sentiment, and better Precision with lower Recall and F score in positive sentiment classification.
Resumo:
Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure. © 2014 Springer International Publishing.
Resumo:
Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based upon a probability model. In this paper we demonstrate how the principal axes of a set of observed data vectors may be determined through maximum-likelihood estimation of parameters in a latent variable model closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss the advantages conveyed by the definition of a probability density function for PCA.
Resumo:
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.
Resumo:
This work explores the relevance of semantic and linguistic description to translation, theory and practice. It is aimed towards a practical model of approach to texts to translate. As literary texts [poetry mainly] are the focus of attention, so are stylistic matters. Note, however, that 'style', and, to some extent, the conclusions of the work, are not limited to so-called literary texts. The study of semantic description reveals that most translation problems do not stem from the cognitive (langue-related), but rather from the contextual (parole-related) aspects of meaning. Thus, any linguistic model that fails to account for the latter is bound to fall short. T.G.G. does, whereas Systemics, concerned with both the 'Iangue' and 'parole' (stylistic and sociolinguistic mainly) aspects of meaning, provides a useful framework of approach to texts to translate. Two essential semantic principles for translation are: that meaning is the property of a language (Firth); and the 'relativity of meaning assignments' (Tymoczko). Both imply that meaning can only be assessed, correctly, in the relevant socio-cultural background. Translation is seen as a restricted creation, and the translator's encroach as a three-dimensional critical one. To encompass the most technical to the most literary text, and account for variations in emphasis in any text, translation theory must be based on typology of function Halliday's ideational, interpersonal and textual, or, Buhler's symbol, signal, symptom, Functions3. Function Coverall and specific] will dictate aims and method, and also provide the critic with criteria to assess translation Faithfulness. Translation can never be reduced to purely objective methods, however. Intuitive procedures intervene, in textual interpretation and analysis, in the choice of equivalents, and in the reception of a translation. Ultimately, translation, theory and practice, may perhaps constitute the touchstone as regards the validity of linguistic and semantic theories.