839 resultados para word representations
Resumo:
We introduce a type of 2-tier convolutional neural network model for learning distributed paragraph representations for a special task (e.g. paragraph or short document level sentiment analysis and text topic categorization). We decompose the paragraph semantics into 3 cascaded constitutes: word representation, sentence composition and document composition. Specifically, we learn distributed word representations by a continuous bag-of-words model from a large unstructured text corpus. Then, using these word representations as pre-trained vectors, distributed task specific sentence representations are learned from a sentence level corpus with task-specific labels by the first tier of our model. Using these sentence representations as distributed paragraph representation vectors, distributed paragraph representations are learned from a paragraph-level corpus by the second tier of our model. It is evaluated on DBpedia ontology classification dataset and Amazon review dataset. Empirical results show the effectiveness of our proposed learning model for generating distributed paragraph representations.
Resumo:
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.
Resumo:
In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional
models of semantics that demonstrate cognitive plausibility. We find that word representations
learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse,
effective, and highly interpretable. To the best of our knowledge, this is the first approach which
yields semantic representation of words satisfying these three desirable properties. Though extensive
experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority
of semantic models learned by NNSE over other state-of-the-art baselines.
Resumo:
Dans la sémantique des cadres de Fillmore, les mots prennent leur sens par rapport au contexte événementiel ou situationnel dans lequel ils s’inscrivent. FrameNet, une ressource lexicale pour l’anglais, définit environ 1000 cadres conceptuels, couvrant l’essentiel des contextes possibles. Dans un cadre conceptuel, un prédicat appelle des arguments pour remplir les différents rôles sémantiques associés au cadre (par exemple : Victime, Manière, Receveur, Locuteur). Nous cherchons à annoter automatiquement ces rôles sémantiques, étant donné le cadre sémantique et le prédicat. Pour cela, nous entrainons un algorithme d’apprentissage machine sur des arguments dont le rôle est connu, pour généraliser aux arguments dont le rôle est inconnu. On utilisera notamment des propriétés lexicales de proximité sémantique des mots les plus représentatifs des arguments, en particulier en utilisant des représentations vectorielles des mots du lexique.
Resumo:
In recent years, learning word vector representations has attracted much interest in Natural Language Processing. Word representations or embeddings learned using unsupervised methods help addressing the problem of traditional bag-of-word approaches which fail to capture contextual semantics. In this paper we go beyond the vector representations at the word level and propose a novel framework that learns higher-level feature representations of n-grams, phrases and sentences using a deep neural network built from stacked Convolutional Restricted Boltzmann Machines (CRBMs). These representations have been shown to map syntactically and semantically related n-grams to closeby locations in the hidden feature space. We have experimented to additionally incorporate these higher-level features into supervised classifier training for two sentiment analysis tasks: subjectivity classification and sentiment classification. Our results have demonstrated the success of our proposed framework with 4% improvement in accuracy observed for subjectivity classification and improved the results achieved for sentiment classification over models trained without our higher level features.
Resumo:
Models of word meaning, built from a corpus of text, have demonstrated success in emulating human performance on a number of cognitive tasks. Many of these models use geometric representations of words to store semantic associations between words. Often word order information is not captured in these models. The lack of structural information used by these models has been raised as a weakness when performing cognitive tasks. This paper presents an efficient tensor based approach to modelling word meaning that builds on recent attempts to encode word order information, while providing flexible methods for extracting task specific semantic information.
Resumo:
Recent models of language comprehension have assumed a tight coupling between the semantic representations of action words and cortical motor areas. We combined functional MRI with cytoarchitectonically defined probabilistic maps of left hemisphere primary and premotor cortices to analyse responses of functionally delineated execution- and observation-related regions during comprehension of action word meanings associated with specific effectors (e.g., punch, bite or stomp) and processing of items with various levels of lexical information (non body part-related meanings, nonwords, and visual character strings). The comprehension of effector specific action word meanings did not elicit preferential activity corresponding to the somatotopic organisation of effectors in either primary or premotor cortex. However, generic action word meanings did show increased BOLD signal responses compared to all other classes of lexical stimuli in the pre-SMA. As expected, the majority of the BOLD responses elicited by the lexical stimuli were in association cortex adjacent to the motor areas. We contrast our results with those of previous studies reporting significant effects for only 1 or 2 effectors outside cytoarchitectonically defined motor regions and discuss the importance of controlling for potentially confounding lexical variables such as imageability. We conclude that there is no strong evidence for a somatotopic organisation of action word meaning representations and argue the pre-SMA might have a role in maintaining abstract representations of action words as instructional cues.
Resumo:
This article explores powerful, constraining representations of encounters between digital technologies and the bodies of students and teachers, using corpus-based Critical Discourse Analysis (CDA). It discusses examples from a corpus of UK Higher Education (HE) policy documents, and considers how confronting such documents may strengthen arguments from educators against narrow representations of an automatically enhanced learning. Examples reveal that a promise of enhanced ‘student experience’ through information and communication technologies internalizes the ideological constructs of technology and policy makers, to reinforce a primary logic of exchange value. The identified dominant discursive patterns are closely linked to the Californian ideology. By exposing these texts, they provide a form of ‘linguistic resistance’ for educators to disrupt powerful processes that serve the interests of a neoliberal social imaginary. To mine this current crisis of education, the authors introduce productive links between a Networked Learning approach and a posthumanist perspective. The Networked Learning approach emphasises conscious choices between political alternatives, which in turn could help us reconsider ways we write about digital technologies in policy. Then, based on the works of Haraway, Hayles, and Wark, a posthumanist perspective places human digital learning encounters at the juncture of non-humans and politics. Connections between the Networked Learning approach and the posthumanist perspective are necessary in order to replace a discourse of (mis)representations with a more performative view towards the digital human body, which then becomes situated at the centre of teaching and learning. In practice, however, establishing these connections is much more complex than resorting to the typically straightforward common sense discourse encountered in the Critical Discourse Analysis, and this may yet limit practical applications of this research in policy making.
Resumo:
My research investigates why nouns are learned disproportionately more frequently than other kinds of words during early language acquisition (Gentner, 1982; Gleitman, et al., 2004). This question must be considered in the context of cognitive development in general. Infants have two major streams of environmental information to make meaningful: perceptual and linguistic. Perceptual information flows in from the senses and is processed into symbolic representations by the primitive language of thought (Fodor, 1975). These symbolic representations are then linked to linguistic input to enable language comprehension and ultimately production. Yet, how exactly does perceptual information become conceptualized? Although this question is difficult, there has been progress. One way that children might have an easier job is if they have structures that simplify the data. Thus, if particular sorts of perceptual information could be separated from the mass of input, then it would be easier for children to refer to those specific things when learning words (Spelke, 1990; Pylyshyn, 2003). It would be easier still, if linguistic input was segmented in predictable ways (Gentner, 1982; Gleitman, et al., 2004) Unfortunately the frequency of patterns in lexical or grammatical input cannot explain the cross-cultural and cross-linguistic tendency to favor nouns over verbs and predicates. There are three examples of this failure: 1) a wide variety of nouns are uttered less frequently than a smaller number of verbs and yet are learnt far more easily (Gentner, 1982); 2) word order and morphological transparency offer no insight when you contrast the sentence structures and word inflections of different languages (Slobin, 1973) and 3) particular language teaching behaviors (e.g. pointing at objects and repeating names for them) have little impact on children's tendency to prefer concrete nouns in their first fifty words (Newport, et al., 1977). Although the linguistic solution appears problematic, there has been increasing evidence that the early visual system does indeed segment perceptual information in specific ways before the conscious mind begins to intervene (Pylyshyn, 2003). I argue that nouns are easier to learn because their referents directly connect with innate features of the perceptual faculty. This hypothesis stems from work done on visual indexes by Zenon Pylyshyn (2001, 2003). Pylyshyn argues that the early visual system (the architecture of the "vision module") segments perceptual data into pre-conceptual proto-objects called FINSTs. FINSTs typically correspond to physical things such as Spelke objects (Spelke, 1990). Hence, before conceptualization, visual objects are picked out by the perceptual system demonstratively, like a finger pointing indicating ‘this’ or ‘that’. I suggest that this primitive system of demonstration elaborates on Gareth Evan's (1982) theory of nonconceptual content. Nouns are learnt first because their referents attract demonstrative visual indexes. This theory also explains why infants less often name stationary objects such as plate or table, but do name things that attract the focal attention of the early visual system, i.e., small objects that move, such as ‘dog’ or ‘ball’. This view leaves open the question how blind children learn words for visible objects and why children learn category nouns (e.g. 'dog'), rather than proper nouns (e.g. 'Fido') or higher taxonomic distinctions (e.g. 'animal').
Resumo:
The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.
Resumo:
Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.
Resumo:
This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.
Resumo:
Language processing is an example of implicit learning of multiple statistical cues that provide probabilistic information regarding word structure and use. Much of the current debate about language embodiment is devoted to how action words are represented in the brain, with motor cortex activity evoked by these words assumed to selectively reflect conceptual content and/or its simulation. We investigated whether motor cortex activity evoked by manual action words (e.g., caress) might reflect sensitivity to probabilistic orthographic-phonological cues to grammatical category embedded within individual words. We first review neuroimaging data demonstrating that nonwords evoke activity much more reliably than action words along the entire motor strip, encompassing regions proposed to be action category specific. Using fMRI, we found that disyllabic words denoting manual actions evoked increased motor cortex activity compared with non-body-part-related words (e.g., canyon), activity which overlaps that evoked by observing and executing hand movements. This result is typically interpreted in support of language embodiment. Crucially, we also found that disyllabic nonwords containing endings with probabilistic cues predictive of verb status (e.g., -eve) evoked increased activity compared with nonwords with endings predictive of noun status (e.g., -age) in the identical motor area. Thus, motor cortex responses to action words cannot be assumed to selectively reflect conceptual content and/or its simulation. Our results clearly demonstrate motor cortex activity reflects implicit processing of ortho-phonological statistical regularities that help to distinguish a word's grammatical class.
Resumo:
Studies of semantic context effects in spoken word production have typically distinguished between categorical (or taxonomic) and associative relations. However, associates tend to confound semantic features or morphological representations, such as whole-part relations and compounds (e.g., BOAT-anchor, BEE-hive). Using a picture-word interference paradigm and functional magnetic resonance imaging (fMRI), we manipulated categorical (COW-rat) and thematic (COW-pasture) TARGET-distractor relations in a balanced design, finding interference and facilitation effects on naming latencies, respectively, as well as differential patterns of brain activation compared with an unrelated distractor condition. While both types of distractor relation activated the middle portion of the left middle temporal gyrus (MTG) consistent with retrieval of conceptual or lexical representations, categorical relations involved additional activation of posterior left MTG, consistent with retrieval of a lexical cohort. Thematic relations involved additional activation of the left angular gyrus. These results converge with recent lesion evidence implicating the left inferior parietal lobe in processing thematic relations and may indicate a potential role for this region during spoken word production.
Resumo:
Word frequency (WF) and strength effects are two important phenomena associated with episodic memory. The former refers to the superior hit-rate (HR) for low (LF) compared to high frequency (HF) words in recognition memory, while the latter describes the incremental effect(s) upon HRs associated with repeating an item at study. Using the "subsequent memory" method with event-related fMRI, we tested the attention-at-encoding (AE) [M. Glanzer, J.K. Adams, The mirror effect in recognition memory: data and theory, J. Exp. Psychol.: Learn Mem. Cogn. 16 (1990) 5-16] explanation of the WF effect. In addition to investigating encoding strength, we addressed if study involves accessing prior representations of repeated items via the same mechanism as that at test [J.L. McClelland, M. Chappell, Familiarity breeds differentiation: a subjective-likelihood approach to the effects of experience in recognition memory, Psychol. Rev. 105 (1998) 724-760], entailing recollection [K.J. Malmberg, J.E. Holden, R.M. Shiffrin, Modeling the effects of repetitions, similarity, and normative word frequency on judgments of frequency and recognition memory, J. Exp. Psychol.: Learn Mem. Cogn. 30 (2004) 319-331] and whether less processing effort is entailed for encoding each repetition [M. Cary, L.M. Reder, A dual-process account of the list-length and strength-based mirror effects in recognition, J. Mem. Lang. 49 (2003) 231-248]. The increased BOLD responses observed in the left inferior prefrontal cortex (LIPC) for the WF effect provide support for an AE account. Less effort does appear to be required for encoding each repetition of an item, as reduced BOLD responses were observed in the LIPC and left lateral temporal cortex; both regions demonstrated increased responses in the conventional subsequent memory analysis. At test, a left lateral parietal BOLD response was observed for studied versus unstudied items, while only medial parietal activity was observed for repeated items at study, indicating that accessing prior representations at encoding does not necessarily occur via the same mechanism as that at test, and is unlikely to involve a conscious recall-like process such as recollection. This information may prove useful for constraining cognitive theories of episodic memory.