992 resultados para Semantic space
Resumo:
Linguistic theory, cognitive, information, and mathematical modeling are all useful while we attempt to achieve a better understanding of the Language Faculty (LF). This cross-disciplinary approach will eventually lead to the identification of the key principles applicable in the systems of Natural Language Processing. The present work concentrates on the syntax-semantics interface. We start from recursive definitions and application of optimization principles, and gradually develop a formal model of syntactic operations. The result – a Fibonacci- like syntactic tree – is in fact an argument-based variant of the natural language syntax. This representation (argument-centered model, ACM) is derived by a recursive calculus that generates a mode which connects arguments and expresses relations between them. The reiterative operation assigns primary role to entities as the key components of syntactic structure. We provide experimental evidence in support of the argument-based model. We also show that mental computation of syntax is influenced by the inter-conceptual relations between the images of entities in a semantic space.
Resumo:
This paper shows that by using only symbolic language phrases, a mobile robot can purposefully navigate to specified rooms in previously unexplored environments. The robot intelligently organises a symbolic language description of the unseen environment and “imagines” a representative map, called the abstract map. The abstract map is an internal representation of the topological structure and spatial layout of symbolically defined locations. To perform goal-directed exploration, the abstract map creates a high-level semantic plan to reason about spaces beyond the robot’s known world. While completing the plan, the robot uses the metric guidance provided by a spatial layout, and grounded observations of door labels, to efficiently guide its navigation. The system is shown to complete exploration in unexplored spaces by travelling only 13.3% further than the optimal path.
Resumo:
In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.
Resumo:
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the Web, and they use Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The second aim of this paper is to use these concepts to circumscribe what Web space is, what it represents and how it can be represented and analyzed. This is used to sketch the role that Semantic Web Mining and the software agents and human agents involved in it can play in the evolution of Web space.
Resumo:
Coarse semantic encoding and broad categorization behavior are the hallmarks of the right cerebral hemisphere's contribution to language processing. We correlated 40 healthy subjects' breadth of categorization as assessed with Pettigrew's category width scale with lateral asymmetries in perceptual and representational space. Specifically, we hypothesized broader category width to be associated with larger leftward spatial biases. For the 20 men, but not the 20 women, this hypothesis was confirmed both in a lateralized tachistoscopic task with chimeric faces and a random digit generation task; the higher a male participant's score on category width, the more pronounced were his left-visual field bias in the judgement of chimeric faces and his small-number preference in digit generation ("small" is to the left of "large" in number space). Subjects' category width was unrelated to lateral displacements in a blindfolded tactile-motor rod centering task. These findings indicate that visual-spatial functions of the right hemisphere should not be considered independent of the same hemisphere's contribution to language. Linguistic and spatial cognition may be more tightly interwoven than is currently assumed.
Resumo:
In computational linguistics, information retrieval and applied cognition, words and concepts are often represented as vectors in high dimensional spaces computed from a corpus of text. These high dimensional spaces are often referred to as Semantic Spaces. We describe a novel and efficient approach to computing these semantic spaces via the use of complex valued vector representations. We report on the practical implementation of the proposed method and some associated experiments. We also briefly discuss how the proposed system relates to previous theoretical work in Information Retrieval and Quantum Mechanics and how the notions of probability, logic and geometry are integrated within a single Hilbert space representation. In this sense the proposed system has more general application and gives rise to a variety of opportunities for future research.
Resumo:
The emergence of mobile and ubiquitous computing has created what is referred to as a hybrid space – a virtual layer of digital information and interaction opportunities that sits on top and augments the physical environment. The increasing connectedness through such media, from anywhere to anybody at anytime, makes us less dependent on being physically present somewhere in particular. But, what is the role of ubiquitous computing in making physical presence at a particular place more attractive? Acknowledging historic context and identity as important attributes of place, this work embarks on a ‘global sense of place’ in which the cultural diversity, multiple identities, backgrounds, skills and experiences of people traversing a place are regarded as social assets of that place. The aim is to explore ways how physical architecture and infrastructure of a place can be mediated towards making invisible social assets visible, thus augmenting people’s situated social experience. Thereby, the focus is on embodied media, i.e. media that materialise digital information as observable and sometimes interactive parts of the physical environment hence amplify people’s real world experience, rather than substituting or moving it to virtual spaces.
Resumo:
Two decades after its inception, Latent Semantic Analysis(LSA) has become part and parcel of every modern introduction to Information Retrieval. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition. For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l2 space for an l1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.
Resumo:
This paper presents a combined structure for using real, complex, and binary valued vectors for semantic representation. The theory, implementation, and application of this structure are all significant. For the theory underlying quantum interaction, it is important to develop a core set of mathematical operators that describe systems of information, just as core mathematical operators in quantum mechanics are used to describe the behavior of physical systems. The system described in this paper enables us to compare more traditional quantum mechanical models (which use complex state vectors), alongside more generalized quantum models that use real and binary vectors. The implementation of such a system presents fundamental computational challenges. For large and sometimes sparse datasets, the demands on time and space are different for real, complex, and binary vectors. To accommodate these demands, the Semantic Vectors package has been carefully adapted and can now switch between different number types comparatively seamlessly. This paper describes the key abstract operations in our semantic vector models, and describes the implementations for real, complex, and binary vectors. We also discuss some of the key questions that arise in the field of quantum interaction and informatics, explaining how the wide availability of modelling options for different number fields will help to investigate some of these questions.
Resumo:
Conference curatorial outline The focus of this symposium was to question whether interior design is changing relative to local conditions, and the effect globalization has on the performance of regional, particularly Southern hemisphere identities. The intention being to understand how theory and practice is transposed to ‘distant lands’, and how ideas shift from one place to another. To this extent the symposium invited papers on the export, translation and adoption of theories and practices of interior design to differing climates, cultures, and landscapes. This process, sometimes referred to as a shift from ‘the centre to the margins’, seeks new perspectives on the adoption of European and US design ideas abroad, as well as their return to their place of origin. Papers were invited from a range of perspectives including the export of ideas/attitudes to interior spaces, history of interior spaces abroad, and the adoption of ideas/processes to new conditions. Paralleling this trafficking of ideas are broader observations about interior space that emerge through specificity of place. These include new and emerging directions and differences in our understanding of interiority; both real and virtual, and an ever-changing relationship to city, suburb and country. Keeping within the Symposium theme the intention was to examine other places, particularly on the margins of the discipline’s domain. Semantic slippage aside, there are a range of approaches that engage outside events and practices enabling a transdisciplinary practice that draws from other philosophical and theoretical frameworks. Moreover as the field expands and new territories are opened up, the virtual worlds of computer gaming, animations, and interactive environments, both rely on and produce new forms of expression. This raises questions about the extent such spaces adopt or translate existing theory and practice, that is the transposition from one area to another and their return to the discipline.
Resumo:
Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.
Resumo:
A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.
Resumo:
SIR is a computer system, programmed in the LISP language, which accepts information and answers questions expressed in a restricted form of English. This system demonstrates what can reasonably be called an ability to "understand" semantic information. SIR's semantic and deductive ability is based on the construction of an internal model, which uses word associations and property lists, for the relational information normally conveyed in conversational statements. A format-matching procedure extracts semantic content from English sentences. If an input sentence is declarative, the system adds appropriate information to the model. If an input sentence is a question, the system searches the model until it either finds the answer or determines why it cannot find the answer. In all cases SIR reports its conclusions. The system has some capacity to recognize exceptions to general rules, resolve certain semantic ambiguities, and modify its model structure in order to save computer memory space. Judging from its conversational ability, SIR, is a first step toward intelligent man-machine communication. The author proposes a next step by describing how to construct a more general system which is less complex and yet more powerful than SIR. This proposed system contains a generalized version of the SIR model, a formal logical system called SIR1, and a computer program for testing the truth of SIR1 statements with respect to the generalized model by using partial proof procedures in the predicate calculus. The thesis also describes the formal properties of SIR1 and how they relate to the logical structure of SIR.
Resumo:
In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional
models of semantics that demonstrate cognitive plausibility. We find that word representations
learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse,
effective, and highly interpretable. To the best of our knowledge, this is the first approach which
yields semantic representation of words satisfying these three desirable properties. Though extensive
experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority
of semantic models learned by NNSE over other state-of-the-art baselines.
Resumo:
Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.