59 resultados para Semantic enrichment
Resumo:
The Dry Valleys of Antarctica are one of the coldest and driest environments on Earth with paleosols in selected areas that date to the emplacement of tills by warm-based ice during the Early Miocene. Cited as an analogue to the martian surface, the ability of the Antarctic environment to support microbial life-forms is a matter of special interest, particularly with the upcoming NASA/ESA 2018 ExoMars mission. Lipid biomarkers were extracted and analyzed by gas chromatography-mass spectrometry to assess sources of organic carbon and evaluate the contribution of microbial species to the organic matter of the paleosols. Paleosol samples from the ice-free Dry Valleys were also subsampled and cultivated in a growth medium from which DNA was extracted with the explicit purpose of the positive identification of bacteria. Several species of bacteria were grown in solution and the genus identified. A similar match of the data to sequenced DNA showed that Alphaproteobacteria, Gamma-proteobacteria, Bacteriodetes, and Actinobacteridae species were cultivated. The results confirm the presence of bacteria within some paleosols, but no assumptions have been made with regard to in situ activity at present. These results underscore the need not only to further investigate Dry Valley cryosols but also to develop reconnaissance strategies to determine whether such likely Earth-like environments on the Red Planet also contain life.
Resumo:
Latent semantic indexing (LSI) is a technique used for intelligent information retrieval (IR). It can be used as an alternative to traditional keyword matching IR and is attractive in this respect because of its ability to overcome problems with synonymy and polysemy. This study investigates various aspects of LSI: the effect of the Haar wavelet transform (HWT) as a preprocessing step for the singular value decomposition (SVD) in the key stage of the LSI process; and the effect of different threshold types in the HWT on the search results. The developed method allows the visualisation and processing of the term document matrix, generated in the LSI process, using HWT. The results have shown that precision can be increased by applying the HWT as a preprocessing step, with better results for hard thresholding than soft thresholding, whereas standard SVD-based LSI remains the most effective way of searching in terms of recall value.
Turning the tide: A critique of Natural Semantic Metalanguage from a translation studies perspective
Resumo:
Starting from the premise that human communication is predicated on translational phenomena, this paper applies theoretical insights and practical findings from Translation Studies to a critique of Natural Semantic Metalanguage (NSM), a theory of semantic analysis developed by Anna Wierzbicka. Key tenets of NSM, i.e. (1) culture-specificity of complex concepts; (2) the existence of a small set of universal semantic primes; and (3) definition by reductive paraphrase, are discussed critically with reference to the notions of untranslatability, equivalence, and intra-lingual translation, respectively. It is argued that a broad spectrum of research and theoretical reflection in Translation Studies may successfully feed into the study of cognition, meaning, language, and communication. The interdisciplinary exchange between Translation Studies and linguistics may be properly balanced, with the former not only being informed by but also informing and interrogating the latter.
Resumo:
Web databases are now pervasive. Such a database can be accessed via its query interface (usually HTML query form) only. Extracting Web query interfaces is a critical step in data integration across multiple Web databases, which creates a formal representation of a query form by extracting a set of query conditions in it. This paper presents a novel approach to extracting Web query interfaces. In this approach, a generic set of query condition rules are created to define query conditions that are semantically equivalent to SQL search conditions. Query condition rules represent the semantic roles that labels and form elements play in query conditions, and how they are hierarchically grouped into constructs of query conditions. To group labels and form elements in a query form, we explore both their structural proximity in the hierarchy of structures in the query form, which is captured by a tree of nested tags in the HTML codes of the form, and their semantic similarity, which is captured by various short texts used in labels, form elements and their properties. We have implemented the proposed approach and our experimental results show that the approach is highly effective.
Resumo:
We determined whether pre-enrichment of low density lipoproteins (LDL) with alpha-tocopherol mitigates their adverse effects, following in vitro glycation, oxidation or glycoxidation, towards cultured bovine retinal capillary endothelial cells (RCEC) and pericytes.
Resumo:
In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional
models of semantics that demonstrate cognitive plausibility. We find that word representations
learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse,
effective, and highly interpretable. To the best of our knowledge, this is the first approach which
yields semantic representation of words satisfying these three desirable properties. Though extensive
experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority
of semantic models learned by NNSE over other state-of-the-art baselines.
Resumo:
Computational models of meaning trained on naturally occurring text successfully model human performance on tasks involving simple similarity measures, but they characterize meaning in terms of undifferentiated bags of words or topical dimensions. This has led some to question their psychological plausibility (Murphy, 2002; Schunn, 1999). We present here a fully automatic method for extracting a structured and comprehensive set of concept descriptions directly from an English part-of-speech-tagged corpus. Concepts are characterized by weighted properties, enriched with concept-property types that approximate classical relations such as hypernymy and function. Our model outperforms comparable algorithms in cognitive tasks pertaining not only to concept-internal structures (discovering properties of concepts, grouping properties by property type) but also to inter-concept relations (clustering into superordinates), suggesting the empirical validity of the property-based approach. Copyright © 2009 Cognitive Science Society, Inc. All rights reserved.
Resumo:
Achieving a clearer picture of categorial distinctions in the brain is essential for our understanding of the conceptual lexicon, but much more fine-grained investigations are required in order for this evidence to contribute to lexical research. Here we present a collection of advanced data-mining techniques that allows the category of individual concepts to be decoded from single trials of EEG data. Neural activity was recorded while participants silently named images of mammals and tools, and category could be detected in single trials with an accuracy well above chance, both when considering data from single participants, and when group-training across participants. By aggregating across all trials, single concepts could be correctly assigned to their category with an accuracy of 98%. The pattern of classifications made by the algorithm confirmed that the neural patterns identified are due to conceptual category, and not any of a series of processing-related confounds. The time intervals, frequency bands and scalp locations that proved most informative for prediction permit physiological interpretation: the widespread activation shortly after appearance of the stimulus (from 100. ms) is consistent both with accounts of multi-pass processing, and distributed representations of categories. These methods provide an alternative to fMRI for fine-grained, large-scale investigations of the conceptual lexicon. © 2010 Elsevier Inc.
Resumo:
Many studies suggest a large capacity memory for briefly presented pictures of whole scenes. At the same time, visual working memory (WM) of scene elements is limited to only a few items. We examined the role of retroactive interference in limiting memory for visual details. Participants viewed a scene for 5?s and then, after a short delay containing either a blank screen or 10 distracter scenes, answered questions about the location, color, and identity of objects in the scene. We found that the influence of the distracters depended on whether they were from a similar semantic domain, such as "kitchen" or "airport." Increasing the number of similar scenes reduced, and eventually eliminated, memory for scene details. Although scene memory was firmly established over the initial study period, this memory was fragile and susceptible to interference. This may help to explain the discrepancy in the literature between studies showing limited visual WM and those showing a large capacity memory for scenes.
Resumo:
In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.