67 resultados para Latent semantic indexing
Resumo:
In this paper we address issues relating to vulnerability to economic exclusion and levels of economic exclusion in Europe. We do so by applying latent class models to data from the European Community Household Panel for thirteen countries. This approach allows us to distinguish between vulnerability to economic exclusion and exposure to multiple deprivation at a particular point in time. The results of our analysis confirm that in every country it is possible to distinguish between a vulnerable and a non-vulnerable class. Association between income poverty, life-style deprivation and subjective economic strain is accounted for by allocating individuals to the categories of this latent variable. The size of the vulnerable class varies across countries in line with expectations derived from welfare regime theory. Between class differentiation is weakest in social democratic regimes but otherwise the pattern of differentiation is remarkably similar. The key discriminatory factor is life-style deprivation, followed by income and economic strain. Social class and employment status are powerful predictors of latent class membership in all countries but the strength of these relationships varies across welfare regimes. Individual biography and life events are also related to vulnerability to economic exclusion. However, there is no evidence that they account for any significant part of the socio-economic structuring of vulnerability and no support is found for the hypothesis that social exclusion has come to transcend class boundaries and become a matter of individual biography. However, the extent of socio-economic structuring does vary substantially across welfare regimes. Levels of economic exclusion, in the sense of current exposure to multiple deprivation, also vary systematically by welfare regime and social class. Taking both vulnerability to economic exclusion and levels of exclusion into account suggests that care should be exercised in moving from evidence on the dynamic nature of poverty and economic exclusion to arguments relating to the superiority of selective over universal social policies.
Turning the tide: A critique of Natural Semantic Metalanguage from a translation studies perspective
Resumo:
Starting from the premise that human communication is predicated on translational phenomena, this paper applies theoretical insights and practical findings from Translation Studies to a critique of Natural Semantic Metalanguage (NSM), a theory of semantic analysis developed by Anna Wierzbicka. Key tenets of NSM, i.e. (1) culture-specificity of complex concepts; (2) the existence of a small set of universal semantic primes; and (3) definition by reductive paraphrase, are discussed critically with reference to the notions of untranslatability, equivalence, and intra-lingual translation, respectively. It is argued that a broad spectrum of research and theoretical reflection in Translation Studies may successfully feed into the study of cognition, meaning, language, and communication. The interdisciplinary exchange between Translation Studies and linguistics may be properly balanced, with the former not only being informed by but also informing and interrogating the latter.
Resumo:
Web databases are now pervasive. Such a database can be accessed via its query interface (usually HTML query form) only. Extracting Web query interfaces is a critical step in data integration across multiple Web databases, which creates a formal representation of a query form by extracting a set of query conditions in it. This paper presents a novel approach to extracting Web query interfaces. In this approach, a generic set of query condition rules are created to define query conditions that are semantically equivalent to SQL search conditions. Query condition rules represent the semantic roles that labels and form elements play in query conditions, and how they are hierarchically grouped into constructs of query conditions. To group labels and form elements in a query form, we explore both their structural proximity in the hierarchy of structures in the query form, which is captured by a tree of nested tags in the HTML codes of the form, and their semantic similarity, which is captured by various short texts used in labels, form elements and their properties. We have implemented the proposed approach and our experimental results show that the approach is highly effective.
Resumo:
In this paper, we introduce an application of matrix factorization to produce corpus-derived, distributional
models of semantics that demonstrate cognitive plausibility. We find that word representations
learned by Non-Negative Sparse Embedding (NNSE), a variant of matrix factorization, are sparse,
effective, and highly interpretable. To the best of our knowledge, this is the first approach which
yields semantic representation of words satisfying these three desirable properties. Though extensive
experimental evaluations on multiple real-world tasks and datasets, we demonstrate the superiority
of semantic models learned by NNSE over other state-of-the-art baselines.
Resumo:
Computational models of meaning trained on naturally occurring text successfully model human performance on tasks involving simple similarity measures, but they characterize meaning in terms of undifferentiated bags of words or topical dimensions. This has led some to question their psychological plausibility (Murphy, 2002; Schunn, 1999). We present here a fully automatic method for extracting a structured and comprehensive set of concept descriptions directly from an English part-of-speech-tagged corpus. Concepts are characterized by weighted properties, enriched with concept-property types that approximate classical relations such as hypernymy and function. Our model outperforms comparable algorithms in cognitive tasks pertaining not only to concept-internal structures (discovering properties of concepts, grouping properties by property type) but also to inter-concept relations (clustering into superordinates), suggesting the empirical validity of the property-based approach. Copyright © 2009 Cognitive Science Society, Inc. All rights reserved.
Resumo:
Achieving a clearer picture of categorial distinctions in the brain is essential for our understanding of the conceptual lexicon, but much more fine-grained investigations are required in order for this evidence to contribute to lexical research. Here we present a collection of advanced data-mining techniques that allows the category of individual concepts to be decoded from single trials of EEG data. Neural activity was recorded while participants silently named images of mammals and tools, and category could be detected in single trials with an accuracy well above chance, both when considering data from single participants, and when group-training across participants. By aggregating across all trials, single concepts could be correctly assigned to their category with an accuracy of 98%. The pattern of classifications made by the algorithm confirmed that the neural patterns identified are due to conceptual category, and not any of a series of processing-related confounds. The time intervals, frequency bands and scalp locations that proved most informative for prediction permit physiological interpretation: the widespread activation shortly after appearance of the stimulus (from 100. ms) is consistent both with accounts of multi-pass processing, and distributed representations of categories. These methods provide an alternative to fMRI for fine-grained, large-scale investigations of the conceptual lexicon. © 2010 Elsevier Inc.
Resumo:
Many studies suggest a large capacity memory for briefly presented pictures of whole scenes. At the same time, visual working memory (WM) of scene elements is limited to only a few items. We examined the role of retroactive interference in limiting memory for visual details. Participants viewed a scene for 5?s and then, after a short delay containing either a blank screen or 10 distracter scenes, answered questions about the location, color, and identity of objects in the scene. We found that the influence of the distracters depended on whether they were from a similar semantic domain, such as "kitchen" or "airport." Increasing the number of similar scenes reduced, and eventually eliminated, memory for scene details. Although scene memory was firmly established over the initial study period, this memory was fragile and susceptible to interference. This may help to explain the discrepancy in the literature between studies showing limited visual WM and those showing a large capacity memory for scenes.
Resumo:
In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.
Resumo:
Purpose – Under investigation is Prosecco wine, a sparkling white wine from North-East Italy.
Information collection on consumer perceptions is particularly relevant when developing market
strategies for wine, especially so when local production and certification of origin play an important
role in the wine market of a given district, as in the case at hand. Investigating and characterizing the
structure of preference heterogeneity become crucial steps in every successful marketing strategy. The
purpose of this paper is to investigate the sources of systematic differences in consumer preferences.
Design/methodology/approach – The paper explores the effect of inclusion of answers to
attitudinal questions in a latent class regression model of stated willingness to pay (WTP) for this
specialty wine. These additional variables were included in the membership equations to investigate
whether they could be of help in the identification of latent classes. The individual specific WTPs from
the sampled respondents were then derived from the best fitting model and examined for consistency.
Findings – The use of answers to attitudinal question in the latent class regression model is found to
improve model fit, thereby helping in the identification of latent classes. The best performing model
obtained makes use of both attitudinal scores and socio-economic covariates identifying five latent
classes. A reasonable pattern of differences in WTP for Prosecco between CDO and TGI types were
derived from this model.
Originality/value – The approach appears informative and promising: attitudes emerge as
important ancillary indicators of taste differences for specialty wines. This might be of interest per se
and of practical use in market segmentation. If future research shows that these variables can be of use
in other contexts, it is quite possible that more attitudinal questions will be routinely incorporated in
structural latent class hedonic models.