963 resultados para semantic lexicon
Resumo:
为解决传统的文档分类方法和手工分类方法都不适宜于处理查询分类的问题,提出了一种基于Web的自动构建特定主题的语义词典的方法来分类搜索查询,通过基于主题的Web信息采集和bootstrapping,由某个主题的少量关键词逐步扩充,最终得到该主题的语义词典及词典中每个单词的相对词频.Web中信息的冗余和各主题语义上的差别使各主题的语义词典中单词的种类和数量存在很大差异,这种差异可以用来对用户的搜索查询进行分类.实验结果表明,利用语义词典可以较准确地将用户的查询分类,同时该分类方法基本上不需要人工介入,且可适应搜索查询覆盖面广和实时性强的特点,较好地解决了搜索查询分类的问题.
Resumo:
Following an early claim by Nelson & McEvoy suggesting that word associations can display `spooky action at a distance behaviour', a serious investigation of the potentially quantum nature of such associations is currently underway. In this paper quantum theory is proposed as a framework suitable for modelling the mental lexicon, specifically the results obtained from both intralist and extralist word association experiments. Some initial models exploring this hypothesis are discussed, and they appear to be capable of substantial agreement with pre-existing experimental data. The paper concludes with a discussion of some experiments that will be performed in order to test these models.
Resumo:
The aim was to analyse the growth and compositional development of the receptive and expressive lexicons between the ages 0,9 and 2;0 in the full-term (FT) and the very-low-birth-weight (VLBW) children who are acquiring Finnish. The associations between the expressive lexicon and grammar at 1;6 and 2;0 in the FT children were also studied. In addition, the language skills of the VLBW children at 2;0 were analysed, as well as the predictive value of early lexicon to the later language performance. Four groups took part in the studies: the longitudinal (N = 35) and cross-sectional (N = 146) samples of the FT children, and the longitudinal (N = 32) and cross-sectional (N = 66) samples of VLBW children. The data was gathered by applying of the structured parental rating method (the Finnish version of the Communicative Development Inventory), through analysis of the children´s spontaneous speech and by administering a a formal test (Reynell Developmental Language Scales). The FT children acquired their receptive lexicons earlier, at a faster rate and with larger individual variation than their expressive lexicons. The acquisition rate of the expressive lexicon increased from slow to faster in most children (91%). Highly parallel developmental paths for lexical semantic categories were detected in the receptive and expressive lexicons of the Finnish children when they were analysed in relation to the growth of the lexicon size, as described in the literature for children acquiring other languages. The emergence of grammar was closely associated with expressive lexical growth. The VLBW children acquired their receptive lexicons at a slower rate and had weaker language skills at 2;0 than the full-term children. The compositional development of both lexicons happened at a slower rate in the VLBW children when compared to the FT controls. However, when the compositional development was analysed in relation to the growth of lexicon size, this development occurred qualitatively in a nearly parallel manner in the VLBW children as in the FT children. Early receptive and expressive lexicon sizes were significantly associated with later language skills in both groups. The effect of the background variables (gender, length of the mother s basic education, birth weight) on the language development in the FT and the VLBW children differed. The results provide new information of early language acquisition by the Finnish FT and VLBW children. The results support the view that the early acquisition of the semantic lexical categories is related to lexicon growth. The current findings also propose that the early grammatical acquisition is closely related to the growth of expressive vocabulary size. The language development of the VLBW children should be followed in clinical work.
Resumo:
Achieving a clearer picture of categorial distinctions in the brain is essential for our understanding of the conceptual lexicon, but much more fine-grained investigations are required in order for this evidence to contribute to lexical research. Here we present a collection of advanced data-mining techniques that allows the category of individual concepts to be decoded from single trials of EEG data. Neural activity was recorded while participants silently named images of mammals and tools, and category could be detected in single trials with an accuracy well above chance, both when considering data from single participants, and when group-training across participants. By aggregating across all trials, single concepts could be correctly assigned to their category with an accuracy of 98%. The pattern of classifications made by the algorithm confirmed that the neural patterns identified are due to conceptual category, and not any of a series of processing-related confounds. The time intervals, frequency bands and scalp locations that proved most informative for prediction permit physiological interpretation: the widespread activation shortly after appearance of the stimulus (from 100. ms) is consistent both with accounts of multi-pass processing, and distributed representations of categories. These methods provide an alternative to fMRI for fine-grained, large-scale investigations of the conceptual lexicon. © 2010 Elsevier Inc.
Resumo:
This introductory chapter briefly introduces a few milestones in the voluminous previous literature on semantic roles, and charts the territory in which the papers of this volume aim to make a contribution. This territory is characterized by fairly disparate conceptualizations of semantic roles and their status in theories of grammar and the lexicon, as well as by diverse and probably complementary ways of deriving or identifying them based on linguistic data. Particular attention is given to the question of how selected roles appear to relate to each other, and we preliminarily address the issue of how roles, subroles, and role complexes are best thought of in general.
Resumo:
Lexica and terminology databases play a vital role in many NLP applications, but currently most such resources are published in application-specific formats, or with custom access interfaces, leading to the problem that much of this data is in ‘‘data silos’’ and hence difficult to access. The Semantic Web and in particular the Linked Data initiative provide effective solutions to this problem, as well as possibilities for data reuse by inter-lexicon linking, and incorporation of data categories by dereferencable URIs. The Semantic Web focuses on the use of ontologies to describe semantics on the Web, but currently there is no standard for providing complex lexical information for such ontologies and for describing the relationship between the lexicon and the ontology. We present our model, lemon, which aims to address these gaps
Resumo:
Many attempts have been made to provide multilinguality to the Semantic Web, by means of annotation properties in Natural Language (NL), such as RDFs or SKOS labels, and other lexicon-ontology models, such as lemon, but there are still many issues to be solved if we want to have a truly accessible Multilingual Semantic Web (MSW). Reusability of monolingual resources (ontologies, lexicons, etc.), accessibility of multilingual resources hindered by many formats, reliability of ontological sources, disambiguation problems and multilingual presentation to the end user of all this information in NL can be mentioned as some of the most relevant problems. Unless this NL presentation is achieved, MSW will be restricted to the limits of IT experts, but even so, with great dissatisfaction and disenchantment
Resumo:
The creation of language resources is a time-consuming process requiring the efforts of many people. The use of resources collaboratively created by non-linguists can potentially ameliorate this situation. However, such resources often contain more errors compared to resources created by experts. For the particular case of lexica, we analyse the case of Wiktionary, a resource created along wiki principles and argue that through the use of a principled lexicon model, namely lemon, the resulting data could be better understandable to machines. We then present a platform called lemon source that supports the creation of linked lexical data along the lemon model. This tool builds on the concept of a semantic wiki to enable collaborative editing of the resources by many users concurrently. In this paper, we describe the model, the tool and present an evaluation of its usability based on a small group of users.
Resumo:
We describe the case of a dysgraphic aphasic individual-S.G.W.-who, in writing to dictation, produced high rates of formally related errors consisting of both lexical substitutions and what we call morphological-compound errors involving legal or illegal combinations of morphemes. These errors were produced in the context of a minimal number of semantic errors. We could exclude problems with phonological discrimination and phonological short-term memory. We also excluded rapid decay of lexical information and/or weak activation of word forms and letter representations since S.G.W.'s spelling showed no effect of delay and no consistent length effects, but, instead, paradoxical complexity effects with segmental, lexical, and morphological errors that were more complex than the target. The case of S.G.W. strongly resembles that of another dysgraphic individual reported in the literature-D.W.-suggesting that this pattern of errors can be replicated across patients. In particular, both patients show unusual errors resulting in the production of neologistic compounds (e.g., "bed button" in response to "bed"). These patterns can be explained if we accept two claims: (a) Brain damage can produce both a reduction and an increase in lexical activation; and (b) there are direct connections between phonological and orthographic lexical representations (a third spelling route). We suggest that both patients are suffering from a difficulty of lexical selection resulting from excessive activation of formally related lexical representations. This hypothesis is strongly supported by S.G.W.'s worse performance in spelling to dictation than in written naming, which shows that a phonological input, activating a cohort of formally related lexical representations, increases selection difficulties. © 2014 Taylor & Francis.
Resumo:
Lexicon-based approaches to Twitter sentiment analysis are gaining much popularity due to their simplicity, domain independence, and relatively good performance. These approaches rely on sentiment lexicons, where a collection of words are marked with fixed sentiment polarities. However, words' sentiment orientation (positive, neural, negative) and/or sentiment strengths could change depending on context and targeted entities. In this paper we present SentiCircle; a novel lexicon-based approach that takes into account the contextual and conceptual semantics of words when calculating their sentiment orientation and strength in Twitter. We evaluate our approach on three Twitter datasets using three different sentiment lexicons. Results show that our approach significantly outperforms two lexicon baselines. Results are competitive but inconclusive when comparing to state-of-art SentiStrength, and vary from one dataset to another. SentiCircle outperforms SentiStrength in accuracy on average, but falls marginally behind in F-measure. © 2014 Springer International Publishing.
Resumo:
Through media such as newspapers, letterbox flyers, corporate brochures and television we are regularly confronted with descriptions for conventional (bricks 'n' mortar style) services. These representations vary in the terminology utilised, the depth of the description, the aspects of the service that are characterised and their applicability to candidate service requestors. Existing service catalogues (such as the Yellow Pages) provide little relief for service requestors from the burdensome task of discovering, comparing and substituting services. Add to this environment the rapidly evolving area of web services with its associated surfeit of standards, and the result is a considerably fragmented approach to the description of services. It leaves the reality of the Semantic Web somewhat clouded. --------- Let's consider service description briefly, before discussing our concerns with existing approaches to description. The act of describing is performed prior to advertising. This simple fact provides an interesting paradox as services cannot be described exactly before advertisement. This doesn't mean they can't be described comprehensively. By "exactly", we are referring to the fact that context provided by a service requestor (and their service needs) will alter the description of the service that is presented to the discoverer. For example, a service provider who operates a cinema wants to describe the price of their service. Let's say the advertised price is $15. They also want to state that a pensioner discount and a student discount is available which provides a 50% discount. A customer (i.e. service requestor) uses the cinema web site to purchase tickets online. They find the movie of their choice at a time that suits. However, its not until some context is provided by the requestor that the exact price is determined. The requestor might state that they are a pensioner. The same is applicable for a service requestor who purchases multiple tickets perhaps on behalf of other people. The disconnect between when the service is described and when a requestor provides context introduces challenges to the description process. A service provider would be ill-advised to offer independent descriptions that represent all the permutations possible for a single service. The descriptive effort would be prohibitive.
Resumo:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
Resumo:
Introduction Many bilinguals will have had the experience of unintentionally reading something in a language other than the intended one (e.g. MUG to mean mosquito in Dutch rather than a receptacle for a hot drink, as one of the possible intended English meanings), of finding themselves blocked on a word for which many alternatives suggest themselves (but, somewhat annoyingly, not in the right language), of their accent changing when stressed or tired and, occasionally, of starting to speak in a language that is not understood by those around them. These instances where lexical access appears compromised and control over language behavior is reduced hint at the intricate structure of the bilingual lexical architecture and the complexity of the processes by which knowledge is accessed and retrieved. While bilinguals might tend to blame word finding and other language problems on their bilinguality, these difficulties per se are not unique to the bilingual population. However, what is unique, and yet far more common than is appreciated by monolinguals, is the cognitive architecture that subserves bilingual language processing. With bilingualism (and multilingualism) the rule rather than the exception (Grosjean, 1982), this architecture may well be the default structure of the language processing system. As such, it is critical that we understand more fully not only how the processing of more than one language is subserved by the brain, but also how this understanding furthers our knowledge of the cognitive architecture that encapsulates the bilingual mental lexicon. The neurolinguistic approach to bilingualism focuses on determining the manner in which the two (or more) languages are stored in the brain and how they are differentially (or similarly) processed. The underlying assumption is that the acquisition of more than one language requires at the very least a change to or expansion of the existing lexicon, if not the formation of language-specific components, and this is likely to manifest in some way at the physiological level. There are many sources of information, ranging from data on bilingual aphasic patients (Paradis, 1977, 1985, 1997) to lateralization (Vaid, 1983; see Hull & Vaid, 2006, for a review), recordings of event-related potentials (ERPs) (e.g. Ardal et al., 1990; Phillips et al., 2006), and positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) studies of neurologically intact bilinguals (see Indefrey, 2006; Vaid & Hull, 2002, for reviews). Following the consideration of methodological issues and interpretative limitations that characterize these approaches, the chapter focuses on how the application of these approaches has furthered our understanding of (1) selectivity of bilingual lexical access, (2) distinctions between word types in the bilingual lexicon and (3) control processes that enable language selection.
Resumo:
The challenges of maintaining a building such as the Sydney Opera House are immense and are dependent upon a vast array of information. The value of information can be enhanced by its currency, accessibility and the ability to correlate data sets (integration of information sources). A building information model correlated to various information sources related to the facility is used as definition for a digital facility model. Such a digital facility model would give transparent and an integrated access to an array of datasets and obviously would support Facility Management processes. In order to construct such a digital facility model, two state-of-the-art Information and Communication technologies are considered: an internationally standardized building information model called the Industry Foundation Classes (IFC) and a variety of advanced communication and integration technologies often referred to as the Semantic Web such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL). This paper reports on some technical aspects for developing a digital facility model focusing on Sydney Opera House. The proposed digital facility model enables IFC data to participate in an ontology driven, service-oriented software environment. A proof-of-concept prototype has been developed demonstrating the usability of IFC information to collaborate with Sydney Opera House’s specific data sources using semantic web ontologies.
Resumo:
Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.