800 resultados para Vischer, Peter, d. 1529.
Resumo:
The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.
Resumo:
Many existing information retrieval models do not explicitly take into account in- formation about word associations. Our approach makes use of rst and second order relationships found in natural language, known as syntagmatic and paradigmatic associ- ations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approach achieves statistically sig- ni cant improvements in MAP (0.158) and P@20 (0.396) over our baseline model. The ERR@20 and nDCG@20 of our system was 0.249 and 0.192 respectively. Our results and discussion suggest that information about both syntagamtic and paradigmatic associa- tions can assist with improving retrieval eectiveness on ad hoc retrieval.
Resumo:
Free association norms indicate that words are organized into semantic/associative neighborhoods within a larger network of words and links that bind the net together. We present evidence indicating that memory for a recent word event can depend on implicitly and simultaneously activating related words in its neighborhood. Processing a word during encoding primes its network representation as a function of the density of the links in its neighborhood. Such priming increases recall and recognition and can have long lasting effects when the word is processed in working memory. Evidence for this phenomenon is reviewed in extralist cuing, primed free association, intralist cuing, and single-item recognition tasks. The findings also show that when a related word is presented to cue the recall of a studied word, the cue activates it in an array of related words that distract and reduce the probability of its selection. The activation of the semantic network produces priming benefits during encoding and search costs during retrieval. In extralist cuing recall is a negative function of cue-to-distracter strength and a positive function of neighborhood density, cue-to-target strength, and target-to cue strength. We show how four measures derived from the network can be combined and used to predict memory performance. These measures play different roles in different tasks indicating that the contribution of the semantic network varies with the context provided by the task. We evaluate spreading activation and quantum-like entanglement explanations for the priming effect produced by neighborhood density.
Resumo:
Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.
Resumo:
Modelling how a word is activated in human memory is an important requirement for determining the probability of recall of a word in an extra-list cueing experiment. Previous research assumed a quantum-like model in which the semantic network was modelled as entangled qubits, however the level of activation was clearly being over-estimated. This paper explores three variations of this model, each of which are distinguished by a scaling factor designed to compensate the overestimation.
Resumo:
A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
Resumo:
Many successful query expansion techniques ignore information about the term dependencies that exist within natural language. However, researchers have recently demonstrated that consistent and significant improvements in retrieval effectiveness can be achieved by explicitly modelling term dependencies within the query expansion process. This has created an increased interest in dependency-based models. State-of-the-art dependency-based approaches primarily model term associations known within structural linguistics as syntagmatic associations, which are formed when terms co-occur together more often than by chance. However, structural linguistics proposes that the meaning of a word is also dependent on its paradigmatic associations, which are formed between words that can substitute for each other without effecting the acceptability of a sentence. Given the reliance on word meanings when a user formulates their query, our approach takes the novel step of modelling both syntagmatic and paradigmatic associations within the query expansion process based on the (pseudo) relevant documents returned in web search. The results demonstrate that this approach can provide significant improvements in web re- trieval effectiveness when compared to a strong benchmark retrieval system.
Resumo:
Plant growth can be limited by resource acquisition and defence against consumers, leading to contrasting trade-off possibilities. The competition-defence hypothesis posits a trade-off between competitive ability and defence against enemies (e.g. herbivores and pathogens). The growth-defence hypothesis suggests that strong competitors for nutrients are also defended against enemies, at a cost to growth rate. We tested these hypotheses using observations of 706 plant populations of over 500 species before and following identical fertilisation and fencing treatments at 39 grassland sites worldwide. Strong positive covariance in species responses to both treatments provided support for a growth-defence trade-off: populations that increased with the removal of nutrient limitation (poor competitors) also increased following removal of consumers. This result held globally across 4 years within plant life-history groups and within the majority of individual sites. Thus, a growth-defence trade-off appears to be the norm, and mechanisms maintaining grassland biodiversity may operate within this constraint.
Resumo:
The article focuses on how the information seeker makes decisions about relevance. It will employ a novel decision theory based on quantum probabilities. This direction derives from mounting research within the field of cognitive science showing that decision theory based on quantum probabilities is superior to modelling human judgements than standard probability models [2, 1]. By quantum probabilities, we mean decision event space is modelled as vector space rather than the usual Boolean algebra of sets. In this way,incompatible perspectives around a decision can be modelled leading to an interference term which modifies the law of total probability. The interference term is crucial in modifying the probability judgements made by current probabilistic systems so they align better with human judgement. The goal of this article is thus to model the information seeker user as a decision maker. For this purpose, signal detection models will be sketched which are in principle applicable in a wide variety of information seeking scenarios.
Resumo:
The overall aim of our research was to characterize airborne particles from selected nanotechnology processes and to utilize the data to develop and test quantitative particle concentration-based criteria that can be used to trigger an assessment of particle emission controls. We investigated particle number concentration (PNC), particle mass (PM) concentration, count median diameter (CMD), alveolar deposited surface area, elemental composition, and morphology from sampling of aerosols arising from six nanotechnology processes. These included fibrous and non-fibrous particles, including carbon nanotubes (CNTs). We adopted standard occupational hygiene principles in relation to controlling peak emission and exposures, as outlined by both Safe Work Australia, (1) and the American Conference of Governmental Industrial Hygienists (ACGIH®). (2) The results from the study were used to analyses peak and 30-minute averaged particle number and mass concentration values measured during the operation of the nanotechnology processes. Analysis of peak (highest value recorded) and 30-minute averaged particle number and mass concentration values revealed: Peak PNC20–1000 nm emitted from the nanotechnology processes were up to three orders of magnitude greater than the local background particle concentration (LBPC). Peak PNC300–3000 nm was up to an order of magnitude greater, and PM2.5 concentrations up to four orders of magnitude greater. For three of these nanotechnology processes, the 30-minute average particle number and mass concentrations were also significantly different from the LBPC (p-value < 0.001). We propose emission or exposure controls may need to be implemented or modified, or further assessment of the controls be undertaken, if concentrations exceed three times the LBPC, which is also used as the local particle reference value, for more than a total of 30 minutes during a workday, and/or if a single short-term measurement exceeds five times the local particle reference value. The use of these quantitative criteria, which we are terming the universal excursion guidance criteria, will account for the typical variation in LBPC and inaccuracy of instruments, while precautionary enough to highlight peaks in particle concentration likely to be associated with particle emission from the nanotechnology process. Recommendations on when to utilize local excursion guidance criteria are also provided.
Resumo:
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.
Resumo:
This article examines manual textual categorisation by human coders with the hypothesis that the law of total probability may be violated for difficult categories. An empirical evaluation was conducted to compare a one step categorisation task with a two step categorisation task using crowdsourcing. It was found that the law of total probability was violated. Both a quantum and classical probabilistic interpretations for this violation are presented. Further studies are required to resolve whether quantum models are more appropriate for this task.
Resumo:
Background To explore the impact of geographical remoteness and area-level socioeconomic disadvantage on colorectal cancer (CRC) survival. Methods Multilevel logistic regression and Markov chain Monte Carlo simulations were used to analyze geographical variations in five-year all-cause and CRC-specific survival across 478 regions in Queensland Australia for 22,727 CRC cases aged 20–84 years diagnosed from 1997–2007. Results Area-level disadvantage and geographic remoteness were independently associated with CRC survival. After full multivariate adjustment (both levels), patients from remote (odds Ratio [OR]: 1.24, 95%CrI: 1.07-1.42) and more disadvantaged quintiles (OR = 1.12, 1.15, 1.20, 1.23 for Quintiles 4, 3, 2 and 1 respectively) had lower CRC-specific survival than major cities and least disadvantaged areas. Similar associations were found for all-cause survival. Area disadvantage accounted for a substantial amount of the all-cause variation between areas. Conclusions We have demonstrated that the area-level inequalities in survival of colorectal cancer patients cannot be explained by the measured individual-level characteristics of the patients or their cancer and remain after adjusting for cancer stage. Further research is urgently needed to clarify the factors that underlie the survival differences, including the importance of geographical differences in clinical management of CRC.