945 resultados para Semantic enrichment
Resumo:
Entity-oriented search has become an essential component of modern search engines. It focuses on retrieving a list of entities or information about the specific entities instead of documents. In this paper, we study the problem of finding entity related information, referred to as attribute-value pairs, that play a significant role in searching target entities. We propose a novel decomposition framework combining reduced relations and the discriminative model, Conditional Random Field (CRF), for automatically finding entity-related attribute-value pairs from free text documents. This decomposition framework allows us to locate potential text fragments and identify the hidden semantics, in the form of attribute-value pairs for user queries. Empirical analysis shows that the decomposition framework outperforms pattern-based approaches due to its capability of effective integration of syntactic and semantic features.
Resumo:
Free association norms indicate that words are organized into semantic/associative neighborhoods within a larger network of words and links that bind the net together. We present evidence indicating that memory for a recent word event can depend on implicitly and simultaneously activating related words in its neighborhood. Processing a word during encoding primes its network representation as a function of the density of the links in its neighborhood. Such priming increases recall and recognition and can have long lasting effects when the word is processed in working memory. Evidence for this phenomenon is reviewed in extralist cuing, primed free association, intralist cuing, and single-item recognition tasks. The findings also show that when a related word is presented to cue the recall of a studied word, the cue activates it in an array of related words that distract and reduce the probability of its selection. The activation of the semantic network produces priming benefits during encoding and search costs during retrieval. In extralist cuing recall is a negative function of cue-to-distracter strength and a positive function of neighborhood density, cue-to-target strength, and target-to cue strength. We show how four measures derived from the network can be combined and used to predict memory performance. These measures play different roles in different tasks indicating that the contribution of the semantic network varies with the context provided by the task. We evaluate spreading activation and quantum-like entanglement explanations for the priming effect produced by neighborhood density.
Resumo:
Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.
Resumo:
Modelling how a word is activated in human memory is an important requirement for determining the probability of recall of a word in an extra-list cueing experiment. Previous research assumed a quantum-like model in which the semantic network was modelled as entangled qubits, however the level of activation was clearly being over-estimated. This paper explores three variations of this model, each of which are distinguished by a scaling factor designed to compensate the overestimation.
Resumo:
In this paper we propose a method to generate a large scale and accurate dense 3D semantic map of street scenes. A dense 3D semantic model of the environment can significantly improve a number of robotic applications such as autonomous driving, navigation or localisation. Instead of using offline trained classifiers for semantic segmentation, our approach employs a data-driven, nonparametric method to parse scenes which easily scale to a large environment and generalise to different scenes. We use stereo image pairs collected from cameras mounted on a moving car to produce dense depth maps which are combined into a global 3D reconstruction using camera poses from stereo visual odometry. Simultaneously, 2D automatic semantic segmentation using a nonparametric scene parsing method is fused into the 3D model. Furthermore, the resultant 3D semantic model is improved with the consideration of moving objects in the scene. We demonstrate our method on the publicly available KITTI dataset and evaluate the performance against manually generated ground truth.
Resumo:
Text categorisation is challenging, due to the complex structure with heterogeneous, changing topics in documents. The performance of text categorisation relies on the quality of samples, effectiveness of document features, and the topic coverage of categories, depending on the employing strategies; supervised or unsupervised; single labelled or multi-labelled. Attempting to deal with these reliability issues in text categorisation, we propose an unsupervised multi-labelled text categorisation approach that maps the local knowledge in documents to global knowledge in a world ontology to optimise categorisation result. The conceptual framework of the approach consists of three modules; pattern mining for feature extraction; feature-subject mapping for categorisation; concept generalisation for optimised categorisation. The approach has been promisingly evaluated by compared with typical text categorisation methods, based on the ground truth encoded by human experts.
Resumo:
Chinese modal particles feature prominently in Chinese people’s daily use of the language, but their pragmatic and semantic functions are elusive as commonly recognised by Chinese linguists and teachers of Chinese as a foreign language. This book originates from an extensive and intensive empirical study of the Chinese modal particle a (啊), one of the most frequently used modal particles in Mandarin Chinese. In order to capture all the uses and the underlying meanings of the particle, the author transcribed the first 20 episodes, about 20 hours in length, of the popular Chinese TV drama series Kewang ‘Expectations’, which yielded a corpus data of more than 142’000 Chinese characters with a total of 1829 instances of the particle all used in meaningful communicative situations. Within its context of use, every single occurrence of the particle was analysed in terms of its pragmatic and semantic contributions to the hosting utterance. Upon this basis the core meanings were identified which were seen as constituting the modal nature of the particle.
Resumo:
Sediment samples from 13 sampling sites in Deception Bay, Australia were analysed for the presence of heavy metals. Enrichment factors, modified contamination indices and Nemerow pollution indices were calculated for each sampling site to determine sediment quality. The results indicate significant pollution of most sites by lead (average enrichment factor (EF) of 13), but there is also enrichment of arsenic (average EF 2.3), zinc (average EF 2.7) and other heavy metals. The modified degree of contamination indices (average 1.0) suggests that there is little contamination. By contrast, the Nemerow pollution index (average 5.8) suggests that Deception Bay is heavily contaminated. Cluster analysis was undertaken to identify groups of elements. Strong correlation between some elements and two distinct clusters of sampling sites based on sediment type was evident. These results have implications for pollution in complex marine environments where there is significant influx of sand and sediment into an estuarine environment.
Resumo:
Determining similarity between business process models has recently gained interest in the business process management community. So far similarity was addressed separately either at semantic or structural aspect of process models. Also, most of the contributions that measure similarity of process models assume an ideal case when process models are enriched with semantics - a description of meaning of process model elements. However, in real life this results in a heavy human effort consuming pre-processing phase which is often not feasible. In this paper we propose an automated approach for querying a business process model repository for structurally and semantically relevant models. Similar to the search on the Internet, a user formulates a BPMN-Q query and as a result receives a list of process models ordered by relevance to the query. We provide a business process model search engine implementation for evaluation of the proposed approach.
Resumo:
We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation, RC attempts at linking entity pairs between two entity lists under the relation. To accomplish the RC goals, we propose to formulate search queries for each query entity α based on some auxiliary information, so that to detect its target entity β from the set of retrieved documents. For instance, a pattern-based method (PaRE) uses extracted patterns as the auxiliary information in formulating search queries. However, high-quality patterns may decrease the probability of finding suitable target entities. As an alternative, we propose CoRE method that uses context terms learned surrounding the expression of a relation as the auxiliary information in formulating queries. The experimental results based on several real-world web data collections demonstrate that CoRE reaches a much higher accuracy than PaRE for the purpose of RC.
Resumo:
Over 80% of women diagnosed with advanced-stage ovarian cancer die as a result of disease recurrence due to failure of chemotherapy treatment. In this study, using two distinct ovarian cancer cell lines (epithelial OVCA 433 and mesenchymal HEY) we demonstrate enrichment in a population of cells with high expression of CSC markers at the protein and mRNA levels in response to cisplatin, paclitaxel and the combination of both. We also demonstrate a significant enhancement in the sphere forming abilities of ovarian cancer cells in response to chemotherapy drugs. The results of these in vitro findings are supported by in vivo mouse xenograft models in which intraperitoneal transplantation of cisplatin or paclitaxel-treated residual HEY cells generated significantly higher tumor burden compared to control untreated cells. Both the treated and untreated cells infiltrated the organs of the abdominal cavity. In addition, immunohistochemical studies on mouse tumors injected with cisplatin or paclitaxel treated residual cells displayed higher staining for the proliferative antigen Ki67, oncogeneic CA125, epithelial E-cadherin as well as cancer stem cell markers such as Oct4 and CD117, compared to mice injected with control untreated cells. These results suggest that a short-term single treatment of chemotherapy leaves residual cells that are enriched in CSC-like traits, resulting in an increased metastatic potential. The novel findings in this study are important in understanding the early molecular mechanisms by which chemoresistance and subsequent relapse may be triggered after the first line of chemotherapy treatment.
Resumo:
This thesis developed new search engine models that elicit the meaning behind the words found in documents and queries, rather than simply matching keywords. These new models were applied to searching medical records: an area where search is particularly challenging yet can have significant benefits to our society.
Resumo:
Semantic space models of word meaning derived from co-occurrence statistics within a corpus of documents, such as the Hyperspace Analogous to Language (HAL) model, have been proposed in the past. While word similarity can be computed using these models, it is not clear how semantic spaces derived from different sets of documents can be compared. In this paper, we focus on this problem, and we revisit the proposal of using semantic subspace distance measurements [1]. In particular, we outline the research questions that still need to be addressed to investigate and validate these distance measures. Then, we describe our plans for future research.
Resumo:
Semantic Space models, which provide a numerical representation of words’ meaning extracted from corpus of documents, have been formalized in terms of Hermitian operators over real valued Hilbert spaces by Bruza et al. [1]. The collapse of a word into a particular meaning has been investigated applying the notion of quantum collapse of superpositional states [2]. While the semantic association between words in a Semantic Space can be computed by means of the Minkowski distance [3] or the cosine of the angle between the vector representation of each pair of words, a new procedure is needed in order to establish relations between two or more Semantic Spaces. We address the question: how can the distance between different Semantic Spaces be computed? By representing each Semantic Space as a subspace of a more general Hilbert space, the relationship between Semantic Spaces can be computed by means of the subspace distance. Such distance needs to take into account the difference in the dimensions between subspaces. The availability of a distance for comparing different Semantic Subspaces would enable to achieve a deeper understanding about the geometry of Semantic Spaces which would possibly translate into better effectiveness in Information Retrieval tasks.
Resumo:
The location of previously unseen and unregistered individuals in complex camera networks from semantic descriptions is a time consuming and often inaccurate process carried out by human operators, or security staff on the ground. To promote the development and evaluation of automated semantic description based localisation systems, we present a new, publicly available, unconstrained 110 sequence database, collected from 6 stationary cameras. Each sequence contains detailed semantic information for a single search subject who appears in the clip (gender, age, height, build, hair and skin colour, clothing type, texture and colour), and between 21 and 290 frames for each clip are annotated with the target subject location (over 11,000 frames are annotated in total). A novel approach for localising a person given a semantic query is also proposed and demonstrated on this database. The proposed approach incorporates clothing colour and type (for clothing worn below the waist), as well as height and build to detect people. A method to assess the quality of candidate regions, as well as a symmetry driven approach to aid in modelling clothing on the lower half of the body, is proposed within this approach. An evaluation on the proposed dataset shows that a relative improvement in localisation accuracy of up to 21 is achieved over the baseline technique.