144 resultados para Semantic fields
Resumo:
The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.
Resumo:
Studies of Heritage Language learners‟ commitment and their ethnic identity are increasing, yet there is scant sociological research addressing topics relating to Chinese Heritage Language learners. Drawing on Bourdieu‟s signature notions of „habitus‟, „capital‟, and „field‟, this mixed methods study investigates two problems: (1) impacts of “Chineseness” and accessible resources on Chinese Heritage Language proficiency of young Chinese Australian adults in urban Australia; and (2) the meanings of Chinese Heritage Language to these young people.
Resumo:
Entity-oriented search has become an essential component of modern search engines. It focuses on retrieving a list of entities or information about the specific entities instead of documents. In this paper, we study the problem of finding entity related information, referred to as attribute-value pairs, that play a significant role in searching target entities. We propose a novel decomposition framework combining reduced relations and the discriminative model, Conditional Random Field (CRF), for automatically finding entity-related attribute-value pairs from free text documents. This decomposition framework allows us to locate potential text fragments and identify the hidden semantics, in the form of attribute-value pairs for user queries. Empirical analysis shows that the decomposition framework outperforms pattern-based approaches due to its capability of effective integration of syntactic and semantic features.
Resumo:
Free association norms indicate that words are organized into semantic/associative neighborhoods within a larger network of words and links that bind the net together. We present evidence indicating that memory for a recent word event can depend on implicitly and simultaneously activating related words in its neighborhood. Processing a word during encoding primes its network representation as a function of the density of the links in its neighborhood. Such priming increases recall and recognition and can have long lasting effects when the word is processed in working memory. Evidence for this phenomenon is reviewed in extralist cuing, primed free association, intralist cuing, and single-item recognition tasks. The findings also show that when a related word is presented to cue the recall of a studied word, the cue activates it in an array of related words that distract and reduce the probability of its selection. The activation of the semantic network produces priming benefits during encoding and search costs during retrieval. In extralist cuing recall is a negative function of cue-to-distracter strength and a positive function of neighborhood density, cue-to-target strength, and target-to cue strength. We show how four measures derived from the network can be combined and used to predict memory performance. These measures play different roles in different tasks indicating that the contribution of the semantic network varies with the context provided by the task. We evaluate spreading activation and quantum-like entanglement explanations for the priming effect produced by neighborhood density.
Resumo:
Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.
Resumo:
Modelling how a word is activated in human memory is an important requirement for determining the probability of recall of a word in an extra-list cueing experiment. Previous research assumed a quantum-like model in which the semantic network was modelled as entangled qubits, however the level of activation was clearly being over-estimated. This paper explores three variations of this model, each of which are distinguished by a scaling factor designed to compensate the overestimation.
Resumo:
Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.
Resumo:
Introduction: There is a recognised relationship between dry weather conditions and increased risk of anterior cruciate ligament (ACL) injury. Previous studies have identified 28 day evaporation as an important weather-based predictor of non-contact ACL injuries in professional Australian Football League matches. The mechanism of non-contact injury to the ACL is believed to increased traction and impact forces between footwear and playing surface. Ground hardness and the amount and quality of grass are factors that would most likely influence this and are inturn, related to the soil moisture content and prevailing weather conditions. This paper explores the relationship between soil moisture content, preceding weather conditions and the Clegg Soil Impact Test (CSIT) which is an internationally recognised standard measure of ground hardness for sports fields. Methodology: The 2.25 kg Clegg Soil Impact Test and a pair of 12 cm soil moisture probes were used to measure ground hardness and percentage moisture content. Five football fields were surveyed at 13 prescribed sites just before seven football matches from October 2008 to January 2009 (an FC Women’s WLeague team). Weather conditions recorded at the nearest weather station were obtained from the Bureau of Meteorology website and total rainfall less evaporation was calculated for 7 and 28 days prior to each match. All non-contact injuries occurring during match play and their location on the field were recorded. Results/conclusions: Ground hardness varied between CSIT 5 and 17 (x10G) (8 is considered a good value for sports fields). Variations within fields were typically greatest in the centre and goal areas. Soil moisture ranged from 3 to 40% with some fields requiring twice the moisture content of others to maintain similar CSIT values. There was a non-linear, negative relationship for ground hardness versus moisture content and a linear relationship with weather (R2, of 0.30 and 0.34, respectively). Three non-contact ACL injuries occurred during the season. Two of these were associated with hard and variable ground conditions.
Resumo:
In this paper we propose a method to generate a large scale and accurate dense 3D semantic map of street scenes. A dense 3D semantic model of the environment can significantly improve a number of robotic applications such as autonomous driving, navigation or localisation. Instead of using offline trained classifiers for semantic segmentation, our approach employs a data-driven, nonparametric method to parse scenes which easily scale to a large environment and generalise to different scenes. We use stereo image pairs collected from cameras mounted on a moving car to produce dense depth maps which are combined into a global 3D reconstruction using camera poses from stereo visual odometry. Simultaneously, 2D automatic semantic segmentation using a nonparametric scene parsing method is fused into the 3D model. Furthermore, the resultant 3D semantic model is improved with the consideration of moving objects in the scene. We demonstrate our method on the publicly available KITTI dataset and evaluate the performance against manually generated ground truth.
Resumo:
Text categorisation is challenging, due to the complex structure with heterogeneous, changing topics in documents. The performance of text categorisation relies on the quality of samples, effectiveness of document features, and the topic coverage of categories, depending on the employing strategies; supervised or unsupervised; single labelled or multi-labelled. Attempting to deal with these reliability issues in text categorisation, we propose an unsupervised multi-labelled text categorisation approach that maps the local knowledge in documents to global knowledge in a world ontology to optimise categorisation result. The conceptual framework of the approach consists of three modules; pattern mining for feature extraction; feature-subject mapping for categorisation; concept generalisation for optimised categorisation. The approach has been promisingly evaluated by compared with typical text categorisation methods, based on the ground truth encoded by human experts.
Resumo:
Over the last decade, the majority of existing search techniques is either keyword- based or category-based, resulting in unsatisfactory effectiveness. Meanwhile, studies have illustrated that more than 80% of users preferred personalized search results. As a result, many studies paid a great deal of efforts (referred to as col- laborative filtering) investigating on personalized notions for enhancing retrieval performance. One of the fundamental yet most challenging steps is to capture precise user information needs. Most Web users are inexperienced or lack the capability to express their needs properly, whereas the existent retrieval systems are highly sensitive to vocabulary. Researchers have increasingly proposed the utilization of ontology-based tech- niques to improve current mining approaches. The related techniques are not only able to refine search intentions among specific generic domains, but also to access new knowledge by tracking semantic relations. In recent years, some researchers have attempted to build ontological user profiles according to discovered user background knowledge. The knowledge is considered to be both global and lo- cal analyses, which aim to produce tailored ontologies by a group of concepts. However, a key problem here that has not been addressed is: how to accurately match diverse local information to universal global knowledge. This research conducts a theoretical study on the use of personalized ontolo- gies to enhance text mining performance. The objective is to understand user information needs by a \bag-of-concepts" rather than \words". The concepts are gathered from a general world knowledge base named the Library of Congress Subject Headings. To return desirable search results, a novel ontology-based mining approach is introduced to discover accurate search intentions and learn personalized ontologies as user profiles. The approach can not only pinpoint users' individual intentions in a rough hierarchical structure, but can also in- terpret their needs by a set of acknowledged concepts. Along with global and local analyses, another solid concept matching approach is carried out to address about the mismatch between local information and world knowledge. Relevance features produced by the Relevance Feature Discovery model, are determined as representatives of local information. These features have been proven as the best alternative for user queries to avoid ambiguity and consistently outperform the features extracted by other filtering models. The two attempt-to-proposed ap- proaches are both evaluated by a scientific evaluation with the standard Reuters Corpus Volume 1 testing set. A comprehensive comparison is made with a num- ber of the state-of-the art baseline models, including TF-IDF, Rocchio, Okapi BM25, the deploying Pattern Taxonomy Model, and an ontology-based model. The gathered results indicate that the top precision can be improved remarkably with the proposed ontology mining approach, where the matching approach is successful and achieves significant improvements in most information filtering measurements. This research contributes to the fields of ontological filtering, user profiling, and knowledge representation. The related outputs are critical when systems are expected to return proper mining results and provide personalized services. The scientific findings have the potential to facilitate the design of advanced preference mining models, where impact on people's daily lives.
Resumo:
Abstract An experimental dataset representing a typical flow field in a stormwater gross pollutant trap (GPT) was visualised. A technique was developed to apply the image-based flow visualisation (IBFV) algorithm to the raw dataset. Particle image velocimetry (PIV) software was previously used to capture the flow field data by tracking neutrally buoyant particles with a high speed camera. The dataset consisted of scattered 2D point velocity vectors and the IBFV visualisation facilitates flow feature characterisation within the GPT. The flow features played a pivotal role in understanding stormwater pollutant capture and retention behaviour within the GPT. It was found that the IBFV animations revealed otherwise unnoticed flow features and experimental artefacts. For example, a circular tracer marker in the IBFV program visually highlighted streamlines to investigate the possible flow paths of pollutants entering the GPT. The investigated flow paths were compared with the behaviour of pollutants monitored during experiments.
Resumo:
This study presented a novel method for purification of three different grades of diatomite from China by scrubbing technique using sodiumhexametaphosphate (SHMP) as dispersant combinedwith centrifugation. Effects of pH value and dispersant amount on the grade of purified diatomitewere studied and the optimumexperimental conditions were obtained. The characterizations of original diatomite and derived products after purification were determined by scanning electron microscopy (SEM), X-ray diffraction (XRD), infrared spectroscopy (IR) and specific surface area analyzer (BET). The results indicated that the pore size distribution, impurity content and bulk density of purified diatomite were improved significantly. The dispersive effect of pH and SHMP on the separation of diatomite from clay minerals was discussed systematically through zeta potential test. Additionally, a possible purification mechanism was proposed in the light of the obtained experimental results.
Resumo:
The Smart Fields programme has been active in Shell over the last decade and has given large benefits. In order to understand the value and to underpin strategies for the future implementation programme, a study was carried out to quantify the benefits to date. This focused on actually achieved value, through increased production or lower costs. This provided an estimate of the total value achieved to date. Future benefits such as increased reserves or continued production gain were recorded separately. The paper describes the process followed in the benefits quantification. It identifies the key solutions and technologies and describes the mechanism used to understand the relation between solutions and value. Examples have been given of value from various assets around the world, in both existing fields and in green fields. Finally, the study provided the methodology for tracking of value. This helps Shell to estimate and track the benefits of the Smart Fields programme at company scale.