362 resultados para natural language


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces PartSS, a new partition-based fil- tering for tasks performing string comparisons under edit distance constraints. PartSS offers improvements over the state-of-the-art method NGPP with the implementation of a new partitioning scheme and also improves filtering abil- ities by exploiting theoretical results on shifting and scaling ranges, thus accelerating the rate of calculating edit distance between strings. PartSS filtering has been implemented within two major tasks of data integration: similarity join and approximate membership extraction under edit distance constraints. The evaluation on an extensive range of real-world datasets demonstrates major gain in efficiency over NGPP and QGrams approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many existing information retrieval models do not explicitly take into account in- formation about word associations. Our approach makes use of rst and second order relationships found in natural language, known as syntagmatic and paradigmatic associ- ations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approach achieves statistically sig- ni cant improvements in MAP (0.158) and P@20 (0.396) over our baseline model. The ERR@20 and nDCG@20 of our system was 0.249 and 0.192 respectively. Our results and discussion suggest that information about both syntagamtic and paradigmatic associa- tions can assist with improving retrieval eectiveness on ad hoc retrieval.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the explosive growth of resources available through the Internet, information mismatching and overload have become a severe concern to users. Web users are commonly overwhelmed by huge volume of information and are faced with the challenge of finding the most relevant and reliable information in a timely manner. Personalised information gathering and recommender systems represent state-of-the-art tools for efficient selection of the most relevant and reliable information resources, and the interest in such systems has increased dramatically over the last few years. However, web personalization has not yet been well-exploited; difficulties arise while selecting resources through recommender systems from a technological and social perspective. Aiming to promote high quality research in order to overcome these challenges, this paper provides a comprehensive survey on the recent work and achievements in the areas of personalised web information gathering and recommender systems. The report covers concept-based techniques exploited in personalised information gathering and recommender systems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For humans and robots to communicate using natural language it is necessary for the robots to develop concepts and associated terms that correspond to the human use of words. Time and space are foundational concepts in human language, and to develop a set of words that correspond to human notions of time and space, it is necessary to take into account the way that they are used in natural human conversations, where terms and phrases such as `soon', `in a while', or `near' are often used. We present language learning robots called Lingodroids that can learn and use simple terms for time and space. In previous work, the Lingodroids were able to learn terms for space. In this work we extend their abilities by adding temporal variables which allow them to learn terms for time. The robots build their own maps of the world and interact socially to form a shared lexicon for location and duration terms. The robots successfully use the shared lexicons to communicate places and times to meet again.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This project was a step forward in developing and evaluating a novel, mathematical model that can deduce the meaning of words based on their use in language. This model can be applied to a wide range of natural language applications, including the information seeking process most of us undertake on a daily basis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The INEX workshop is concerned with evaluating the effectiveness of XML retrieval systems. In 2004 a natural language query task was added to the INEX Ad hoc track. Standard INEX Ad hoc topic titles are specified in NEXI -- a simplified and restricted subset of XPath, with a similar feel, and yet with a distinct IR flavour and interpretation. The syntax of NEXI is rigid and it imposes some limitations on the kind of information need that it can faithfully capture. At INEX 2004 the NLP question to be answered was simple -- is it practical to use a natural language query that is the equivalent of the formal NEXI title? The results of this experiment are reported and some information on the future direction of the NLP task is presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Process models expressed in BPMN typically rely on a small subset of all available symbols. In our 2008 study, we examined the composition of these subsets, and found that the distribution of BPMN symbols in practice closely resembles the frequency distribution of words in natural language. We offered some suggestions based on our findings, how to make the use of BPMN more manageable and also outlined ideas for further development of BPMN. Since this paper was published it has provoked spirited debate in the BPM practitioner community, prompted the definition of a modeling standard in US government, and helped shape the next generation of the BPMN standard.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many successful query expansion techniques ignore information about the term dependencies that exist within natural language. However, researchers have recently demonstrated that consistent and significant improvements in retrieval effectiveness can be achieved by explicitly modelling term dependencies within the query expansion process. This has created an increased interest in dependency-based models. State-of-the-art dependency-based approaches primarily model term associations known within structural linguistics as syntagmatic associations, which are formed when terms co-occur together more often than by chance. However, structural linguistics proposes that the meaning of a word is also dependent on its paradigmatic associations, which are formed between words that can substitute for each other without effecting the acceptability of a sentence. Given the reliance on word meanings when a user formulates their query, our approach takes the novel step of modelling both syntagmatic and paradigmatic associations within the query expansion process based on the (pseudo) relevant documents returned in web search. The results demonstrate that this approach can provide significant improvements in web re- trieval effectiveness when compared to a strong benchmark retrieval system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Australian e-Health Research Centre (AEHRC) recently participated in the ShARe/CLEF eHealth Evaluation Lab Task 1. The goal of this task is to individuate mentions of disorders in free-text electronic health records and map disorders to SNOMED CT concepts in the UMLS metathesaurus. This paper details our participation to this ShARe/CLEF task. Our approaches are based on using the clinical natural language processing tool Metamap and Conditional Random Fields (CRF) to individuate mentions of disorders and then to map those to SNOMED CT concepts. Empirical results obtained on the 2013 ShARe/CLEF task highlight that our instance of Metamap (after ltering irrelevant semantic types), although achieving a high level of precision, is only able to identify a small amount of disorders (about 21% to 28%) from free-text health records. On the other hand, the addition of the CRF models allows for a much higher recall (57% to 79%) of disorders from free-text, without sensible detriment in precision. When evaluating the accuracy of the mapping of disorders to SNOMED CT concepts in the UMLS, we observe that the mapping obtained by our ltered instance of Metamap delivers state-of-the-art e ectiveness if only spans individuated by our system are considered (`relaxed' accuracy).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Expert searchers engage with information as information brokers, researchers, reference librarians, information architects, faculty who teach advanced search, and in a variety of other information-intensive professions. Their experiences are characterized by a profound understanding of information concepts and skills and they have an agile ability to apply this knowledge to interacting with and having an impact on the information environment. This study explored the learning experiences of searchers to understand the acquisition of search expertise. The research question was: What can be learned about becoming an expert searcher from the learning experiences of proficient novice searchers and highly experienced searchers? The key objectives were: (1) to explore the existence of threshold concepts in search expertise; (2) to improve our understanding of how search expertise is acquired and how novice searchers, intent on becoming experts, can learn to search in more expertlike ways. The participant sample drew from two population groups: (1) highly experienced searchers with a minimum of 20 years of relevant professional experience, including LIS faculty who teach advanced search, information brokers, and search engine developers (11 subjects); and (2) MLIS students who had completed coursework in information retrieval and online searching and demonstrated exceptional ability (9 subjects). Using these two groups allowed a nuanced understanding of the experience of learning to search in expertlike ways, with data from those who search at a very high level as well as those who may be actively developing expertise. The study used semi-structured interviews, search tasks with think-aloud narratives, and talk-after protocols. Searches were screen-captured with simultaneous audio-recording of the think-aloud narrative. Data were coded and analyzed using NVivo9 and manually. Grounded theory allowed categories and themes to emerge from the data. Categories represented conceptual knowledge and attributes of expert searchers. In accord with grounded theory method, once theoretical saturation was achieved, during the final stage of analysis the data were viewed through lenses of existing theoretical frameworks. For this study, threshold concept theory (Meyer & Land, 2003) was used to explore which concepts might be threshold concepts. Threshold concepts have been used to explore transformative learning portals in subjects ranging from economics to mathematics. A threshold concept has five defining characteristics: transformative (causing a shift in perception), irreversible (unlikely to be forgotten), integrative (unifying separate concepts), troublesome (initially counter-intuitive), and may be bounded. Themes that emerged provided evidence of four concepts which had the characteristics of threshold concepts. These were: information environment: the total information environment is perceived and understood; information structures: content, index structures, and retrieval algorithms are understood; information vocabularies: fluency in search behaviors related to language, including natural language, controlled vocabulary, and finesse using proximity, truncation, and other language-based tools. The fourth threshold concept was concept fusion, the integration of the other three threshold concepts and further defined by three properties: visioning (anticipating next moves), being light on one's 'search feet' (dancing property), and profound ontological shift (identity as searcher). In addition to the threshold concepts, findings were reported that were not concept-based, including praxes and traits of expert searchers. A model of search expertise is proposed with the four threshold concepts at its core that also integrates the traits and praxes elicited from the study, attributes which are likewise long recognized in LIS research as present in professional searchers. The research provides a deeper understanding of the transformative learning experiences involved in the acquisition of search expertise. It adds to our understanding of search expertise in the context of today's information environment and has implications for teaching advanced search, for research more broadly within library and information science, and for methodologies used to explore threshold concepts.