553 resultados para Language Model
Resumo:
For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.
Resumo:
This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of information retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the authoritativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signi cantly improves the e ectiveness of the language model baseline. Readability priors seem to increase retrieval e ectiveness for graded relevance at early ranks (nDCG@5, but not precision), but no improvements are found at later ranks and when considering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.
Resumo:
This paper presents our system to address the CogALex-IV 2014 shared task of identifying a single word most semantically related to a group of 5 words (queries). Our system uses an implementation of a neural language model and identifies the answer word by finding the most semantically similar word representation to the sum of the query representations. It is a fully unsupervised system which learns on around 20% of the UkWaC corpus. It correctly identifies 85 exact correct targets out of 2,000 queries, 285 approximate targets in lists of 5 suggestions.
Resumo:
Recent advances in neural language models have contributed new methods for learning distributed vector representations of words (also called word embeddings). Two such methods are the continuous bag-of-words model and the skipgram model. These methods have been shown to produce embeddings that capture higher order relationships between words that are highly effective in natural language processing tasks involving the use of word similarity and word analogy. Despite these promising results, there has been little analysis of the use of these word embeddings for retrieval. Motivated by these observations, in this paper, we set out to determine how these word embeddings can be used within a retrieval model and what the benefit might be. To this aim, we use neural word embeddings within the well known translation language model for information retrieval. This language model captures implicit semantic relations between the words in queries and those in relevant documents, thus producing more accurate estimations of document relevance. The word embeddings used to estimate neural language models produce translations that differ from previous translation language model approaches; differences that deliver improvements in retrieval effectiveness. The models are robust to choices made in building word embeddings and, even more so, our results show that embeddings do not even need to be produced from the same corpus being used for retrieval.
Resumo:
Recommender systems assist users in finding what they want. The challenging issue is how to efficiently acquire user preferences or user information needs for building personalized recommender systems. This research explores the acquisition of user preferences using data taxonomy information to enhance personalized recommendations for alleviating cold-start problem. A concept hierarchy model is proposed, which provides a two-dimensional hierarchy for acquiring user preferences. The language model is also extended for the proposed hierarchy in order to generate an effective recommender algorithm. Both Amazon.com book and music datasets are used to evaluate the proposed approach, and the experimental results show that the proposed approach is promising.
Resumo:
As business process management technology matures, organisations acquire more and more business process models. The management of the resulting collections of process models poses real challenges. One of these challenges concerns model retrieval where support should be provided for the formulation and efficient execution of business process model queries. As queries based on only structural information cannot deal with all querying requirements in practice, there should be support for queries that require knowledge of process model semantics. In this paper we formally define a process model query language that is based on semantic relationships between tasks in process models and is independent of any particular process modelling notation.
Resumo:
The design and development of process-aware information systems is often supported by specifying requirements as business process models. Although this approach is generally accepted as an effective strategy, it remains a fundamental challenge to adequately validate these models given the diverging skill set of domain experts and system analysts. As domain experts often do not feel confident in judging the correctness and completeness of process models that system analysts create, the validation often has to regress to a discourse using natural language. In order to support such a discourse appropriately, so-called verbalization techniques have been defined for different types of conceptual models. However, there is currently no sophisticated technique available that is capable of generating natural-looking text from process models. In this paper, we address this research gap and propose a technique for generating natural language texts from business process models. A comparison with manually created process descriptions demonstrates that the generated texts are superior in terms of completeness, structure, and linguistic complexity. An evaluation with users further demonstrates that the texts are very understandable and effectively allow the reader to infer the process model semantics. Hence, the generated texts represent a useful input for process model validation.
Resumo:
This constructivist theory-led case study explored how the term language learner autonomy (LLA) is interpreted and the appropriate pedagogy to foster LLA in the Vietnamese higher education context. Evidence through the exploration of the government policies and the cases of three EFL classes confirms the interpretation that learner autonomy and language acquisition are mutually supported. The study has proposed project work as a potential model while demonstrating the role of the teacher and the use of target language as mediators to enhance LLA in the local context. Findings of the study contribute a theoretical and pedagogical justification for encouraging LLA in Vietnam and other similar contexts.
Resumo:
Providing support for reversible transformations as a basis for round-trip engineering is a significant challenge in model transformation research. While there are a number of current approaches, they require the underlying transformation to exhibit an injective behaviour when reversing changes. This however, does not serve all practical transformations well. In this paper, we present a novel approach to round-trip engineering that does not place restrictions on the nature of the underlying transformation. Based on abductive logic programming, it allows us to compute a set of legitimate source changes that equate to a given change to the target model. Encouraging results are derived from an initial prototype that supports most concepts of the Tefkat transformation language
Resumo:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
Resumo:
This paper examines the interactional phenomenon of justification as it is produced in young children’s language. A justification provides a reason for one’s position and can be produced in children’s language at an early age. There are various pragmatic reasons for justifications. For example, justifications may be drawn upon by members to compensate for the disruption of the existing social order or to explain something that is possibly questionable. Justifications are also drawn upon to extend or close disputes. This study uses the analytical techniques of conversation analysis and membership categorisation to analyse video-recorded and transcribed interactions of young children (aged 4-6 years) in a preparatory classroom in a primary school in Australia. The focus is an episode that occurred within the block play area of the classroom that involved a dispute of ownership relating to a small, wooden plank. In analysing this dispute, justifications were frequent occurrences and the young participants drew upon justificatory devices in their everyday arguments. As the turns surrounding the justificatory language were examined, a pattern emerged: in each excerpt observed, a justification arose in response to a challenge. This pattern provided the basis for developing a model that helped to discern where, why and what type of justifications occurred in the interaction. To depict this interactional phenomenon, the model of ‘if x, then y’ was used, ‘x’ referring to the challenge or prompt, and ‘y’ referring to the justificatory response. Justifications related to the concepts of ownership and were used as devices by those engaged in disputes to support their positions and provide reasons for their actions. The children drew upon these child-constructed rules as resources to use in disputes with their peers, in order to construct and maintain the social order of the block area in the classroom.
Resumo:
Educational assessment was a worldwide commonplace practice in the last century. With the theoretical underpinnings of education shifting from behaviourism and social efficiency to constructivism and cognitive theories in the past two decades, the assessment theories and practices show a widespread changing movement. The emergent assessment paradigm, with a futurist perspective, indicates a deviation away from the prevailing large scale high-stakes standardised testing and an inclination towards classroom-based formative assessment. Innovations and reforms initiated in attempts to achieve better education outcomes for a sustainable future via more developed learning and assessment theories have included the 2007 College English Reform Program (CERP) in Chinese higher education context. This paper focuses on the College English Test (CET) - the national English as a Foreign Language (EFL) testing system for non-English majors at tertiary level in China. It seeks to explore the roles that the CET played in the past two College English curriculum reforms, and the new role that testing and assessment assumed in the newly launched reform. The paper holds that the CET was operationalised to uplift the standards. However, the extended use of this standardised testing system brings constraints as well as negative washback effects on the tertiary EFL education. Therefore in the newly launched reform -CERP, a new assessment model which combines summative and formative assessment approaches is proposed. The testing and assessment, assumed a new role - to engender desirable education outcomes. The question asked is: will the mixed approach to formative and summative assessment provide the intended cure to the agony that tertiary EFL education in China has long been suffering - spending much time, yet achieving little effects? The paper reports the progresses and challenges as informed by the available research literature, yet asserts a lot needs to be explored on the potential of the assessment mix in this examination tradition deep-rooted and examination-obsessed society.
Resumo:
While Business Process Management (BPM) is an established discipline, the increased adoption of BPM technology in recent years has introduced new challenges. One challenge concerns dealing with process model complexity in order to improve the understanding of a process model by stakeholders and process analysts. Features for dealing with this complexity can be classified in two categories: 1) those that are solely concerned with the appearance of the model, and 2) those that in essence change the structure of the model. In this paper we focus on the former category and present a collection of patterns that generalize and conceptualize various existing features. The paper concludes with a detailed analysis of the degree of support of a number of state-of-the-art languages and language implementations for these patterns.
Resumo:
How social class factors into linguistic practices and use, language change and loss has been a major theme in postwar sociolinguistics and ethnography of communication, language planning and sociology of language. Key foci of linguistic and sociological research include the study of social class in everyday language use, media and institutional texts. A further concern is to understand the relationship between social class stratification, intergenerational social reproduction, and language variation. Bourdieu’s model of linguistic habitus and cultural capital offers a broad theoretical template for examining these relations, even as they are complicated by forces of economic and cultural globalization, new media and identity formations.