21 resultados para Lingual Frenum
em Queensland University of Technology - ePrints Archive
Resumo:
At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-lingual document name triangulation performs very well. The evaluation shows encouraging results for our system.
Resumo:
This paper presents an overview of NTCIR-9 Cross-lingual Link Discovery (Crosslink) task. The overview includes: the motivation of cross-lingual link discovery; the Crosslink task definition; the run submission specification; the assessment and evaluation framework; the evaluation metrics; and the evaluation results of submitted runs. Cross-lingual link discovery (CLLD) is a way of automatically finding potential links between documents in different languages. The goal of this task is to create a reusable resource for evaluating automated CLLD approaches. The results of this research can be used in building and refining systems for automated link discovery. The task is focused on linking between English source documents and Chinese, Korean, and Japanese target documents.
Resumo:
This paper describes the evaluation in benchmarking the effectiveness of cross-lingual link discovery (CLLD). Cross lingual link discovery is a way of automatically finding prospective links between documents in different languages, which is particularly helpful for knowledge discovery of different language domains. A CLLD evaluation framework is proposed for system performance benchmarking. The framework includes standard document collections, evaluation metrics, and link assessment and evaluation tools. The evaluation methods described in this paper have been utilised to quantify the system performance at NTCIR-9 Crosslink task. It is shown that using the manual assessment for generating gold standard can deliver a more reliable evaluation result.
Resumo:
In this paper we examine automated Chinese to English link discovery in Wikipedia and the effects of Chinese segmentation and Chinese to English translation on the hyperlink recommendation. Our experimental results show that the implemented link discovery framework can effectively recommend Chinese-to-English cross-lingual links. The techniques described here can assist bi-lingual users where a particular topic is not covered in Chinese, is not equally covered in both languages, or is biased in one language; as well as for language learning.
Resumo:
Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case with Wikipedia. Techniques for identifying new and topically relevant cross-lingual links are a current topic of interest at NTCIR where the CrossLink task has been running since the 2011 NTCIR-9. This paper presents the evaluation framework for benchmarking algorithms for cross-lingual link discovery evaluated in the context of NTCIR-9. This framework includes topics, document collections, assessments, metrics, and a toolkit for pooling, assessment, and evaluation. The assessments are further divided into two separate sets: manual assessments performed by human assessors; and automatic assessments based on links extracted from Wikipedia itself. Using this framework we show that manual assessment is more robust than automatic assessment in the context of cross-lingual link discovery.
Resumo:
This paper presents an overview of NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) task. For the task, we continued using the evaluation framework developed for the NTCIR-9 CrossLink-1 task. Overall, recommended links were evaluated at two levels (file-to-file and anchor-to-file); and system performance was evaluated with metrics: LMAP, R-Prec and P@N.
Resumo:
This creative writing work was selected for publication in a bi-lingual anthology, published in China, as suitable to be culturally applicable to both Chinese and Australian social contexts. The poem raises six social/ethical issues and comments on them. It is based on research into Chinese traditional poetry that focuses on an image, and after each image this poem provides an ethical comment. It is based in the ethical hypothesis that moral evaluation of individual and social behaviour can not be achieved without ethical judgement which questions social norms. In particular, the poem questions the validity of fundamentalism – the belief in religious, scientific and moral absolutes. This is a key issue in contemporary research into the effect of religion on politics. It also draws on contemporary psychological theory, especially the concept of narcissism. The sociological basis of the work is in drawing parallels between eastern and western ethical issues, stressing similarity by inference. The imagery on which the poem is based selects objects such a single ‘stone’ that take on symbolic connotations common to both Australian and Chinese readers. This is innovative, since very little creative writing has been dome to address commonalities between Australian and Chinese ethical thinking, especially by adopting Chinese motifs.
Resumo:
In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation,online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment results from the NTCIR-8 evaluation forum. This mechanism achieved 95% accuracy in NEs translation and 0.3756 MAP in English–Chinese cross-lingual information retrieval of QA.
Resumo:
In Australia, there is only one, newly established, dedicated mental health service catering specifically for the signing *Deaf community. It is staffed by four part-time hearing professionals and based in Brisbane. There are currently no Deaf psychologists or psychiatrists and there is no valid or reliable empirical evidence on outcomes for Deaf people accessing specialised or mainstream mental health services. Further compounding these issues, is the fact that there are no sign language versions of the most common standardised mental health or psychological instruments available to clinicians in Australia. Contemporary counselling literature is acknowledging the role of the therapeutic alliance and the impact of 'common factors' on therapeutic outcomes. However, these issues are complicated by the relationship between the Deaf client and the hearing therapist being a cross-cultural exchange. The disability model of deafness is contentious and few professionals in Australia have the requisite knowledge and understanding of deafness from a cultural perspective to attend to the therapeutic relationship with this in mind. Consequently, Deaf people are severely disadvantaged by the current lack of services, resources and skilled professionals in the field of deafness and psychology in this country. The primary aim of the following program of research has been to propose a model for culturally affirmative service delivery and to provide clinicians with tools to evaluate the effect of their therapeutic work with Deaf people seeking mental health treatment. The research document is presented as a thesis by publication and comprises four specific objectives formulated in response to the lack of existing services and resources. The first objective was to explore the use of social constructionist counselling techniques and a reflecting team with Deaf clients, hearing therapists and an interpreter. Following the establishment of a pilot counselling clinic, indepth semi-structured interviews were conducted with two long-term clients following the one year pilot of this service. These interviews generated recommendations for the development of a new 'enriched' model of counselling to be implemented and evaluated in later stages of the research program. The second objective was to identify appropriate psychometric measures that could be translated into Australian Sign Language (Auslan) for research into efficacy, effectiveness and counselling outcomes. Two instruments were identified as potentially suitable; the Outcome Rating Scale (ORS), a measure of global functioning, and the Session Rating Scale (SRS), a measure of therapeutic alliance. A specialised team of bi-lingual and bi-cultural interpreters, native signers and the primary researcher for this thesis, produced the ORS-Auslan and the SRS-Auslan in DVD format, using the translation and back-translation process. The third objective was to establish the validity and reliability of these new Auslan measures based on normative data from the Deaf community. Data from the ORS-Auslan was collected from one clinical and one non-clinical sample of Deaf people. Statistical analyses revealed that the ORS-Auslan is reliable, valid and adequately distinguishes between clinical and non-clinical presentations. Furthermore, construct validity has been established using a yet to be validated sign language version of the Depression, Anxiety and Stress Scale-21 items (DASS-21), providing a platform for further research using the DASS-21 with Deaf people. The fourth objective was to evaluate counselling outcomes following the implementation of an enriched counselling service, based on the findings generated by the first objective, and using the newly translated Auslan measures. A second university counselling clinic was established and implemented over the course of one year. Practice-based evidence guided the research and the ORS-Auslan and the SRS-Auslan were administered at every session and provided outcome data on Deaf clients' global functioning. Data from six clients over the course of ten months indicated that this culturally affirmative model was an effective approach for these six clients. This is the first time that outcome data have been collected in Australia using valid and reliable Auslan measures to establish preliminary evidence for the effectiveness of any therapeutic intervention for clinical work with adult, signing Deaf clients. The research generated by this thesis contributes theoretical knowledge, professional development and practical resources that can be used by a variety of mental health clinicians in the context of mental health service delivery to Deaf clients in Australia.
Resumo:
In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.
Resumo:
Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.