35 resultados para Language Model
Resumo:
The research reported here is an investigation into the problems of social and economic development of a multiethic and multicultural country which has the added challenge of adopting a non-indigenous code to facilitate the development process. Malaysia's power to negotiate outcomes favourable to the interest of the country is critical for the successful attainment of the goals and objectives of VISION2020. Therefore the mechanisms of the human resource development programme have to be efficacious. The three hypotheses of this study are as follows: 1. there is a fear that the problems and challenges posed by the development plans, have been conceptually trivialised; 2. based on (1) above there is a concern that solutions proposed are inadequate and inappropriate and 3. the outcome of both (1) and (2) can lead to the potential underachievement of national goals and objectives. The study proposes a complex model for conceptualising the problem which looks at the relationship between society and language, which any solutions proposed must take into proper consideration. The study looks at the mechanisms available for the smooth absorption of new Malaysian members to new and international communities. A large scale investigation was undertaken with the researcher functioning as a participant observer. An in-depth study of one particular educational ecology yielded approximately 38 hours of interviews and 100 questionnaires. These data were analysed both for explicit information and implicit implications. By some criteria national policies appear to be having the desired effect, and can be given a clean bill of health. By others it is clear that major adjustments would be necessary if the nation is to achieve its objectives in full. Based on the evidence gathered, thr study proposes an apprenticeship approach to training programmes for effective participation of new members in the new ecologies.
Resumo:
Time after time… and aspect and mood. Over the last twenty five years, the study of time, aspect and - to a lesser extent - mood acquisition has enjoyed increasing popularity and a constant widening of its scope. In such a teeming field, what can be the contribution of this book? We believe that it is unique in several respects. First, this volume encompasses studies from different theoretical frameworks: functionalism vs generativism or function-based vs form-based approaches. It also brings together various sub-fields (first and second language acquisition, child and adult acquisition, bilingualism) that tend to evolve in parallel rather than learn from each other. A further originality is that it focuses on a wide range of typologically different languages, and features less studied languages such as Korean and Bulgarian. Finally, the book gathers some well-established scholars, young researchers, and even research students, in a rich inter-generational exchange, that ensures the survival but also the renewal and the refreshment of the discipline. The book at a glance The first part of the volume is devoted to the study of child language acquisition in monolingual, impaired and bilingual acquisition, while the second part focuses on adult learners. In this section, we will provide an overview of each chapter. The first study by Aviya Hacohen explores the acquisition of compositional telicity in Hebrew L1. Her psycholinguistic approach contributes valuable data to refine theoretical accounts. Through an innovating methodology, she gathers information from adults and children on the influence of definiteness, number, and the mass vs countable distinction on the constitution of a telic interpretation of the verb phrase. She notices that the notion of definiteness is mastered by children as young as 10, while the mass/count distinction does not appear before 10;7. However, this does not entail an adult-like use of telicity. She therefore concludes that beyond definiteness and noun type, pragmatics may play an important role in the derivation of Hebrew compositional telicity. For the second chapter we move from a Semitic language to a Slavic one. Milena Kuehnast focuses on the acquisition of negative imperatives in Bulgarian, a form that presents the specificity of being grammatical only with the imperfective form of the verb. The study examines how 40 Bulgarian children distributed in two age-groups (15 between 2;11-3;11, and 25 between 4;00 and 5;00) develop with respect to the acquisition of imperfective viewpoints, and the use of imperfective morphology. It shows an evolution in the recourse to expression of force in the use of negative imperatives, as well as the influence of morphological complexity on the successful production of forms. With Yi-An Lin’s study, we concentrate both on another type of informant and of framework. Indeed, he studies the production of children suffering from Specific Language Impairment (SLI), a developmental language disorder the causes of which exclude cognitive impairment, psycho-emotional disturbance, and motor-articulatory disorders. Using the Leonard corpus in CLAN, Lin aims to test two competing accounts of SLI (the Agreement and Tense Omission Model [ATOM] and his own Phonetic Form Deficit Model [PFDM]) that conflicts on the role attributed to spellout in the impairment. Spellout is the point at which the Computational System for Human Language (CHL) passes over the most recently derived part of the derivation to the interface components, Phonetic Form (PF) and Logical Form (LF). ATOM claims that SLI sufferers have a deficit in their syntactic representation while PFDM suggests that the problem only occurs at the spellout level. After studying the corpus from the point of view of tense / agreement marking, case marking, argument-movement and auxiliary inversion, Lin finds further support for his model. Olga Gupol, Susan Rohstein and Sharon Armon-Lotem’s chapter offers a welcome bridge between child language acquisition and multilingualism. Their study explores the influence of intensive exposure to L2 Hebrew on the development of L1 Russian tense and aspect morphology through an elicited narrative. Their informants are 40 Russian-Hebrew sequential bilingual children distributed in two age groups 4;0 – 4;11 and 7;0 - 8;0. They come to the conclusion that bilingual children anchor their narratives in perfective like monolinguals. However, while aware of grammatical aspect, bilinguals lack the full form-function mapping and tend to overgeneralize the imperfective on the principles of simplicity (as imperfective are the least morphologically marked forms), universality (as it covers more functions) and interference. Rafael Salaberry opens the second section on foreign language learners. In his contribution, he reflects on the difficulty L2 learners of Spanish encounter when it comes to distinguishing between iterativity (conveyed with the use of the preterite) and habituality (expressed through the imperfect). He examines in turn the theoretical views that see, on the one hand, habituality as part of grammatical knowledge and iterativity as pragmatic knowledge, and on the other hand both habituality and iterativity as grammatical knowledge. He comes to the conclusion that the use of preterite as a default past tense marker may explain the impoverished system of aspectual distinctions, not only at beginners but also at advanced levels, which may indicate that the system is differentially represented among L1 and L2 speakers. Acquiring the vast array of functions conveyed by a form is therefore no mean feat, as confirmed by the next study. Based on the prototype theory, Kathleen Bardovi-Harlig’s chapter focuses on the development of the progressive in L2 English. It opens with an overview of the functions of the progressive in English. Then, a review of acquisition research on the progressive in English and other languages is provided. The bulk of the chapter reports on a longitudinal study of 16 learners of L2 English and shows how their use of the progressive expands from the prototypical uses of process and continuousness to the less prototypical uses of repetition and future. The study concludes that the progressive spreads in interlanguage in accordance with prototype accounts. However, it suggests additional stages, not predicted by the Aspect Hypothesis, in the development from activities and accomplishments at least for the meaning of repeatedness. A similar theoretical framework is adopted in the following chapter, but it deals with a lesser studied language. Hyun-Jin Kim revisits the claims of the Aspect Hypothesis in relation to the acquisition of L2 Korean by two L1 English learners. Inspired by studies on L2 Japanese, she focuses on the emergence and spread of the past / perfective marker ¬–ess- and the progressive – ko iss- in the interlanguage of her informants throughout their third and fourth semesters of study. The data collected through six sessions of conversational interviews and picture description tasks seem to support the Aspect Hypothesis. Indeed learners show a strong association between past tense and accomplishments / achievements at the start and a gradual extension to other types; a limited use of past / perfective marker with states and an affinity of progressive with activities / accomplishments and later achievements. In addition, - ko iss– moves from progressive to resultative in the specific category of Korean verbs meaning wear / carry. While the previous contributions focus on function, Evgeniya Sergeeva and Jean-Pierre Chevrot’s is interested in form. The authors explore the acquisition of verbal morphology in L2 French by 30 instructed native speakers of Russian distributed in a low and high levels. They use an elicitation task for verbs with different models of stem alternation and study how token frequency and base forms influence stem selection. The analysis shows that frequency affects correct production, especially among learners with high proficiency. As for substitution errors, it appears that forms with a simple structure are systematically more frequent than the target form they replace. When a complex form serves as a substitute, it is more frequent only when it is replacing another complex form. As regards the use of base forms, the 3rd person singular of the present – and to some extent the infinitive – play this role in the corpus. The authors therefore conclude that the processing of surface forms can be influenced positively or negatively by the frequency of the target forms and of other competing stems, and by the proximity of the target stem to a base form. Finally, Martin Howard’s contribution takes up the challenge of focusing on the poorer relation of the TAM system. On the basis of L2 French data obtained through sociolinguistic interviews, he studies the expression of futurity, conditional and subjunctive in three groups of university learners with classroom teaching only (two or three years of university teaching) or with a mixture of classroom teaching and naturalistic exposure (2 years at University + 1 year abroad). An analysis of relative frequencies leads him to suggest a continuum of use going from futurate present to conditional with past hypothetic conditional clauses in si, which needs to be confirmed by further studies. Acknowledgements The present volume was inspired by the conference Acquisition of Tense – Aspect – Mood in First and Second Language held on 9th and 10th February 2008 at Aston University (Birmingham, UK) where over 40 delegates from four continents and over a dozen countries met for lively and enjoyable discussions. This collection of papers was double peer-reviewed by an international scientific committee made of Kathleen Bardovi-Harlig (Indiana University), Christine Bozier (Lund Universitet), Alex Housen (Vrije Universiteit Brussel), Martin Howard (University College Cork), Florence Myles (Newcastle University), Urszula Paprocka (Catholic University of Lublin), †Clive Perdue (Université Paris 8), Michel Pierrard (Vrije Universiteit Brussel), Rafael Salaberry (University of Texas at Austin), Suzanne Schlyter (Lund Universitet), Richard Towell (Salford University), and Daniel Véronique (Université d’Aix-en-Provence). We are very much indebted to that scientific committee for their insightful input at each step of the project. We are also thankful for the financial support of the Association for French Language Studies through its workshop grant, and to the Aston Modern Languages Research Foundation for funding the proofreading of the manuscript.
Resumo:
Component-based development (CBD) has become an important emerging topic in the software engineering field. It promises long-sought-after benefits such as increased software reuse, reduced development time to market and, hence, reduced software production cost. Despite the huge potential, the lack of reasoning support and development environment of component modeling and verification may hinder its development. Methods and tools that can support component model analysis are highly appreciated by industry. Such a tool support should be fully automated as well as efficient. At the same time, the reasoning tool should scale up well as it may need to handle hundreds or even thousands of components that a modern software system may have. Furthermore, a distributed environment that can effectively manage and compose components is also desirable. In this paper, we present an approach to the modeling and verification of a newly proposed component model using Semantic Web languages and their reasoning tools. We use the Web Ontology Language and the Semantic Web Rule Language to precisely capture the inter-relationships and constraints among the entities in a component model. Semantic Web reasoning tools are deployed to perform automated analysis support of the component models. Moreover, we also proposed a service-oriented architecture (SOA)-based semantic web environment for CBD. The adoption of Semantic Web services and SOA make our component environment more reusable, scalable, dynamic and adaptive.
Resumo:
In this article, we describe and model the language classroom as a complex adaptive system (see Logan & Schumann, 2005). We argue that linear, categorical descriptions of classroom processes and interactions do not sufficiently explain the complex nature of classrooms, and cannot account for how classroom change occurs (or does not occur), over time. A relational model of classrooms is proposed which focuses on the relations between different elements (physical, environmental, cognitive, social) in the classroom and on how their interaction is crucial in understanding and describing classroom action.
Resumo:
This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.
Resumo:
Semantic Web Service, one of the most significant research areas within the Semantic Web vision, has attracted increasing attention from both the research community and industry. The Web Service Modelling Ontology (WSMO) has been proposed as an enabling framework for the total/partial automation of the tasks (e.g., discovery, selection, composition, mediation, execution, monitoring, etc.) involved in both intra- and inter-enterprise integration of Web services. To support the standardisation and tool support of WSMO, a formal model of the language is highly desirable. As several variants of WSMO have been proposed by the WSMO community, which are still under development, the syntax and semantics of WSMO should be formally defined to facilitate easy reuse and future development. In this paper, we present a formal Object-Z formal model of WSMO, where different aspects of the language have been precisely defined within one unified framework. This model not only provides a formal unambiguous model which can be used to develop tools and facilitate future development, but as demonstrated in this paper, can be used to identify and eliminate errors present in existing documentation.
Resumo:
This paper reports some of the more frequent language changes in Panjabi, the first language of bilingual Panjabi/English children in the West Midlands, UK. Spontaneous spoken data were collected in schools across both languages in three formatted elicitation procedures from 50 bilingual Panjabi/English-speaking children, aged 6–7 years old. Panjabi data from the children is analysed for lexical borrowings and code-switching with English. Several changes of vocabulary and word grammar patterns in Panjabi are identified, many due to interaction with English, and some due to developmental features of Panjabi. There is also evidence of pervasive changes of word order, suggesting a shift in Panjabi word order to that of English. Lexical choice is discussed in terms of language change rather than language deficit. The implications of a normative framework for comparison are explored. A psycholinguistic model interprets grammatical changes in Panjabi.
Resumo:
Current models of word production assume that words are stored as linear sequences of phonemes which are structured into syllables only at the moment of production. This is because syllable structure is always recoverable from the sequence of phonemes. In contrast, we present theoretical and empirical evidence that syllable structure is lexically represented. Storing syllable structure would have the advantage of making representations more stable and resistant to damage. On the other hand, re-syllabifications affect only a minimal part of phonological representations and occur only in some languages and depending on speech register. Evidence for these claims comes from analyses of aphasic errors which not only respect phonotactic constraints, but also avoid transformations which move the syllabic structure of the word further away from the original structure, even when equating for segmental complexity. This is true across tasks, types of errors, and, crucially, types of patients. The same syllabic effects are shown by apraxic patients and by phonological patients who have more central difficulties in retrieving phonological representations. If syllable structure was only computed after phoneme retrieval, it would have no way to influence the errors of phonological patients. Our results have implications for psycholinguistic and computational models of language as well as for clinical and educational practices.
Resumo:
This thesis provides a set of tools for managing uncertainty in Web-based models and workflows.To support the use of these tools, this thesis firstly provides a framework for exposing models through Web services. An introduction to uncertainty management, Web service interfaces,and workflow standards and technologies is given, with a particular focus on the geospatial domain.An existing specification for exposing geospatial models and processes, theWeb Processing Service (WPS), is critically reviewed. A processing service framework is presented as a solutionto usability issues with the WPS standard. The framework implements support for Simple ObjectAccess Protocol (SOAP), Web Service Description Language (WSDL) and JavaScript Object Notation (JSON), allowing models to be consumed by a variety of tools and software. Strategies for communicating with models from Web service interfaces are discussed, demonstrating the difficultly of exposing existing models on the Web. This thesis then reviews existing mechanisms for uncertainty management, with an emphasis on emulator methods for building efficient statistical surrogate models. A tool is developed to solve accessibility issues with such methods, by providing a Web-based user interface and backend to ease the process of building and integrating emulators. These tools, plus the processing service framework, are applied to a real case study as part of the UncertWeb project. The usability of the framework is proved with the implementation of aWeb-based workflow for predicting future crop yields in the UK, also demonstrating the abilities of the tools for emulator building and integration. Future directions for the development of the tools are discussed.
Resumo:
Social streams have proven to be the mostup-to-date and inclusive information on cur-rent events. In this paper we propose a novelprobabilistic modelling framework, called violence detection model (VDM), which enables the identification of text containing violent content and extraction of violence-related topics over social media data. The proposed VDM model does not require any labeled corpora for training, instead, it only needs the in-corporation of word prior knowledge which captures whether a word indicates violence or not. We propose a novel approach of deriving word prior knowledge using the relative entropy measurement of words based on the in-tuition that low entropy words are indicative of semantically coherent topics and therefore more informative, while high entropy words indicates words whose usage is more topical diverse and therefore less informative. Our proposed VDM model has been evaluated on the TREC Microblog 2011 dataset to identify topics related to violence. Experimental results show that deriving word priors using our proposed relative entropy method is more effective than the widely-used information gain method. Moreover, VDM gives higher violence classification results and produces more coherent violence-related topics compared toa few competitive baselines.
Resumo:
In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.
Resumo:
Social media data are produced continuously by a large and uncontrolled number of users. The dynamic nature of such data requires the sentiment and topic analysis model to be also dynamically updated, capturing the most recent language use of sentiments and topics in text. We propose a dynamic Joint Sentiment-Topic model (dJST) which allows the detection and tracking of views of current and recurrent interests and shifts in topic and sentiment. Both topic and sentiment dynamics are captured by assuming that the current sentiment-topic-specific word distributions are generated according to the word distributions at previous epochs. We study three different ways of accounting for such dependency information: (1) Sliding window where the current sentiment-topic word distributions are dependent on the previous sentiment-topic-specific word distributions in the last S epochs; (2) skip model where history sentiment topic word distributions are considered by skipping some epochs in between; and (3) multiscale model where previous long- and shorttimescale distributions are taken into consideration. We derive efficient online inference procedures to sequentially update the model with newly arrived data and show the effectiveness of our proposed model on the Mozilla add-on reviews crawled between 2007 and 2011. © 2013 ACM 2157-6904/2013/12-ART5 $ 15.00.
Resumo:
Natural language understanding is to specify a computational model that maps sentences to their semantic mean representation. In this paper, we propose a novel framework to train the statistical models without using expensive fully annotated data. In particular, the input of our framework is a set of sentences labeled with abstract semantic annotations. These annotations encode the underlying embedded semantic structural relations without explicit word/semantic tag alignment. The proposed framework can automatically induce derivation rules that map sentences to their semantic meaning representations. The learning framework is applied on two statistical models, the conditional random fields (CRFs) and the hidden Markov support vector machines (HM-SVMs). Our experimental results on the DARPA communicator data show that both CRFs and HM-SVMs outperform the baseline approach, previously proposed hidden vector state (HVS) model which is also trained on abstract semantic annotations. In addition, the proposed framework shows superior performance than two other baseline approaches, a hybrid framework combining HVS and HM-SVMs and discriminative training of HVS, with a relative error reduction rate of about 25% and 15% being achieved in F-measure.
Resumo:
In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.
Resumo:
Clinical decision support systems (CDSSs) often base their knowledge and advice on human expertise. Knowledge representation needs to be in a format that can be easily understood by human users as well as supporting ongoing knowledge engineering, including evolution and consistency of knowledge. This paper reports on the development of an ontology specification for managing knowledge engineering in a CDSS for assessing and managing risks associated with mental-health problems. The Galatean Risk and Safety Tool, GRiST, represents mental-health expertise in the form of a psychological model of classification. The hierarchical structure was directly represented in the machine using an XML document. Functionality of the model and knowledge management were controlled using attributes in the XML nodes, with an accompanying paper manual for specifying how end-user tools should behave when interfacing with the XML. This paper explains the advantages of using the web-ontology language, OWL, as the specification, details some of the issues and problems encountered in translating the psychological model to OWL, and shows how OWL benefits knowledge engineering. The conclusions are that OWL can have an important role in managing complex knowledge domains for systems based on human expertise without impeding the end-users' understanding of the knowledge base. The generic classification model underpinning GRiST makes it applicable to many decision domains and the accompanying OWL specification facilitates its implementation.