929 resultados para second language processing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Le présent article poursuit un double objectif. Le premier est de rendre compte de l’importance du concept de bilittératie pour la recherche et la pratique en éducation. Le second est de susciter la réflexion sur les différents enjeux didactiques et pédagogiques liés à la bilittératie chez les jeunes élèves allophones à Montréal. À cet effet, deux modèles complémentaires qui font partie de la matrice théorique de notre projet de recherche seront abordés. Il s’agit du modèle de la compétence sous-jacente commune de Cummins (2008, 1991, 1981, 1979), ainsi que du modèle des continuums de Hornberger (2004, 2003). Le texte illustre la nécessité de reconsidérer la pratique d’enseignement du français – langue de scolarisation au Québec – vu la réalité sociolinguistique dans laquelle évoluent les élèves allophones.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Les adultes peuvent éprouver des difficultés à discriminer des phonèmes d’une langue seconde (L2) qui ne servent pas à distinguer des items lexicaux dans leur langue maternelle (L1). Le Feature Model (FM) de Brown (1998) propose que les adultes peuvent réussir à créer des nouvelles catégories de sons seulement si celles-ci peuvent être construites à partir de traits distinctifs existant dans la L1 des auditeurs. Cette hypothèse a été testée sur plusieurs contrastes consonantiques dans différentes langues; cependant, il semble que les traits qui s’appliquent sur les voyelles n’aient jamais été examinés dans cette perspective et encore moins les traits qui opèrent à la fois dans les systèmes vocalique et consonantique et qui peuvent avoir un statut distinctif ou non-distinctif. Le principal objectif de la présente étude était de tester la validité du FM concernant le contraste vocalique oral-nasal du portugais brésilien (PB). La perception naïve du contraste /i/-/ĩ/ par des locuteurs du français, de l’anglais, de l’espagnol caribéen et de l’espagnol conservateur a été examinée, étant donné que ces quatre langues diffèrent en ce qui a trait au statut de la nasalité. De plus, la perception du contraste non-naïf /e/-/ẽ/ a été inclus afin de comparer les performances dans la perception naïve et non-naïve. Les résultats obtenus pour la discrimination naïve de /i/-/ĩ/ a permis de tirer les conclusions suivantes pour la première exposition à un contraste non natif : (1) le trait [nasal] qui opère de façon distinctive dans la grammaire d’une certaine L1 peut être redéployé au sein du système vocalique, (2) le trait [nasal] qui opère de façon distinctive dans la grammaire d’une certaine L1 ne peut pas être redéployé à travers les systèmes (consonne à voyelle) et (3) le trait [nasal] qui opère de façon non-distinctive dans la grammaire d’une certaine L1 peut être ou ne pas être redéployé au statut distinctif. En dernier lieu, la discrimination non-naïve de /e/-/ẽ/ a été réussie par tous les groupes, suggérant que les trois types de redéploiement s’avèrent possibles avec plus d’expérience dans la L2.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents the design and development of a frame based approach for speech to sign language machine translation system in the domain of railways and banking. This work aims to utilize the capability of Artificial intelligence for the improvement of physically challenged, deaf-mute people. Our work concentrates on the sign language used by the deaf community of Indian subcontinent which is called Indian Sign Language (ISL). Input to the system is the clerk’s speech and the output of this system is a 3D virtual human character playing the signs for the uttered phrases. The system builds up 3D animation from pre-recorded motion capture data. Our work proposes to build a Malayalam to ISL

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The goal of this work is to develop an Open Agent Architecture for Multilingual information retrieval from Relational Database. The query for information retrieval can be given in plain Hindi or Malayalam; two prominent regional languages of India. The system supports distributed processing of user requests through collaborating agents. Natural language processing techniques are used for meaning extraction from the plain query and information is given back to the user in his/ her native language. The system architecture is designed in a structured way so that it can be adapted to other regional languages of India

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Das Forschungsfeld der vorliegenden Arbeit sind die Deutsch als Zweitsprache-Kurse, die im Zeitraum der Untersuchung (2002) noch vom Sprachverband Deutsch (vormals: für ausländische Arbeitnehmer) unterstützt wurden. Da sich mit diesem wichtigen und breiten Anwendungsgebiet der Fremdsprachendidaktik bisher nur wenige Studien beschäftigt haben, ist als Forschungsansatz eine explorativ-qualitative Herangehensweise gewählt worden. Die Kurse für erwachsene Einwanderer zeichnen sich durch eine große Heterogenität der Teilnehmenden aus, dementsprechend ist die zentrale Fragestellung der Studie, in der das professionelle Handlungswissen von Lehrenden erforscht wird, die Frage der Binnendifferenzierung. Ausgehend von bereits seit den siebziger Jahren des 20. Jahrhunderts vorliegenden allgemeindidaktischen Entwürfen zur Arbeit mit heterogenen Lerngruppen, in denen das Prinzip der Binnendifferenzierung entwickelt wird, werden im ersten Teil der Arbeit didaktische Möglichkeiten der Binnendifferenzierung im Deutsch als Zweitsprache-Unterricht entworfen. Ausgehend von diesem Vorverständnis ist dann die Befragung der Lehrenden durchgeführt worden, die im zweiten Teil der Arbeit dargestellt, ausgewertet und diskutiert wird. Dabei geht es nicht um eine Evaluation der Praxis anhand vorgefasster Kategorien, sondern im Gegenteil um eine explorative Erforschung des Problembereiches der Arbeit mit heterogenen Lerngruppen im Deutsch als Zweitsprache Unterricht. Anhand der am Material entwickelten Kategorien werden zentrale didaktische Gesichtspunkte herausgearbeitet, die charakteristisch für das Forschungsfeld Deutsch als Zweitsprache mit erwachsenen Einwanderern sind. Diese Kategorien sind nicht deckungsgleich mit denen, die durch die hermeneutisch orientierte Vorgehensweise im ersten Teil der Arbeit entwickelt werden konnten. Anhand dieser Diskrepanz wird das Theorie-Praxis-Verhältnis der didaktischen Forschung und Lehre aufgeschlüsselt und kritisch betrachtet. Ausblick der Arbeit ist der Verweis auf die Professionalisierungsdebatte und die Notwendigkeit einer praxisbezogenen Forschung, welche die Bedürfnisse von Lehrenden direkt einbezieht und im Sinne einer Aktionsforschung gleichzeitig zur Weiterbildung der Lehrenden beiträgt. Nur auf diesem Weg kann die Unterrichtspraxis unmittelbar weiter entwickelt werden. Aus der vorliegenden Studie ergeben sich viel versprechende Anknüpfungspunkte für kooperative Aktionsforschungsprojekte, die von den Lehrenden in den Interviews angeregt werden.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Ontologies have been established for knowledge sharing and are widely used as a means for conceptually structuring domains of interest. With the growing usage of ontologies, the problem of overlapping knowledge in a common domain becomes critical. In this short paper, we address two methods for merging ontologies based on Formal Concept Analysis: FCA-Merge and ONTEX. --- FCA-Merge is a method for merging ontologies following a bottom-up approach which offers a structural description of the merging process. The method is guided by application-specific instances of the given source ontologies. We apply techniques from natural language processing and formal concept analysis to derive a lattice of concepts as a structural result of FCA-Merge. The generated result is then explored and transformed into the merged ontology with human interaction. --- ONTEX is a method for systematically structuring the top-down level of ontologies. It is based on an interactive, top-down- knowledge acquisition process, which assures that the knowledge engineer considers all possible cases while avoiding redundant acquisition. The method is suited especially for creating/merging the top part(s) of the ontologies, where high accuracy is required, and for supporting the merging of two (or more) ontologies on that level.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis aims at empowering software customers with a tool to build software tests them selves, based on a gradual refinement of natural language scenarios into executable visual test models. The process is divided in five steps: 1. First, a natural language parser is used to extract a graph of grammatical relations from the textual scenario descriptions. 2. The resulting graph is transformed into an informal story pattern by interpreting structurization rules based on Fujaba Story Diagrams. 3. While the informal story pattern can already be used by humans the diagram still lacks technical details, especially type information. To add them, a recommender based framework uses web sites and other resources to generate formalization rules. 4. As a preparation for the code generation the classes derived for formal story patterns are aligned across all story steps, substituting a class diagram. 5. Finally, a headless version of Fujaba is used to generate an executable JUnit test. The graph transformations used in the browser application are specified in a textual domain specific language and visualized as story pattern. Last but not least, only the heavyweight parsing (step 1) and code generation (step 5) are executed on the server side. All graph transformation steps (2, 3 and 4) are executed in the browser by an interpreter written in JavaScript/GWT. This result paves the way for online collaboration between global teams of software customers, IT business analysts and software developers.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

During the past few years, there has been much discussion of a shift from rule-based systems to principle-based systems for natural language processing. This paper outlines the major computational advantages of principle-based parsing, its differences from the usual rule-based approach, and surveys several existing principle-based parsing systems used for handling languages as diverse as Warlpiri, English, and Spanish, as well as language translation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Machine translation has been a particularly difficult problem in the area of Natural Language Processing for over two decades. Early approaches to translation failed since interaction effects of complex phenomena in part made translation appear to be unmanageable. Later approaches to the problem have succeeded (although only bilingually), but are based on many language-specific rules of a context-free nature. This report presents an alternative approach to natural language translation that relies on principle-based descriptions of grammar rather than rule-oriented descriptions. The model that has been constructed is based on abstract principles as developed by Chomsky (1981) and several other researchers working within the "Government and Binding" (GB) framework. Thus, the grammar is viewed as a modular system of principles rather than a large set of ad hoc language-specific rules.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at http://teach-computers.org/learner.html). Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Los movimientos migratorios han crecido espectacularmente durante los últimos años y la escuela en nuestro país se ha visto afectada por los cambios demográficos ocurridos durante la última década. Una gran parte del alumnado presente en las aulas de nuestro sistema educativo está escolarizado en programas de cambio de lengua del hogar a la escuela que no cumplen los requisitos de la inmersión lingüística. Dada la gran diversidad de lenguas existentes, el sistema educativo no se puede organizar según los parámetros de la educación bilingüe. Esto no significa que dicho alumnado esté condenado al fracaso escolar: desde la práctica educativa y la modificación de la organización escolar existen soluciones para que todo el alumnado progrese a lo largo de la enseñanza obligatoria. El artículo analiza las condiciones implicadas en una práctica educativa que facilite el aprendizaje de la lengua de la escuela. Asimismo, sugerimos algunos criterios para la evaluación de este alumnado

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Real-time geoparsing of social media streams (e.g. Twitter, YouTube, Instagram, Flickr, FourSquare) is providing a new 'virtual sensor' capability to end users such as emergency response agencies (e.g. Tsunami early warning centres, Civil protection authorities) and news agencies (e.g. Deutsche Welle, BBC News). Challenges in this area include scaling up natural language processing (NLP) and information retrieval (IR) approaches to handle real-time traffic volumes, reducing false positives, creating real-time infographic displays useful for effective decision support and providing support for trust and credibility analysis using geosemantics. I will present in this seminar on-going work by the IT Innovation Centre over the last 4 years (TRIDEC and REVEAL FP7 projects) in building such systems, and highlights our research towards improving trustworthy and credible of crisis map displays and real-time analytics for trending topics and influential social networks during major news worthy events.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Title: Data-Driven Text Generation using Neural Networks Speaker: Pavlos Vougiouklis, University of Southampton Abstract: Recent work on neural networks shows their great potential at tackling a wide variety of Natural Language Processing (NLP) tasks. This talk will focus on the Natural Language Generation (NLG) problem and, more specifically, on the extend to which neural network language models could be employed for context-sensitive and data-driven text generation. In addition, a neural network architecture for response generation in social media along with the training methods that enable it to capture contextual information and effectively participate in public conversations will be discussed. Speaker Bio: Pavlos Vougiouklis obtained his 5-year Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki in 2013. He was awarded an MSc degree in Software Engineering from the University of Southampton in 2014. In 2015, he joined the Web and Internet Science (WAIS) research group of the University of Southampton and he is currently working towards the acquisition of his PhD degree in the field of Neural Network Approaches for Natural Language Processing. Title: Provenance is Complicated and Boring — Is there a solution? Speaker: Darren Richardson, University of Southampton Abstract: Paper trails, auditing, and accountability — arguably not the sexiest terms in computer science. But then you discover that you've possibly been eating horse-meat, and the importance of provenance becomes almost palpable. Having accepted that we should be creating provenance-enabled systems, the challenge of then communicating that provenance to casual users is not trivial: users should not have to have a detailed working knowledge of your system, and they certainly shouldn't be expected to understand the data model. So how, then, do you give users an insight into the provenance, without having to build a bespoke system for each and every different provenance installation? Speaker Bio: Darren is a final year Computer Science PhD student. He completed his undergraduate degree in Electronic Engineering at Southampton in 2012.