756 resultados para Language Analysis
Resumo:
OBJECTIVES. Oral foreign language skills are an integral part of one's social, academic and professional competence. This can be problematic for those suffering from foreign language communication apprehension (CA), or a fear of speaking a foreign language. CA manifests itself, for example, through feelings of anxiety and tension, physical arousal and avoidance of foreign language communication situations. According to scholars, foreign language CA may impede the language learning process significantly and have detrimental effects on one's language learning, academic achievement and career prospects. Drawing on upper secondary students' subjective experiences of communication situations in English as a foreign language, this study seeks, first, to describe, analyze and interpret why upper secondary students experience English language communication apprehension in English as a foreign language (EFL) classes. Second, this study seeks to analyse what the most anxiety-arousing oral production tasks in EFL classes are, and which features of different oral production tasks arouse English language communication apprehension and why. The ultimate objectives of the present study are to raise teachers' awareness of foreign language CA and its features, manifestations and impacts in foreign language classes as well as to suggest possible ways to minimize the anxiety-arousing features in foreign language classes. METHODS. The data was collected in two phases by means of six-part Likert-type questionnaires and theme interviews, and analysed using both quantitative and qualitative methods. The questionnaire data was collected in spring 2008. The respondents were 122 first-year upper secondary students, 68 % of whom were girls and 31 % of whom were boys. The data was analysed by statistical methods using SPSS software. The theme interviews were conducted in spring 2009. The interviewees were 11 second-year upper secondary students aged 17 to 19, who were chosen by purposeful selection on the basis of their English language CA level measured in the questionnaires. Six interviewees were classified as high apprehensives and five as low apprehensives according to their score in the foreign language CA scale in the questionnaires. The interview data was coded and thematized using the technique of content analysis. The analysis and interpretation of the data drew on a comparison of the self-reports of the highly apprehensive and low apprehensive upper secondary students. RESULTS. The causes of English language CA in EFL classes as reported by the students were both internal and external in nature. The most notable causes were a low self-assessed English proficiency, a concern over errors, a concern over evaluation, and a concern over the impression made on others. Other causes related to a high English language CA were a lack of authentic oral practise in EFL classes, discouraging teachers and negative experiences of learning English, unrealistic internal demands for oral English performance, high external demands and expectations for oral English performance, the conversation partner's higher English proficiency, and the audience's large size and unfamiliarity. The most anxiety-arousing oral production tasks in EFL classes were presentations or speeches with or without notes in front of the class, acting in front of the class, pair debates with the class as audience, expressing thoughts and ideas to the class, presentations or speeches without notes while seated, group debates with the class as audience, and answering to the teacher's questions involuntarily. The main features affecting the anxiety-arousing potential of an oral production task were a high degree of attention, a large audience, a high degree of evaluation, little time for preparation, little linguistic support, and a long duration.
Resumo:
Researchers and developers in academia and industry would benefit from a facility that enables them to easily locate, licence and use the kind of empirical data they need for testing and refining their hypotheses and to deposit and disseminate their data e.g. to support replication and validation of reported scientific experiments. To answer these needs initially in Finland, there is an ongoing project at University of Helsinki and its collaborators to create a user-friendly web service for researchers and developers in Finland and other countries. In our talk, we describe ongoing work to create a palette of extensive but easily available Finnish language resources and technologies for the research community, including lexical resources, wordnets, morphologically tagged corpora, dependency syntactic treebanks and parsebanks, open-source finite state toolkits and libraries and language models to support text analysis and processing at customer site. Also first publicly available results are presented.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.
Resumo:
User authentication is essential for accessing computing resources, network resources, email accounts, online portals etc. To authenticate a user, system stores user credentials (user id and password pair) in system. It has been an interested field problem to discover user password from a system and similarly protecting them against any such possible attack. In this work we show that passwords are still vulnerable to hash chain based and efficient dictionary attacks. Human generated passwords use some identifiable patterns. We have analysed a sample of 19 million passwords, of different lengths, available online and studied the distribution of the symbols in the password strings. We show that the distribution of symbols in user passwords is affected by the native language of the user. From symbol distributions we can build smart and efficient dictionaries, which are smaller in size and their coverage of plausible passwords from Key-space is large. These smart dictionaries make dictionary based attacks practical.
Resumo:
In the past many different methodologies have been devised to support software development and different sets of methodologies have been developed to support the analysis of software artefacts. We have identified this mismatch as one of the causes of the poor reliability of embedded systems software. The issue with software development styles is that they are ``analysis-agnostic.'' They do not try to structure the code in a way that lends itself to analysis. The analysis is usually applied post-mortem after the software was developed and it requires a large amount of effort. The issue with software analysis methodologies is that they do not exploit available information about the system being analyzed.
In this thesis we address the above issues by developing a new methodology, called "analysis-aware" design, that links software development styles with the capabilities of analysis tools. This methodology forms the basis of a framework for interactive software development. The framework consists of an executable specification language and a set of analysis tools based on static analysis, testing, and model checking. The language enforces an analysis-friendly code structure and offers primitives that allow users to implement their own testers and model checkers directly in the language. We introduce a new approach to static analysis that takes advantage of the capabilities of a rule-based engine. We have applied the analysis-aware methodology to the development of a smart home application.
Resumo:
This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.
The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.
The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.
This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.
Resumo:
The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval- 2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques.
Resumo:
In the principles-and-parameters model of language, the principle known as "free indexation'' plays an important part in determining the referential properties of elements such as anaphors and pronominals. This paper addresses two issues. (1) We investigate the combinatorics of free indexation. In particular, we show that free indexation must produce an exponential number of referentially distinct structures. (2) We introduce a compositional free indexation algorithm. We prove that the algorithm is "optimal.'' More precisely, by relating the compositional structure of the formulation to the combinatorial analysis, we show that the algorithm enumerates precisely all possible indexings, without duplicates.
Resumo:
The computer science technique of computational complexity analysis can provide powerful insights into the algorithm-neutral analysis of information processing tasks. Here we show that a simple, theory-neutral linguistic model of syntactic agreement and ambiguity demonstrates that natural language parsing may be computationally intractable. Significantly, we show that it may be syntactic features rather than rules that can cause this difficulty. Informally, human languages and the computationally intractable Satisfiability (SAT) problem share two costly computional mechanisms: both enforce agreement among symbols across unbounded distances (Subject-Verb agreement) and both allow ambiguity (is a word a Noun or a Verb?).
Resumo:
The STUDENT problem solving system, programmed in LISP, accepts as input a comfortable but restricted subset of English which can express a wide variety of algebra story problems. STUDENT finds the solution to a large class of these problems. STUDENT can utilize a store of global information not specific to any one problem, and may make assumptions about the interpretation of ambiguities in the wording of the problem being solved. If it uses such information or makes any assumptions, STUDENT communicates this fact to the user. The thesis includes a summary of other English language questions-answering systems. All these systems, and STUDENT, are evaluated according to four standard criteria. The linguistic analysis in STUDENT is a first approximation to the analytic portion of a semantic theory of discourse outlined in the thesis. STUDENT finds the set of kernel sentences which are the base of the input discourse, and transforms this sequence of kernel sentences into a set of simultaneous equations which form the semantic base of the STUDENT system. STUDENT then tries to solve this set of equations for the values of requested unknowns. If it is successful it gives the answers in English. If not, STUDENT asks the user for more information, and indicates the nature of the desired information. The STUDENT system is a first step toward natural language communication with computers. Further work on the semantic theory proposed should result in much more sophisticated systems.
Resumo:
Jackson, R. (2007). Language, Policy and the Construction of a Torture Culture in the War on Terrorism. Review of International Studies. 33(3), pp.353-371 RAE2008
Resumo:
Watt, D. (2003). Amoral Gower: Language, Sex and Politics. Medievil Cultures Series, volume 38. Minneapolis: University of Minnesota Press. RAE2008
Resumo:
This study investigated the consistency of a measure of integrative motivation in the prediction of achievement in English as a foreign language in 18 samples of Polish school students. The results are shown to have implications for concerns expressed that integrative motivation might not be appropriate to the acquisition of English because it is a global language and moreover that other factors such as the gender of the student or the environment of the class might also influence its predictability. Results of a hierarchical linear modeling analysis indicated that for the older samples, integrative motivation was a consistent predictor of grades in English, unaffected by either the gender of the student or class environment acting as covariates. Comparable results were obtained for the younger samples except that student gender also contributed to the prediction of grades in English. Examination of the correlations of the elements of the integrative motivation score with English grades demonstrated that the aggregate score is the more consistent correlate from sample to sample than the elements themselves. Such results lead to the hypothesis that integrative motivation is a multi-dimensional construct and different aspects of the motivational complex come into play for each individual. That is, two individuals can hold the same level of integrative motivation and thus attain the same level of achievement but one might be higher in some elements and lower in others than another individual, resulting in consistent correlations of the aggregate but less so for the elements.