Biblioteca Digital

58 resultados para Language-based Editor

em Aston University Research Archive

The use of deterministic parsers on sublanguage for machine translation

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For more than forty years, research has been on going in the use of the computer in the processing of natural language. During this period methods have evolved, with various parsing techniques and grammars coming to prominence. Problems still exist, not least in the field of Machine Translation. However, one of the successes in this field is the translation of sublanguage. The present work reports Deterministic Parsing, a relatively new parsing technique, and its application to the sublanguage of an aircraft maintenance manual for Machine Translation. The aim has been to investigate the practicability of using Deterministic Parsers in the analysis stage of a Machine Translation system. Machine Translation, Sublanguage and parsing are described in general terms with a review of Deterministic parsing systems, pertinent to this research, being presented in detail. The interaction between machine Translation, Sublanguage and Parsing, including Deterministic parsing, is also highlighted. Two types of Deterministic Parser have been investigated, a Marcus-type parser, based on the basic design of the original Deterministic parser (Marcus, 1980) and an LR-type Deterministic Parser for natural language, based on the LR parsing algorithm. In total, four Deterministic Parsers have been built and are described in the thesis. Two of the Deterministic Parsers are prototypes from which the remaining two parsers to be used on sublanguage have been developed. This thesis reports the results of parsing by the prototypes, a Marcus-type parser and an LR-type parser which have a similar grammatical and linguistic range to the original Marcus parser. The Marcus-type parser uses a grammar of production rules, whereas the LR-type parser employs a Definite Clause Grammar(DGC).

Disciplinary talk:a systemic functional exploration of university seminar discussions

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Despite the growth of spoken academic corpora in recent years, relatively little is known about the language of seminar discussions in higher education. This thesis compares seminar discussions across three disciplinary areas. The aim of this thesis is to uncover the functions and patterns of talk used in different disciplinary discussions and to highlight language on a macro and micro level that would be useful for materials design and teaching purposes. A framework for identifying and analysing genres in spoken language based on Hallidayan Systemic Functional Linguistics (SFL) is used. Stretches of talk sharing a similar purpose and predictable functional staging, termed Discussion Macro Genres (DMGs) are identified. Language is compared across DMGs and across disciplines through use of corpus techniques in conjunction with SFL genre theory. Data for the study comprises just over 180,000 tokens and is drawn from the British Academic Spoken English corpus (BASE), recorded at two universities in the UK. The discipline areas investigated are Arts and Humanities, Social Sciences and Physical Sciences. Findings from this study make theoretical, empirical and methodological contributions to the field of spoken EAP. The empirical findings are firstly, that the majority of the seminar discussion can be assigned to one of the three main DMG in the corpus: Responding, Debating and Problem Solving. Secondly, it characterises each discipline area according to two DMGs. Thirdly, the majority of the discussion is non-oppositional in nature, suggesting that ‘debate’ is not the only form of discussion that students need to be prepared for. Finally, while some characteristics of the discussion are tied to the DMG and common across disciplines, others are discipline specific. On a theoretical level, this study shows that an SFL genre model for investigating spoken discourse can be successfully extended to investigate longer stretches of discourse than have previously been identified. The methodological contribution is to demonstrate how corpus techniques can be combined with SFL genre theory to investigate extended stretches of spoken discussion. The thesis will be of value to those working in the field of teaching spoken EAP/ ESAP as well as to materials developers.

The usability of semantic search tools:a review

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The goal of semantic search is to improve on traditional search methods by exploiting the semantic metadata. In this paper, we argue that supporting iterative and exploratory search modes is important to the usability of all search systems. We also identify the types of semantic queries the users need to make, the issues concerning the search environment and the problems that are intrinsic to semantic search in particular. We then review the four modes of user interaction in existing semantic search systems, namely keyword-based, form-based, view-based and natural language-based systems. Future development should focus on multimodal search systems, which exploit the advantages of more than one mode of interaction, and on developing the search systems that can search heterogeneous semantic metadata on the open semantic Web.

What drives the prediction of early reading? An analysis of stimulus and response-type

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose: Phonological accounts of reading implicate three aspects of phonological awareness tasks that underlie the relationship with reading; a) the language-based nature of the stimuli (words or nonwords), b) the verbal nature of the response, and c) the complexity of the stimuli (words can be segmented into units of speech). Yet, it is uncertain which task characteristics are most important as they are typically confounded. By systematically varying response-type and stimulus complexity across speech and non-speech stimuli, the current study seeks to isolate the characteristics of phonological awareness tasks that drive the prediction of early reading. Method: Four sets of tasks were created; tone stimuli (simple non-speech) requiring a non-verbal response, phonemes (simple speech) requiring a non-verbal response, phonemes requiring a verbal response, and nonwords (complex speech) requiring a verbal response. Tasks were administered to 570 2nd grade children along with standardized tests of reading and non-verbal IQ. Results: Three structural equation models comparing matched sets of tasks were built. Each model consisted of two 'task' factors with a direct link to a reading factor. The following factors predicted unique variance in reading: a) simple speech and non-speech stimuli, b) simple speech requiring a verbal response but not simple speech requiring a non-verbal-response, and c) complex and simple speech stimuli. Conclusions: Results suggest that the prediction of reading by phonological tasks is driven by the verbal nature of the response and not the complexity or 'speechness' of the stimuli. Findings highlight the importance of phonological output processes to early reading.

A unification-based natural language interface to a database.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An implementation of a Lexical Functional Grammar (LFG) natural language front-end to a database is presented, and its capabilities demonstrated by reference to a set of queries used in the Chat-80 system. The potential of LFG for such applications is explored. Other grammars previously used for this purpose are briefly reviewed and contrasted with LFG. The basic LFG formalism is fully described, both as to its syntax and semantics, and the deficiencies of the latter for database access application shown. Other current LFG implementations are reviewed and contrasted with the LFG implementation developed here specifically for database access. The implementation described here allows a natural language interface to a specific Prolog database to be produced from a set of grammar rule and lexical specifications in an LFG-like notation. In addition to this the interface system uses a simple database description to compile metadata about the database for later use in planning the execution of queries. Extensions to LFG's semantic component are shown to be necessary to produce a satisfactory functional analysis and semantic output for querying a database. A diverse set of natural language constructs are analysed using LFG and the derivation of Prolog queries from the F-structure output of LFG is illustrated. The functional description produced from LFG is proposed as sufficient for resolving many problems of quantification and attachment.

The impact of the language barrier on the management of multinational companies : a case study and survey, based exploration of the impact of the language barrier on the strategies, policies and systems by which multinational companies manage their subsidaries

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The thesis begins with a conceptual model of the way that language diversity affects the strategies, organisation and subsidiary control policies of multinational companies. The model is based solely on the researcher'’ personal experience of working in a variety of international management roles, but in Chapter 2 a wide-ranging review of related academic literature finds evidence to support the key ideas. The model is developed as a series of propositions which are tested in a comparative case study, refined and then re-tested in a global survey of multinational subsidiaries. The principal findings of the empirical phases of the thesis endorse the main tenets of the model: - That language difference between parent and subsidiary will impair communication, create mistrust and impede relationship development. - That subsequently the feelings of uncertainty, suspicion and mistrust will influence the decisions taken by the parent company. - They will have heightened sensitivity to language issues and will implement policies to manage language differences. - They will adopt low-risk strategies in host countries where they are concerned about language difference. - They will use organisational and manpower strategies to minimise the consequences and risks of the communications problems with the subsidiary. - As a consequence the level of integration and knowledge flow between parent and subsidiary will be curtailed. - They will adopt styles of control that depend least on their ability to communicate with their subsidiary. Although there is adequate support for all of the above conclusions, on some key points the evidence of the Case Studies and Survey is contradictory. The thesis, therefore, closes with an agenda for further research that would address these inconsistencies.

Supporting English language teachers doing further degrees at a distance:a web based strategy researched

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis explores how the world-wide-web can be used to support English language teachers doing further studies at a distance. The future of education worldwide is moving towards a requirement that we, as teacher educators, use the latest web technology not as a gambit, but as a viable tool to improve learning. By examining the literature on knowledge, teacher education and web training, a model of teacher knowledge development, along with statements of advice for web developers based upon the model are developed. Next, the applicability and viability of both the model and statements of advice are examined by developing a teacher support site (bttp://www. philseflsupport. com) according to these principles. The data collected from one focus group of users from sixteen different countries, all studying on the same distance Masters programme, is then analysed in depth. The outcomes from the research are threefold: A functioning website that is averaging around 15, 000 hits a month provides a professional contribution. An expanded model of teacher knowledge development that is based upon five theoretical principles that reflect the ever-expanding cyclical nature of teacher learning provides an academic contribution. A series of six statements of advice for developers of teacher support sites. These statements are grounded in the theoretical principles behind the model of teacher knowledge development and incorporate nine keys to effective web facilitation. Taken together, they provide a forward-looking contribution to the praxis of web supported teacher education, and thus to the potential dissemination of the research presented here. The research has succeeded in reducing the proliferation of terminology in teacher knowledge into a succinct model of teacher knowledge development. The model may now be used to further our understanding of how teachers learn and develop as other research builds upon the individual study here. NB: Appendix 4 is only available only available for consultation at Aston University Library with prior arrangement.

Natural language processing as a foundation of the semantic Web

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main argument of this paper is that Natural Language Processing (NLP) does, and will continue to, underlie the Semantic Web (SW), including its initial construction from unstructured sources like the World Wide Web (WWW), whether its advocates realise this or not. Chiefly, we argue, such NLP activity is the only way up to a defensible notion of meaning at conceptual levels (in the original SW diagram) based on lower level empirical computations over usage. Our aim is definitely not to claim logic-bad, NLP-good in any simple-minded way, but to argue that the SW will be a fascinating interaction of these two methodologies, again like the WWW (which has been basically a field for statistical NLP research) but with deeper content. Only NLP technologies (and chiefly information extraction) will be able to provide the requisite RDF knowledge stores for the SW from existing unstructured text databases in the WWW, and in the vast quantities needed. There is no alternative at this point, since a wholly or mostly hand-crafted SW is also unthinkable, as is a SW built from scratch and without reference to the WWW. We also assume that, whatever the limitations on current SW representational power we have drawn attention to here, the SW will continue to grow in a distributed manner so as to serve the needs of scientists, even if it is not perfect. The WWW has already shown how an imperfect artefact can become indispensable.

Token codeswitching and language alternation in narrative discourse: a functional-pragmatic approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study is concerned with two phenomena of language alternation in biographic narrations in Yiddish and Low German, based on spoken language data recorded between 1988 and 1995. In both phenomena language alternation serves as an additional communicative tool which can be applied by bilingual speakers to enlarge their set of interactional devices in order to ensure a smoother or more pointed processing of communicative aims. The first phenomenon is a narrative strategy I call Token Cod-eswitching: In a bilingual narrative culminating in a line of reported speech, a single element of L2 indicates the original language of the reconstructed dialogue – a token for a quote. The second phenomenon has to do with directing procedures, carried out by the speaker and aimed at guiding the hearer's attention, which are frequently carried out in L2, supporting the hearer's attention at crucial points in the interaction. Both phenomena are analyzed following a model of narrative discourse as proposed in the framework of Functional Pragmatics. The model allows the adoption of an integral approach to previous findings in code-switching research.

Die Sprache der Auricher Juden. Zur Rekonstruktion westjiddischer Sprachreste in Ostfriesland:to reconstruct westjiddischer language residues in Ostfriesland

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Western Yiddish, the spoken language of the traditional Jewish society in the German- and Dutch-speaking countries, was abandoned by its speakers at the end of the 18th in favour of the emerging standard varieties: Dutch and German, respectively. Remnants of Western Yiddish varieties, however, remained a medium of discourse in remote provinces and could be found well into the 19th and sometimes the 20th century in some South-western areas of Germany and Switzerland, the Alsace, some areas of the Netherlands and in parts of the German province of Westphalia. It appears that rural Jewish communities sometimes preserved in-group vernaculars, which were based on Western Yiddish. Sources discovered in 2004 in the town of Aurich prove that Jews living in East Frisia, a Low-German speaking peninsula in the North-west of Germany, used a variety based on Western Yiddish until the Second World War. It appears that until the Holocaust a number of small, close-knit Jewish communities East Frisia, which depended economically mainly on cattle-trading and butchery, kept certain specific cultural features, among them the vernacular which they spoke alongside Low German and Standard German. The sources consist of two amateur theatre plays, a memoir and two word lists written in 1902, 1928 and the 1980s, respectively. In the monograph these sources are documented and annotated as well as analyzed linguistically against the background of rural Jewish life in Northern Germany. The study focuses on traces of language contact with Low German, processes of language change and on the question of the function of the variety in day-to-day life in a rural Jewish community.

Modeling geometric rules in object based models:an XML / GML approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most object-based approaches to Geographical Information Systems (GIS) have concentrated on the representation of geometric properties of objects in terms of fixed geometry. In our road traffic marking application domain we have a requirement to represent the static locations of the road markings but also enforce the associated regulations, which are typically geometric in nature. For example a give way line of a pedestrian crossing in the UK must be within 1100-3000 mm of the edge of the crossing pattern. In previous studies of the application of spatial rules (often called 'business logic') in GIS emphasis has been placed on the representation of topological constraints and data integrity checks. There is very little GIS literature that describes models for geometric rules, although there are some examples in the Computer Aided Design (CAD) literature. This paper introduces some of the ideas from so called variational CAD models to the GIS application domain, and extends these using a Geography Markup Language (GML) based representation. In our application we have an additional requirement; the geometric rules are often changed and vary from country to country so should be represented in a flexible manner. In this paper we describe an elegant solution to the representation of geometric rules, such as requiring lines to be offset from other objects. The method uses a feature-property model embraced in GML 3.1 and extends the possible relationships in feature collections to permit the application of parameterized geometric constraints to sub features. We show the parametric rule model we have developed and discuss the advantage of using simple parametric expressions in the rule base. We discuss the possibilities and limitations of our approach and relate our data model to GML 3.1. © 2006 Springer-Verlag Berlin Heidelberg.

A corpus-based investigation of junk emails

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Almost everyone who has an email account receives from time to time unwanted emails. These emails can be jokes from friends or commercial product offers from unknown people. In this paper we focus on these unwanted messages which try to promote a product or service, or to offer some “hot” business opportunities. These messages are called junk emails. Several methods to filter junk emails were proposed, but none considers the linguistic characteristics of junk emails. In this paper, we investigate the linguistic features of a corpus of junk emails, and try to decide if they constitute a distinct genre. Our corpus of junk emails was build from the messages received by the authors over a period of time. Initially, the corpus consisted of 1563, but after eliminating the duplications automatically we kept only 673 files, totalising just over 373,000 tokens. In order to decide if the junk emails constitute a different genre, a comparison with a corpus of leaflets extracted from BNC and with the whole BNC corpus is carried out. Several characteristics at the lexical and grammatical levels were identified.

Morphology in autism spectrum disorders:local processing bias and language

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We conducted a detailed study of a case of linguistic talent in the context of autism spectrum disorder, specifically Asperger syndrome. I.A. displays language strengths at the level of morphology and syntax. Yet, despite this grammar advantage, processing of figurative language and inferencing based on context presents a problem for him. The morphology advantage for I.A. is consistent with the weak central coherence (WCC) account of autism. From this account, the presence of a local processing bias is evident in the ways in which autistic individuals solve common problems, such as assessing similarities between objects and finding common patterns, and may therefore provide an advantage in some cognitive tasks compared to typical individuals. We extend the WCC account to language and provide evidence for a connection between the local processing bias and the acquisition of morphology and grammar.

A statistical method for the identification and aggregation of regional linguistic variation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a method for the analysis of regional linguistic variation. The method identifies individual and common patterns of spatial clustering in a set of linguistic variables measured over a set of locations based on a combination of three statistical techniques: spatial autocorrelation, factor analysis, and cluster analysis. To demonstrate how to apply this method, it is used to analyze regional variation in the values of 40 continuously measured, high-frequency lexical alternation variables in a 26-million-word corpus of letters to the editor representing 206 cities from across the United States.

A regional analysis of contraction rate in written Standard American English

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goal of this study is to determine if various measures of contraction rate are regionally patterned in written Standard American English. In order to answer this question, this study employs a corpus-based approach to data collection and a statistical approach to data analysis. Based on a spatial autocorrelation analysis of the values of eleven measures of contraction across a 25 million word corpus of letters to the editor representing the language of 200 cities from across the contiguous United States, two primary regional patterns were identified: easterners tend to produce relatively few standard contractions (not contraction, verb contraction) compared to westerners, and northeasterners tend to produce relatively few non-standard contractions (to contraction, non-standard not contraction) compared to southeasterners. These findings demonstrate that regional linguistic variation exists in written Standard American English and that regional linguistic variation is more common than is generally assumed.

«
1
2
3
4
»