617 resultados para dictionaries


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The present work is an empirical investigation into the lq`reference skills' of Pakistani learners and their language needs on semantic, phonetic, lexical and pragmatic levels in the dictionary. The introductory chapter discusses the relatively problematic nature of lexis in comparison with the other aspects in EFL learning and spells out the aim of this study. Chapter two provides an analytical survey of the various types of research undertaken in different contexts of the dictionary and explains the eclectic approach adopted in the present work. Chapter three studies the `reference skills' of this category of learners in the background of highly sophisticated information structure of learners' dictionaries under evaluation and suggests some measures for improvement in this context. Chapter four considers various criteria, eg. pedagogic, linguistic and sociolinguistic for determining the macro-structure of learner's dictionary with a focus on specific Ll speakers. Chapter five is concerned with various aspects of the semantic information provided in the dictionaries matched against the needs of Pakistani learners with regard to both comprehension and production. The type, scale and presentation of grammatical information in the dictionary is analysed in chapter six with the object of discovering their role and utility for the learner. Chapter seven explores the rationale for providing phonological information, the extent to which this guidance is vital and the problems of phonetic symbols employed in the dictionaries. Chapter eight brings into perspective the historical background of English-Urdu bilingual lexicography and evalutes the currently popular bilingual dictionaries among the student community, with the aim of discovering the extent to which they have taken account of the modern tents of lexicography and investigating their validity as a useful reference tool in the learning of English language. The final chapter concludes the findings of individual aspects in a coherent fashion to assess the viability of the original hypothesis that learners' dictionaries if compiled with a specific set of users in mind would be more useful.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

University students encounter difficulties with academic English because of its vocabulary, phraseology, and variability, and also because academic English differs in many respects from general English, the language which they have experienced before starting their university studies. Although students have been provided with many dictionaries that contain some helpful information on words used in academic English, these dictionaries remain focused on the uses of words in general English. There is therefore a gap in the dictionary market for a dictionary for university students, and this thesis provides a proposal for such a dictionary (called the Dictionary of Academic English; DOAE) in the form of a model which depicts how the dictionary should be designed, compiled, and offered to students. The model draws on state-of-the-art techniques in lexicography, dictionary-use research, and corpus linguistics. The model demanded the creation of a completely new corpus of academic language (Corpus of Academic Journal Articles; CAJA). The main advantages of the corpus are its large size (83.5 million words) and balance. Having access to a large corpus of academic language was essential for a corpus-driven approach to data analysis. A good corpus balance in terms of domains enabled a detailed domain-labelling of senses, patterns, collocates, etc. in the dictionary database, which was then used to tailor the output according to the needs of different types of student. The model proposes an online dictionary that is designed as an online dictionary from the outset. The proposed dictionary is revolutionary in the way it addresses the needs of different types of student. It presents students with a dynamic dictionary whose contents can be customised according to the user's native language, subject of study, variant spelling preferences, and/or visual preferences (e.g. black and white).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper discusses three important aspects of John Sinclair’s legacy: the corpus, lexicography, and the notion of ‘corpus-driven’. The corpus represents his concern with the nature of linguistic evidence. Lexicography is for him the canonical mode of language description at the lexical level. And his belief that the corpus should ‘drive’ the description is reflected in his constant attempts to utilize the emergent computer technologies to automate the initial stages of analysis and defer the intuitive, interpretative contributions of linguists to increasingly later stages in the process. Sinclair’s model of corpus-driven lexicography has spread far beyond its initial implementation at Cobuild, to most EFL dictionaries, to native-speaker dictionaries (e.g. the New Oxford Dictionary of English, and many national language dictionaries in emerging or re-emerging speech communities) and bilingual dictionaries (e.g. Collins, Oxford-Hachette).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A property of sparse representations in relation to their capacity for information storage is discussed. It is shown that this feature can be used for an application that we term Encrypted Image Folding. The proposed procedure is realizable through any suitable transformation. In particular, in this paper we illustrate the approach by recourse to the Discrete Cosine Transform and a combination of redundant Cosine and Dirac dictionaries. The main advantage of the proposed technique is that both storage and encryption can be achieved simultaneously using simple processing steps.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sparse representation of astronomical images is discussed. It is shown that a significant gain in sparsity is achieved when particular mixed dictionaries are used for approximating these types of images with greedy selection strategies. Experiments are conducted to confirm (i) the effectiveness at producing sparse representations and (ii) competitiveness, with respect to the time required to process large images. The latter is a consequence of the suitability of the proposed dictionaries for approximating images in partitions of small blocks. This feature makes it possible to apply the effective greedy selection technique called orthogonal matching pursuit, up to some block size. For blocks exceeding that size, a refinement of the original matching pursuit approach is considered. The resulting method is termed "self-projected matching pursuit," because it is shown to be effective for implementing, via matching pursuit itself, the optional backprojection intermediate steps in that approach. © 2013 Optical Society of America.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Semantic Web relies on carefully structured, well defined, data to allow machines to communicate and understand one another. In many domains (e.g. geospatial) the data being described contains some uncertainty, often due to incomplete knowledge; meaningful processing of this data requires these uncertainties to be carefully analysed and integrated into the process chain. Currently, within the SemanticWeb there is no standard mechanism for interoperable description and exchange of uncertain information, which renders the automated processing of such information implausible, particularly where error must be considered and captured as it propagates through a processing sequence. In particular we adopt a Bayesian perspective and focus on the case where the inputs / outputs are naturally treated as random variables. This paper discusses a solution to the problem in the form of the Uncertainty Markup Language (UncertML). UncertML is a conceptual model, realised as an XML schema, that allows uncertainty to be quantified in a variety of ways i.e. realisations, statistics and probability distributions. UncertML is based upon a soft-typed XML schema design that provides a generic framework from which any statistic or distribution may be created. Making extensive use of Geography Markup Language (GML) dictionaries, UncertML provides a collection of definitions for common uncertainty types. Containing both written descriptions and mathematical functions, encoded as MathML, the definitions within these dictionaries provide a robust mechanism for defining any statistic or distribution and can be easily extended. Universal Resource Identifiers (URIs) are used to introduce semantics to the soft-typed elements by linking to these dictionary definitions. The INTAMAP (INTeroperability and Automated MAPping) project provides a use case for UncertML. This paper demonstrates how observation errors can be quantified using UncertML and wrapped within an Observations & Measurements (O&M) Observation. The interpolation service uses the information within these observations to influence the prediction outcome. The output uncertainties may be encoded in a variety of UncertML types, e.g. a series of marginal Gaussian distributions, a set of statistics, such as the first three marginal moments, or a set of realisations from a Monte Carlo treatment. Quantifying and propagating uncertainty in this way allows such interpolation results to be consumed by other services. This could form part of a risk management chain or a decision support system, and ultimately paves the way for complex data processing chains in the Semantic Web.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Corpora—large collections of written and/or spoken text stored and accessed electronically—provide the means of investigating language that is of growing importance academically and professionally. Corpora are now routinely used in the following fields: The production of dictionaries and other reference materials; The development of aids to translation; Language teaching materials; The investigation of ideologies and cultural assumptions; Natural language processing; and The investigation of all aspects of linguistic behaviour, including vocabulary, grammar and pragmatics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many people think of language as words. Words are small, convenient units, especially in written English, where they are separated by spaces. Dictionaries seem to reinforce this idea, because entries are arranged as a list of alphabetically-ordered words. Traditionally, linguists and teachers focused on grammar and treated words as self-contained units of meaning, which fill the available grammatical slots in a sentence. More recently, attention has shifted from grammar to lexis, and from words to chunks. Dictionary headwords are convenient points of access for the user, but modern dictionary entries usually deal with chunks, because meanings often do not arise from individual words, but from the chunks in which the words occur. Corpus research confirms that native speakers of a language actually work with larger “chunks” of language. This paper will show that teachers and learners will benefit from treating language as chunks rather than words.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Malapropism is a semantic error that is hardly detectable because it usually retains syntactical links between words in the sentence but replaces one content word by a similar word with quite different meaning. A method of automatic detection of malapropisms is described, based on Web statistics and a specially defined Semantic Compatibility Index (SCI). For correction of the detected errors, special dictionaries and heuristic rules are proposed, which retains only a few highly SCI-ranked correction candidates for the user’s selection. Experiments on Web-assisted detection and correction of Russian malapropisms are reported, demonstrating efficacy of the described method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

* Work done under partial support of Mexican Government (CONACyT, SNI), IPN (CGPI, COFAA) and Korean Government (KIPA Professorship for Visiting Faculty Positions). The second author is currently on Sabbatical leave at Chung-Ang University.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we try to present how information technologies as tools for the creation of digital bilingual dictionaries can help the preservation of natural languages. Natural languages are an outstanding part of human cultural values and for that reason they should be preserved as part of the world cultural heritage. We describe our work on the bilingual lexical database supporting the Bulgarian-Polish Online dictionary. The main software tools for the web- presentation of the dictionary are shortly described. We focus our special attention on the presentation of verbs, the richest from a specific characteristics viewpoint linguistic category in Bulgarian.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a research of linguistic structure of Bulgarian bells knowledge. The idea of building semantic structure of Bulgarian bells appeared during the “Multimedia fund - BellKnow” project. In this project was collected a lots of data about bells, their structure, history, technical data, etc. This is the first attempt for computation linguistic explain of bell knowledge and deliver a semantic representation of that knowledge. Based on this research some linguistic components, aiming to realize different types of analysis of text objects are implemented in term dictionaries. Thus, we lay the foundation of the linguistic analysis services in these digital dictionaries aiding the research of kinds, number and frequency of the lexical units that constitute various bell objects.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present an innovative topic segmentation system based on a new informative similarity measure that takes into account word co-occurrence in order to avoid the accessibility to existing linguistic resources such as electronic dictionaries or lexico-semantic databases such as thesauri or ontology. Topic segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. Topic segmentation has extensively been used in information retrieval and text summarization. In particular, our architecture proposes a language-independent topic segmentation system that solves three main problems evidenced by previous research: systems based uniquely on lexical repetition that show reliability problems, systems based on lexical cohesion using existing linguistic resources that are usually available only for dominating languages and as a consequence do not apply to less favored languages and finally systems that need previously existing harvesting training data. For that purpose, we only use statistics on words and sequences of words based on a set of texts. This solution provides a flexible solution that may narrow the gap between dominating languages and less favored languages thus allowing equivalent access to information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cooperative Greedy Pursuit Strategies are considered for approximating a signal partition subjected to a global constraint on sparsity. The approach aims at producing a high quality sparse approximation of the whole signal, using highly coherent redundant dictionaries. The cooperation takes place by ranking the partition units for their sequential stepwise approximation, and is realized by means of i)forward steps for the upgrading of an approximation and/or ii) backward steps for the corresponding downgrading. The advantage of the strategy is illustrated by approximation of music signals using redundant trigonometric dictionaries. In addition to rendering stunning improvements in sparsity with respect to the concomitant trigonometric basis, these dictionaries enable a fast implementation of the approach via the Fast Fourier Transform

Relevância:

10.00% 10.00%

Publicador:

Resumo:

I describe and discuss a series of court cases which focus upon on decoding the meaning of slang terms. Examples include sexual slang used in a description by a child and an Internet Relay Chat containing a conspiracy to murder. I consider the task presented by these cases for the forensic linguist and the roles the linguist may assume in determining the meaning of slang terms for the Courts. These roles are identified as linguist as naïve interpreter, lexicographer, case researcher and cultural mediator. Each of these roles is suggestive of different strategies that might be used from consulting formal slang dictionaries and less formal Internet sources, to collecting case specific corpora and examining all the extraneous material in a particular case. Each strategy is evaluated both in terms of the strength of evidence provided and its applicability to the forensic context.