993 resultados para lexical approach


Relevância:

30.00% 30.00%

Publicador:

Resumo:

No âmbito do Processamento Automático de Línguas Naturais (PLN), o desenvolvimento de recursos léxico-semânticos é premente. Ao conceber os sistemas de PLN como um exercício de engenharia da linguagem humana, acredita-se que o desenvolvimento de tais recursos pode ser beneficiado pelos modelos de representação do conhecimento, desenvolvidos pela Engenharia do Conhecimento. Esses modelos, em particular, fornecem simultaneamente o arcabouço teórico-metodológico e a metalinguagem formal para o tratamento computacional do significado das unidades lexicais. Neste artigo, após a apresentação da concepção linguístico-computacional de léxico, elucidam-se os principais paradigmas de representação do conhecimento, enfatizando a abordagem do significado e a metalinguagem formal vinculadas a cada um deles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pós-graduação em Estudos Linguísticos - IBILCE

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract This dissertation investigates the notion of equivalence with particular reference to lexical cohesion in the translation of political speeches. Lexical cohesion poses a particular challenge to the translators of political speeches and thus preserving lexical cohesion elements as one of the major elements of cohesion is undoubtedly crucial to their translation equivalence. We rely on Halliday’s (1994) classification of lexical cohesion which comprises: repetition, synonymy, antonymy, meronymy and hyponymy. Other traditional models of lexical cohesion are examined. We include Grammatical Parallelism for its role in creating textual semantic unity which is what cohesion is all about. The study shed light on the function of lexical cohesion elements as rhetorical device. The study also deals with lexical problems resulting from the transfer of lexical cohesion elements from the SL into the TL, which is often beset by many problems that most often result from the differences between languages. Three key issues are identified as being fundamental to equivalence and lexical cohesion in the translation of political speeches: sociosemiotic approach, register analysis, rhetoric, and poetic function. The study also investigates the lexical cohesion elements in the translation of political speeches from English into Arabic, Italian and French in relation to ideology, and its control, through bias and distortion. The findings are discussed, implications examined and topics for further research suggested.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Extracting opinions and emotions from text is becoming increasingly important, especially since the advent of micro-blogging and social networking. Opinion mining is particularly popular and now gathers many public services, datasets and lexical resources. Unfortunately, there are few available lexical and semantic resources for emotion recognition that could foster the development of new emotion aware services and applications. The diversity of theories of emotion and the absence of a common vocabulary are two of the main barriers to the development of such resources. This situation motivated the creation of Onyx, a semantic vocabulary of emotions with a focus on lexical resources and emotion analysis services. It follows a linguistic Linked Data approach, it is aligned with the Provenance Ontology, and it has been integrated with the Lexicon Model for Ontologies (lemon), a popular RDF model for representing lexical entries. This approach also means a new and interesting way to work with different theories of emotion. As part of this work, Onyx has been aligned with EmotionML and WordNet-Affect.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The concept of plagiarism is not uncommonly associated with the concept of intellectual property, both for historical and legal reasons: the approach to the ownership of ‘moral’, nonmaterial goods has evolved to the right to individual property, and consequently a need was raised to establish a legal framework to cope with the infringement of those rights. The solution to plagiarism therefore falls most often under two categories: ethical and legal. On the ethical side, education and intercultural studies have addressed plagiarism critically, not only as a means to improve academic ethics policies (PlagiarismAdvice.org, 2008), but mainly to demonstrate that if anything the concept of plagiarism is far from being universal (Howard & Robillard, 2008). Even if differently, Howard (1995) and Scollon (1994, 1995) argued, and Angèlil-Carter (2000) and Pecorari (2008) later emphasised that the concept of plagiarism cannot be studied on the grounds that one definition is clearly understandable by everyone. Scollon (1994, 1995), for example, claimed that authorship attribution is particularly a problem in non-native writing in English, and so did Pecorari (2008) in her comprehensive analysis of academic plagiarism. If among higher education students plagiarism is often a problem of literacy, with prior, conflicting social discourses that may interfere with academic discourse, as Angèlil-Carter (2000) demonstrates, we then have to aver that a distinction should be made between intentional and inadvertent plagiarism: plagiarism should be prosecuted when intentional, but if it is part of the learning process and results from the plagiarist’s unfamiliarity with the text or topic it should be considered ‘positive plagiarism’ (Howard, 1995: 796) and hence not an offense. Determining the intention behind the instances of plagiarism therefore determines the nature of the disciplinary action adopted. Unfortunately, in order to demonstrate the intention to deceive and charge students with accusations of plagiarism, teachers necessarily have to position themselves as ‘plagiarism police’, although it has been argued otherwise (Robillard, 2008). Practice demonstrates that in their daily activities teachers will find themselves being required a command of investigative skills and tools that they most often lack. We thus claim that the ‘intention to deceive’ cannot inevitably be dissociated from plagiarism as a legal issue, even if Garner (2009) asserts that generally plagiarism is immoral but not illegal, and Goldstein (2003) makes the same severance. However, these claims, and the claim that only cases of copyright infringement tend to go to court, have recently been challenged, mainly by forensic linguists, who have been actively involved in cases of plagiarism. Turell (2008), for instance, demonstrated that plagiarism is often connoted with an illegal appropriation of ideas. Previously, she (Turell, 2004) had demonstrated by comparison of four translations of Shakespeare’s Julius Caesar to Spanish that the use of linguistic evidence is able to demonstrate instances of plagiarism. This challenge is also reinforced by practice in international organisations, such as the IEEE, to whom plagiarism potentially has ‘severe ethical and legal consequences’ (IEEE, 2006: 57). What plagiarism definitions used by publishers and organisations have in common – and which the academia usually lacks – is their focus on the legal nature. We speculate that this is due to the relation they intentionally establish with copyright laws, whereas in education the focus tends to shift from the legal to the ethical aspects. However, the number of plagiarism cases taken to court is very small, and jurisprudence is still being developed on the topic. In countries within the Civil Law tradition, Turell (2008) claims, (forensic) linguists are seldom called upon as expert witnesses in cases of plagiarism, either because plagiarists are rarely taken to court or because there is little tradition of accepting linguistic evidence. In spite of the investigative and evidential potential of forensic linguistics to demonstrate the plagiarist’s intention or otherwise, this potential is restricted by the ability to identify a text as being suspect of plagiarism. In an era with such a massive textual production, ‘policing’ plagiarism thus becomes an extraordinarily difficult task without the assistance of plagiarism detection systems. Although plagiarism detection has attracted the attention of computer engineers and software developers for years, a lot of research is still needed. Given the investigative nature of academic plagiarism, plagiarism detection has of necessity to consider not only concepts of education and computational linguistics, but also forensic linguistics. Especially, if intended to counter claims of being a ‘simplistic response’ (Robillard & Howard, 2008). In this paper, we use a corpus of essays written by university students who were accused of plagiarism, to demonstrate that a forensic linguistic analysis of improper paraphrasing in suspect texts has the potential to identify and provide evidence of intention. A linguistic analysis of the corpus texts shows that the plagiarist acts on the paradigmatic axis to replace relevant lexical items with a related word from the same semantic field, i.e. a synonym, a subordinate, a superordinate, etc. In other words, relevant lexical items were replaced with related, but not identical, ones. Additionally, the analysis demonstrates that the word order is often changed intentionally to disguise the borrowing. On the other hand, the linguistic analysis of linking and explanatory verbs (i.e. referencing verbs) and prepositions shows that these have the potential to discriminate instances of ‘patchwriting’ and instances of plagiarism. This research demonstrates that the referencing verbs are borrowed from the original in an attempt to construct the new text cohesively when the plagiarism is inadvertent, and that the plagiarist has made an effort to prevent the reader from identifying the text as plagiarism, when it is intentional. In some of these cases, the referencing elements prove being able to identify direct quotations and thus ‘betray’ and denounce plagiarism. Finally, we demonstrate that a forensic linguistic analysis of these verbs is critical to allow detection software to identify them as proper paraphrasing and not – mistakenly and simplistically – as plagiarism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper presents an approach to extraction of facts from texts of documents. This approach is based on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually extracted facts combine into one structure the dictionary lexical objects found in the text and match them against concepts of subject domain ontology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H30

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article I argue that the study of the linguistic aspects of epistemology has become unhelpfully focused on the corpus-based study of hedging and that a corpus-driven approach can help to improve upon this. Through focusing on a corpus of texts from one discourse community (that of genetics) and identifying frequent tri-lexical clusters containing highly frequent lexical items identified as keywords, I undertake an inductive analysis identifying patterns of epistemic significance. Several of these patterns are shown to be hedging devices and the whole corpus frequencies of the most salient of these, candidate and putative, are then compared to the whole corpus frequencies for comparable wordforms and clusters of epistemic significance. Finally I interviewed a ‘friendly geneticist’ in order to check my interpretation of some of the terms used and to get an expert interpretation of the overall findings. In summary I argue that the highly unexpected patterns of hedging found in genetics demonstrate the value of adopting a corpus-driven approach and constitute an advance in our current understanding of how to approach the relationship between language and epistemology.