Biblioteca Digital

980 resultados para Linguistic analysis (Linguistics)

A new hybrid summarizer based on vector Space model, statistical physics and linguistics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.

Discourse, knowledge, power and politics: towards critical epistemic discourse analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although both are fundamental terms in the humanities and social sciences, discourse and knowledge have seldom been explicitly related, and even less so in critical discourse studies. After a brief summary of what we know about these relationships in linguistics, psychology, epistemology and the social sciences, with special emphasis on the role of knowledge in the formation of mental models as a basis for discourse, I examine in more detail how a critical study of discourse and knowledge may be articulated in critical discourse studies. Thus, several areas of critical epistemic discourse analysis are identified, and then applied in a study of Tony Blair’s Iraq speech on March 18, 2003, in which he sought to legitimatize his decision to go to war in Iraq with George Bush. The analysis shows the various modes of how knowledge is managed and manipulated of all levels of discourse of this speech.

Distributional equivalence and subcompositional coherence in the analysis of contingency tables, ratio-scale measurements and compositional data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.

Cape Verdean Creole of São Vicente: Its genesis and Structure

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although the Santiago variety of Cape Verdean Creole (CVC) has been the subject of numerous linguistic works, the second major variety of the language, i.e. the São Vicente variety of CVC (CVSV), has hardly been described. Nevertheless this lack of studies and given its striking differences, on all linguistic levels, from the variety of Santiago (CVST), the implicit explanation for such divergences, echoed for decades in the literature on CVC, has been the presumably decreolized character of CVSV. First, this study provides a comprehensive fieldwork-based synchronic description of CVSV major morpho-syntactic categories in the intent to document the variety. Second, it aims to place the study of CVSV within a broader scope of contact linguistics in the quest to explain its structure. Based on analyses of historical documents and studies, it reconstructs the sociohistorical scenario of the emergence and development of CVSV in the period of 1797- 1975. From the comparison of the current structures of CVSV and CVST, the examination of linguistic data in historical texts and the analysis of sociohistorical facts it becomes clear that the contemporary structure of CVSV stems from the contact-induced changes that occurred during the intensive language and dialect contact on the island of São Vicente in the early days of its settlement in the late 18th and ensuing early 19th century development, rather than from modern day pressure of Portuguese. Although this dissertation argues for multiple explanations rather than a single theory, by showing that processes such as languages shift among the first Portuguese settlers, L2 acquisition, migration of the Barlavento speakers and subsequent dialect leveling as well as language borrowing at a later stage were at stake, it demonstrates the usefulness of partial-restructuring model proposed by Holm (2004).

An error analysis of phonetic transcription : results from a pilot study

Relevância:

30.00% 30.00%

Publicador:

Corpus studies in applied linguistics

Relevância:

30.00% 30.00%

Publicador:

A citation analysis of Catalan literary studies (1974-2003): towards a bibliometrics of humanities studies in minority languages

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A citation analysis was carried out on the most important research journals in the field of Catalan literature between 1974 and 2003. The indicators and qualitative parameters obtained show the value of performing citation analysis in cultural and linguistic areas that are poorly covered by the A&HCI. Catalan literature shows a similar pattern to that of humanities in general, but it could still be in a stage of consolidation because too little work has as yet been published.

Analyzing the Linguistic Dimension of Globalization in Media Communication: the Case of Insults and Violence in Debates

Relevância:

30.00% 30.00%

Publicador:

Paraphrase concept and typology. A linguistically based and computationally oriented approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs.

The "handwriting brain": a meta-analysis of neuroimaging studies of motor versus orthographic processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: Handwriting is a modality of language production whose cerebral substrates remain poorly known although the existence of specific regions is postulated. The description of brain damaged patients with agraphia and, more recently, several neuroimaging studies suggest the involvement of different brain regions. However, results vary with the methodological choices made and may not always discriminate between "writing-specific" and motor or linguistic processes shared with other abilities. METHODS: We used the "Activation Likelihood Estimate" (ALE) meta-analytical method to identify the cerebral network of areas commonly activated during handwriting in 18 neuroimaging studies published in the literature. Included contrasts were also classified according to the control tasks used, whether non-specific motor/output-control or linguistic/input-control. These data were included in two secondary meta-analyses in order to reveal the functional role of the different areas of this network. RESULTS: An extensive, mainly left-hemisphere network of 12 cortical and sub-cortical areas was obtained; three of which were considered as primarily writing-specific (left superior frontal sulcus/middle frontal gyrus area, left intraparietal sulcus/superior parietal area, right cerebellum) while others related rather to non-specific motor (primary motor and sensorimotor cortex, supplementary motor area, thalamus and putamen) or linguistic processes (ventral premotor cortex, posterior/inferior temporal cortex). CONCLUSIONS: This meta-analysis provides a description of the cerebral network of handwriting as revealed by various types of neuroimaging experiments and confirms the crucial involvement of the left frontal and superior parietal regions. These findings provide new insights into cognitive processes involved in handwriting and their cerebral substrates.

Plagiarism meets paraphrasing: insights for the new generation in automatic plagiarism detection

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyse the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource which uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analysed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarising, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analysed, providing critical insights for the improvement of automatic plagiarism detection systems.

An Italian to Catalan RBMT system reusing data from existing language pairs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an Italian to CatalanRBMT system automatically built bycombining the linguistic data of theexisting pairs Spanish-Catalan andSpanish-Italian. A lightweight manualpostprocessing is carried out in order tofix inconsistencies in the automaticallyderived dictionaries and to add very frequentwords that are missing accordingto a corpus analysis. The system isevaluated on the KDE4 corpus and outperformsGoogle Translate by approximatelyten absolute points in terms ofboth TER and GTM.

Fuzzy Principal Components Analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to the large number of characteristics, there is a need to extract the most relevant characteristicsfrom the input data, so that the amount of information lost in this way is minimal, and the classification realized with the projected data set is relevant with respect to the original data. In order to achieve this feature extraction, different statistical techniques, as well as the principal components analysis (PCA) may be used. This thesis describes an extension of principal components analysis (PCA) allowing the extraction ofa finite number of relevant features from high-dimensional fuzzy data and noisy data. PCA finds linear combinations of the original measurement variables that describe the significant variation in the data. The comparisonof the two proposed methods was produced by using postoperative patient data. Experiment results demonstrate the ability of using the proposed two methods in complex data. Fuzzy PCA was used in the classificationproblem. The classification was applied by using the similarity classifier algorithm where total similarity measures weights are optimized with differential evolution algorithm. This thesis presents the comparison of the classification results based on the obtained data from the fuzzy PCA.

Conceptualisation et emploi du passé composé et de l'imparfait en français langue étrangère (L3) au Sri Lanka : l'impact du singhalais langue maternelle (L1) et de l'anglais langue seconde (L2)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Résumé: Notre étude chevauche deux domaines de recherche quasi indissociables : ceux de la linguistique et de la didactique des langues. Comme l'indique le sujet, elle examine la conceptualisation et l'emploi de deux notions aspecto-temporelles du français (le passé composé et l'imparfait), sous l'impact des connaissances grammaticales déjà acquises sur deux autres langues : le singhalais et l'anglais. Notre recherche relève des domaines de la psycholinguistique, de la linguistique acquisitionnelle et de la linguistique comparative. Toutefois, dans le cadre de cette étude, nous examinons ces notions grammaticales françaises et leurs équivalents présumés dans les deux autres langues comme étant des concepts relevant des langues à statuts sociaux spécifiques [à savoir, langue maternelle (L1), langue seconde (L2) et langue étrangère (L3)], dans un contexte particulier d'enseignement/apprentissage et d'acquisition de langue [à savoir, le contexte d'enseignement/apprentissage et d'acquisition du français langue étrangère (FLE) au Sri Lanka]. En ce sens, notre étude est également liée aux domaines de la sociolinguistique et de la didactique des langues, notamment, étrangères. Ce qui pourrait probablement distinguer cette recherche des autres, c'est qu'elle aborde certaines questions linguistiques et didactiques peu étudiées jusqu'ici. Entre autres, l'influence de deux langues sur l'enseignement/apprentissage d'une L3, l'enseignement/apprentissage des langues dans des contextes exolingues et le rôle des transferts dans la conceptualisation des notions grammaticales. Pourtant, lorsque nous avons choisi le contexte d'apprentissage du FLE au Sri Lanka comme terrain de recherche, nous avons également visé d'autres objectifs : examiner les systèmes verbaux de trois langues dont l'imbrication n'a pas encore été objet d'étude ; examiner le système verbal aspecto-temporel peu explicité du singhalais à la lumière des descriptions linguistiques occidentales ; vérifier certains préjugés concernant les liens de proximité et de distance entre les trois langues choisies et étudier les causes de ces préjugés. Notre corpus provient de plusieurs classes de FLE au Sri Lanka. Le public observé était constitué d'adolescents ou d'adultes bilingues ayant le singhalais en L1 et l'anglais en L2. Les cours choisis se distinguaient les uns des autres par plusieurs critères, mais travaillaient tous sur les notions du passé composé et de l'imparfait. A la conclusion de notre étude, nous avons constaté qu'un nombre important de nos hypothèses initiales se sont avérées véridiques. A titre d'exemples, les transferts entre les langues premières et la langue cible sont récurrents et non négligeables chez l'écrasante majorité des apprenants exolingues observés, et parfois, même chez leurs enseignants; si ces apprenants recourent à ces langues pour étayer leur apprentissage, ni leurs enseignants ni leurs manuels provenant de l'étranger ne les guident dans ce travail; les transferts ayant l'anglais pour origine l'emportent considérablement sur ceux provenant du singhalais. De même, suite à l'analyse contrastive des trois systèmes verbaux aspecto-temporels et à l'analyse du corpus, nous avons également eu un résultat imprévu : contrairement à une représentation répandue chez les apprenants singhalais, il existe des points convergents entre leur L1 et le français ; du moins, au niveau de l'emploi de certains temps du passé. Un fait dont on était jusqu'ici ignorants mais dont on peut sûrement profiter dans les cours de FLE au Sri Lanka. Suite à ces observations et à la fin de notre thèse, nous avons fait quelques recommandations didactiques afin d'améliorer les conditions d'enseignement/apprentissage des langues étrangères, au Sri Lanka et ailleurs. Abstract: Our research is related to the fields of both linguistics and didactics, two research areas which are almost inseparable. As the title shows, the thesis examines the issue of conceptualizing and using of two grammatical (aspectual and temporal) concepts of the French language (le passé composé and l'imparfait), under the influence of previously acquired grammatical knowledge of two other languages: Sinhalese and English. Thus, our research is linked to the domains of psycholinguistics, acquisitional linguistics and comparative linguistics. However, within the framework of this study, we will consider the above-mentioned two French grammatical concepts and their presumed equivalents in the other two languages as concepts belonging to three languages with specific social status [i.e. first language (L1), second language (L2) and foreign language (L3)], taught/learnt/acquired in a particular language teaching/learning context [the context of teaching/learning of French as a foreign language (FFL) in Sri Lanka]. In that sense, our study is also associated with the fields of sociolinguistics and language teaching, especially foreign language teaching. What could probably make this study outstanding is that it studies certain linguistic and didactic issues which have not yet been studied. For example, it examines, among other issues, the following: the influence of two languages (i.e. mother tongue -L1 & second language -L2) on the teaching/learning process of a third language (i.e. foreign language- L3); foreign language teaching and learning in an exolingual context (where the target language is not spoken outside the classroom); the role of language transfers in the process of grammatical notion conceptualization. However, in selecting the FFL teaching/learning context in Sri Lanka as our field of research, we had further objectives in mind : i.e. 1) studying the verb systems of three languages whose combination has never been studied before ; 2) studying the aspectual-temporal formation of the Sinhalese verb system (which is hardly taught explicitly) in the light of the linguistic descriptions of dominant European languages; 3) verifying certain preconceived ideas regarding the proximity and the distance between the three chosen languages, and 4) studying the causes for these preconceptions. Our corpus is obtained from a number of FFL classes in Sri Lanka. The observed student groups consisted of bilingual adolescents and adults whose first language (L1) was Sinhalese and the second language (L2) was English. The observed classes differed in many ways but in each of those classes, a common factor was that the students had been learning some aspect of the two grammatical concepts, le passé composé and l'imparfait. Having completed our study, we now see that a considerable number of our initial hypotheses are proven correct. For example, in the exolingual French language teaching/learning context in Sri Lanka where we carried out our research, language transfers between the first and target languages were recurrent and numerous in the work of the greater majority of the observed language learners, and even their teachers; these transfers were so frequent that they could hardly be ignored during the teaching/learning process ; although learners turned to their first languages to facilitate the learning process of a new language, neither their teachers, nor their text books helped them in this task; the transfers originating from English were far too numerous than those originating from Sinhalese; however, contrary to the popular belief among many Sinhalese learners of French, the contrastive analysis of the three aspectual-temporal verb systems and the study of our corpus helped us in proving that there are common linguistic features between the Sinhalese and the French languages ; at least, when it comes to using some of their past tenses. This is a fact which had been ignored up to now but which could probably be used to improve French teaching/learning in Sri Lanka. Taking all observations into account, we made some pedagogical recommendations in the concluding part of our thesis with the view of improving foreign language teaching/learning in Sri Lanka, and elsewhere.

MPRO-Spanish: development and experiments with a linguistic parser for Spanish texts

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the main features and present results of MPRO-Spanish, a parser for morphological and syntactic analysis of unrestricted Spanish text developed at the IAI1. This parser makes direct use of X-phrase structure rules to handle a variety of patterns from derivational morphology and syntactic structure. Both analyses, morphological and syntactic, are realised by two subsequent modules. One module analyses and disambiguates the source words at morphological level while the other consists of a series of programs and a deterministic, procedural and explicit grammar. The article explains the main features of MPRO and resumes some of the experiments on some of its applications, some of which still being implemented like the monolingual and bilingual term extraction while others need further work like indexing. The results and applications obtained so far with simple and relatively complex sentences give us grounds to believe in its reliability.

«
1
2
...
8
9
10
11
12
13
14
...
65
66
»