965 resultados para specialized corpora
Resumo:
Identifying translations from comparable corpora is a well-known problem with several applications, e.g. dictionary creation in resource-scarce languages. Scarcity of high quality corpora, especially in Indian languages, makes this problem hard, e.g. state-of-the-art techniques achieve a mean reciprocal rank (MRR) of 0.66 for English-Italian, and a mere 0.187 for Telugu-Kannada. There exist comparable corpora in many Indian languages with other ``auxiliary'' languages. We observe that translations have many topically related words in common in the auxiliary language. To model this, we define the notion of a translingual theme, a set of topically related words from auxiliary language corpora, and present a probabilistic framework for translation induction. Extensive experiments on 35 comparable corpora using English and French as auxiliary languages show that this approach can yield dramatic improvements in performance (e.g. MRR improves by 124% to 0.419 for Telugu-Kannada). A user study on WikiTSu, a system for cross-lingual Wikipedia title suggestion that uses our approach, shows a 20% improvement in the quality of titles suggested.
Resumo:
Presentation for the 5th International Conference on Corpus Linguistics (CILC 2013), V Congreso Internacional de Lingüistica de Corpus.
Resumo:
Esta pesquisa tem como objetivo principal investigar como aprendizes brasileiros de língua inglesa usam advérbios com terminação em ly no inglês escrito, e comparar ao uso que deles fazem os falantes de inglês como língua materna. Para tanto, o trabalho encontra suporte teórico e metodológico na Linguística de Corpus e fundamenta-se na área chamada de pesquisa sobre corpora de aprendizes, que se ocupa da coleta e armazenagem de dados linguísticos de sujeitos aprendizes de uma língua estrangeira, para a formação de um corpus que possa ser utilizado para fins descritivos e pedagógicos. Esta área objetiva identificar em que aspectos os aprendizes diferem ou se assemelham aos falantes nativos. Os corpora empregados na pesquisa são o corpus de estudo (Br-ICLE), contendo inglês escrito por brasileiros, compilado de acordo com o projeto ICLE (International Corpus of Learner English) e dois corpora de referência (LOCNESS e BAWE), contendo inglês escrito por falantes de inglês como língua materna. Os resultados indicam que os alunos brasileiros usam, em demasia, as categorias de advérbios que indicam veracidade, realidade e intensidade, em relação ao uso que deles fazem os falantes nativos, além de usarem esses advérbios de forma distinta. Os resultados sugerem que, além das diferenças apresentadas em termos de frequência (seja pelo sobreuso ou subuso dos advérbios), os aprendizes apresentavam combinações errôneas, ou em termos de colocados ou em termos de prosódia semântica. E finalmente a pesquisa revela que a preferência dos aprendizes por advérbios que exprimem veracidade, realidade e intensidade cria a impressão de um discurso muito assertivo. Conclui-se que as diferenças encontradas podem estar ligadas a fatores como o tamanho dos corpora, a influência da língua materna dos aprendizes, a internalização dos elementos linguísticos necessários para a produção de um texto em língua estrangeira, a falta de fluência dos aprendizes e o contexto de sala de aula nas universidades
Resumo:
Investigou-se pelo presente estudo se a concepção presente na Teoria de Replicadores, expressa através do conceito de meme (DAWKINS, 1979), poderia ser um modelo compatível para explicar a propagação de memes no substrato das mídias sociais. No âmbito dos estudos locais, Recuero (2006) sugeriu uma transdução desse modelo, baseando-se nas concepções de Dawkins (1979). Refletindo sobre o posicionamento epistemológico de Recuero (2006), o presente trabalho, baseando-se em Dennett (1995), Blackmore (2002) e Tyler (2011b; 2013b), procedeu às instâncias de Análise Conceitual e Composicional dessa transdução. A partir do conceito de memeplexo (BLACKMORE, 2002), esta pesquisa de base linguística (HALLIDAY, 1987) entende os memes, no substrato das mídias digitais/sociais, como práticas de produção e distribuição linguístico-midiáticas, propaladas a partir de diversas unidades de propagação e das relações criadas pelos internautas nesse processo de transmissão. Investigando tais relações, a partir da instância de Análise Relacional, propõe-se examinar duas unidades de propagação. Expressões meméticas (Que deselegante e #Tenso) e imagens meméticas (oriundas do fenômeno memético Nana em desastres). Integram este estudo dois corpora de expressões meméticas (5275 postagens oriundas ou redirecionadas para o Twitter.com total de 83.655 palavras/tokens) e um corpus bilíngue (Português/Inglês) de imagens meméticas (um total de 134 imagens oriundas do Tumblr.com e Facebook.com). Para analisar os corpora de expressões meméticas utilizou-se a metodologia de Linguística de Corpus (BERBER-SARDINHA, 2004; SHEPHERD, 2009; SOUZA JÚNIOR, 2012, 2013b, 2013c). Para a análise do corpus multimodal de imagens meméticas, utilizou-se a metodologia que chamamos de Análise Propagatória. Objetivamos verificar se essas unidades de propagação e as práticas linguístico-midiáticas que estas transmitem, evoluiriam somente devido a aspectos memético-midiáticos, conforme Recuero (2006) apontara, e com padrão de propagação internalista (DAWKINS, 1979; 1982). Após análise dos dados, revelou-se que, ao nível do propósito, os fenômenos locais investigados não evoluíram por padrão internalista (ou homogêneo) de propagação. Tais padrões revelam ser de natureza externalista (ou heterogênea). Ademais, constatou-se que princípios constitutivos meméticos de evolução como os de fecundidade, longevidade (DAWKINS 1979; 1982) e o de design (DENNETT, 1995), junto com o princípio midiático de evolução de alcance (RECUERO, 2006) mantiveram-se presentes com alto grau de influencia nas propagações de natureza externalista. Por outro lado, o princípio memético da fidelidade (DAWKINS, 1979; 1982) foi o que menos influenciou esses padrões de propagação. Neutralizando a fidelidade, e impulsionados pelo princípio de design, destacaram-se nesse processo evolutivo os princípios linguísticos sistematizadores revelados por este estudo. Isto é: o princípio da funcionalidade (memes evoluem porque podem indicar propósitos diferentes) e o princípio do alcance linguístico (memes podem ser direcionados a itens animados/ inanimados; para internautas em idioma nativo/ estrangeiro)
Resumo:
Phylogeny of the specialized schizothoracine fishes (Teleostei: Cypriniformes: Cyprinidae). Zoological Studies 40(2). 147-157. To elucidate phylogenetic relationships within the specialized schizothoracine fishes, we used 41 variable osteological and exte
Resumo:
Molecular phylogeny of three genera containing nine species and subspecies of the specialized schizothoracine fishes are investigated based on the complete nucleotide sequence of mitochondrial cytochrome b gene. Meantime relationships between the main cladogenetic events of the specialized schizothoracine fishes and the stepwise uplift of the Qinghai-Tibetan Plateau are also conducted using the molecular clock, which is calibrated by geological isolated events between the upper reaches of the Yellow River and the Qinghai Lake. Results indicated that the specialized schizothoracine fishes are not a monophyly. Five species and subspecies of Ptychobarbus form a monophyly. But three species of Gymnodiptychus do not form a monophyly. Gd. integrigymnatus is a sister taxon of the highly specialized schizothoracine fishes while Gd. pachycheilus has a close relation with Gd. dybowskii, and both of them are as a sister group of Diptychus maculatus. The specialized schizothoracines fishes might have originated during the Miocene (about 10 MaBP), and then the divergence of three genera happened during late Miocene (about 8 MaBP). Their main specialization occurred during the late Pliocene and Pleistocene (3.54-0.42 MaBP). The main cladogenetic events of the specialized schizothoracine fishes are mostly correlated with the geological tectonic events and intensive climate shift happened at 8, 3.6, 2.5 and 1.7 MaBP of the late Cenozoic. Molecular clock data do not support the hypothesis that the Qinghai-Tibetan Plateau uplifted to near present or even higher elevations during the Oligocene or Miocene, and neither in agreement with the view that the plateau uplifting reached only to an altitude of 2000 in during the late Pliocene (about 2.6 MaBP).
Resumo:
We recovered the phylogenetic relationships among 23 species and subspecies of the highly specialized grade schizothoracine fishes distributing at 36 geographical sites in the Tibetan Plateau and its Surrounding regions by analyzing sequences of cytochrome b genes. Furthermore, we estimated the possible divergent times among lineages based on a historical geological isolation event in the Tibetan Plateau. The molecular data revealed that the highly specialized grade schizothoracine fishes were not a monophyletic group, but were the same as genera Gymnocypris and Schizogypsis. Our results indicated that the molecular phylogenetic relationships apparently reflected their geographical and historical associations with drainages, namely species from the same and adjacent drainages clustered together and had close relationships. The divergence times of different lineages were well consistent with the rapid uplift phases of the Tibetan Plateau in the late Cenozoic, suggesting that the origin and evolution of schizothoracine fishes were strongly influenced by environment changes resulting from the upheaval of the Tibetan Plateau.
Resumo:
Molecular phylogeny of three genera containing nine species and subspecies of the specialized schizothoracine fishes are investigated based on the complete nucleotide sequence of mitochondrial cytochrome b gene. Meantime relationships between the main cladogenetic events of the specialized schizothoracine fishes and the stepwise uplift of the Qinghai-Tibetan Plateau are also conducted using the molecular clock, which is calibrated by geological isolated events between the upper reaches of the Yellow River and the Qinghai Lake. Results indicated that the specialized schizothoracine fishes are not a monophyly. Five species and subspecies of Ptychobarbus form a monophyly. But three species of Gymnodiptychus do not form a monophyly. Gd. integrigymnatus is a sister taxon of the highly specialized schizothoracine fishes while Gd. pachycheilus has a close relation with Gd. dybowskii, and both of them are as a sister group of Diptychus maculatus. The specialized schizothoracines fishes might have originated during the Miocene (about 10 MaBP), and then the divergence of three genera happened during late Miocene (about 8 MaBP). Their main specialization occurred during the late Pliocene and Pleistocene (3.54-0.42 MaBP). The main cladogenetic events of the specialized schizothoracine fishes are mostly correlated with the geological tectonic events and intensive climate shift happened at 8, 3.6, 2.5 and 1.7 MaBP of the late Cenozoic. Molecular clock data do not support the hypothesis that the Qinghai-Tibetan Plateau uplifted to near present or even higher elevations during the Oligocene or Miocene, and neither in agreement with the view that the plateau uplifting reached only to an altitude of 2000 in during the late Pliocene (about 2.6 MaBP).
Resumo:
Phylogeny of the specialized schizothoracine fishes (Teleostei: Cypriniformes: Cyprinidae). Zoological Studies 40(2). 147-157. To elucidate phylogenetic relationships within the specialized schizothoracine fishes, we used 41 variable osteological and external characters among this groups, three species of Schizothorax, and 1 fossil species. When the 3 species of Schizothorax were designated as an outgroup and all 41 characters were set as unordered with equal weighting, the data matrix yielded a single most-parsimonious tree with a tree length of 71 steps, a consistency index of 0.6761, and a retention index of 0.7416. Meanwhile, a bootstrap test was conducted to verify the reliability of the results. The matrix was also analyzed for different conditions: all characters were ordered and the fossil species was added as an outgroup. The phylogenetic analyses presented herein support the following hypotheses. 1) All species of the specialized schizo-thoracines fishes form a monophyletic group. 2) Monophyly of the genus Ptychobarbus is not supported by the bootstrap test or when these characters are ordered. 3) The genus Gymnodiptychus forms a monophyletic group. 4) All species of Ptychobarbus and Gymnodiptychus form a monophyletic group with Diptychus as its sister group.
Resumo:
The complete 1140 bp mitochondial cytochrome b sequences were obtained from 39 individuals representing five species of all four genera of highly specialized schizothoracine fishes distributed in the Qinghai-Tibet plateau. Sequence variation of the cytochrome b gene was surveyed among the 39 individuals as well as three primitive schizothoracines and one outgroup. Phylogenetic analysis suggested that the group assignment based on 1140 bp of the cytochrome b sequence is obviously; different from previous assignments, and the highly specialized schizothoracine fishes (Schizopygopsis pylzovi, Gymnocypris przewalskii, G. eckloni, Chuanchia lablosa, and Platypharodon extremus) form a monophyletic group that is sister to the clade formed by the primitive schizothoracine fishes (Schizothorax prenanti, S. pseudaksaiensis, and S. argentatus). The haplotypes of Schizopygopsis pylzovi and G. przewalskii were paraphyletic based on cytochrome b data, which most likely reflected incomplete sorting of mitochondrial DNA lineages. The diploid chromosome numbers of Schizofhoracinae were considered in phylogenetic analysis and provided a clear pattern of relationships. Molecular dating estimated for highly specialized schizothoracine fishes suggested that the highly specialized schizothoracine fishes diverged in the late Miocene Pliocene to Pleistocene (4.5x10(4)-4.05x10(6) Years BP). The relationship between the cladogenesis of highly specialized schizothoracine fishes and geographical events of the Qinghai-Tibet plateau is discussed.
Resumo:
A probabilistic, nonlinear supervised learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA employs a set of several forward mapping functions that are estimated automatically from training data. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). The SMA can model ambiguous, one-to-many mappings that may yield multiple valid output hypotheses. Once learned, the mapping functions generate a set of output hypotheses for a given input via a statistical inference procedure. The SMA inference procedure incorporates an inverse mapping or feedback function in evaluating the likelihood of each of the hypothesis. Possible feedback functions include computer graphics rendering routines that can generate images for given hypotheses. The SMA employs a variant of the Expectation-Maximization algorithm for simultaneous learning of the specialized domains along with the mapping functions, and approximate strategies for inference. The framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human’s body or hands, given silhouettes from a single image. The accuracy and stability of the SMA are also tested using synthetic images of human bodies and hands, where ground truth is known.