875 resultados para Corpus annotation


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Este artigo constitui uma reflexão sobre as vantagens da utilização de corpora no processo de ensino/aprendizagem das línguas. O trabalho com corpora na sala de aula acarreta uma aproximação entre as práticas de investigação e as práticas de ensino-aprendizagem. O aluno adquire o papel de um investigador que pretende obter respostas a partir dos dados disponíveis no corpus. Deste modo, o aluno descobre a língua por meio das suas próprias observações, transformando-se em agente do seu processo de aprendizagem. Equacionada sob um certo ponto de vista de configuração tradicionalista, a utilização da informática na análise lexical afigura-se improfícua, no entanto, muitos estudiosos das Humanidades em geral, para além de revelarem a salutar consciência da indispensável adesão das Humanidades à informática, como forma de garantir a vitalidade das Humanidades, no que respeita à análise estatístico-lexical, preconizam que a utilização do computador constitui uma mais-valia. Ao longo deste artigo, procuraremos refletir sobre as seguintes questões: Quais são os benefícios das abordagens lexicais inspiradas na exploração de corpora ou em conceitos da Linguística de Corpus? Qual é o papel da informática na análise lexical? Que novas potencialidades apresentam as concordâncias na sala de aula?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El uso de (grandes) corpus textuales como base empírica para el análisis de fenómenos gramaticales ocupa un lugar central dentro de la lingüística contemporánea. La gramática histórica del español no es ninguna excepción, y desde principios del presente milenio los historiadores de la lengua disponen de dos grandes corpus diacrónicos ampliamente usados en el mundo entero, como son el CORDE de la Real Academia Española y el Corpus del español de Mark Davies (2002-). Al lado de los grandes corpus muestras de textos de menor extensión, pero con características relevantes para la investigación en cuestión, también se utilizan como base de análisis empíricos.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The exocarp, or skin, of fleshy fruit is a specialized tissue that protects the fruit, attracts seed dispersing fruit eaters, and has large economical relevance for fruit quality. Development of the exocarp involves regulated activities of many genes. This research analyzed global gene expression in the exocarp of developing sweet cherry (Prunus avium L., 'Regina'), a fruit crop species with little public genomic resources. A catalog of transcript models (contigs) representing expressed genes was constructed from de novo assembled short complementary DNA (cDNA) sequences generated from developing fruit between flowering and maturity at 14 time points. Expression levels in each sample were estimated for 34 695 contigs from numbers of reads mapping to each contig. Contigs were annotated functionally based on BLAST, gene ontology and InterProScan analyses. Coregulated genes were detected using partitional clustering of expression patterns. The results are discussed with emphasis on genes putatively involved in cuticle deposition, cell wall metabolism and sugar transport. The high temporal resolution of the expression patterns presented here reveals finely tuned developmental specialization of individual members of gene families. Moreover, the de novo assembled sweet cherry fruit transcriptome with 7760 full-length protein coding sequences and over 20 000 other, annotated cDNA sequences together with their developmental expression patterns is expected to accelerate molecular research on this important tree fruit crop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present 'AnnoTALE', a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities. © 2016, Nature Publishing Group. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Annotation Pro - a description of techniques, methods implemented in the tool, as well as the list of all built in functionalities and features of the user interface, and usage tips.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic video segmentation plays a vital role in sports videos annotation. This paper presents a fully automatic and computationally efficient algorithm for analysis of sports videos. Various methods of automatic shot boundary detection have been proposed to perform automatic video segmentation. These investigations mainly concentrate on detecting fades and dissolves for fast processing of the entire video scene without providing any additional feedback on object relativity within the shots. The goal of the proposed method is to identify regions that perform certain activities in a scene. The model uses some low-level feature video processing algorithms to extract the shot boundaries from a video scene and to identify dominant colours within these boundaries. An object classification method is used for clustering the seed distributions of the dominant colours to homogeneous regions. Using a simple tracking method a classification of these regions to active or static is performed. The efficiency of the proposed framework is demonstrated over a standard video benchmark with numerous types of sport events and the experimental results show that our algorithm can be used with high accuracy for automatic annotation of active regions for sport videos.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a semi-parametric Algorithm for parsing football video structures. The approach works on a two interleaved based process that closely collaborate towards a common goal. The core part of the proposed method focus perform a fast automatic football video annotation by looking at the enhance entropy variance within a series of shot frames. The entropy is extracted on the Hue parameter from the HSV color system, not as a global feature but in spatial domain to identify regions within a shot that will characterize a certain activity within the shot period. The second part of the algorithm works towards the identification of dominant color regions that could represent players and playfield for further activity recognition. Experimental Results shows that the proposed football video segmentation algorithm performs with high accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In evaluating Plutarch’s contacts with other cultures of his era, scholars have not reached consensus so far regarding the relationship between the Chaironean and Early Christian writers. A good example of this lack of consensus rises when we come to the views of the creation of human soul. The aim of the following paper is to deal with those contacts by, after an analysis of Plutarch’s texts, taking into an account the sources of NHC, heresiologists, and also the contemporary Corpus Hermeticum in order to highlight their similitudes and/or differences about the motif of the soul’s birth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

[EU]Testu bat koherente egiten duten arrazoiak ulertzea oso baliagarria da testuaren beraren ulermenerako, koherentzia eta koherentzia-erlazioak testu bat edo gehiago koherente diren ondorioztatzen laguntzen baitigu. Lan honetan gai bera duten testu ezberdinen arteko koherentziazko 3 Cross Document Structure Theory edo CST (Radev, 2000) erlazio aztertu eta sailkatu dira. Hori egin ahal izateko, euskaraz idatziriko gai berari buruzko testuak segmentatzeko eta beraien arteko erlazioak etiketatzeko gidalerroak proposatzen dira. 10 testuz osaturiko corpusa etiketatu da; horietako 3 cluster bi etiketatzailek aztertu dute. Etiketatzaileen arteko adostasunaren berri ematen dugu. Koherentzia-erlazioak garatzea oso garrantzitsua da Hizkuntzaren Prozesamenduko hainbat sistementzat, hala nola, informazioa erauzteko sistementzat, itzulpen automatikoarentzat, galde-erantzun sistementzat eta laburpen automatikoarentzat. Etorkizunean CSTko erlazio guztiak corpus esanguratsuan aztertuko balira, testuen arteko koherentzia- erlazioak euskarazko testuen prozesaketa automatikoa bideratzeko lehenengo pausua litzateke hemen egindakoa.