831 resultados para Semantic Annotation
Resumo:
Although blogs exist from the beginning of the Internet, their use has considerablybeen increased in the last decade. Nowadays, they are ready for being used bya broad range of people. From teenagers to multinationals, everyone can have aglobal communication space.Companies know blogs are a valuable publicity tool to share information withthe participants, and the importance of creating consumer communities aroundthem: participants come together to exchange ideas, review and recommend newproducts, and even support each other. Also, companies can use blogs for differentpurposes, such as a content management system to manage the content of websites,a bulletin board to support communication and document sharing in teams,an instrument in marketing to communicate with Internet users, or a KnowledgeManagement Tool. However, an increasing number of blog content do not findtheir source in the personal experiences of the writer. Thus, the information cancurrently be kept in the user¿s desktop documents, in the companies¿ catalogues,or in another blogs. Although the gap between blog and data source can be manuallytraversed in a manual coding, this is a cumbersome task that defeats the blog¿seasiness principle. Moreover, depending on the quantity of information and itscharacterisation (i.e., structured content, unstructured content, etc.), an automaticapproach can be more effective.Based on these observations, the aim of this dissertation is to assist blog publicationthrough annotation, model transformation and crossblogging techniques.These techniques have been implemented to give rise to Blogouse, Catablog, andBlogUnion. These tools strive to improve the publication process considering theaforementioned data sources.
Resumo:
For sign languages used by deaf communities, linguistic corpora have until recently been unavailable, due to the lack of a writing system and a written culture in these communities, and the very recent advent of digital video. Recent improvements in video and computer technology have now made larger sign language datasets possible; however, large sign language datasets that are fully machine-readable are still elusive. This is due to two challenges. 1. Inconsistencies that arise when signs are annotated by means of spoken/written language. 2. The fact that many parts of signed interaction are not necessarily fully composed of lexical signs (equivalent of words), instead consisting of constructions that are less conventionalised. As sign language corpus building progresses, the potential for some standards in annotation is beginning to emerge. But before this project, there were no attempts to standardise these practices across corpora, which is required to be able to compare data crosslinguistically. This project thus had the following aims: 1. To develop annotation standards for glosses (lexical/word level) 2. To test their reliability and validity 3. To improve current software tools that facilitate a reliable workflow Overall the project aimed not only to set a standard for the whole field of sign language studies throughout the world but also to make significant advances toward two of the world’s largest machine-readable datasets for sign languages – specifically the BSL Corpus (British Sign Language, http://bslcorpusproject.org) and the Corpus NGT (Sign Language of the Netherlands, http://www.ru.nl/corpusngt).
Resumo:
[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself.
Resumo:
[ES] En este trabajo se define el cambio semántico, se analizan las causas de que se produzca y se especifican sus tipos en el griego antiguo.
Resumo:
Albacore and Atlantic Bluefin tuna are two pelagic fish. Atlantic Bluefin tuna is included in the IUCN red list of threatened species and albacore is considered to be near threatened, so conservation plans are needed. However, no genomic resources are available for any of them. In this study, to better understand their transcriptome we functionally annotated orthologous genes. In all, 159 SNPs distributed in 120 contigs of the muscle transcriptome were analyzed. Genes were predicted for 98 contigs (81.2%) using the bioinformatics tool BLAST. In addition, another bioinformatics tool, BLAST2GO was used in order to achieve GO terms for the genes, in which 41 sequences were given a biological process, and 39 sequences were given a molecular process. The most repeated biological process was metabolism and it is important that no cellular process was given in any of the sequences. The most abundant molecular process was binding and very few catalytic activity processes were given. From the initial 159 SNPs, 40 were aligned with a sequence in the database after BLAST2GO was run, and were polymorphic in Atlantic Bluefin tuna and monomorphic in albacore. From these 40 SNPs, 24 were located in an open reading frame of which four were non-synonymous and 20 were synonymous and 16 were not located in a known open reading frame,. This study provides information for better understanding the ecology and evolution of these species and this is important in order to establish a proper conservation plan and an appropriate management.
Resumo:
Background: Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. Results: We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. Conclusions: We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.
Resumo:
0890-0604