738 resultados para Annotation de génomes
Resumo:
Despite the presence of over 3 million transposons separated on average by similar to 500 bp, the human and mouse genomes each contain almost 1000 transposon-free regions (TFRs) over 10 kb in length. The majority of human TFRs correlate with orthologous TFRs in the mouse, despite the fact that most transposons are lineage specific. Many human TFRs also overlap with orthologous TFRs in the marsupial opossum, indicating that these regions have remained refractory to transposon insertion for long evolutionary periods. Over 90% of the bases covered by TFRs are noncoding, much of which is not highly conserved. Most TFRs are not associated with unusual nucleotide composition, but are significantly associated with genes encoding developmental regulators, suggesting that they represent extended regions of regulatory information that are largely unable to tolerate insertions, a conclusion difficult to reconcile with current conceptions of gene regulation.
Resumo:
Scorpion toxins are important physiological probes for characterizing ion channels. Molecular databases have limited functional annotation of scorpion toxins. Their function can be inferred by searching for conserved motifs in sequence signature databases that are derived statistically but are not necessarily biologically relevant. Mutation studies provide biological information on residues and positions important for structure-function relationship but are not normally used for extraction of binding motifs. 3D structure analyses also aid in the extraction of peptide motifs in which non-contiguous residues are clustered spatially. Here we present new, functionally relevant peptide motifs for ion channels, derived from the analyses of scorpion toxin native and mutant peptides. Copyright (c) 2006 European Peptide Society and John Wiley & Sons, Ltd.
Resumo:
The flood of new genomic sequence information together with technological innovations in protein structure determination have led to worldwide structural genomics (SG) initiatives. The goals of SG initiatives are to accelerate the process of protein structure determination, to fill in protein fold space and to provide information about the function of uncharacterized proteins. In the long-term, these outcomes are likely to impact on medical biotechnology and drug discovery, leading to a better understanding of disease as well as the development of new therapeutics. Here we describe the high throughput pipeline established at the University of Queensland in Australia. In this focused pipeline, the targets for structure determination are proteins that are expressed in mouse macrophage cells and that are inferred to have a role in innate immunity. The aim is to characterize the molecular structure and the biochemical and cellular function of these targets by using a parallel processing pipeline. The pipeline is designed to work with tens to hundreds of target gene products and comprises target selection, cloning, expression, purification, crystallization and structure determination. The structures from this pipeline will provide insights into the function of previously uncharacterized macrophage proteins and could lead to the validation of new drug targets for chronic obstructive pulmonary disease and arthritis. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Collaborative working with the aid of computers is increasing rapidly due to the widespread use of computer networks, geographic mobility of people, and small powerful personal computers. For the past ten years research has been conducted into this use of computing technology from a wide variety of perspectives and for a wide range of uses. This thesis adds to that previous work by examining the area of collaborative writing amongst groups of people. The research brings together a number of disciplines, namely sociology for examining group dynamics, psychology for understanding individual writing and learning processes, and computer science for database, networking, and programming theory. The project initially looks at groups and how they form, communicate, and work together, progressing on to look at writing and the cognitive processes it entails for both composition and retrieval. The thesis then details a set of issues which need to be addressed in a collaborative writing system. These issues are then followed by developing a model for collaborative writing, detailing an iterative process of co-ordination, writing and annotation, consolidation, and negotiation, based on a structured but extensible document model. Implementation issues for a collaborative application are then described, along with various methods of overcoming them. Finally the design and implementation of a collaborative writing system, named Collaborwriter, is described in detail, which concludes with some preliminary results from initial user trials and testing.
Resumo:
Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model's predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.
Resumo:
UK universities are accepting increasing numbers of students whose L1 is not English on a wide range of programmes at all levels. These students require additional support and training in English, focussing on their academic disciplines. Corpora have been used in EAP since the 1980s, mainly for research, but a growing number of researchers and practitioners have been advocating the use of corpora in EAP pedagogy, and such use is gradually increasing. This paper outlines the processes and factors to be considered in the design and compilation of an EAP corpus (e.g., the selection and acquisition of texts, metadata, data annotation, software tools and outputs, web interface, and screen displays), especially one intended to be used for teaching. Such a corpus would also facilitate EAP research in terms of longitudinal studies, student progression and development, and course and materials design. The paper has been informed by the preparatory work on the EAP subcorpus of the ACORN corpus project at Aston University. © 2007 Elsevier Ltd. All rights reserved.
Resumo:
The realization of the Semantic Web is constrained by a knowledge acquisition bottleneck, i.e. the problem of how to add RDF mark-up to the millions of ordinary web pages that already exist. Information Extraction (IE) has been proposed as a solution to the annotation bottleneck. In the task based evaluation reported here, we compared the performance of users without access to annotation, users working with annotations which had been produced from manually constructed knowledge bases, and users working with annotations augmented using IE. We looked at retrieval performance, overlap between retrieved items and the two sets of annotations, and usage of annotation options. Automatically generated annotations were found to add value to the browsing experience in the scenario investigated. Copyright 2005 ACM.
Resumo:
Dance videos are interesting and semantics-intensive. At the same time, they are the complex type of videos compared to all other types such as sports, news and movie videos. In fact, dance video is the one which is less explored by the researchers across the globe. Dance videos exhibit rich semantics such as macro features and micro features and can be classified into several types. Hence, the conceptual modeling of the expressive semantics of the dance videos is very crucial and complex. This paper presents a generic Dance Video Semantics Model (DVSM) in order to represent the semantics of the dance videos at different granularity levels, identified by the components of the accompanying song. This model incorporates both syntactic and semantic features of the videos and introduces a new entity type called, Agent, to specify the micro features of the dance videos. The instantiations of the model are expressed as graphs. The model is implemented as a tool using J2SE and JMF to annotate the macro and micro features of the dance videos. Finally examples and evaluation results are provided to depict the effectiveness of the proposed dance video model.
Resumo:
* This research is partially supported by a grant (bourse Lavoisier) from the French Ministry of Foreign Affairs (Ministère des Affaires Etrangères).
Resumo:
More and more researchers have realized that ontologies will play a critical role in the development of the Semantic Web, the next generation Web in which content is not only consumable by humans, but also by software agents. The development of tools to support ontology management including creation, visualization, annotation, database storage, and retrieval is thus extremely important. We have developed ImageSpace, an image ontology creation and annotation tool that features (1) full support for the standard web ontology language DAML+OIL; (2) image ontology creation, visualization, image annotation and display in one integrated framework; (3) ontology consistency assurance; and (4) storing ontologies and annotations in relational databases. It is expected that the availability of such a tool will greatly facilitate the creation of image repositories as islands of the Semantic Web.
Resumo:
* The following text has been originally published in the Proceedings of the Language Recourses and Evaluation Conference held in Lisbon, Portugal, 2004, under the title of "Towards Intelligent Written Cultural Heritage Processing - Lexical processing". I present here a revised contribution of the aforementioned paper and I add here the latest efforts done in the Center for Computational Linguistic in Prague in the field under discussion.
Resumo:
Content creation and presentation are key activities in a multimedia digital library (MDL). The proper design and intelligent implementation of these services provide a stable base for overall MDL functionality. This paper presents the framework and the implementation of these services in the latest version of the “Virtual Encyclopaedia of Bulgarian Iconography” multimedia digital library. For the semantic description of the iconographical objects a tree-based annotation template is implemented. It provides options for autocompletion, reuse of values, bilingual entering of data, automated media watermarking, resizing and conversing. The paper describes in detail the algorithm for automated appearance of dependent values for different characteristics of an iconographical object. An algorithm for avoiding duplicate image objects is also included. The service for automated appearance of new objects in a collection after their entering is included as an important part of the content presentation. The paper also presents the overall service-based architecture of the library, covering its main service panels, repositories and their relationships. The presented vision is based on a long-term observation of the users’ preferences, cognitive goals, and needs, aiming to find an optimal functionality solution for the end users.
Resumo:
In the recent years the East-Christian iconographical art works have been digitized providing a large volume of data. The need for effective classification, indexing and retrieval of iconography repositories was the motivation of the design and development of a systemized ontological structure for description of iconographical art objects. This paper presents the ontology of the East-Christian iconographical art, developed to provide content annotation in the Virtual encyclopedia of Bulgarian iconography multimedia digital library. The ontology’s main classes, relations, facts, rules, and problems appearing during the design and development are described. The paper also presents an application of the ontology for learning analysis on an iconography domain implemented during the SINUS project “Semantic Technologies for Web Services and Technology Enhanced Learning”.
Resumo:
In this paper we try to present how information technologies as tools for the creation of digital bilingual dictionaries can help the preservation of natural languages. Natural languages are an outstanding part of human cultural values and for that reason they should be preserved as part of the world cultural heritage. We describe our work on the bilingual lexical database supporting the Bulgarian-Polish Online dictionary. The main software tools for the web- presentation of the dictionary are shortly described. We focus our special attention on the presentation of verbs, the richest from a specific characteristics viewpoint linguistic category in Bulgarian.
Resumo:
This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.