738 resultados para Annotation de génomes
Resumo:
This paper shows an empirical study about the anaphoric accessibility space in Spanish dialogues. According to this study, antecedents of pronominal and adjectival anaphors can almost always (95.9%) be found in the noun phrases set taken from spaces defined using a structure based on adjacency pairs. Furthermore, a proposal of a reliable annotation scheme for Spanish dialogues is presented in order to define this anaphoric accessibility space. Using this annotation scheme, anaphora resolution algorithms can locate the adequate set of anaphor antecedent candidates.
Resumo:
Preliminary research demonstrated the EmotiBlog annotated corpus relevance as a Machine Learning resource to detect subjective data. In this paper we compare EmotiBlog with the JRC Quotes corpus in order to check the robustness of its annotation. We concentrate on its coarse-grained labels and carry out a deep Machine Learning experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC Quotes corpus demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
The development of the Web 2.0 led to the birth of new textual genres such as blogs, reviews or forum entries. The increasing number of such texts and the highly diverse topics they discuss make blogs a rich source for analysis. This paper presents a comparative study on open domain and opinion QA systems. A collection of opinion and mixed fact-opinion questions in English is defined and two Question Answering systems are employed to retrieve the answers to these queries. The first one is generic, while the second is specific for emotions. We comparatively evaluate and analyze the systems’ results, concluding that opinion Question Answering requires the use of specific resources and methods.
Resumo:
The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. They require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog – a fine-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the benefits it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres –a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.
Resumo:
This paper presents the first version of EmotiBlog, an annotation scheme for emotions in non-traditional textual genres such as blogs or forums. We collected a corpus composed by blog posts in three languages: English, Spanish and Italian and about three topics of interest. Subsequently, we annotated our collection and carried out the inter-annotator agreement and a ten-fold cross-validation evaluation, obtaining promising results. The main aim of this research is to provide a finer-grained annotation scheme and annotated data that are essential to perform evaluation focused on checking the quality of the created resources.
Resumo:
This paper presents a preliminary study in which Machine Learning experiments applied to Opinion Mining in blogs have been carried out. We created and annotated a blog corpus in Spanish using EmotiBlog. We evaluated the utility of the features labelled firstly carrying out experiments with combinations of them and secondly using the feature selection techniques, we also deal with several problems, such as the noisy character of the input texts, the small size of the training set, the granularity of the annotation scheme and the language object of our study, Spanish, with less resource than English. We obtained promising results considering that it is a preliminary study.
Resumo:
EmotiBlog is a corpus labelled with the homonymous annotation schema designed for detecting subjectivity in the new textual genres. Preliminary research demonstrated its relevance as a Machine Learning resource to detect opinionated data. In this paper we compare EmotiBlog with the JRC corpus in order to check the EmotiBlog robustness of annotation. For this research we concentrate on its coarse-grained labels. We carry out a deep ML experimentation also with the inclusion of lexical resources. The results obtained show a similarity with the ones obtained with the JRC demonstrating the EmotiBlog validity as a resource for the SA task.
Resumo:
In this paper a multilingual method for event ordering based on temporal expression resolution is presented. This method has been implemented through the TERSEO system which consists of three main units: temporal expression recognizing, resolution of the coreference introduced by these expressions, and event ordering. By means of this system, chronological information related to events can be extracted from documental databases. This information is automatically added to the documental database in order to allow its use by question answering systems in those cases referring to temporality. The system has been evaluated obtaining results of 91 % precision and 71 % recall. For this, a blind evaluation process has been developed guaranteing a reliable annotation process that was measured through the kappa factor.
Resumo:
The McCabe-Thiele method is a classical approximate graphical method for the conceptual design of binary distillation columns which is still widely used, mainly for didactical purposes, though it is also valuable for quick preliminary calculations. Nevertheless, no complete description of the method has been found and situations such as different thermal feed conditions, multiple feeds, possibilities to extract by-products or to add or remove heat, are not always considered. In the present work we provide a systematic analysis of such situations by developing the generalized equations for: a) the operating lines (OL) of each sector, and b) the changeover line that provides the connection between two consecutive trays of the corresponding sectors separated by a lateral stream of feed, product, or a heat removal or addition.
Resumo:
IARG-AnCora tiene como objetivo la anotación con papeles temáticos de los argumentos implícitos de las nominalizaciones deverbales en el corpus AnCora. Estos corpus servirán de base para los sistemas de etiquetado automático de roles semánticos basados en técnicas de aprendizaje automático. Los analizadores semánticos son componentes básicos en las aplicaciones actuales de las tecnologías del lenguaje, en las que se quiere potenciar una comprensión más profunda del texto para realizar inferencias de más alto nivel y obtener así mejoras cualitativas en los resultados.
Resumo:
One of the main concerns is the nature of the missing values. Let’s consider extremes for simplicity. If missing at random we have not to care about. But if missing shows structures that covariate with substantive variables we have to make decisions. There are, in fact, several options to take. We are speaking about one country, one mode. But if you go cross-cultural (or more precisely, cross-state nations) and mixed modes many questions raise. For example, the simple one. What are we comparing? Reports and books usually go straight into variables distributions and coefficient comparisons. This is possible because the annalist presume "tabula rasa" effect from data collections procedures. But this is not, frequently, the real situation. This paper will expose the mixed missing mode imprint in international surveys. This will help to evaluate how deal with this problem. Also, to consider the real meaning of observed cross-national differences.
Resumo:
Pochonia chlamydosporia is a worldwide-distributed soil fungus with a great capacity to infect and destroy the eggs and kill females of plant-parasitic nematodes. Additionally, it has the ability to colonize endophytically roots of economically-important crop plants, thereby promoting their growth and eliciting plant defenses. This multitrophic behavior makes P. chlamydosporia a potentially useful tool for sustainable agriculture approaches. We sequenced and assembled ∼41 Mb of P. chlamydosporia genomic DNA and predicted 12,122 gene models, of which many were homologous to genes of fungal pathogens of invertebrates and fungal plant pathogens. Predicted genes (65%) were functionally annotated according to Gene Ontology, and 16% of them found to share homology with genes in the Pathogen Host Interactions (PHI) database. The genome of this fungus is highly enriched in genes encoding hydrolytic enzymes, such as proteases, glycoside hydrolases and carbohydrate esterases. We used RNA-Seq technology in order to identify the genes expressed during endophytic behavior of P. chlamydosporia when colonizing barley roots. Functional annotation of these genes showed that hydrolytic enzymes and transporters are expressed during endophytism. This structural and functional analysis of the P. chlamydosporia genome provides a starting point for understanding the molecular mechanisms involved in the multitrophic lifestyle of this fungus. The genomic information provided here should also prove useful for enhancing the capabilities of this fungus as a biocontrol agent of plant-parasitic nematodes and as a plant growth-promoting organism.