997 resultados para Computational Lexical Semantics
Resumo:
High-throughput prioritization of cancer-causing mutations (drivers) is a key challenge of cancer genome projects, due to the number of somatic variants detected in tumors. One important step in this task is to assess the functional impact of tumor somatic mutations. A number of computational methods have been employed for that purpose, although most were originally developed to distinguish disease-related nonsynonymous single nucleotide variants (nsSNVs) from polymorphisms. Our new method, transformed Functional Impact score for Cancer (transFIC), improves the assessment of the functional impact of tumor nsSNVs by taking into account the baseline tolerance of genes to functional variants.
Resumo:
Finding an adequate paraphrase representation formalism is a challenging issue in Natural Language Processing. In this paper, we analyse the performance of Tree Edit Distance as a paraphrase representation baseline. Our experiments using Edit Distance Textual Entailment Suite show that, as Tree Edit Distance consists of a purely syntactic approach, paraphrase alternations not based on structural reorganizations do not find an adequate representation. They also show that there is much scope for better modelling of the way trees are aligned.
Resumo:
Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs.
Resumo:
Sickness absence (SA) is an important social, economic and public health issue. Identifying and understanding the determinants, whether biological, regulatory or, health services-related, of variability in SA duration is essential for better management of SA. The conditional frailty model (CFM) is useful when repeated SA events occur within the same individual, as it allows simultaneous analysis of event dependence and heterogeneity due to unknown, unmeasured, or unmeasurable factors. However, its use may encounter computational limitations when applied to very large data sets, as may frequently occur in the analysis of SA duration. To overcome the computational issue, we propose a Poisson-based conditional frailty model (CFPM) for repeated SA events that accounts for both event dependence and heterogeneity. To demonstrate the usefulness of the model proposed in the SA duration context, we used data from all non-work-related SA episodes that occurred in Catalonia (Spain) in 2007, initiated by either a diagnosis of neoplasm or mental and behavioral disorders. As expected, the CFPM results were very similar to those of the CFM for both diagnosis groups. The CPU time for the CFPM was substantially shorter than the CFM. The CFPM is an suitable alternative to the CFM in survival analysis with recurrent events,especially with large databases.
Differences in the evolutionary history of disease genes affected by dominant or recessive mutations
Resumo:
Background: Global analyses of human disease genes by computational methods have yielded important advances in the understanding of human diseases. Generally these studies have treated the group of disease genes uniformly, thus ignoring the type of disease-causing mutations (dominant or recessive). In this report we present a comprehensive study of the evolutionary history of autosomal disease genes separated by mode of inheritance.Results: We examine differences in protein and coding sequence conservation between dominant and recessive human disease genes. Our analysis shows that disease genes affected by dominant mutations are more conserved than those affected by recessive mutations. This could be a consequence of the fact that recessive mutations remain hidden from selection while heterozygous. Furthermore, we employ functional annotation analysis and investigations into disease severity to support this hypothesis. Conclusion: This study elucidates important differences between dominantly- and recessively-acting disease genes in terms of protein and DNA sequence conservation, paralogy and essentiality. We propose that the division of disease genes by mode of inheritance will enhance both understanding of the disease process and prediction of candidate disease genes in the future.
Resumo:
In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present a general, inclusive and computationally oriented typology of the linguistic mechanisms that give rise to form variations between paraphrase pairs.
Resumo:
Morphological transitions are analyzed for a radial multiparticle diffusion-limited aggregation process grown under a convective drift. The introduction of a tangential flow changes the morphology of the diffusion-limited structure, into multiarm structures, inclined opposite to the flow, whose limit consists of single arms, when decreasing density. The case of shear flow is also considered. The anisotropy of the patterns is characterized in terms of a tangential correlation function based analysis. Comparison between the simulation results and preliminary experimental results has been done.
Resumo:
In Livius 1,5,1 the reading Lupercal hoc ludicrum has been interpreted with the meaning of Lupercal game or festivity; however, this interpretation goes against the use in Latin of the singular form Lupercal, wich is used to refer to the cavern and not to the games, wich are always referred to with theplural form Lupercalia. Lexical reasons suggest that Lupercal should also be interpreted here in its usual local sense.
Resumo:
Drug safety issues pose serious health threats to the population and constitute a major cause of mortality worldwide. Due to the prominent implications to both public health and the pharmaceutical industry, it is of great importance to unravel the molecular mechanisms by which an adverse drug reaction can be potentially elicited. These mechanisms can be investigated by placing the pharmaco-epidemiologically detected adverse drug reaction in an information-rich context and by exploiting all currently available biomedical knowledge to substantiate it. We present a computational framework for the biological annotation of potential adverse drug reactions. First, the proposed framework investigates previous evidences on the drug-event association in the context of biomedical literature (signal filtering). Then, it seeks to provide a biological explanation (signal substantiation) by exploring mechanistic connections that might explain why a drug produces a specific adverse reaction. The mechanistic connections include the activity of the drug, related compounds and drug metabolites on protein targets, the association of protein targets to clinical events, and the annotation of proteins (both protein targets and proteins associated with clinical events) to biological pathways. Hence, the workflows for signal filtering and substantiation integrate modules for literature and database mining, in silico drug-target profiling, and analyses based on gene-disease networks and biological pathways. Application examples of these workflows carried out on selected cases of drug safety signals are discussed. The methodology and workflows presented offer a novel approach to explore the molecular mechanisms underlying adverse drug reactions
Resumo:
For computational studies of makam music, it is essential to gather a list of characteristics that constitute a makam and explore corresponding quantitative features for automaticanalysis. This study is such an attempt where we address the characteristics for makams as defined in theory books and deduce a list of quantitative features. The target here is to evoke discussions on some measurable features other than providing complete analysis on thediscriminative potentials of each proposed feature which could be the subject of a few larger studies.
Resumo:
This document describes some of the technological aspects of a project devoted to the creation of a factory for language resources. The project’s objectives are explained, as well as the idea to create a distributed infrastructure of web services. This document focuses on two main topics of the factory: (1) the technological approaches chosen to develop the factory, i.e. software, protocols, servers, etc. (2) and Interoperability as the main challenge is to permit different NLP tools work together in the factory. This document explains why XCES and GrAF are chosen as the main formats used for the linguistic data exchange.
Resumo:
This paper demonstrates a novel distributed architecture to facilitate the acquisition of Language Resources. We build a factory that automates the stages involved in the acquisition, production, updating and maintenance of these resources. The factory is designed as a platform where functionalities are deployed as web services, which can be combined in complex acquisition chains using workflows. We show a case study, which acquires a Translation Memory for a given pair of languages and a domain using web services for crawling, sentence alignment and conversion to TMX.
Resumo:
Next-generation sequencing techniques such as exome sequencing can successfully detect all genetic variants in a human exome and it has been useful together with the implementation of variant filters to identify causing-disease mutations. Two filters aremainly used for the mutations identification: low allele frequency and the computational annotation of the genetic variant. Bioinformatic tools to predict the effect of a givenvariant may have errors due to the existing bias in databases and sometimes show a limited coincidence among them. Advances in functional and comparative genomics are needed in order to properly annotate these variants.The goal of this study is to: first, functionally annotate Common Variable Immunodeficiency disease (CVID) variants with the available bioinformatic methods in order to assess the reliability of these strategies. Sencondly, as the development of new methods to reduce the number of candidate genetic variants is an active and necessary field of research, we are exploring the utility of gene function information at organism level as a filter for rare disease genes identification. Recently, it has been proposed that only 10-15% of human genes are essential and therefore we would expect that severe rare diseases are mostly caused by mutations on them. Our goal is to determine whether or not these rare and severe diseases are caused by deleterious mutations in these essential genes. If this hypothesis were true, taking into account essential genes as a filter would be an interesting parameter to identify causingdisease mutations.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
En este trabajo se estudia la relación entre la morfología y la lexicografía mediante el análisis de seis verbos prefijados con re-. Se comparan sus definiciones en tres diccionarios y se proponen nuevas definiciones siguiendo el modelo de entrada lexicográfica del Diccionario de Aprendizaje de Español como Lengua Extranjera.