9 resultados para Computational biology and bioinformatics

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to “reproducibility maps” that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one’s own laboratory.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although several profiling techniques for identifying performance bottlenecks in logic programs have been developed, they are generally not automatic and in most cases they do not provide enough information for identifying the root causes of such bottlenecks. This complicates using their results for guiding performance improvement. We present a profiling method and tool that provides such explanations. Our profiler associates cost centers to certain program elements and can measure different types of resource-related properties that affect performance, preserving the precedence of cost centers in the cali graph. It includes an automatic method for detecting procedures that are performance bottlenecks. The profiling tool has been integrated in a previously developed run-time checking framework to allow verification of certain properties when they cannot be verified statically. The approach allows checking global computational properties which require complex instrumentation tracking information about previous execution states, such as, e.g., that the execution time accumulated by a given procedure is not greater than a given bound. We have built a prototype implementation, integrated it in the Ciao/CiaoPP system and successfully applied it to performance improvement, automatic optimization (e.g., resource-aware specialization of programs), run-time checking, and debugging of global computational properties (e.g., resource usage) in Prolog programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although several profiling techniques for identifying performance bottlenecks in logic programs have been developed, they are generally not automatic and in most cases they do not provide enough information for identifying the root causes of such bottlenecks. This complicates using their results for guiding performance improvement. We present a profiling method and tool that provides such explanations. Our profiler associates cost centers to certain program elements and can measure different types of resource-related properties that affect performance, preserving the precedence of cost centers in the call graph. It includes an automatic method for detecting procedures that are performance bottlenecks. The profiling tool has been integrated in a previously developed run-time checking framework to allow verification of certain properties when they cannot be verified statically. The approach allows checking global computational properties which require complex instrumentation tracking information about previous execution states, such as, e.g., that the execution time accumulated by a given procedure is not greater than a given bound. We have built a prototype implementation, integrated it in the Ciao/CiaoPP system and successfully applied it to performance improvement, automatic optimization (e.g., resource-aware specialization of programs), run-time checking, and debugging of global computational properties (e.g., resource usage) in Prolog programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Astragalus gines-lopezii Talavera, Podlech, Devesa & F.M.Vázquez (Fabaceae) is a threatened endemic species with a distribution restricted to a very small area in Badajoz Province (Extremadura Region, SW Spain) and only 2 populations are known. This species was catalogued in the ?Endangered? category in the 2008 Red List and the 2010 Threatened Spanish Vascular Flora List. Despite its status as an endangered species, at present very little is known about the distribution, census, and reproductive biology of this species. In this study we have carried out an exhaustive census of A. gines-lopezii , and we have evaluated the production of flowers, fruits, and seeds and the existence or not of intra- and interpopulation variability in seed germination. Results have highlighted the high reproductive capacity of this species on the basis of a high production of flowers, fruits, and seeds. Mechanical scarification of seeds was effective for increasing germination. Thus, initial germination (22%?60%) was increased to 97%?99% when seeds were rubbed with sandpapers. A high intra- and interpopulation variability in seed germination was found in this species. A. gines-lopezii produces seeds with different degrees of physical dormancy, varying this grade among different individuals within a population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Astragalus gines-lopezii Talavera, Podlech, Devesa & F.M.Vazquez (Fabaceae) is a threatened endemic species with a distribution restricted to a very small area in Badajoz Province (Extremadura Region, SW Spain) and only 2 populations are known.This species was catalogued in the "Endangered" category in the 2008 Red List and the 2010 Threatened Spanish Vascular Flora List. Despite its status as an endangered species, at present very little is known about the distribution, census, and reproductive biology of this species. In this study we have carried out anexhaustive census of A. gines-lopezii, and we have evaluated the production of flowers, fruits, and seeds and the existence or not of intra- and interpopulation variability in seed germination. Results have highlighted the high reproductive capacity of this species on the basis of a high production of flowers, fruits, and seeds. Mechanical scarification of seeds was effective for increasing germination. Thus, initial germination (22%-60%) was increased to 97%-99% when seeds were rubbed with sandpapers. A high intra- and interpopulation variability in seed germination was found in this species. A. gines-lopezii produces seeds with different degrees of physical dormancy, varying this grade among different individuals within a population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purely data-driven approaches for machine learning present difficulties when data are scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data-driven modeling with a physical model of the system. We show how different, physically inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology, and geostatistics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents some brief considerations on the role of Computational Logic in the construction of Artificial Intelligence systems and in programming in general. It does not address how the many problems in AI can be solved but, rather more modestly, tries to point out some advantages of Computational Logic as a tool for the AI scientist in his quest. It addresses the interaction between declarative and procedural views of programs (deduction and action), the impact of the intrinsic limitations of logic, the relationship with other apparently competing computational paradigms, and finally discusses implementation-related issues, such as the efficiency of current implementations and their capability for efficiently exploiting existing and future sequential and parallel hardware. The purpose of the discussion is in no way to present Computational Logic as the unique overall vehicle for the development of intelligent systems (in the firm belief that such a panacea is yet to be found) but rather to stress its strengths in providing reasonable solutions to several aspects of the task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A gene expression atlas is an essential resource to quantify and understand the multiscale processes of embryogenesis in time and space. The automated reconstruction of a prototypic 4D atlas for vertebrate early embryos, using multicolor fluorescence in situ hybridization with nuclear counterstain, requires dedicated computational strategies. To this goal, we designed an original methodological framework implemented in a software tool called Match-IT. With only minimal human supervision, our system is able to gather gene expression patterns observed in different analyzed embryos with phenotypic variability and map them onto a series of common 3D templates over time, creating a 4D atlas. This framework was used to construct an atlas composed of 6 gene expression templates from a cohort of zebrafish early embryos spanning 6 developmental stages from 4 to 6.3 hpf (hours post fertilization). They included 53 specimens, 181,415 detected cell nuclei and the segmentation of 98 gene expression patterns observed in 3D for 9 different genes. In addition, an interactive visualization software, Atlas-IT, was developed to inspect, supervise and analyze the atlas. Match-IT and Atlas-IT, including user manuals, representative datasets and video tutorials, are publicly and freely available online. We also propose computational methods and tools for the quantitative assessment of the gene expression templates at the cellular scale, with the identification, visualization and analysis of coexpression patterns, synexpression groups and their dynamics through developmental stages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los hipergrafos dirigidos se han empleado en problemas relacionados con lógica proposicional, bases de datos relacionales, linguística computacional y aprendizaje automático. Los hipergrafos dirigidos han sido también utilizados como alternativa a los grafos (bipartitos) dirigidos para facilitar el estudio de las interacciones entre componentes de sistemas complejos que no pueden ser fácilmente modelados usando exclusivamente relaciones binarias. En este contexto, este tipo de representación es conocida como hiper-redes. Un hipergrafo dirigido es una generalización de un grafo dirigido especialmente adecuado para la representación de relaciones de muchos a muchos. Mientras que una arista en un grafo dirigido define una relación entre dos de sus nodos, una hiperarista en un hipergrafo dirigido define una relación entre dos conjuntos de sus nodos. La conexión fuerte es una relación de equivalencia que divide el conjunto de nodos de un hipergrafo dirigido en particiones y cada partición define una clase de equivalencia conocida como componente fuertemente conexo. El estudio de los componentes fuertemente conexos de un hipergrafo dirigido puede ayudar a conseguir una mejor comprensión de la estructura de este tipo de hipergrafos cuando su tamaño es considerable. En el caso de grafo dirigidos, existen algoritmos muy eficientes para el cálculo de los componentes fuertemente conexos en grafos de gran tamaño. Gracias a estos algoritmos, se ha podido averiguar que la estructura de la WWW tiene forma de “pajarita”, donde más del 70% del los nodos están distribuidos en tres grandes conjuntos y uno de ellos es un componente fuertemente conexo. Este tipo de estructura ha sido también observada en redes complejas en otras áreas como la biología. Estudios de naturaleza similar no han podido ser realizados en hipergrafos dirigidos porque no existe algoritmos capaces de calcular los componentes fuertemente conexos de este tipo de hipergrafos. En esta tesis doctoral, hemos investigado como calcular los componentes fuertemente conexos de un hipergrafo dirigido. En concreto, hemos desarrollado dos algoritmos para este problema y hemos determinado que son correctos y cuál es su complejidad computacional. Ambos algoritmos han sido evaluados empíricamente para comparar sus tiempos de ejecución. Para la evaluación, hemos producido una selección de hipergrafos dirigidos generados de forma aleatoria inspirados en modelos muy conocidos de grafos aleatorios como Erdos-Renyi, Newman-Watts-Strogatz and Barabasi-Albert. Varias optimizaciones para ambos algoritmos han sido implementadas y analizadas en la tesis. En concreto, colapsar los componentes fuertemente conexos del grafo dirigido que se puede construir eliminando ciertas hiperaristas complejas del hipergrafo dirigido original, mejora notablemente los tiempos de ejecucion de los algoritmos para varios de los hipergrafos utilizados en la evaluación. Aparte de los ejemplos de aplicación mencionados anteriormente, los hipergrafos dirigidos han sido también empleados en el área de representación de conocimiento. En concreto, este tipo de hipergrafos se han usado para el cálculo de módulos de ontologías. Una ontología puede ser definida como un conjunto de axiomas que especifican formalmente un conjunto de símbolos y sus relaciones, mientras que un modulo puede ser entendido como un subconjunto de axiomas de la ontología que recoge todo el conocimiento que almacena la ontología sobre un conjunto especifico de símbolos y sus relaciones. En la tesis nos hemos centrado solamente en módulos que han sido calculados usando la técnica de localidad sintáctica. Debido a que las ontologías pueden ser muy grandes, el cálculo de módulos puede facilitar las tareas de re-utilización y mantenimiento de dichas ontologías. Sin embargo, analizar todos los posibles módulos de una ontología es, en general, muy costoso porque el numero de módulos crece de forma exponencial con respecto al número de símbolos y de axiomas de la ontología. Afortunadamente, los axiomas de una ontología pueden ser divididos en particiones conocidas como átomos. Cada átomo representa un conjunto máximo de axiomas que siempre aparecen juntos en un modulo. La decomposición atómica de una ontología es definida como un grafo dirigido de tal forma que cada nodo del grafo corresponde con un átomo y cada arista define una dependencia entre una pareja de átomos. En esta tesis introducimos el concepto de“axiom dependency hypergraph” que generaliza el concepto de descomposición atómica de una ontología. Un modulo en una ontología correspondería con un componente conexo en este tipo de hipergrafos y un átomo de una ontología con un componente fuertemente conexo. Hemos adaptado la implementación de nuestros algoritmos para que funcionen también con axiom dependency hypergraphs y poder de esa forma calcular los átomos de una ontología. Para demostrar la viabilidad de esta idea, hemos incorporado nuestros algoritmos en una aplicación que hemos desarrollado para la extracción de módulos y la descomposición atómica de ontologías. A la aplicación la hemos llamado HyS y hemos estudiado sus tiempos de ejecución usando una selección de ontologías muy conocidas del área biomédica, la mayoría disponibles en el portal de Internet NCBO. Los resultados de la evaluación muestran que los tiempos de ejecución de HyS son mucho mejores que las aplicaciones más rápidas conocidas. ABSTRACT Directed hypergraphs are an intuitive modelling formalism that have been used in problems related to propositional logic, relational databases, computational linguistic and machine learning. Directed hypergraphs are also presented as an alternative to directed (bipartite) graphs to facilitate the study of the interactions between components of complex systems that cannot naturally be modelled as binary relations. In this context, they are known as hyper-networks. A directed hypergraph is a generalization of a directed graph suitable for representing many-to-many relationships. While an edge in a directed graph defines a relation between two nodes of the graph, a hyperedge in a directed hypergraph defines a relation between two sets of nodes. Strong-connectivity is an equivalence relation that induces a partition of the set of nodes of a directed hypergraph into strongly-connected components. These components can be collapsed into single nodes. As result, the size of the original hypergraph can significantly be reduced if the strongly-connected components have many nodes. This approach might contribute to better understand how the nodes of a hypergraph are connected, in particular when the hypergraphs are large. In the case of directed graphs, there are efficient algorithms that can be used to compute the strongly-connected components of large graphs. For instance, it has been shown that the macroscopic structure of the World Wide Web can be represented as a “bow-tie” diagram where more than 70% of the nodes are distributed into three large sets and one of these sets is a large strongly-connected component. This particular structure has been also observed in complex networks in other fields such as, e.g., biology. Similar studies cannot be conducted in a directed hypergraph because there does not exist any algorithm for computing the strongly-connected components of the hypergraph. In this thesis, we investigate ways to compute the strongly-connected components of directed hypergraphs. We present two new algorithms and we show their correctness and computational complexity. One of these algorithms is inspired by Tarjan’s algorithm for directed graphs. The second algorithm follows a simple approach to compute the stronglyconnected components. This approach is based on the fact that two nodes of a graph that are strongly-connected can also reach the same nodes. In other words, the connected component of each node is the same. Both algorithms are empirically evaluated to compare their performances. To this end, we have produced a selection of random directed hypergraphs inspired by existent and well-known random graphs models like Erd˝os-Renyi and Newman-Watts-Strogatz. Besides the application examples that we mentioned earlier, directed hypergraphs have also been employed in the field of knowledge representation. In particular, they have been used to compute the modules of an ontology. An ontology is defined as a collection of axioms that provides a formal specification of a set of terms and their relationships; and a module is a subset of an ontology that completely captures the meaning of certain terms as defined in the ontology. In particular, we focus on the modules computed using the notion of syntactic locality. As ontologies can be very large, the computation of modules facilitates the reuse and maintenance of these ontologies. Analysing all modules of an ontology, however, is in general not feasible as the number of modules grows exponentially in the number of terms and axioms of the ontology. Nevertheless, the modules can succinctly be represented using the Atomic Decomposition of an ontology. Using this representation, an ontology can be partitioned into atoms, which are maximal sets of axioms that co-occur in every module. The Atomic Decomposition is then defined as a directed graph such that each node correspond to an atom and each edge represents a dependency relation between two atoms. In this thesis, we introduce the notion of an axiom dependency hypergraph which is a generalization of the atomic decomposition of an ontology. A module in the ontology corresponds to a connected component in the hypergraph, and the atoms of the ontology to the strongly-connected components. We apply our algorithms for directed hypergraphs to axiom dependency hypergraphs and in this manner, we compute the atoms of an ontology. To demonstrate the viability of this approach, we have implemented the algorithms in the application HyS which computes the modules of ontologies and calculate their atomic decomposition. In the thesis, we provide an experimental evaluation of HyS with a selection of large and prominent biomedical ontologies, most of which are available in the NCBO Bioportal. HyS outperforms state-of-the-art implementations in the tasks of extracting modules and computing the atomic decomposition of these ontologies.