926 resultados para Genomic data integration


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Natural and man-made disasters have gained attention at all levels of policy-making in recent years. Emergency management tasks are inherently complex and unpredictable, and often require coordination among multiple organizations across different levels and locations. Effectively managing various knowledge areas and the organizations involved has become a critical emergency management success factor. However, there is a general lack of understanding about how to describe and assess the complex nature of emergency management tasks and how knowledge integration can help managers improve emergency management task performance. ^ The purpose of this exploratory research was first, to understand how emergency management operations are impacted by tasks that are complex and inter-organizational and second, to investigate how knowledge integration as a particular knowledge management strategy can improve the efficiency and effectiveness of the emergency tasks. Three types of specific knowledge were considered: context-specific, technology-specific, and context-and-technology-specific. ^ The research setting was the Miami-Dade Emergency Operations Center (EOC) and the study was based on the survey responses from the participants in past EOC activations related to their emergency tasks and knowledge areas. The data included task attributes related to complexity, knowledge area, knowledge integration, specificity of knowledge, and task performance. The data was analyzed using multiple linear regressions and path analyses, to (1) examine the relationships between task complexity, knowledge integration, and performance, (2) the moderating effects of each type of specific knowledge on the relationship between task complexity and performance, and (3) the mediating role of knowledge integration. ^ As per theory-based propositions, the results indicated that overall component complexity and interactive complexity tend to have a negative effect on task performance. But surprisingly, procedural rigidity tended to have a positive effect on performance in emergency management tasks. Also as per our expectation, knowledge integration had a positive relationship with task performance. Interestingly, the moderating effects of each type of specific knowledge on the relationship between task complexity and performance were varied and the extent of mediation of knowledge integration depended on the dimension of task complexity. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The No Child Left Behind Act of 2001 (NCLB) brought many significant changes to American schools including accessibility to technology. Through an extensive literature review of the relationship between technology leadership and student achievement, five major themes emerged from data that support the need for more effective computer-based education in schools.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation examines the relationship between the degree of openness to international trade and the level of growth experienced by a country. More precisely, it explores how trade liberalization and economic integration can lead to specialization in production, affect national levels of welfare, productivity, and competition and at the same time reinforce deflationary efforts. A large part of this investigation is carried out using industry-level data from Spain. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bio-systems are inherently complex information processing systems. Furthermore, physiological complexities of biological systems limit the formation of a hypothesis in terms of behavior and the ability to test hypothesis. More importantly the identification and classification of mutation in patients are centric topics in today's cancer research. Next generation sequencing (NGS) technologies can provide genome-wide coverage at a single nucleotide resolution and at reasonable speed and cost. The unprecedented molecular characterization provided by NGS offers the potential for an individualized approach to treatment. These advances in cancer genomics have enabled scientists to interrogate cancer-specific genomic variants and compare them with the normal variants in the same patient. Analysis of this data provides a catalog of somatic variants, present in tumor genome but not in the normal tissue DNA. In this dissertation, we present a new computational framework to the problem of predicting the number of mutations on a chromosome for a certain patient, which is a fundamental problem in clinical and research fields. We begin this dissertation with the development of a framework system that is capable of utilizing published data from a longitudinal study of patients with acute myeloid leukemia (AML), who's DNA from both normal as well as malignant tissues was subjected to NGS analysis at various points in time. By processing the sequencing data at the time of cancer diagnosis using the components of our framework, we tested it by predicting the genomic regions to be mutated at the time of relapse and, later, by comparing our results with the actual regions that showed mutations (discovered at relapse time). We demonstrate that this coupling of the algorithm pipeline can drastically improve the predictive abilities of searching a reliable molecular signature. Arguably, the most important result of our research is its superior performance to other methods like Radial Basis Function Network, Sequential Minimal Optimization, and Gaussian Process. In the final part of this dissertation, we present a detailed significance, stability and statistical analysis of our model. A performance comparison of the results are presented. This work clearly lays a good foundation for future research for other types of cancer.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The metabolic rate of organisms may either be viewed as a basic property from which other vital rates and many ecological patterns emerge and that follows a universal allometric mass scaling law; or it may be considered a property of the organism that emerges as a result of the organism's adaptation to the environment, with consequently less universal mass scaling properties. Data on body mass, maximum ingestion and clearance rates, respiration rates and maximum growth rates of animals living in the ocean epipelagic were compiled from the literature, mainly from original papers but also from previous compilations by other authors. Data were read from tables or digitized from graphs. Only measurements made on individuals of know size, or groups of individuals of similar and known size were included. We show that clearance and respiration rates have life-form-dependent allometries that have similar scaling but different elevations, such that the mass-specific rates converge on a rather narrow size-independent range. In contrast, ingestion and growth rates follow a near-universal taxa-independent ~3/4 mass scaling power law. We argue that the declining mass-specific clearance rates with size within taxa is related to the inherent decrease in feeding efficiency of any particular feeding mode. The transitions between feeding mode and simultaneous transitions in clearance and respiration rates may then represent adaptations to the food environment and be the result of the optimization of tradeoffs that allow sufficient feeding and growth rates to balance mortality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The metabolic rate of organisms may either be viewed as a basic property from which other vital rates and many ecological patterns emerge and that follows a universal allometric mass scaling law; or it may be considered a property of the organism that emerges as a result of the organism's adaptation to the environment, with consequently less universal mass scaling properties. Data on body mass, maximum ingestion and clearance rates, respiration rates and maximum growth rates of animals living in the ocean epipelagic were compiled from the literature, mainly from original papers but also from previous compilations by other authors. Data were read from tables or digitized from graphs. Only measurements made on individuals of know size, or groups of individuals of similar and known size were included. We show that clearance and respiration rates have life-form-dependent allometries that have similar scaling but different elevations, such that the mass-specific rates converge on a rather narrow size-independent range. In contrast, ingestion and growth rates follow a near-universal taxa-independent ~3/4 mass scaling power law. We argue that the declining mass-specific clearance rates with size within taxa is related to the inherent decrease in feeding efficiency of any particular feeding mode. The transitions between feeding mode and simultaneous transitions in clearance and respiration rates may then represent adaptations to the food environment and be the result of the optimization of tradeoffs that allow sufficient feeding and growth rates to balance mortality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present data compilation includes dinoflagellates growth rate, grazing rate and gross growth efficiency determined either in the field or in laboratory experiments. From the existing literature, we synthesized all data that we could find on dinoflagellates. Some sources might be missing but none were purposefully ignored. We did not include autotrophic dinoflagellates in the database, but mixotrophic organisms may have been included. This is due to the large uncertainty about which taxa are mixotrophic, heterotrophic or symbiont bearing. Field data on microzooplankton grazing are mostly comprised of grazing rate using the dilution technique with a 24h incubation period. Laboratory grazing and growth data are focused on pelagic ciliates and heterotrophic dinoflagellates. The experiment measured grazing or growth as a function of prey concentration or at saturating prey concentration (maximal grazing rate). When considering every single data point available (each measured rate for a defined predator-prey pair and a certain prey concentration) there is a total of 801 data points for the dinoflagellates, counting experiments that measured growth and grazing simultaneously as 1 data point.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The metabolic rate of organisms may either be viewed as a basic property from which other vital rates and many ecological patterns emerge and that follows a universal allometric mass scaling law; or it may be considered a property of the organism that emerges as a result of the organism's adaptation to the environment, with consequently less universal mass scaling properties. Data on body mass, maximum ingestion and clearance rates, respiration rates and maximum growth rates of animals living in the ocean epipelagic were compiled from the literature, mainly from original papers but also from previous compilations by other authors. Data were read from tables or digitized from graphs. Only measurements made on individuals of know size, or groups of individuals of similar and known size were included. We show that clearance and respiration rates have life-form-dependent allometries that have similar scaling but different elevations, such that the mass-specific rates converge on a rather narrow size-independent range. In contrast, ingestion and growth rates follow a near-universal taxa-independent ~3/4 mass scaling power law. We argue that the declining mass-specific clearance rates with size within taxa is related to the inherent decrease in feeding efficiency of any particular feeding mode. The transitions between feeding mode and simultaneous transitions in clearance and respiration rates may then represent adaptations to the food environment and be the result of the optimization of tradeoffs that allow sufficient feeding and growth rates to balance mortality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present data compilation includes ciliates growth rate, grazing rate and gross growth efficiency determined either in the field or in laboratory experiments. From the existing literature, we synthesized all data that we could find on cilliate. Some sources might be missing but none were purposefully ignored. Field data on microzooplankton grazing are mostly comprised of grazing rate using the dilution technique with a 24h incubation period. Laboratory grazing and growth data are focused on pelagic ciliates and heterotrophic dinoflagellates. The experiment measured grazing or growth as a function of prey concentration or at saturating prey concentration (maximal grazing rate). When considering every single data point available (each measured rate for a defined predator-prey pair and a certain prey concentration) there is a total of 1485 data points for the ciliates, counting experiments that measured growth and grazing simultaneously as 1 data point.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article is protected by copyright. All rights reserved. Acknowledgements This study was funded by a BBSRC studentship (MAW) and NERC grants NE/H00775X/1 and NE/D000602/1 (SBP). The authors are grateful to Mario Röder and Keliya Bai for fieldwork assistance, and all estate owners, factors and keepers for access to field sites, most particularly MJ Taylor and Mike Nisbet (Airlie), Neil Brown (Allargue), RR Gledson and David Scrimgeour (Delnadamph), Andrew Salvesen and John Hay (Dinnet), Stuart Young and Derek Calder (Edinglassie), Kirsty Donald and David Busfield (Glen Dye), Neil Hogbin and Ab Taylor (Glen Muick), Alistair Mitchell (Glenlivet), Simon Blackett, Jim Davidson and Liam Donald (Invercauld), Richard Cooke and Fred Taylor† (Invermark), Shaila Rao and Christopher Murphy (Mar Lodge), and Ralph Peters and Philip Astor (Tillypronie). Data accessibility • Genotype data (DataDryad: doi:10.5061/dryad.4t7jk) • Metadata (information on sampling sites, phenotypes and medication regimen) (DataDryad: doi:10.5061/dryad.4t7jk)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper explores the dynamics of inter-sectoral technological integration by introducing the concept of bridging platform as a node of pervasive technologies, whose collective broad applicability may enhance the connection between ‘distant’ knowledge by offering a technological coupling. Using data on patents obtained from the CRIOS-PATSTAT database for four EU countries (Germany, UK, France and Italy), we provide empirical evidence that bridging platforms are likely to connect more effectively innovations across distant technological domains, fostering inter-sectoral technological integration and the development of original innovation. Public research organisations are also found to play a crucial role in terms of technological integration and original innovation due to their higher capacity to access and use bridging platforms within their innovation activities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The European Union has expanded significantly in recent years. Sustainable trade within the Union, leading to economic growth to the benefit of the ‘old’ and ‘new’ member states is thus extremely important. The road infrastructure is strategic and vital to such development since an uneven transport infrastructure, in terms of capacity and condition, has the potential to reinforce uneven development trends and hinder economic convergence of old and new member states. In the decades since their design and construction, loading conditions have significantly changed for many major highway infrastructure elements/networks owing primarily to increased freight volumes and vehicle sizes. This, coupled with the gradual deterioration of a significant number of highway structures due to their age, and the absence of a pan-European assessment framework, can be expected to affect the smooth functioning of the infrastructure in its as-built condition. Increased periods of reduced flow can be expected owing to planned and unplanned interventions for repair/rehabilitation. This paper reports the findings of a survey regarding the current status of the highway infrastructure elements in six countries within the European Union as reported by the owners/operators. The countries surveyed include a cross-section of ‘existing’ older countries and ‘new’ member states. The current situations for bridges, culverts, tunnels and retaining walls are reported, along with their potential replacement costs. The findings act as a departure point for further studies in support of a centralised and/or synchronised EU approach to infrastructure maintenance management. Information in the form presented in this paper is central to any future decision-making frameworks in terms of trade route choice and operations, monetary investment, optimised maintenance, management and rehabilitation of the built infrastructure and the economic integration of the newly joined member states.