967 resultados para biological data


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the past decade, the amount of data in biological field has become larger and larger; Bio-techniques for analysis of biological data have been developed and new tools have been introduced. Several computational methods are based on unsupervised neural network algorithms that are widely used for multiple purposes including clustering and visualization, i.e. the Self Organizing Maps (SOM). Unfortunately, even though this method is unsupervised, the performances in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. In this paper we present a new initialization technique based on a totally connected undirected graph, that report relations among some intersting features of data input. Result of experimental tests, where the proposed algorithm is compared to the original initialization techniques, shows that our technique assures faster learning and better performance in terms of quantization error.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The microbial fermentability, ruminal degradability and digestibility of 48 maize silages were determined using in vitro gas production (GP), in situ degradability and in vitro digestibility procedures. The silages were produced from forage maize harvested throughout the summer of 1998, and represent a wide range of physiological maturities. Large variations among samples were observed for all biological parameters, with the exception of in vitro digestibility and the asymptote of in vitro GP. The potential of near infrared reflectance spectroscopy (NIRS) to predict the biological parameters measured was determined by regression of the biological data against the respective spectral profile. NIRS demonstrated only a moderate ability (R-2 > 0.60-0.80) to predict in vitro digestibility, modelled kinetics of gas production (excluding the asymptote of gas production) and the modelled ruminally soluble dry matter (DM) fraction. Calibration statistics for remaining biological parameters were unacceptably poor (R-2 = 0.60). (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

There is remarkable agreement in expectations today for vastly improved ocean data management a decade from now -- capabilities that will help to bring significant benefits to ocean research and to society. Advancing data management to such a degree, however, will require cultural and policy changes that are slow to effect. The technological foundations upon which data management systems are built are certain to continue advancing rapidly in parallel. These considerations argue for adopting attitudes of pragmatism and realism when planning data management strategies. In this paper we adopt those attitudes as we outline opportunities for progress in ocean data management. We begin with a synopsis of expectations for integrated ocean data management a decade from now. We discuss factors that should be considered by those evaluating candidate “standards”. We highlight challenges and opportunities in a number of technical areas, including “Web 2.0” applications, data modeling, data discovery and metadata, real-time operational data, archival of data, biological data management and satellite data management. We discuss the importance of investments in the development of software toolkits to accelerate progress. We conclude the paper by recommending a few specific, short term targets for implementation, that we believe to be both significant and achievable, and calling for action by community leadership to effect these advancements.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background
Medical and biological data are commonly with small sample size, missing values, and most importantly, imbalanced class distribution. In this study we propose a particle swarm based hybrid system for remedying the class imbalance problem in medical and biological data mining. This hybrid system combines the particle swarm optimization (PSO) algorithm with multiple classifiers and evaluation metrics for evaluation fusion. Samples from the majority class are ranked using multiple objectives according to their merit in compensating the class imbalance, and then combined with the minority class to form a balanced dataset.

Results
One important finding of this study is that different classifiers and metrics often provide different evaluation results. Nevertheless, the proposed hybrid system demonstrates consistent improvements over several alternative methods with three different metrics. The sampling results also demonstrate good generalization on different types of classification algorithms, indicating the advantage of information fusion applied in the hybrid system.

Conclusion
The experimental results demonstrate that unlike many currently available methods which often perform unevenly with different datasets the proposed hybrid system has a better generalization property which alleviates the method-data dependency problem. From the biological perspective, the system provides indication for further investigation of the highly ranked samples, which may result in the discovery of new conditions or disease subtypes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper proposes to apply multiagent based data mining technologies to biological data analysis. The rationale is justified from multiple perspectives with an emphasis on biological context. Followed by that, an initial multiagent based bio-data mining framework is presented. Based on the framework, we developed a prototype system to demonstrate how it helps the biologists to perform a comprehensive mining task for answering biological questions. The system offers a new way to reuse biological datasets and available data mining algorithms with ease.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Inferring transcriptional regulatory networks from high-throughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed TReNGO (Transcriptional Regulatory Networks reconstruction based on Global Optimization), a global and threshold-free algorithm with simulated annealing for inferring regulatory networks by the integration of ChIP-chip and expression data. Superior to existing methods, TReNGO was expected to find the optimal structure of transcriptional regulatory networks without any arbitrary thresholds or predetermined number of transcriptional modules (TMs). TReNGO was applied to both synthetic data and real yeast data in the rapamycin response. In these applications, we demonstrated an improved functional coherence of TMs and TF (transcription factor)- target predictions by TReNGO when compared to GRAM, COGRIM or to analyzing ChIP-chip data alone. We also demonstrated the ability of TReNGO to discover unexpected biological processes that TFs may be involved in and to also identify interesting novel combinations of TFs.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Many techniques used to model ecosystems cannot be meaningfully applied to large-scale ecological problems due to data constraints. Disparate collection methods, data types and incomplete data sets, or limited theoretical understanding mean that a wide range of modelling techniques used to model physical processes or for problems specific to species or populations cannot be used at an ecosystem scale. In developing an ecological response model for the Coorong, a South Australian hypersaline estuary, we combined several flexible modelling approaches in a statistical framework to develop an approach we call ‘ecosystem states’. This model uses simulated hydrodynamic conditions as input to predict one of a suite of states per space and time, allowing prediction of likely ecological conditions under a variety of scenarios. Each ecosystem state has defined sets of biota and physico-chemical parameters. The existing model is limited in that its predictions have yet to be tested and, as yet, no spatial or temporal connectivity has been incorporated into simulated time series of ecosystem states. This approach can be used in a wide range of ecosystems, where enough data are available to model ecosystem states. We are in the process of applying the technique to a nearby lake system. This has been more difficult than for the Coorong as there is little overlap in the spatial and temporal coverage of biological data sets for that region. The approach is robust to low-quality biological data and missing environmental data, so should suit situations where community or management monitoring programs have occurred through time.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The consideration of information on social values in conjunction with biological data is critical for achieving both socially acceptable and scientifically defensible conservation planning outcomes. However, the influence of social values on spatial conservation priorities has received limited attention and is poorly understood. We present an approach that incorporates quantitative data on social values for conservation and social preferences for development into spatial conservation planning. We undertook a public participation GIS survey to spatially represent social values and development preferences and used species distribution models for 7 threatened fauna species to represent biological values. These spatially explicit data were simultaneously included in the conservation planning software Zonation to examine how conservation priorities changed with the inclusion of social data. Integrating spatially explicit information about social values and development preferences with biological data produced prioritizations that differed spatially from the solution based on only biological data. However, the integrated solutions protected a similar proportion of the species' distributions, indicating that Zonation effectively combined the biological and social data to produce socially feasible conservation solutions of approximately equivalent biological value. We were able to identify areas of the landscape where synergies and conflicts between different value sets are likely to occur. Identification of these synergies and conflicts will allow decision makers to target communication strategies to specific areas and ensure effective community engagement and positive conservation outcomes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Abstract
Background: Coronary artery disease (CAD), one of the leading causes of death globally, is influenced by both environmental and genetic risk factors. Gene-centric genome-wide association studies (GWAS) involving cases and controls have been remarkably successful in identifying genetic loci contributing to CAD. Modern in silico platforms, such as candidate gene prediction tools, permit a systematic analysis of GWAS data to identify candidate genes for complex diseases like CAD. Subsequent integration of drug-target data from drug databases with the predicted candidate genes can potentially identify novel therapeutics suitable for repositioning towards treatment of CAD.
Methods: Previously, we were able to predict 264 candidate genes and 104 potential therapeutic targets for CAD using Gentrepid (www.gentrepid.org), a candidate gene prediction platform with two bioinformatic modules to reanalyze Wellcome Trust Case-Control Consortium GWAS data. In an expanded study, using five bioinformatics modules on the same data, Gentrepid predicted 647 candidate genes and successfully replicated 55% of the candidate genes identified by the more powerful CARDIoGRAMplusC4D consortium meta-analysis. Hence, Gentrepid was capable of enhancing lower quality genotype-phenotype data, using an independent knowledgebase of existing biological data. Here, we used our methodology to integrate drug data from three drug databases: the Therapeutic Target Database, PharmGKB and Drug Bank, with the 647 candidate gene predictions from Gentrepid. We utilized known CAD targets, the scientific literature, existing drug data and the CARDIoGRAMplusC4D meta-analysis study as benchmarks to validate Gentrepid predictions for CAD.
Results: Our analysis identified a total of 184 predicted candidate genes as novel therapeutic targets for CAD, and 981 novel therapeutics feasible for repositioning in clinical trials towards treatment of CAD. The benchmarks based on known CAD targets and the scientific literature showed that our results were significant (p < 0.05).
Conclusions: We have demonstrated that available drugs may potentially be repositioned as novel therapeutics for the treatment of CAD. Drug repositioning can save valuable time and money spent on preclinical and phase I clinical studies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis is settled within the STOCKMAPPING project, which represents one of the studies that were developed in the framework of RITMARE Flagship project. The main goals of STOCKMAPPING were the creation of a genomic mapping for stocks of demersal target species and the assembling of a database of population genomic, in order to identify stocks and stocks boundaries. The thesis focuses on three main objectives representing the core for the initial assessment of the methodologies and structure that would be applied to the entire STOCKMAPPING project: individuation of an analytical design to identify and locate stocks and stocks boundaries of Mullus barbatus, application of a multidisciplinary approach to validate biological methods and an initial assessment and improvement for the genotyping by sequencing technique utilized (2b-RAD). The first step is the individuation of an analytical design that has to take in to account the biological characteristics of red mullet and being representative for STOCKMAPPING commitments. In this framework a reduction and selection steps was needed due to budget reduction. Sampling areas were ranked according the individuation of four priorities. To guarantee a multidisciplinary approach the biological data associated to the collected samples were used to investigate differences between sampling areas and GSAs. Genomic techniques were applied to red mullet for the first time so an initial assessment of molecular protocols for DNA extraction and 2b-RAD processing were needed. At the end 192 good quality DNAs have been extracted and eight samples have been processed with 2b-RAD. Utilizing the software Stacks for sequences analyses a great number of SNPs markers among the eight samples have been identified. Several tests have been performed changing the main parameter of the Stacks pipeline in order to identify the most explicative and functional sets of parameters.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis is developed in the contest of Ritmare project WP1, which main objective is the development of a sustainable fishery through the identification of populations boundaries in commercially important species in Italian Seas. Three main objectives are discussed in order to help reach the main purpose of identification of stock boundaries in Parapenaeus longirostris: 1 -Development of a representative sampling design for Italian seas; 2 -Evaluation of 2b-RAD protocol; 3 -Investigation of populations through biological data analysis. First of all we defined and accomplished a sampling design which properly represents all Italian seas. Then we used information and data about nursery areas distribution, abundance of populations and importance of P. longirostris in local fishery, to develop an experimental design that prioritize the most important areas to maximize the results with actual project funds. We introduced for the first time the use of 2b-RAD on this species, a genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases. Thanks to this method we were able to move from genetics to the more complex genomics. In order to proceed with 2b-RAD we performed several tests to identify the best DNA extraction kit and protocol and finally we were able to extract 192 high quality DNA extracts ready to be processed. We tested 2b-RAD with five samples and after high-throughput sequencing of libraries we used the software “Stacks” to analyze the sequences. We obtained positive results identifying a great number of SNP markers among the five samples. To guarantee a multidisciplinary approach we used the biological data associated to the collected samples to investigate differences between geographical samples. Such approach assures continuity with other project, for instance STOCKMED, which utilize a combination of molecular and biological analysis as well.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural MRI.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

BACKGROUND: Despite a large body of clinical and histological data demonstrating beneficial effects of enamel matrix proteins (EMPs) for regenerative periodontal therapy, it is less clear how the available biological data can explain the mechanisms underlying the supportive effects of EMPs. OBJECTIVE: To analyse all available biological data of EMPs at the cellular and molecular levels that are relevant in the context of periodontal wound healing and tissue formation. METHODS: A stringent systematic approach was applied using the key words "enamel matrix proteins" OR "enamel matrix derivative" OR "emdogain" OR "amelogenin". The literature search was performed separately for epithelial cells, gingival fibroblasts, periodontal ligament cells, cementoblasts, osteogenic/chondrogenic/bone marrow cells, wound healing, and bacteria. RESULTS: A total of 103 papers met the inclusion criteria. EMPs affect many different cell types. Overall, the available data show that EMPs have effects on: (1) cell attachment, spreading, and chemotaxis; (2) cell proliferation and survival; (3) expression of transcription factors; (4) expression of growth factors, cytokines, extracellular matrix constituents, and other macromolecules; and (5) expression of molecules involved in the regulation of bone remodelling. CONCLUSION: All together, the data analysis provides strong evidence for EMPs to support wound healing and new periodontal tissue formation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sedimentary sequences in ancient or long-lived lakes can reach several thousands of meters in thickness and often provide an unrivalled perspective of the lake's regional climatic, environmental, and biological history. Over the last few years, deep-drilling projects in ancient lakes became increasingly multi- and interdisciplinary, as, among others, seismological, sedimentological, biogeochemical, climatic, environmental, paleontological, and evolutionary information can be obtained from sediment cores. However, these multi- and interdisciplinary projects pose several challenges. The scientists involved typically approach problems from different scientific perspectives and backgrounds, and setting up the program requires clear communication and the alignment of interests. One of the most challenging tasks, besides the actual drilling operation, is to link diverse datasets with varying resolution, data quality, and age uncertainties to answer interdisciplinary questions synthetically and coherently. These problems are especially relevant when secondary data, i.e., datasets obtained independently of the drilling operation, are incorporated in analyses. Nonetheless, the inclusion of secondary information, such as isotopic data from fossils found in outcrops or genetic data from extant species, may help to achieve synthetic answers. Recent technological and methodological advances in paleolimnology are likely to increase the possibilities of integrating secondary information. Some of the new approaches have started to revolutionize scientific drilling in ancient lakes, but at the same time, they also add a new layer of complexity to the generation and analysis of sediment-core data. The enhanced opportunities presented by new scientific approaches to study the paleolimnological history of these lakes, therefore, come at the expense of higher logistic, communication, and analytical efforts. Here we review types of data that can be obtained in ancient lake drilling projects and the analytical approaches that can be applied to empirically and statistically link diverse datasets to create an integrative perspective on geological and biological data. In doing so, we highlight strengths and potential weaknesses of new methods and analyses, and provide recommendations for future interdisciplinary deep-drilling projects.