13 resultados para R-package
Resumo:
Summary: We present a new R package, diveRsity, for the calculation of various diversity statistics, including common diversity partitioning statistics (?, G) and population differentiation statistics (D, GST ', ? test for population heterogeneity), among others. The package calculates these estimators along with their respective bootstrapped confidence intervals for loci, sample population pairwise and global levels. Various plotting tools are also provided for a visual evaluation of estimated values, allowing users to critically assess the validity and significance of statistical tests from a biological perspective. diveRsity has a set of unique features, which facilitate the use of an informed framework for assessing the validity of the use of traditional F-statistics for the inference of demography, with reference to specific marker types, particularly focusing on highly polymorphic microsatellite loci. However, the package can be readily used for other co-dominant marker types (e.g. allozymes, SNPs). Detailed examples of usage and descriptions of package capabilities are provided. The examples demonstrate useful strategies for the exploration of data and interpretation of results generated by diveRsity. Additional online resources for the package are also described, including a GUI web app version intended for those with more limited experience using R for statistical analysis. © 2013 British Ecological Society.
Resumo:
Modern biology and medicine aim at hunting molecular and cellular causes of biological functions and diseases. Gene regulatory networks (GRN) inferred from gene expression data are considered an important aid for this research by providing a map of molecular interactions. Hence, GRNs have the potential enabling and enhancing basic as well as applied research in the life sciences. In this paper, we introduce a new method called BC3NET for inferring causal gene regulatory networks from large-scale gene expression data. BC3NET is an ensemble method that is based on bagging the C3NET algorithm, which means it corresponds to a Bayesian approach with noninformative priors. In this study we demonstrate for a variety of simulated and biological gene expression data from S. cerevisiae that BC3NET is an important enhancement over other inference methods that is capable of capturing biochemical interactions from transcription regulation and protein-protein interaction sensibly. An implementation of BC3NET is freely available as an R package from the CRAN repository. © 2012 de Matos Simoes, Emmert-Streib.
Resumo:
A parametric regression model for right-censored data with a log-linear median regression function and a transformation in both response and regression parts, named parametric Transform-Both-Sides (TBS) model, is presented. The TBS model has a parameter that handles data asymmetry while allowing various different distributions for the error, as long as they are unimodal symmetric distributions centered at zero. The discussion is focused on the estimation procedure with five important error distributions (normal, double-exponential, Student's t, Cauchy and logistic) and presents properties, associated functions (that is, survival and hazard functions) and estimation methods based on maximum likelihood and on the Bayesian paradigm. These procedures are implemented in TBSSurvival, an open-source fully documented R package. The use of the package is illustrated and the performance of the model is analyzed using both simulated and real data sets.
Resumo:
Background: Pedigree reconstruction using genetic analysis provides a useful means to estimate fundamental population biology parameters relating to population demography, trait heritability and individual fitness when combined with other sources of data. However, there remain limitations to pedigree reconstruction in wild populations, particularly in systems where parent-offspring relationships cannot be directly observed, there is incomplete sampling of individuals, or molecular parentage inference relies on low quality DNA from archived material. While much can still be inferred from incomplete or sparse pedigrees, it is crucial to evaluate the quality and power of available genetic information a priori to testing specific biological hypotheses. Here, we used microsatellite markers to reconstruct a multi-generation pedigree of wild Atlantic salmon (Salmo salar L.) using archived scale samples collected with a total trapping system within a river over a 10 year period. Using a simulation-based approach, we determined the optimal microsatellite marker number for accurate parentage assignment, and evaluated the power of the resulting partial pedigree to investigate important evolutionary and quantitative genetic characteristics of salmon in the system.
Results: We show that at least 20 microsatellites (ave. 12 alleles/locus) are required to maximise parentage assignment and to improve the power to estimate reproductive success and heritability in this study system. We also show that 1.5 fold differences can be detected between groups simulated to have differing reproductive success, and that it is possible to detect moderate heritability values for continuous traits (h(2) similar to 0.40) with more than 80% power when using 28 moderately to highly polymorphic markers.
Conclusion: The methodologies and work flow described provide a robust approach for evaluating archived samples for pedigree-based research, even where only a proportion of the total population is sampled. The results demonstrate the feasibility of pedigree-based studies to address challenging ecological and evolutionary questions in free-living populations, where genealogies can be traced only using molecular tools, and that significant increases in pedigree assignment power can be achieved by using higher numbers of markers.
Resumo:
We present a robust Dirichlet process for estimating survival functions from samples with right-censored data. It adopts a prior near-ignorance approach to avoid almost any assumption about the distribution of the population lifetimes, as well as the need of eliciting an infinite dimensional parameter (in case of lack of prior information), as it happens with the usual Dirichlet process prior. We show how such model can be used to derive robust inferences from right-censored lifetime data. Robustness is due to the identification of the decisions that are prior-dependent, and can be interpreted as an analysis of sensitivity with respect to the hypothetical inclusion of fictitious new samples in the data. In particular, we derive a nonparametric estimator of the survival probability and a hypothesis test about the probability that the lifetime of an individual from one population is shorter than the lifetime of an individual from another. We evaluate these ideas on simulated data and on the Australian AIDS survival dataset. The methods are publicly available through an easy-to-use R package.
Resumo:
Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.
Resumo:
MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (a) a cost efficient and (b) an optimal experimental design leading to a compromise, e.g., in the sequencing depth of experiments.
RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes.
AVAILABILITY: Availability: samExploreR is available as an R package from Bioconductor (after acceptance of the paper, download link: http://www.bio-complexity.com/samExploreR_1.0.0.tar.gz).
Resumo:
Self-injurious and aggressive behaviours have often been identified as the cause for students’ lack of academic progress, parental distress, health risks and teachers´ low satisfaction levels. Functional analysis has been identified in the research literature as the benchmark of effective treatments for disruptive and/or inappropriate behaviours. The present study was completed with a girl diagnosed with ASD. An experimental functional analysis was conducted identifying the function of self-injurious behaviours and tantrums to be escaping from tasks. A treatment package was consequently put in place integrating several components that aimed at reducing overall levels of inappropriate behaviours. Results showed a clear and meaningful improvement in the student´s overall health and academic progress, as well as in parental involvement, teachers’ satisfaction and school inclusion. These outcomes are discussed in the light of evidence-based experimental procedures based on applied behaviour analysis and more specifically on the functional-analytic literature, which, if put in place consistently, can bring valuable positive changes in the quality of life of individuals with ASD.
Resumo:
Survivorship is an important issue in cancer care in the UK. More people are being diagnosed with the disease and many more are living for longer after diagnosis. The National Cancer Survivorship Initiative recommends that patients with cancer have a package of care designed to improve outcomes and support for those living with and beyond the disease. The recovery package consists of a holistic needs assessment, treatment summary, cancer care review and health and wellbeing event. Although these interventions are recommended as a way to improve care, many people do not have access to the combined package, or even some of its components. The Cancer Nursing Partnership (CNP), a collaboration of cancer nursing organisations and communities of influence, has been established to support nurses with delivery of the recovery package in practice. This article describes the package and its components, introduces the CNP and outlines the work it has carried out to date.