980 resultados para R package
Resumo:
Microsatellite markers have demonstrated their value for performing paternity exclusion and hence exploring mating patterns in plants and animals. Methodology is well established for diploid species, and several software packages exist for elucidating paternity in diploids; however, these issues are not so readily addressed in polyploids due to the increased complexity of the exclusion problem and a lack of available software. We introduce polypatex, an r package for paternity exclusion analysis using microsatellite data in autopolyploid, monoecious or dioecious/bisexual species with a ploidy of 4n, 6n or 8n. Given marker data for a set of offspring, their mothers and a set of candidate fathers, polypatex uses allele matching to exclude candidates whose marker alleles are incompatible with the alleles in each offspring–mother pair. polypatex can analyse marker data sets in which allele copy numbers are known (genotype data) or unknown (allelic phenotype data) – for data sets in which allele copy numbers are unknown, comparisons are made taking into account all possible genotypes that could arise from the compared allele sets. polypatex is a software tool that provides population geneticists with the ability to investigate the mating patterns of autopolyploids using paternity exclusion analysis on data from codominant markers having multiple alleles per locus.
Resumo:
[EN]Probability models on permutations associate a probability value to each of the permutations on n items. This paper considers two popular probability models, the Mallows model and the Generalized Mallows model. We describe methods for making inference, sampling and learning such distributions, some of which are novel in the literature. This paper also describes operations for permutations, with special attention in those related with the Kendall and Cayley distances and the random generation of permutations. These operations are of key importance for the efficient computation of the operations on distributions. These algorithms are implemented in the associated R package. Moreover, the internal code is written in C++.
Resumo:
Background: Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample ... ) belongs to one of these previously identified clusters or to a new group. Results: ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. Conclusions: We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.
Resumo:
Summary: We present a new R package, diveRsity, for the calculation of various diversity statistics, including common diversity partitioning statistics (?, G) and population differentiation statistics (D, GST ', ? test for population heterogeneity), among others. The package calculates these estimators along with their respective bootstrapped confidence intervals for loci, sample population pairwise and global levels. Various plotting tools are also provided for a visual evaluation of estimated values, allowing users to critically assess the validity and significance of statistical tests from a biological perspective. diveRsity has a set of unique features, which facilitate the use of an informed framework for assessing the validity of the use of traditional F-statistics for the inference of demography, with reference to specific marker types, particularly focusing on highly polymorphic microsatellite loci. However, the package can be readily used for other co-dominant marker types (e.g. allozymes, SNPs). Detailed examples of usage and descriptions of package capabilities are provided. The examples demonstrate useful strategies for the exploration of data and interpretation of results generated by diveRsity. Additional online resources for the package are also described, including a GUI web app version intended for those with more limited experience using R for statistical analysis. © 2013 British Ecological Society.
Resumo:
The R-package “compositions”is a tool for advanced compositional analysis. Its basic functionality has seen some conceptual improvement, containing now some facilities to work with and represent ilr bases built from balances, and an elaborated subsys- tem for dealing with several kinds of irregular data: (rounded or structural) zeroes, incomplete observations and outliers. The general approach to these irregularities is based on subcompositions: for an irregular datum, one can distinguish a “regular” sub- composition (where all parts are actually observed and the datum behaves typically) and a “problematic” subcomposition (with those unobserved, zero or rounded parts, or else where the datum shows an erratic or atypical behaviour). Systematic classification schemes are proposed for both outliers and missing values (including zeros) focusing on the nature of irregularities in the datum subcomposition(s). To compute statistics with values missing at random and structural zeros, a projection approach is implemented: a given datum contributes to the estimation of the desired parameters only on the subcompositon where it was observed. For data sets with values below the detection limit, two different approaches are provided: the well-known imputation technique, and also the projection approach. To compute statistics in the presence of outliers, robust statistics are adapted to the characteristics of compositional data, based on the minimum covariance determinant approach. The outlier classification is based on four different models of outlier occur- rence and Monte-Carlo-based tests for their characterization. Furthermore the package provides special plots helping to understand the nature of outliers in the dataset. Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator, robustness, rounded zeros
Resumo:
Approximate Bayesian computation (ABC) is a popular family of algorithms which perform approximate parameter inference when numerical evaluation of the likelihood function is not possible but data can be simulated from the model. They return a sample of parameter values which produce simulations close to the observed dataset. A standard approach is to reduce the simulated and observed datasets to vectors of summary statistics and accept when the difference between these is below a specified threshold. ABC can also be adapted to perform model choice. In this article, we present a new software package for R, abctools which provides methods for tuning ABC algorithms. This includes recent dimension reduction algorithms to tune the choice of summary statistics, and coverage methods to tune the choice of threshold. We provide several illustrations of these routines on applications taken from the ABC literature.
Resumo:
We present a new version of the hglm package for fittinghierarchical generalized linear models (HGLM) with spatially correlated random effects. A CAR family for conditional autoregressive random effects was implemented. Eigen decomposition of the matrix describing the spatial structure (e.g. the neighborhood matrix) was used to transform the CAR random effectsinto an independent, but heteroscedastic, gaussian random effect. A linear predictor is fitted for the random effect variance to estimate the parameters in the CAR model.This gives a computationally efficient algorithm for moderately sized problems (e.g. n<5000).
Resumo:
Genotyping platforms such as Affymetrix can be used to assess genotype-phenotype as well as copy number-phenotype associations at millions of markers. While genotyping algorithms are largely concordant when assessed on HapMap samples, tools to assess copy number changes are more variable and often discordant. One explanation for the discordance is that copy number estimates are susceptible to systematic differences between groups of samples that were processed at different times or by different labs. Analysis algorithms that do not adjust for batch effects are prone to spurious measures of association. The R package crlmm implements a multilevel model that adjusts for batch effects and provides allele-specific estimates of copy number. This paper illustrates a workflow for the estimation of allele-specific copy number, develops markerand study-level summaries of batch effects, and demonstrates how the marker-level estimates can be integrated with complimentary Bioconductor software for inferring regions of copy number gain or loss. All analyses are performed in the statistical environment R. A compendium for reproducing the analysis is available from the author’s website (http://www.biostat.jhsph.edu/~rscharpf/crlmmCompendium/index.html).
Resumo:
Mathematical models of disease progression predict disease outcomes and are useful epidemiological tools for planners and evaluators of health interventions. The R package gems is a tool that simulates disease progression in patients and predicts the effect of different interventions on patient outcome. Disease progression is represented by a series of events (e.g., diagnosis, treatment and death), displayed in a directed acyclic graph. The vertices correspond to disease states and the directed edges represent events. The package gems allows simulations based on a generalized multistate model that can be described by a directed acyclic graph with continuous transition-specific hazard functions. The user can specify an arbitrary hazard function and its parameters. The model includes parameter uncertainty, does not need to be a Markov model, and may take the history of previous events into account. Applications are not limited to the medical field and extend to other areas where multistate simulation is of interest. We provide a technical explanation of the multistate models used by gems, explain the functions of gems and their arguments, and show a sample application.
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
© 2016 Springer Science+Business Media New YorkResearchers studying mammalian dentitions from functional and adaptive perspectives increasingly have moved towards using dental topography measures that can be estimated from 3D surface scans, which do not require identification of specific homologous landmarks. Here we present molaR, a new R package designed to assist researchers in calculating four commonly used topographic measures: Dirichlet Normal Energy (DNE), Relief Index (RFI), Orientation Patch Count (OPC), and Orientation Patch Count Rotated (OPCR) from surface scans of teeth, enabling a unified application of these informative new metrics. In addition to providing topographic measuring tools, molaR has complimentary plotting functions enabling highly customizable visualization of results. This article gives a detailed description of the DNE measure, walks researchers through installing, operating, and troubleshooting molaR and its functions, and gives an example of a simple comparison that measured teeth of the primates Alouatta and Pithecia in molaR and other available software packages. molaR is a free and open source software extension, which can be found at the doi:10.13140/RG.2.1.3563.4961(molaR v. 2.0) as well as on the Internet repository CRAN, which stores R packages.
Resumo:
When we study the variables that a ffect survival time, we usually estimate their eff ects by the Cox regression model. In biomedical research, e ffects of the covariates are often modi ed by a biomarker variable. This leads to covariates-biomarker interactions. Here biomarker is an objective measurement of the patient characteristics at baseline. Liu et al. (2015) has built up a local partial likelihood bootstrap model to estimate and test this interaction e ffect of covariates and biomarker, but the R code developed by Liu et al. (2015) can only handle one variable and one interaction term and can not t the model with adjustment to nuisance variables. In this project, we expand the model to allow adjustment to nuisance variables, expand the R code to take any chosen interaction terms, and we set up many parameters for users to customize their research. We also build up an R package called "lplb" to integrate the complex computations into a simple interface. We conduct numerical simulation to show that the new method has excellent fi nite sample properties under both the null and alternative hypothesis. We also applied the method to analyze data from a prostate cancer clinical trial with acid phosphatase (AP) biomarker.
Resumo:
The metapopulation paradigm is central in ecology and conservation biology to understand the dynamics of spatially-structured populations in fragmented landscapes. Metapopulations are often studied using simulation modelling, and there is an increasing demand of user-friendly software tools to simulate metapopulation responses to environmental change. Here we describe the MetaLandSim R package, mwhich integrates ideas from metapopulation and graph theories to simulate the dynamics of real and virtual metapopulations. The package offers tools to (i) estimate metapopulation parameters from empirical data, (ii) to predict variation in patch occupancy over time in static and dynamic landscapes, either real or virtual, and (iii) to quantify the patterns and speed of metapopulation expansion into empty landscapes. MetaLandSim thus provides detailed information on metapopulation processes, which can be easily combined with land use and climate change scenarios to predict metapopulation dynamics and range expansion for a variety of taxa and ecological systems.
Resumo:
Stratigraphic Columns (SC) are the most useful and common ways to represent the eld descriptions (e.g., grain size, thickness of rock packages, and fossil and lithological components) of rock sequences and well logs. In these representations the width of SC vary according to the grain size (i.e., the wider the strata, the coarser the rocks (Miall 1990; Tucker 2011)), and the thickness of each layer is represented at the vertical axis of the diagram. Typically these representations are drawn 'manually' using vector graphic editors (e.g., Adobe Illustrator®, CorelDRAW®, Inskape). Nowadays there are various software which automatically plot SCs, but there are not versatile open-source tools and it is very di cult to both store and analyse stratigraphic information. This document presents Stratigraphic Data Analysis in R (SDAR), an analytical package1 designed for both plotting and facilitate the analysis of Stratigraphic Data in R (R Core Team 2014). SDAR, uses simple stratigraphic data and takes advantage of the exible plotting tools available in R to produce detailed SCs. The main bene ts of SDAR are: (i) used to generate accurate and complete SC plot including multiple features (e.g., sedimentary structures, samples, fossil content, color, structural data, contacts between beds), (ii) developed in a free software environment for statistical computing and graphics, (iii) run on a wide variety of platforms (i.e., UNIX, Windows, and MacOS), (iv) both plotting and analysing functions can be executed directly on R's command-line interface (CLI), consequently this feature enables users to integrate SDAR's functions with several others add-on packages available for R from The Comprehensive R Archive Network (CRAN).