103 resultados para statistical inference


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Adaptation to different ecological environments can promote speciation. Although numerous examples of such 'ecological speciation' now exist, the genomic basis of the process, and the role of gene flow in it, remains less understood. This is, at least in part, because systems that are well characterized in terms of their ecology often lack genomic resources. In this study, we characterize the transcriptome of Timema cristinae stick insects, a system that has been researched intensively in terms of ecological speciation, but for which genomic resources have not been previously developed. Specifically, we obtained >1 million 454 sequencing reads that assembled into 84,937 contigs representing approximately 18,282 unique genes and tens of thousands of potential molecular markers. Second, as an illustration of their utility, we used these genomic resources to assess multilocus genetic divergence within both an ecotype pair and a species pair of Timema stick insects. The results suggest variable levels of genetic divergence and gene flow among taxon pairs and genes and illustrate a first step towards future genomic work in Timema.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Phylogenetic reconstructions are a major component of many studies in evolutionary biology, but their accuracy can be reduced under certain conditions. Recent studies showed that the convergent evolution of some phenotypes resulted from recurrent amino acid substitutions in genes belonging to distant lineages. It has been suggested that these convergent substitutions could bias phylogenetic reconstruction toward grouping convergent phenotypes together, but such an effect has never been appropriately tested. We used computer simulations to determine the effect of convergent substitutions on the accuracy of phylogenetic inference. We show that, in some realistic conditions, even a relatively small proportion of convergent codons can strongly bias phylogenetic reconstruction, especially when amino acid sequences are used as characters. The strength of this bias does not depend on the reconstruction method but varies as a function of how much divergence had occurred among the lineages prior to any episodes of convergent substitutions. While the occurrence of this bias is difficult to predict, the risk of spurious groupings is strongly decreased by considering only 3rd codon positions, which are less subject to selection, as long as saturation problems are not present. Therefore, we recommend that, whenever possible, topologies obtained with amino acid sequences and 3rd codon positions be compared to identify potential phylogenetic biases and avoid evolutionarily misleading conclusions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pearson correlation coefficients were applied for the objective comparison of 30 black gel pen inks analysed by laser desorption ionization mass spectrometry (LDI-MS). The mass spectra were obtained for ink lines directly on paper using positive and negative ion modes at several laser intensities. This methodology has the advantage of taking into account the reproducibility of the results as well as the variability between spectra of different pens. A differentiation threshold could thus be selected in order to avoid the risk of false differentiation. Combining results from positive and negative mode yielded a discriminating power up to 85%, which was better than the one obtained previously with other optical comparison methodologies. The technique also allowed discriminating between pens from the same brand.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Detecting local differences between groups of connectomes is a great challenge in neuroimaging, because the large number of tests that have to be performed and the impact on multiplicity correction. Any available information should be exploited to increase the power of detecting true between-group effects. We present an adaptive strategy that exploits the data structure and the prior information concerning positive dependence between nodes and connections, without relying on strong assumptions. As a first step, we decompose the brain network, i.e., the connectome, into subnetworks and we apply a screening at the subnetwork level. The subnetworks are defined either according to prior knowledge or by applying a data driven algorithm. Given the results of the screening step, a filtering is performed to seek real differences at the node/connection level. The proposed strategy could be used to strongly control either the family-wise error rate or the false discovery rate. We show by means of different simulations the benefit of the proposed strategy, and we present a real application of comparing connectomes of preschool children and adolescents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Accurate detection of subpopulation size determinations in bimodal populations remains problematic yet it represents a powerful way by which cellular heterogeneity under different environmental conditions can be compared. So far, most studies have relied on qualitative descriptions of population distribution patterns, on population-independent descriptors, or on arbitrary placement of thresholds distinguishing biological ON from OFF states. We found that all these methods fall short of accurately describing small population sizes in bimodal populations. Here we propose a simple, statistics-based method for the analysis of small subpopulation sizes for use in the free software environment R and test this method on real as well as simulated data. Four so-called population splitting methods were designed with different algorithms that can estimate subpopulation sizes from bimodal populations. All four methods proved more precise than previously used methods when analyzing subpopulation sizes of transfer competent cells arising in populations of the bacterium Pseudomonas knackmussii B13. The methods' resolving powers were further explored by bootstrapping and simulations. Two of the methods were not severely limited by the proportions of subpopulations they could estimate correctly, but the two others only allowed accurate subpopulation quantification when this amounted to less than 25% of the total population. In contrast, only one method was still sufficiently accurate with subpopulations smaller than 1% of the total population. This study proposes a number of rational approximations to quantifying small subpopulations and offers an easy-to-use protocol for their implementation in the open source statistical software environment R.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this research was to evaluate how fingerprint analysts would incorporate information from newly developed tools into their decision making processes. Specifically, we assessed effects using the following: (1) a quality tool to aid in the assessment of the clarity of the friction ridge details, (2) a statistical tool to provide likelihood ratios representing the strength of the corresponding features between compared fingerprints, and (3) consensus information from a group of trained fingerprint experts. The measured variables for the effect on examiner performance were the accuracy and reproducibility of the conclusions against the ground truth (including the impact on error rates) and the analyst accuracy and variation for feature selection and comparison.¦The results showed that participants using the consensus information from other fingerprint experts demonstrated more consistency and accuracy in minutiae selection. They also demonstrated higher accuracy, sensitivity, and specificity in the decisions reported. The quality tool also affected minutiae selection (which, in turn, had limited influence on the reported decisions); the statistical tool did not appear to influence the reported decisions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analysis of variance is commonly used in morphometry in order to ascertain differences in parameters between several populations. Failure to detect significant differences between populations (type II error) may be due to suboptimal sampling and lead to erroneous conclusions; the concept of statistical power allows one to avoid such failures by means of an adequate sampling. Several examples are given in the morphometry of the nervous system, showing the use of the power of a hierarchical analysis of variance test for the choice of appropriate sample and subsample sizes. In the first case chosen, neuronal densities in the human visual cortex, we find the number of observations to be of little effect. For dendritic spine densities in the visual cortex of mice and humans, the effect is somewhat larger. A substantial effect is shown in our last example, dendritic segmental lengths in monkey lateral geniculate nucleus. It is in the nature of the hierarchical model that sample size is always more important than subsample size. The relative weight to be attributed to subsample size thus depends on the relative magnitude of the between observations variance compared to the between individuals variance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: As part of EUROCAT's surveillance of congenital anomalies in Europe, a statistical monitoring system has been developed to detect recent clusters or long-term (10 year) time trends. The purpose of this article is to describe the system for the identification and investigation of 10-year time trends, conceived as a "screening" tool ultimately leading to the identification of trends which may be due to changing teratogenic factors.METHODS: The EUROCAT database consists of all cases of congenital anomalies including livebirths, fetal deaths from 20 weeks gestational age, and terminations of pregnancy for fetal anomaly. Monitoring of 10-year trends is performed for each registry for each of 96 non-independent EUROCAT congenital anomaly subgroups, while Pan-Europe analysis combines data from all registries. The monitoring results are reviewed, prioritized according to a prioritization strategy, and communicated to registries for investigation. Twenty-one registries covering over 4 million births, from 1999 to 2008, were included in monitoring in 2010.CONCLUSIONS: Significant increasing trends were detected for abdominal wall anomalies, gastroschisis, hypospadias, Trisomy 18 and renal dysplasia in the Pan-Europe analysis while 68 increasing trends were identified in individual registries. A decreasing trend was detected in over one-third of anomaly subgroups in the Pan-Europe analysis, and 16.9% of individual registry tests. Registry preliminary investigations indicated that many trends are due to changes in data quality, ascertainment, screening, or diagnostic methods. Some trends are inevitably chance phenomena related to multiple testing, while others seem to represent real and continuing change needing further investigation and response by regional/national public health authorities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the past 20 years the theory of robust estimation has become an important topic of mathematical statistics. We discuss here some basic concepts of this theory with the help of simple examples. Furthermore we describe a subroutine library for the application of robust statistical procedures, which was developed with the support of the Swiss National Science Foundation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.