Biblioteca Digital

993 resultados para Statistical variance

Recent developments in statistical methods for detecting genetic loci affecting phenotypic variability

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A number of recent works have introduced statistical methods for detecting genetic loci that affect phenotypic variability, which we refer to as variability-controlling quantitative trait loci (vQTL). These are genetic variants whose allelic state predicts how much phenotype values will vary about their expected means. Such loci are of great potential interest in both human and non-human genetic studies, one reason being that a detected vQTL could represent a previously undetected interaction with other genes or environmental factors. The simultaneous publication of these new methods in different journals has in many cases precluded opportunity for comparison. We survey some of these methods, the respective trade-offs they imply, and the connections between them. The methods fall into three main groups: classical non-parametric, fully parametric, and semi-parametric two-stage approximations. Choosing between alternatives involves balancing the need for robustness, flexibility, and speed. For each method, we identify important assumptions and limitations, including those of practical importance, such as their scope for including covariates and random effects. We show in simulations that both parametric methods and their semi-parametric approximations can give elevated false positive rates when they ignore mean-variance relationships intrinsic to the data generation process. We conclude that choice of method depends on the trait distribution, the need to include non-genetic covariates, and the population size and structure, coupled with a critical evaluation of how these fit with the assumptions of the statistical model.

Learning from large data : bias, variance, sampling, and learning curves

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the fundamental machine learning tasks is that of predictive classification. Given that organisations collect an ever increasing amount of data, predictive classification methods must be able to effectively and efficiently handle large amounts of data. However, it is understood that present requirements push existing algorithms to, and sometimes beyond, their limits since many classification prediction algorithms were designed when currently common data set sizes were beyond imagination. This has led to a significant amount of research into ways of making classification learning algorithms more effective and efficient. Although substantial progress has been made, a number of key questions have not been answered. This dissertation investigates two of these key questions. The first is whether different types of algorithms to those currently employed are required when using large data sets. This is answered by analysis of the way in which the bias plus variance decomposition of predictive classification error changes as training set size is increased. Experiments find that larger training sets require different types of algorithms to those currently used. Some insight into the characteristics of suitable algorithms is provided, and this may provide some direction for the development of future classification prediction algorithms which are specifically designed for use with large data sets. The second question investigated is that of the role of sampling in machine learning with large data sets. Sampling has long been used as a means of avoiding the need to scale up algorithms to suit the size of the data set by scaling down the size of the data sets to suit the algorithm. However, the costs of performing sampling have not been widely explored. Two popular sampling methods are compared with learning from all available data in terms of predictive accuracy, model complexity, and execution time. The comparison shows that sub-sampling generally products models with accuracy close to, and sometimes greater than, that obtainable from learning with all available data. This result suggests that it may be possible to develop algorithms that take advantage of the sub-sampling methodology to reduce the time required to infer a model while sacrificing little if any accuracy. Methods of improving effective and efficient learning via sampling are also investigated, and now sampling methodologies proposed. These methodologies include using a varying-proportion of instances to determine the next inference step and using a statistical calculation at each inference step to determine sufficient sample size. Experiments show that using a statistical calculation of sample size can not only substantially reduce execution time but can do so with only a small loss, and occasional gain, in accuracy. One of the common uses of sampling is in the construction of learning curves. Learning curves are often used to attempt to determine the optimal training size which will maximally reduce execution time while nut being detrimental to accuracy. An analysis of the performance of methods for detection of convergence of learning curves is performed, with the focus of the analysis on methods that calculate the gradient, of the tangent to the curve. Given that such methods can be susceptible to local accuracy plateaus, an investigation into the frequency of local plateaus is also performed. It is shown that local accuracy plateaus are a common occurrence, and that ensuring a small loss of accuracy often results in greater computational cost than learning from all available data. These results cast doubt over the applicability of gradient of tangent methods for detecting convergence, and of the viability of learning curves for reducing execution time in general.

An optimized mean variance estimation method for uncertainty quantification of wind power forecasts

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A statistical optimized technique for rapid development of reliable prediction intervals (PIs) is presented in this study. The mean-variance estimation (MVE) technique is employed here for quantification of uncertainties related with wind power predictions. In this method, two separate neural network models are used for estimation of wind power generation and its variance. A novel PI-based training algorithm is also presented to enhance the performance of the MVE method and improve the quality of PIs. For an in-depth analysis, comprehensive experiments are conducted with seasonal datasets taken from three geographically dispersed wind farms in Australia. Five confidence levels of PIs are between 50% and 90%. Obtained results show while both traditional and optimized PIs are hypothetically valid, the optimized PIs are much more informative than the traditional MVE PIs. The informativeness of these PIs paves the way for their application in trouble-free operation and smooth integration of wind farms into energy systems. © 2014 Elsevier Ltd. All rights reserved.

Power, effects, confidence, and significance: an investigation of statistical practices in nursing research

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives: To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Design: Statistical review. Data sources: Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Review methods: Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. Results: The median power to detect small, medium, and large effect sizes was .40 (interquartile range [. IQR]. = .24-.71), .98 (IQR= .85-1.00), and 1.00 (IQR= 1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR= .26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. Conclusion: The use, reporting, and interpretation of inferential statistics in nursing research need substantial improvement. Most importantly, researchers should abandon the misleading practice of interpreting the results from inferential tests based solely on whether they are statistically significant (or not) and, instead, focus on reporting and interpreting effect sizes, confidence intervals, and significance levels. Nursing researchers also need to conduct and report a priori power analyses, and to address the issue of Type I experiment-wise error inflation in their studies. © 2013 .

A comparison of statistical methods for genomic selection in a mice population

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

An adaptive chart for monitoring the process mean and variance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditionally, an (X) over bar chart is used to control the process mean and an R chart is used to control the process variance. However, these charts are not sensitive to small changes in the process parameters. The adaptive ($) over bar and R charts might be considered if the aim is to detect small disturbances. Due to the statistical character of the joint (X) over bar and R charts with fixed or adaptive parameters, they are not reliable in identifing the nature of the disturbance, whether it is one that shifts the process mean, increases the process variance, or leads to a combination of both effects. In practice, the speed with which the control charts detect process changes may be more important than their ability in identifying the nature of the change. Under these circumstances, it seems to be advantageous to consider a single chart, based on only one statistic, to simultaneously monitor the process mean and variance. In this paper, we propose the adaptive non-central chi-square statistic chart. This new chart is more effective than the adaptive (X) over bar and R charts in detecting disturbances that shift the process mean, increase the process variance, or lead to a combination of both effects. Copyright (c) 2006 John Wiley & Sons, Ltd.

Difference variance dispersion graphs for comparing response surface designs with applications in food technology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Variance dispersion graphs have become a popular tool in aiding the choice of a response surface design. Often differences in response from some particular point, such as the expected position of the optimum or standard operating conditions, are more important than the response itself. We describe two examples from food technology. In the first, an experiment was conducted to find the levels of three factors which optimized the yield of valuable products enzymatically synthesized from sugars and to discover how the yield changed as the levels of the factors were changed from the optimum. In the second example, an experiment was conducted on a mixing process for pastry dough to discover how three factors affected a number of properties of the pastry, with a view to using these factors to control the process. We introduce the difference variance dispersion graph (DVDG) to help in the choice of a design in these circumstances. The DVDG for blocked designs is developed and the examples are used to show how the DVDG can be used in practice. In both examples a design was chosen by using the DVDG, as well as other properties, and the experiments were conducted and produced results that were useful to the experimenters. In both cases the conclusions were drawn partly by comparing responses at different points on the response surface.

Optimum design of experiments for statistical inference

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Estimation of variance components and prediction of breeding values in rubber tree breeding using the REML/BLUP procedure

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present paper deals with estimation of variance components, prediction of breeding values and selection in a population of rubber tree [Hevea brasiliensis (Willd. ex Adr. de Juss.) Müell.-Arg.] from Rio Branco, State of Acre, Brazil. The REML/BLUP (restricted maximum likelihood/best linear unbiased prediction) procedure was applied. For this purpose, 37 rubber tree families were obtained and assessed in a randomized complete block design, with three unbalanced replications. The field trial was carried out at the Experimental Station of UNESP, located in Selvíria, State of Mato Grosso do Sul, Brazil. The quantitative traits evaluated were: girth (G), bark thickness (BT), number of latex vessel rings (NR), and plant height (PH). Given the unbalanced condition of the progeny test, the REML/BLUP procedure was used for estimation. The narrow-sense individual heritability estimates were 0.43 for G, 0.18 for BT, 0.01 for NR, and 0.51 for PH. Two selection strategies were adopted: one short-term (ST - selection intensity of 8.85%) and the other long-term (LT - selection intensity of 26.56%). For G, the estimated genetic gains in relation to the population average were 26.80% and 17.94%, respectively, according to the ST and LT strategies. The effective population sizes were 22.35 and 46.03, respectively. The LT and ST strategies maintained 45.80% and 28.24%, respectively, of the original genetic diversity represented in the progeny test. So, it can be inferred that this population has potential for both breeding and ex situ genetic conservation as a supplier of genetic material for advanced rubber tree breeding programs. Copyright by the Brazilian Society of Genetics.

A synthetic control chart for monitoring the process mean and variance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose - The aim of this paper is to present a synthetic chart based on the non-central chi-square statistic that is operationally simpler and more effective than the joint X̄ and R chart in detecting assignable cause(s). This chart will assist in identifying which (mean or variance) changed due to the occurrence of the assignable causes. Design/methodology/approach - The approach used is based on the non-central chi-square statistic and the steady-state average run length (ARL) of the developed chart is evaluated using a Markov chain model. Findings - The proposed chart always detects process disturbances faster than the joint X̄ and R charts. The developed chart can monitor the process instead of looking at two charts separately. Originality/value - The most important advantage of using the proposed chart is that practitioners can monitor the process by looking at only one chart instead of looking at two charts separately. © Emerald Group Publishing Limted.

Influence of Tapered and External Hexagon Connections on Bone Stresses Around Tilted Dental Implants: Three-Dimensional Finite Element Method With Statistical Analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The purpose of this study is to analyze the tension distribution on bone tissue around implants with different angulations (0 degrees, 17 degrees, and 30 degrees) and connections (external hexagon and tapered) through the use of three-dimensional finite element and statistical analyses.Methods: Twelve different configurations of three-dimensional finite element models, including three inclinations of the implants (0 degrees, 17 degrees, and 30 degrees), two connections (an external hexagon and a tapered), and two load applications (axial and oblique), were simulated. The maximum principal stress values for cortical bone were measured at the mesial, distal, buccal, and lingual regions around the implant for each analyzed situation, totaling 48 groups. Loads of 200 and 100 N were applied at the occlusal surface in the axial and oblique directions, respectively. Maximum principal stress values were measured at the bone crest and statistically analyzed using analysis of variance. Stress patterns in the bone tissue around the implant were analyzed qualitatively.Results: The results demonstrated that under the oblique loading process, the external hexagon connection showed significantly higher stress concentrations in the bone tissue (P < 0.05) compared with the tapered connection. Moreover, the buccal and mesial regions of the cortical bone concentrated significantly higher stress (P < 0.005) to the external hexagon implant type. Under the oblique loading direction, the increased external hexagon implant angulation induced a significantly higher stress concentration (P = 0.045).Conclusions: The study results show that: 1) the oblique load was more damaging to bone tissue, mainly when associated with external hexagon implants; and 2) there was a higher stress concentration on the buccal region in comparison to all other regions under oblique load.

Biomechanical influence of crown-to-implant ratio on stress distribution over internal hexagon short implant: 3-D finite element analysis with statistical test

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of short implants is relevant to the biomechanics of dental implants, and research on crown increase has implications for the daily clinic. The aim of this study was to analyze the biomechanical interactions of a singular implant-supported prosthesis of different crown heights under vertical and oblique force, using the 3-D finite element method. Six 3-D models were designed with Invesalius 3.0, Rhinoceros 3D 4.0, and Solidworks 2010 software. Each model was constructed with a mandibular segment of bone block, including an implant supporting a screwed metal-ceramic crown. The crown height was set at 10, 12.5, and 15 mm. The applied force was 200 N (axial) and 100 N (oblique). We performed an ANOVA statistical test and Tukey tests; p < 0.05 was considered statistically significant. The increase of crown height did not influence the stress distribution on screw prosthetic (p > 0.05) under axial load. However, crown heights of 12.5 and 15 mm caused statistically significant damage to the stress distribution of screws and to the cortical bone (p <0.001) under oblique load. High crown to implant (C/I) ratio harmed microstrain distribution on bone tissue under axial and oblique loads (p < 0.001). Crown increase was a possible deleterious factor to the screws and to the different regions of bone tissue. (C) 2014 Elsevier Ltd. All rights reserved.

Structural damage detection in an aeronautical panel using analysis of variance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Female fertility in a Guzerat dairy subpopulation: Heterogeneity of variance components for calving intervals

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objectives of the present study were to determine if variance components of calving intervals varied with age at calving and if considering calving intervals as a longitudinal trait would be a useful approach for fertility analysis of Zebu dairy herds. With these purposes, calving records from females born from 1940 to 2006 in a Guzerat dairy subpopulation in Brazil were analyzed. The fixed effects of contemporary groups, formed by year and farm at birth or at calving, and the regressions of age at calving, equivalent inbreeding coefficient and day of the year on the studied traits were considered in the statistical models. In one approach, calving intervals (Cl) were analyzed as a single trait, by fitting a statistical model on which both animal and permanent environment effects were adjusted for the effect of age at calving by random regression. In a second approach, a four-trait analysis was conducted, including age at first calving (AFC) and three different female categories for the calving intervals: first calving females; young females (less than 80 months old, but not first calving); or mature females (80 months old or more). Finally, a two-trait analysis was performed, also including AFC and Cl, but calving intervals were regarded as a single trait in a repeatability model. Additionally, the ranking of sires was compared among approaches. Calving intervals decreased with age until females were about 80 months old, remaining nearly constant after that age. A quasi-linear increase of 11.5 days on the calving intervals was observed for each 10% increase in the female's equivalent inbreeding coefficient. The heritability of AFC was 0.37. For Cl. the genetic-phenotypic variance ratios ranged from 0.064 to 0.141, depending on the approach and on ages at calving. Differences among genetic variance components for calving intervals were observed along the animal's lifetime. Those differences confirmed the longitudinal aspect of that trait, indicating the importance of such consideration when accessing fertility of Zebu dairy females, especially in situations where the available information relies on their calving intervals. Spearman rank correlations among approaches ranged from 0.90 to 0.95, and changes observed in the ranking of sires suggested that the genetic progress of the population could be affected by the approach chosen for the analysis of calving intervals. (C) 2012 Elsevier ay. All rights reserved.

Quantitative Chemical Profile and Multivariate Statistical Analysis of Alembic Distilled Sugarcane Spirit Fractions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Concentrations of 39 organic compounds were determined in three fractions (head, heart and tail) obtained from the pot still distillation of fermented sugarcane juice. The results were evaluated using analysis of variance (ANOVA), Tukey's test, principal component analysis (PCA), hierarchical cluster analysis (HCA) and linear discriminant analysis (LDA). According to PCA and HCA, the experimental data lead to the formation of three clusters. The head fractions give rise to a more defined group. The heart and tail fractions showed some overlap consistent with its acid composition. The predictive ability of calibration and validation of the model generated by LDA for the three fractions classification were 90.5 and 100%, respectively. This model recognized as the heart twelve of the thirteen commercial cachacas (92.3%) with good sensory characteristics, thus showing potential for guiding the process of cuts.

«
1
2
3
4
5
6
7
8
...
66
67
»