Biblioteca Digital

897 resultados para SAMPLE SIZE

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.

How small can a sample size be for a structural equation model?

Relevância:

100.00% 100.00%

Publicador:

Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyzes whether standard covariance matrix tests work whendimensionality is large, and in particular larger than sample size. Inthe latter case, the singularity of the sample covariance matrix makeslikelihood ratio tests degenerate, but other tests based on quadraticforms of sample covariance matrix eigenvalues remain well-defined. Westudy the consistency property and limiting distribution of these testsas dimensionality and sample size go to infinity together, with theirratio converging to a finite non-zero limit. We find that the existingtest for sphericity is robust against high dimensionality, but not thetest for equality of the covariance matrix to a given matrix. For thelatter test, we develop a new correction to the existing test statisticthat makes it robust against high dimensionality.

Sample size and statistical power in the hierarchical analysis of variance: applications in morphometry of the nervous system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analysis of variance is commonly used in morphometry in order to ascertain differences in parameters between several populations. Failure to detect significant differences between populations (type II error) may be due to suboptimal sampling and lead to erroneous conclusions; the concept of statistical power allows one to avoid such failures by means of an adequate sampling. Several examples are given in the morphometry of the nervous system, showing the use of the power of a hierarchical analysis of variance test for the choice of appropriate sample and subsample sizes. In the first case chosen, neuronal densities in the human visual cortex, we find the number of observations to be of little effect. For dendritic spine densities in the visual cortex of mice and humans, the effect is somewhat larger. A substantial effect is shown in our last example, dendritic segmental lengths in monkey lateral geniculate nucleus. It is in the nature of the hierarchical model that sample size is always more important than subsample size. The relative weight to be attributed to subsample size thus depends on the relative magnitude of the between observations variance compared to the between individuals variance.

A program for the computation of power and determination of sample size in hierarchical experimental designs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have devised a program that allows computation of the power of F-test, and hence determination of appropriate sample and subsample sizes, in the context of the one-way hierarchical analysis of variance with fixed effects. The power at a fixed alternative is an increasing function of the sample size and of the subsample size. The program makes it easy to obtain the power of F-test for a range of values of sample and subsample sizes, and therefore the appropriate sizes based on a desired power. The program can be used for the 'ordinary' case of the one-way analysis of variance, as well as for hierarchical analysis of variance with two stages of sampling. Examples are given of the practical use of the program.

Sample size for measurement of root traits on common bean by image analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Evaluation of root traits may be facilitated if they are assessed on samples of the root system. The objective of this work was to determine the sample size of the root system in order to estimate root traits of common bean (Phaseolus vulgaris L.) cultivars by digital image analysis. One plant was grown per pot and harvested at pod setting, with 64 and 16 pots corresponding to two and four cultivars in the first and second experiments, respectively. Root samples were scanned up to the completeness of the root system and the root area and length were estimated. Scanning a root sample demanded 21 minutes, and scanning the entire root system demanded 4 hours and 53 minutes. In the first experiment, root area and length estimated with two samples showed, respectively, a correlation of 0.977 and 0.860, with these traits measured in the entire root. In the second experiment, the correlation was 0.889 and 0.915. The increase in the correlation with more than two samples was negligible. The two samples corresponded to 13.4% and 16.9% of total root mass (excluding taproot and nodules) in the first and second experiments. Taproot stands for a high proportion of root mass and must be deducted on root trait estimations. Samples with nearly 15% of total root mass produce reliable root trait estimates.

Sample size for full-sib family evaluation in sugarcane

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this study was to determine the minimum number of plants per plot that must be sampled in experiments with sugarcane (Saccharum officinarum) full-sib families in order to provide an effective estimation of genetic and phenotypic parameters of yield-related traits. The data were collected in a randomized complete block design with 18 sugarcane full-sib families and 6 replicates, with 20 plants per plot. The sample size was determined using resampling techniques with replacement, followed by an estimation of genetic and phenotypic parameters. Sample-size estimates varied according to the evaluated parameter and trait. The resampling method permits an efficient comparison of the sample-size effects on the estimation of genetic and phenotypic parameters. A sample of 16 plants per plot, or 96 individuals per family, was sufficient to obtain good estimates for all traits considered of all the characters evaluated. However, for Brix, if sample separation by trait were possible, ten plants per plot would give an efficient estimate for most of the characters evaluated.

Effects of sample size on the performance of species distribution models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.

Weed control in corn and weed sample size for growth evaluations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objectives of this study were to evaluate baby corn yield, green corn yield, and grain yield in corn cultivar BM 3061, with weed control achieved via a combination of hoeing and intercropping with gliricidia, and determine how sample size influences weed growth evaluation accuracy. A randomized block design with ten replicates was used. The cultivar was submitted to the following treatments: A = hoeings at 20 and 40 days after corn sowing (DACS), B = hoeing at 20 DACS + gliricidia sowing after hoeing, C = gliricidia sowing together with corn sowing + hoeing at 40 DACS, D = gliricidia sowing together with corn sowing, and E = no hoeing. Gliricidia was sown at a density of 30 viable seeds m-2. After harvesting the mature ears, the area of each plot was divided into eight sampling units measuring 1.2 m² each to evaluate weed growth (above-ground dry biomass). Treatment A provided the highest baby corn, green corn, and grain yields. Treatment B did not differ from treatment A with respect to the yield values for the three products, and was equivalent to treatment C for green corn yield, but was superior to C with regard to baby corn weight and grain yield. Treatments D and E provided similar yields and were inferior to the other treatments. Therefore, treatment B is a promising one. The relation between coefficient of experimental variation (CV) and sample size (S) to evaluate growth of the above-ground part of the weeds was given by the equation CV = 37.57 S-0.15, i.e., CV decreased as S increased. The optimal sample size indicated by this equation was 4.3 m².

Sample size requirements for interval estimation of the strengthof association effect sizes in multiple regression analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resumen tomado de la publicación

Sample size planning for multiple correlation : reply to Shieh (2013)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resumen tomado de la publicaci??n

Subgroup analyses in randomized trials: risks of subgroup- specific analyses; power and sample size for the interaction test

Relevância:

100.00% 100.00%

Publicador:

On sample size calculations for 2x2 cross-over designs applied to bioequivalence studies

Relevância:

100.00% 100.00%

Publicador:

Comparison of sample size formulae for 2x2 cross-over designs applied to bioequivalence studies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the comparison of two formulations in terms of average bioequivalence using the 2 × 2 cross-over design. In a bioequivalence study, the primary outcome is a pharmacokinetic measure, such as the area under the plasma concentration by time curve, which is usually assumed to have a lognormal distribution. The criterion typically used for claiming bioequivalence is that the 90% confidence interval for the ratio of the means should lie within the interval (0.80, 1.25), or equivalently the 90% confidence interval for the differences in the means on the natural log scale should be within the interval (-0.2231, 0.2231). We compare the gold standard method for calculation of the sample size based on the non-central t distribution with those based on the central t and normal distributions. In practice, the differences between the various approaches are likely to be small. Further approximations to the power function are sometimes used to simplify the calculations. These approximations should be used with caution, because the sample size required for a desirable level of power might be under- or overestimated compared to the gold standard method. However, in some situations the approximate methods produce very similar sample sizes to the gold standard method. Copyright © 2005 John Wiley & Sons, Ltd.

Bayesian sample size for exploratory clinical trials incorporating historical data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.

«
1
2
3
4
5
6
7
8
...
59
60
»