201 resultados para Statistical variance
em Queensland University of Technology - ePrints Archive
Resumo:
NAPLAN RESULTS HAVE gained socio-political prominence and have been used as indicators of educational outcomes for all students, including Indigenous students. Despite the promise of open and in-depth access to NAPLAN data as a vehicle for intervention, we argue that the use of NAPLAN data as a basis for teachers and schools to reduce variance in learning outcomes is insufficient. NAPLAN tests are designed to show statistical variance at the level of the school and the individual, yet do not factor in the sociocultural and cognitive conditions Indigenous students’ experience when taking the tests. We contend that further understanding of these influences may help teachers understand how to develop their classroom practices to secure better numeracy and literacy outcomes for all students. Empirical research findings demonstrate how teachers can develop their classroom practices from an understanding of the extraneous cognitive load imposed by test taking. We have analysed Indigenous students’ experience of solving mathematical test problems to discover evidence of extraneous cognitive load. We have also explored conditions that are more supportive of learning derived from a classroom intervention which provides an alternative way to both assess and build learning for Indigenous students. We conclude that conditions to support assessment for more equitable learning outcomes require a reduction in cognitive load for Indigenous students while maintaining a high level of expectation and participation in problem solving.
Resumo:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.
Resumo:
The success rate of carrier phase ambiguity resolution (AR) is the probability that the ambiguities are successfully fixed to their correct integer values. In existing works, an exact success rate formula for integer bootstrapping estimator has been used as a sharp lower bound for the integer least squares (ILS) success rate. Rigorous computation of success rate for the more general ILS solutions has been considered difficult, because of complexity of the ILS ambiguity pull-in region and computational load of the integration of the multivariate probability density function. Contributions of this work are twofold. First, the pull-in region mathematically expressed as the vertices of a polyhedron is represented by a multi-dimensional grid, at which the cumulative probability can be integrated with the multivariate normal cumulative density function (mvncdf) available in Matlab. The bivariate case is studied where the pull-region is usually defined as a hexagon and the probability is easily obtained using mvncdf at all the grid points within the convex polygon. Second, the paper compares the computed integer rounding and integer bootstrapping success rates, lower and upper bounds of the ILS success rates to the actual ILS AR success rates obtained from a 24 h GPS data set for a 21 km baseline. The results demonstrate that the upper bound probability of the ILS AR probability given in the existing literatures agrees with the actual ILS success rate well, although the success rate computed with integer bootstrapping method is a quite sharp approximation to the actual ILS success rate. The results also show that variations or uncertainty of the unit–weight variance estimates from epoch to epoch will affect the computed success rates from different methods significantly, thus deserving more attentions in order to obtain useful success probability predictions.
Resumo:
Analytical expressions are derived for the mean and variance, of estimates of the bispectrum of a real-time series assuming a cosinusoidal model. The effects of spectral leakage, inherent in discrete Fourier transform operation when the modes present in the signal have a nonintegral number of wavelengths in the record, are included in the analysis. A single phase-coupled triad of modes can cause the bispectrum to have a nonzero mean value over the entire region of computation owing to leakage. The variance of bispectral estimates in the presence of leakage has contributions from individual modes and from triads of phase-coupled modes. Time-domain windowing reduces the leakage. The theoretical expressions for the mean and variance of bispectral estimates are derived in terms of a function dependent on an arbitrary symmetric time-domain window applied to the record. the number of data, and the statistics of the phase coupling among triads of modes. The theoretical results are verified by numerical simulations for simple test cases and applied to laboratory data to examine phase coupling in a hypothesis testing framework
Resumo:
Multivariate volatility forecasts are an important input in many financial applications, in particular portfolio optimisation problems. Given the number of models available and the range of loss functions to discriminate between them, it is obvious that selecting the optimal forecasting model is challenging. The aim of this thesis is to thoroughly investigate how effective many commonly used statistical (MSE and QLIKE) and economic (portfolio variance and portfolio utility) loss functions are at discriminating between competing multivariate volatility forecasts. An analytical investigation of the loss functions is performed to determine whether they identify the correct forecast as the best forecast. This is followed by an extensive simulation study examines the ability of the loss functions to consistently rank forecasts, and their statistical power within tests of predictive ability. For the tests of predictive ability, the model confidence set (MCS) approach of Hansen, Lunde and Nason (2003, 2011) is employed. As well, an empirical study investigates whether simulation findings hold in a realistic setting. In light of these earlier studies, a major empirical study seeks to identify the set of superior multivariate volatility forecasting models from 43 models that use either daily squared returns or realised volatility to generate forecasts. This study also assesses how the choice of volatility proxy affects the ability of the statistical loss functions to discriminate between forecasts. Analysis of the loss functions shows that QLIKE, MSE and portfolio variance can discriminate between multivariate volatility forecasts, while portfolio utility cannot. An examination of the effective loss functions shows that they all can identify the correct forecast at a point in time, however, their ability to discriminate between competing forecasts does vary. That is, QLIKE is identified as the most effective loss function, followed by portfolio variance which is then followed by MSE. The major empirical analysis reports that the optimal set of multivariate volatility forecasting models includes forecasts generated from daily squared returns and realised volatility. Furthermore, it finds that the volatility proxy affects the statistical loss functions’ ability to discriminate between forecasts in tests of predictive ability. These findings deepen our understanding of how to choose between competing multivariate volatility forecasts.
Resumo:
The cotton strip assay (CSA) is an established technique for measuring soil microbial activity. The technique involves burying cotton strips and measuring their tensile strength after a certain time. This gives a measure of the rotting rate, R, of the cotton strips. R is then a measure of soil microbial activity. This paper examines properties of the technique and indicates how the assay can be optimised. Humidity conditioning of the cotton strips before measuring their tensile strength reduced the within and between day variance and enabled the distribution of the tensile strength measurements to approximate normality. The test data came from a three-way factorial experiment (two soils, two temperatures, three moisture levels). The cotton strips were buried in the soil for intervals of time ranging up to 6 weeks. This enabled the rate of loss of cotton tensile strength with time to be studied under a range of conditions. An inverse cubic model accounted for greater than 90% of the total variation within each treatment combination. This offers support for summarising the decomposition process by a single parameter R. The approximate variance of the decomposition rate was estimated from a function incorporating the variance of tensile strength and the differential of the function for the rate of decomposition, R, with respect to tensile strength. This variance function has a minimum when the measured strength is approximately 2/3 that of the original strength. The estimates of R are almost unbiased and relatively robust against the cotton strips being left in the soil for more or less than the optimal time. We conclude that the rotting rate X should be measured using the inverse cubic equation, and that the cotton strips should be left in the soil until their strength has been reduced to about 2/3.
Resumo:
Bounds on the expectation and variance of errors at the output of a multilayer feedforward neural network with perturbed weights and inputs are derived. It is assumed that errors in weights and inputs to the network are statistically independent and small. The bounds obtained are applicable to both digital and analogue network implementations and are shown to be of practical value.