989 resultados para Statistical variance
Resumo:
NAPLAN RESULTS HAVE gained socio-political prominence and have been used as indicators of educational outcomes for all students, including Indigenous students. Despite the promise of open and in-depth access to NAPLAN data as a vehicle for intervention, we argue that the use of NAPLAN data as a basis for teachers and schools to reduce variance in learning outcomes is insufficient. NAPLAN tests are designed to show statistical variance at the level of the school and the individual, yet do not factor in the sociocultural and cognitive conditions Indigenous students’ experience when taking the tests. We contend that further understanding of these influences may help teachers understand how to develop their classroom practices to secure better numeracy and literacy outcomes for all students. Empirical research findings demonstrate how teachers can develop their classroom practices from an understanding of the extraneous cognitive load imposed by test taking. We have analysed Indigenous students’ experience of solving mathematical test problems to discover evidence of extraneous cognitive load. We have also explored conditions that are more supportive of learning derived from a classroom intervention which provides an alternative way to both assess and build learning for Indigenous students. We conclude that conditions to support assessment for more equitable learning outcomes require a reduction in cognitive load for Indigenous students while maintaining a high level of expectation and participation in problem solving.
Resumo:
Three fertilizer types (NPK, Super-phosphate and cow dung) were applied at two levels (Low, 0.3 kg/25m super(2)/2weeks and High, 0.7kg/25 m super(2)/2weeks) to 12 ponds with two ponds serving as control. Each pond had an area of 25 m super(2). Application of fertilizers and monitoring of plankton productivity and water quality parameters continued fortnightly for 52 days. Results obtained were subjected to Statistical Variance Analysis. The abundance of phytoplankton was in the order: Chlorophyceae > Bacillariophyceae > Cyanophyceae > Desmideaceae. While that of zooplankton followed the order: Crustacean > Rotifer > Protozoan. Primary productivity showed a variation between treatments with lowest value of 5592 mg/O sub(2)/m super(3)/day obtained in the control and cow dung low application rates (1.5 kg/25 m super(2)/2weeks). The highest value for primary productivity was obtained at M sub(2) (0.7 kg/25 m super(2)/2weeks, N.P.K) with primary productivity value of 7200 mg/O sub(2)/m super(3)/day, closely followed by M sub(4) (0.7 kg/25 m super(2)/2weeks, super phosphate) with 6792 mg/O sub(2)/m super(3)/day.
Resumo:
Tese de doutoramento, Psicologia (Psicologia Clínica), Universidade de Lisboa, Faculdade de Psicologia, 2015
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Multi-site time series studies of air pollution and mortality and morbidity have figured prominently in the literature as comprehensive approaches for estimating acute effects of air pollution on health. Hierarchical models are generally used to combine site-specific information and estimate pooled air pollution effects taking into account both within-site statistical uncertainty, and across-site heterogeneity. Within a site, characteristics of time series data of air pollution and health (small pollution effects, missing data, highly correlated predictors, non linear confounding etc.) make modelling all sources of uncertainty challenging. One potential consequence is underestimation of the statistical variance of the site-specific effects to be combined. In this paper we investigate the impact of variance underestimation on the pooled relative rate estimate. We focus on two-stage normal-normal hierarchical models and on under- estimation of the statistical variance at the first stage. By mathematical considerations and simulation studies, we found that variance underestimation does not affect the pooled estimate substantially. However, some sensitivity of the pooled estimate to variance underestimation is observed when the number of sites is small and underestimation is severe. These simulation results are applicable to any two-stage normal-normal hierarchical model for combining information of site-specific results, and they can be easily extended to more general hierarchical formulations. We also examined the impact of variance underestimation on the national average relative rate estimate from the National Morbidity Mortality Air Pollution Study and we found that variance underestimation as much as 40% has little effect on the national average.
Resumo:
Tese de doutoramento, Psicologia (Psicologia Clínica), Universidade de Lisboa, Faculdade de Psicologia, 2016
Resumo:
Home literacy environment explains between 12 and 18.5% of the variance of children’s language skills. Although most authors agree that children whose parents encourage them to read tend to develop better and earlier reading skills, some authors consider that the impact of family environment in reading skills is overvalued. Probably, other variables of parent–child relationship, like parenting styles, might be relevant for this field. Nevertheless, no previous studies on the effect of parenting styles in literacy have been found. To analyze the role of parenting styles in the reading processes of children. Children’s perceptions of parenting styles contribute significantly to the explanation of statistical variance of children’s reading processes. 110 children (67 boys and 43 girls), aged between 7 and 11 years (M=9.22 and SD = 1.14) from Portuguese schools answered to a socio-demographic questionnaire. To assess reading processes it was administered the Portuguese adaptation (Figueira et al. in press) of Bateria de Avaliação dos Processos Leitores-Revista (PROLEC-R). To assess the parenting styles Egna Minnen av Barndoms Uppfostran-parents (EMBU-P) and EMBU-C (children version) were administered. According to multiple hierarchical linear regressions, individual factors contribute to explain all reading tests of PROLEC-R, while family factors contribute to explain most of these tests. Regarding parenting styles, results evidence the explanatory power about grammatical structures, sentence comprehension and listening. Parenting styles have an important role in the explanation of higher reading processes (syntactic and semantic) but not in lexical processes, focused by main theories concerning dyslexia.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
Understanding the complexities that are involved in the genetics of multifactorial diseases is still a monumental task. In addition to environmental factors that can influence the risk of disease, there is also a number of other complicating factors. Genetic variants associated with age of disease onset may be different from those variants associated with overall risk of disease, and variants may be located in positions that are not consistent with the traditional protein coding genetic paradigm. Latent Variable Models are well suited for the analysis of genetic data. A latent variable is one that we do not directly observe, but which is believed to exist or is included for computational or analytic convenience in a model. This thesis presents a mixture of methodological developments utilising latent variables, and results from case studies in genetic epidemiology and comparative genomics. Epidemiological studies have identified a number of environmental risk factors for appendicitis, but the disease aetiology of this oft thought useless vestige remains largely a mystery. The effects of smoking on other gastrointestinal disorders are well documented, and in light of this, the thesis investigates the association between smoking and appendicitis through the use of latent variables. By utilising data from a large Australian twin study questionnaire as both cohort and case-control, evidence is found for the association between tobacco smoking and appendicitis. Twin and family studies have also found evidence for the role of heredity in the risk of appendicitis. Results from previous studies are extended here to estimate the heritability of age-at-onset and account for the eect of smoking. This thesis presents a novel approach for performing a genome-wide variance components linkage analysis on transformed residuals from a Cox regression. This method finds evidence for a dierent subset of genes responsible for variation in age at onset than those associated with overall risk of appendicitis. Motivated by increasing evidence of functional activity in regions of the genome once thought of as evolutionary graveyards, this thesis develops a generalisation to the Bayesian multiple changepoint model on aligned DNA sequences for more than two species. This sensitive technique is applied to evaluating the distributions of evolutionary rates, with the finding that they are much more complex than previously apparent. We show strong evidence for at least 9 well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least 7 classes in an alignment of four mammals, including human. A pattern of enrichment and depletion of genic regions in the profiled segments suggests they are functionally significant, and most likely consist of various functional classes. Furthermore, a method of incorporating alignment characteristics representative of function such as GC content and type of mutation into the segmentation model is developed within this thesis. Evidence of fine-structured segmental variation is presented.
Resumo:
Most statistical methods use hypothesis testing. Analysis of variance, regression, discrete choice models, contingency tables, and other analysis methods commonly used in transportation research share hypothesis testing as the means of making inferences about the population of interest. Despite the fact that hypothesis testing has been a cornerstone of empirical research for many years, various aspects of hypothesis tests commonly are incorrectly applied, misinterpreted, and ignored—by novices and expert researchers alike. On initial glance, hypothesis testing appears straightforward: develop the null and alternative hypotheses, compute the test statistic to compare to a standard distribution, estimate the probability of rejecting the null hypothesis, and then make claims about the importance of the finding. This is an oversimplification of the process of hypothesis testing. Hypothesis testing as applied in empirical research is examined here. The reader is assumed to have a basic knowledge of the role of hypothesis testing in various statistical methods. Through the use of an example, the mechanics of hypothesis testing is first reviewed. Then, five precautions surrounding the use and interpretation of hypothesis tests are developed; examples of each are provided to demonstrate how errors are made, and solutions are identified so similar errors can be avoided. Remedies are provided for common errors, and conclusions are drawn on how to use the results of this paper to improve the conduct of empirical research in transportation.