996 resultados para Statistical variance
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Multi-site time series studies of air pollution and mortality and morbidity have figured prominently in the literature as comprehensive approaches for estimating acute effects of air pollution on health. Hierarchical models are generally used to combine site-specific information and estimate pooled air pollution effects taking into account both within-site statistical uncertainty, and across-site heterogeneity. Within a site, characteristics of time series data of air pollution and health (small pollution effects, missing data, highly correlated predictors, non linear confounding etc.) make modelling all sources of uncertainty challenging. One potential consequence is underestimation of the statistical variance of the site-specific effects to be combined. In this paper we investigate the impact of variance underestimation on the pooled relative rate estimate. We focus on two-stage normal-normal hierarchical models and on under- estimation of the statistical variance at the first stage. By mathematical considerations and simulation studies, we found that variance underestimation does not affect the pooled estimate substantially. However, some sensitivity of the pooled estimate to variance underestimation is observed when the number of sites is small and underestimation is severe. These simulation results are applicable to any two-stage normal-normal hierarchical model for combining information of site-specific results, and they can be easily extended to more general hierarchical formulations. We also examined the impact of variance underestimation on the national average relative rate estimate from the National Morbidity Mortality Air Pollution Study and we found that variance underestimation as much as 40% has little effect on the national average.
Resumo:
Tese de doutoramento, Psicologia (Psicologia Clínica), Universidade de Lisboa, Faculdade de Psicologia, 2016
Resumo:
Home literacy environment explains between 12 and 18.5% of the variance of children’s language skills. Although most authors agree that children whose parents encourage them to read tend to develop better and earlier reading skills, some authors consider that the impact of family environment in reading skills is overvalued. Probably, other variables of parent–child relationship, like parenting styles, might be relevant for this field. Nevertheless, no previous studies on the effect of parenting styles in literacy have been found. To analyze the role of parenting styles in the reading processes of children. Children’s perceptions of parenting styles contribute significantly to the explanation of statistical variance of children’s reading processes. 110 children (67 boys and 43 girls), aged between 7 and 11 years (M=9.22 and SD = 1.14) from Portuguese schools answered to a socio-demographic questionnaire. To assess reading processes it was administered the Portuguese adaptation (Figueira et al. in press) of Bateria de Avaliação dos Processos Leitores-Revista (PROLEC-R). To assess the parenting styles Egna Minnen av Barndoms Uppfostran-parents (EMBU-P) and EMBU-C (children version) were administered. According to multiple hierarchical linear regressions, individual factors contribute to explain all reading tests of PROLEC-R, while family factors contribute to explain most of these tests. Regarding parenting styles, results evidence the explanatory power about grammatical structures, sentence comprehension and listening. Parenting styles have an important role in the explanation of higher reading processes (syntactic and semantic) but not in lexical processes, focused by main theories concerning dyslexia.
Resumo:
Analysis of variance is commonly used in morphometry in order to ascertain differences in parameters between several populations. Failure to detect significant differences between populations (type II error) may be due to suboptimal sampling and lead to erroneous conclusions; the concept of statistical power allows one to avoid such failures by means of an adequate sampling. Several examples are given in the morphometry of the nervous system, showing the use of the power of a hierarchical analysis of variance test for the choice of appropriate sample and subsample sizes. In the first case chosen, neuronal densities in the human visual cortex, we find the number of observations to be of little effect. For dendritic spine densities in the visual cortex of mice and humans, the effect is somewhat larger. A substantial effect is shown in our last example, dendritic segmental lengths in monkey lateral geniculate nucleus. It is in the nature of the hierarchical model that sample size is always more important than subsample size. The relative weight to be attributed to subsample size thus depends on the relative magnitude of the between observations variance compared to the between individuals variance.
Resumo:
In this research, the effectiveness of Naive Bayes and Gaussian Mixture Models classifiers on segmenting exudates in retinal images is studied and the results are evaluated with metrics commonly used in medical imaging. Also, a color variation analysis of retinal images is carried out to find how effectively can retinal images be segmented using only the color information of the pixels.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
Due to the several kinds of services that use the Internet and data networks infra-structures, the present networks are characterized by the diversity of types of traffic that have statistical properties as complex temporal correlation and non-gaussian distribution. The networks complex temporal correlation may be characterized by the Short Range Dependence (SRD) and the Long Range Dependence - (LRD). Models as the fGN (Fractional Gaussian Noise) may capture the LRD but not the SRD. This work presents two methods for traffic generation that synthesize approximate realizations of the self-similar fGN with SRD random process. The first one employs the IDWT (Inverse Discrete Wavelet Transform) and the second the IDWPT (Inverse Discrete Wavelet Packet Transform). It has been developed the variance map concept that allows to associate the LRD and SRD behaviors directly to the wavelet transform coefficients. The developed methods are extremely flexible and allow the generation of Gaussian time series with complex statistical behaviors.
Resumo:
Objective: The aim of this article is to propose an integrated framework for extracting and describing patterns of disorders from medical images using a combination of linear discriminant analysis and active contour models. Methods: A multivariate statistical methodology was first used to identify the most discriminating hyperplane separating two groups of images (from healthy controls and patients with schizophrenia) contained in the input data. After this, the present work makes explicit the differences found by the multivariate statistical method by subtracting the discriminant models of controls and patients, weighted by the pooled variance between the two groups. A variational level-set technique was used to segment clusters of these differences. We obtain a label of each anatomical change using the Talairach atlas. Results: In this work all the data was analysed simultaneously rather than assuming a priori regions of interest. As a consequence of this, by using active contour models, we were able to obtain regions of interest that were emergent from the data. The results were evaluated using, as gold standard, well-known facts about the neuroanatomical changes related to schizophrenia. Most of the items in the gold standard was covered in our result set. Conclusions: We argue that such investigation provides a suitable framework for characterising the high complexity of magnetic resonance images in schizophrenia as the results obtained indicate a high sensitivity rate with respect to the gold standard. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The effect of number of samples and selection of data for analysis on the calculation of surface motor unit potential (SMUP) size in the statistical method of motor unit number estimates (MUNE) was determined in 10 normal subjects and 10 with amyotrophic lateral sclerosis (ALS). We recorded 500 sequential compound muscle action potentials (CMAPs) at three different stable stimulus intensities (10–50% of maximal CMAP). Estimated mean SMUP sizes were calculated using Poisson statistical assumptions from the variance of 500 sequential CMAP obtained at each stimulus intensity. The results with the 500 data points were compared with smaller subsets from the same data set. The results using a range of 50–80% of the 500 data points were compared with the full 500. The effect of restricting analysis to data between 5–20% of the CMAP and to standard deviation limits was also assessed. No differences in mean SMUP size were found with stimulus intensity or use of different ranges of data. Consistency was improved with a greater sample number. Data within 5% of CMAP size gave both increased consistency and reduced mean SMUP size in many subjects, but excluded valid responses present at that stimulus intensity. These changes were more prominent in ALS patients in whom the presence of isolated SMUP responses was a striking difference from normal subjects. Noise, spurious data, and large SMUP limited the Poisson assumptions. When these factors are considered, consistent statistical MUNE can be calculated from a continuous sequence of data points. A 2 to 2.5 SD or 10% window are reasonable methods of limiting data for analysis. Muscle Nerve 27: 320–331, 2003
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia do Ambiente