57 resultados para Statistic nonparametric
Resumo:
Background: The evaluation of the complexity of an observed object is an old but outstanding problem. In this paper we are tying on this problem introducing a measure called statistic complexity.
Resumo:
A nonparametric, small-sample-size test for the homogeneity of two psychometric functions against the left- and right-shift alternatives has been developed. The test is designed to determine whether it is safe to amalgamate psychometric functions obtained in different experimental sessions. The sum of the lower and upper p-values of the exact (conditional) Fisher test for several 2 × 2 contingency tables (one for each point of the psychometric function) is employed as the test statistic. The probability distribution of the statistic under the null (homogeneity) hypothesis is evaluated to obtain corresponding p-values. Power functions of the test have been computed by randomly generating samples from Weibull psychometric functions. The test is free of any assumptions about the shape of the psychometric function; it requires only that all observations are statistically independent. © 2011 Psychonomic Society, Inc.
Resumo:
In many applications in applied statistics researchers reduce the complexity of a data set by combining a group of variables into a single measure using factor analysis or an index number. We argue that such compression loses information if the data actually has high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data does not have to be linearized. Applying our ideas to the United Nations Human Development Index we find that the four variables that are used in its construction have dimension three and the index loses information.
Resumo:
We consider the local order estimation of nonlinear autoregressive systems with exogenous inputs (NARX), which may have different local dimensions at different points. By minimizing the kernel-based local information criterion introduced in this paper, the strongly consistent estimates for the local orders of the NARX system at points of interest are obtained. The modification of the criterion and a simple procedure of searching the minimum of the criterion, are also discussed. The theoretical results derived here are tested by simulation examples.
Resumo:
Gross Motor Function Classification System (GMFCS) level was reported by three independent assessors in a population of children with cerebral palsy (CP) aged between 4 and 18 years (n=184; 112 males, 72 females; mean age 10y 10mo [SD 3y 7mo]). A software algorithm also provided a computed GMFCS level from a regional CP registry. Participants had clinical diagnoses of unilateral (n=94) and bilateral (n=84) spastic CP, ataxia (n=4), dyskinesia (n=1), and hypotonia (n=1), and could walk independently with or without the use of an aid (GMFCS Levels I-IV). Research physiotherapist (n=184) and parent/guardian data (n=178) were collected in a research environment. Data from the child's community physiotherapist (n=143) were obtained by postal questionnaire. Results, using the kappa statistic with linear weighting (?1w), showed good agreement between the parent/guardian and research physiotherapist (?1w=0.75) with more moderate levels of agreement between the clinical physiotherapist and researcher (?1w=0.64) and the clinical physiotherapist and parent/guardian (?1w=0.57). Agreement was consistently better for older children (>2y). This study has shown that agreement with parent report increases with therapists'experience of the GMFCS and knowledge of the child at the time of grading. Substantial agreement between a computed GMFCS and an experienced therapist (?1w=0.74) also demonstrates the potential for extrapolation of GMFCS rating from an existing CP registry, providing the latter has sufficient data on locomotor ability.
Resumo:
This brief examines the application of nonlinear statistical process control to the detection and diagnosis of faults in automotive engines. In this statistical framework, the computed score variables may have a complicated nonparametric distri- bution function, which hampers statistical inference, notably for fault detection and diagnosis. This brief shows that introducing the statistical local approach into nonlinear statistical process control produces statistics that follow a normal distribution, thereby enabling a simple statistical inference for fault detection. Further, for fault diagnosis, this brief introduces a compensation scheme that approximates the fault condition signature. Experimental results from a Volkswagen 1.9-L turbo-charged diesel engine are included.
Resumo:
Modeling of on-body propagation channels is of paramount importance to those wishing to evaluate radio channel performance for wearable devices in body area networks (BANs). Difficulties in modeling arise due to the highly variable channel conditions related to changes in the user's state and local environment. This study characterizes these influences by using time-series analysis to examine and model signal characteristics for on-body radio channels in user stationary and mobile scenarios in four different locations: anechoic chamber, open office area, hallway, and outdoor environment. Autocorrelation and cross-correlation functions are reported and shown to be dependent on body state and surroundings. Autoregressive (AR) transfer functions are used to perform time-series analysis and develop models for fading in various on-body links. Due to the non-Gaussian nature of the logarithmically transformed observed signal envelope in the majority of mobile user states, a simple method for reproducing the failing based on lognormal and Nakagami statistics is proposed. The validity of the AR models is evaluated using hypothesis testing, which is based on the Ljung-Box statistic, and the estimated distributional parameters of the simulator output compared with those from experimental results.
Resumo:
Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses.
Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.
Resumo:
In this paper, we extend the heterogeneous panel data stationarity test of Hadri [Econometrics Journal, Vol. 3 (2000) pp. 148–161] to the cases where breaks are taken into account. Four models with different patterns of breaks under the null hypothesis are specified. Two of the models have been already proposed by Carrion-i-Silvestre et al.[Econometrics Journal,Vol. 8 (2005) pp. 159–175]. The moments of the statistics corresponding to the four models are derived in closed form via characteristic functions.We also provide the exact moments of a modified statistic that do not asymptotically depend on the location of the break point under the null hypothesis. The cases where the break point is unknown are also considered. For the model with breaks in the level and no time trend and for the model with breaks in the level and in the time trend, Carrion-i-Silvestre et al. [Econometrics Journal, Vol. 8 (2005) pp. 159–175]showed that the number of breaks and their positions may be allowed to differ acrossindividuals for cases with known and unknown breaks. Their results can easily be extended to the proposed modified statistic. The asymptotic distributions of all the statistics proposed are derived under the null hypothesis and are shown to be normally distributed. We show by simulations that our suggested tests have in general good performance in finite samples except the modified test. In an empirical application to the consumer prices of 22 OECD countries during the period from 1953 to 2003, we found evidence of stationarity once a structural break and cross-sectional dependence are accommodated.
Resumo:
Background. The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Results. Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Conclusions. Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.
Resumo:
In this study we show that forest areas contribute significantly to the estimated benefits from om outdoor recreation in Northern Ireland. Secondly we provide empirical evidence of the gains in the statistical efficiency of both benefit and parameter estimates obtained by analysing follow-up responses with Double Bounded interval data analysis. As these gains are considerable, it is clearly worth considering this method in CVM survey design even when moderately large sample sizes are used. Finally we demonstrate that estimates of means and medians of WTP distributions for access to forest recreation show plausible magnitude, are consistent with previous UK studies, and converge across parametric and non-parametic methods of estimation.
Resumo:
This paper investigates the performance of the tests proposed by Hadri and by Hadri and Larsson for testing for stationarity in heterogeneous panel data under model misspecification. The panel tests are based on the well known KPSS test (cf. Kwiatkowski et al.) which considers two models: stationarity around a deterministic level and stationarity around a deterministic trend. There is no study, as far as we know, on the statistical properties of the test when the wrong model is used. We also consider the case of the simultaneous presence of the two types of models in a panel. We employ two asymptotics: joint asymptotic, T, N -> infinity simultaneously, and T fixed and N allowed to grow indefinitely. We use Monte Carlo experiments to investigate the effects of misspecification in sample sizes usually used in practice. The results indicate that the assumption that T is fixed rather than asymptotic leads to tests that have less size distortions, particularly for relatively small T with large N panels (micro-panels) than the tests derived under the joint asymptotics. We also find that choosing a deterministic trend when a deterministic level is true does not significantly affect the properties of the test. But, choosing a deterministic level when a deterministic trend is true leads to extreme over-rejections. Therefore, when unsure about which model has generated the data, it is suggested to use the model with a trend. We also propose a new statistic for testing for stationarity in mixed panel data where the mixture is known. The performance of this new test is very good for both cases of T asymptotic and T fixed. The statistic for T asymptotic is slightly undersized when T is very small (