100 resultados para SAMPLE ERROR
Resumo:
In moment structure analysis with nonnormal data, asymptotic valid inferences require the computation of a consistent (under general distributional assumptions) estimate of the matrix $\Gamma$ of asymptotic variances of sample second--order moments. Such a consistent estimate involves the fourth--order sample moments of the data. In practice, the use of fourth--order moments leads to computational burden and lack of robustness against small samples. In this paper we show that, under certain assumptions, correct asymptotic inferences can be attained when $\Gamma$ is replaced by a matrix $\Omega$ that involves only the second--order moments of the data. The present paper extends to the context of multi--sample analysis of second--order moment structures, results derived in the context of (simple--sample) covariance structure analysis (Satorra and Bentler, 1990). The results apply to a variety of estimation methods and general type of statistics. An example involving a test of equality of means under covariance restrictions illustrates theoretical aspects of the paper.
Resumo:
We extend to score, Wald and difference test statistics the scaled and adjusted corrections to goodness-of-fit test statistics developed in Satorra and Bentler (1988a,b). The theory is framed in the general context of multisample analysis of moment structures, under general conditions on the distribution of observable variables. Computational issues, as well as the relation of the scaled and corrected statistics to the asymptotic robust ones, is discussed. A Monte Carlo study illustrates thecomparative performance in finite samples of corrected score test statistics.
Resumo:
This paper analyzes whether standard covariance matrix tests work whendimensionality is large, and in particular larger than sample size. Inthe latter case, the singularity of the sample covariance matrix makeslikelihood ratio tests degenerate, but other tests based on quadraticforms of sample covariance matrix eigenvalues remain well-defined. Westudy the consistency property and limiting distribution of these testsas dimensionality and sample size go to infinity together, with theirratio converging to a finite non-zero limit. We find that the existingtest for sphericity is robust against high dimensionality, but not thetest for equality of the covariance matrix to a given matrix. For thelatter test, we develop a new correction to the existing test statisticthat makes it robust against high dimensionality.
Resumo:
Consider the problem of testing k hypotheses simultaneously. In this paper,we discuss finite and large sample theory of stepdown methods that providecontrol of the familywise error rate (FWE). In order to improve upon theBonferroni method or Holm's (1979) stepdown method, Westfall and Young(1993) make eective use of resampling to construct stepdown methods thatimplicitly estimate the dependence structure of the test statistics. However,their methods depend on an assumption called subset pivotality. The goalof this paper is to construct general stepdown methods that do not requiresuch an assumption. In order to accomplish this, we take a close look atwhat makes stepdown procedures work, and a key component is a monotonicityrequirement of critical values. By imposing such monotonicity on estimatedcritical values (which is not an assumption on the model but an assumptionon the method), it is demonstrated that the problem of constructing a validmultiple test procedure which controls the FWE can be reduced to the problemof contructing a single test which controls the usual probability of a Type 1error. This reduction allows us to draw upon an enormous resamplingliterature as a general means of test contruction.
Resumo:
In this article we propose using small area estimators to improve the estimatesof both the small and large area parameters. When the objective is to estimateparameters at both levels accurately, optimality is achieved by a mixed sampledesign of fixed and proportional allocations. In the mixed sample design, oncea sample size has been determined, one fraction of it is distributedproportionally among the different small areas while the rest is evenlydistributed among them. We use Monte Carlo simulations to assess theperformance of the direct estimator and two composite covariant-freesmall area estimators, for different sample sizes and different sampledistributions. Performance is measured in terms of Mean Squared Errors(MSE) of both small and large area parameters. It is found that the adoptionof small area composite estimators open the possibility of 1) reducingsample size when precision is given, or 2) improving precision for a givensample size.
Resumo:
Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.
Resumo:
In this paper I explore the issue of nonlinearity (both in the datageneration process and in the functional form that establishes therelationship between the parameters and the data) regarding the poorperformance of the Generalized Method of Moments (GMM) in small samples.To this purpose I build a sequence of models starting with a simple linearmodel and enlarging it progressively until I approximate a standard (nonlinear)neoclassical growth model. I then use simulation techniques to find the smallsample distribution of the GMM estimators in each of the models.
Resumo:
We derive a new inequality for uniform deviations of averages from their means. The inequality is a common generalization of previous results of Vapnik and Chervonenkis (1974) and Pollard (1986). Usingthe new inequality we obtain tight bounds for empirical loss minimization learning.
Resumo:
A national survey designed for estimating a specific population quantity is sometimes used for estimation of this quantity also for a small area, such as a province. Budget constraints do not allow a greater sample size for the small area, and so other means of improving estimation have to be devised. We investigate such methods and assess them by a Monte Carlo study. We explore how a complementary survey can be exploited in small area estimation. We use the context of the Spanish Labour Force Survey (EPA) and the Barometer in Spain for our study.
Resumo:
This paper demonstrates that, unlike what the conventional wisdom says, measurement error biases in panel data estimation of convergence using OLS with fixed effects are huge, not trivial. It does so by way of the "skipping estimation"': taking data from every m years of the sample (where m is an integer greater than or equal to 2), as opposed to every single year. It is shown that the estimated speed of convergence from the OLS with fixed effects is biased upwards by as much as 7 to 15%.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical {\sc vc} dimension, empirical {\sc vc} entropy, andmargin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.
Resumo:
This paper proposes a new time-domain test of a process being I(d), 0 < d = 1, under the null, against the alternative of being I(0) with deterministic components subject to structural breaks at known or unknown dates, with the goal of disentangling the existing identification issue between long-memory and structural breaks. Denoting by AB(t) the different types of structural breaks in the deterministic components of a time series considered by Perron (1989), the test statistic proposed here is based on the t-ratio (or the infimum of a sequence of t-ratios) of the estimated coefficient on yt-1 in an OLS regression of ?dyt on a simple transformation of the above-mentioned deterministic components and yt-1, possibly augmented by a suitable number of lags of ?dyt to account for serial correlation in the error terms. The case where d = 1 coincides with the Perron (1989) or the Zivot and Andrews (1992) approaches if the break date is known or unknown, respectively. The statistic is labelled as the SB-FDF (Structural Break-Fractional Dickey- Fuller) test, since it is based on the same principles as the well-known Dickey-Fuller unit root test. Both its asymptotic behavior and finite sample properties are analyzed, and two empirical applications are provided.
Resumo:
This paper presents a new respiratory impedance estimator to minimize the error due to breathing. Its practical reliability was evaluated in a simulation using realistic signals. These signals were generated by superposing pressure and flow records obtained in two conditions: 1) when applying forced oscillation to a resistance- inertance- elastance (RIE) mechanical model; 2) when healthy subjects breathed through the unexcited forced oscillation generator. Impedances computed (4-32 Hz) from the simulated signals with the new estimator resulted in a mean value which was scarcely biased by the added breathing (errors less than 1 percent in the mean R, I , and E ) and had a small variability (coefficients of variation of R, I, and E of 1.3, 3.5, and 9.6 percent, respectively). Our results suggest that the proposed estimator reduces the error in measurement of respiratory impedance without appreciable extracomputational cost.