Biblioteca Digital

39 resultados para SAMPLE SIZE

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain

Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyzes whether standard covariance matrix tests work whendimensionality is large, and in particular larger than sample size. Inthe latter case, the singularity of the sample covariance matrix makeslikelihood ratio tests degenerate, but other tests based on quadraticforms of sample covariance matrix eigenvalues remain well-defined. Westudy the consistency property and limiting distribution of these testsas dimensionality and sample size go to infinity together, with theirratio converging to a finite non-zero limit. We find that the existingtest for sphericity is robust against high dimensionality, but not thetest for equality of the covariance matrix to a given matrix. For thelatter test, we develop a new correction to the existing test statisticthat makes it robust against high dimensionality.

Data size sufficiency analyses of haplotype inference algortihms

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present experimental and theoretical analyses of data requirements for haplotype inference algorithms. Our experiments include a broad range of problem sizes under two standard models of tree distribution and were designed to yield statistically robust results despite the size of the sample space. Our results validate Gusfield's conjecture that a population size of n log n is required to give (with high probability) sufficient information to deduce the n haplotypes and their complete evolutionary history. The experimental results inspired our experimental finding with theoretical bounds on the population size. We also analyze the population size required to deduce some fixed fraction of the evolutionary history of a set of n haplotypes and establish linear bounds on the required sample size. These linear bounds are also shown theoretically.

Calculation of the variance in surveys of the economic climate

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Public opinion surveys have become progressively incorporated into systems of official statistics. Surveys of the economic climate are usually qualitative because they collect opinions of businesspeople and/or experts about the long-term indicators described by a number of variables. In such cases the responses are expressed in ordinal numbers, that is, the respondents verbally report, for example, whether during a given trimester the sales or the new orders have increased, decreased or remained the same as in the previous trimester. These data allow to calculate the percent of respondents in the total population (results are extrapolated), who select every one of the three options. Data are often presented in the form of an index calculated as the difference between the percent of those who claim that a given variable has improved in value and of those who claim that it has deteriorated. As in any survey conducted on a sample the question of the measurement of the sample error of the results has to be addressed, since the error influences both the reliability of the results and the calculation of the sample size adequate for a desired confidence interval. The results presented here are based on data from the Survey of the Business Climate (Encuesta de Clima Empresarial) developed through the collaboration of the Statistical Institute of Catalonia (Institut d’Estadística de Catalunya) with the Chambers of Commerce (Cámaras de Comercio) of Sabadell and Terrassa.

Quantificació de l'error comès pels models de níxol ecològic al predir la distribució d'especies generalistes

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Report for the scientific sojourn carried out at the University of California at Berkeley, from September to December 2007. Environmental niche modelling (ENM) techniques are powerful tools to predict species potential distributions. In the last ten years, a plethora of novel methodological approaches and modelling techniques have been developed. During three months, I stayed at the University of California, Berkeley, working under the supervision of Dr. David R. Vieites. The aim of our work was to quantify the error committed by these techniques, but also to test how an increase in the sample size affects the resultant predictions. Using MaxEnt software we generated distribution predictive maps, from different sample sizes, of the Eurasian quail (Coturnix coturnix) in the Iberian Peninsula. The quail is a generalist species from a climatic point of view, but an habitat specialist. The resultant distribution maps were compared with the real distribution of the species. This distribution was obtained from recent bird atlases from Spain and Portugal. Results show that ENM techniques can have important errors when predicting the species distribution of generalist species. Moreover, an increase of sample size is not necessary related with a better performance of the models. We conclude that a deep knowledge of the species’ biology and the variables affecting their distribution is crucial for an optimal modelling. The lack of this knowledge can induce to wrong conclusions.

Intrinsic test of independence in contingency tables

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A condition needed for testing nested hypotheses from a Bayesianviewpoint is that the prior for the alternative model concentratesmass around the small, or null, model. For testing independencein contingency tables, the intrinsic priors satisfy this requirement.Further, the degree of concentration of the priors is controlled bya discrete parameter m, the training sample size, which plays animportant role in the resulting answer regardless of the samplesize.In this paper we study robustness of the tests of independencein contingency tables with respect to the intrinsic priors withdifferent degree of concentration around the null, and comparewith other “robust” results by Good and Crook. Consistency ofthe intrinsic Bayesian tests is established.We also discuss conditioning issues and sampling schemes,and argue that conditioning should be on either one margin orthe table total, but not on both margins.Examples using real are simulated data are given

The Effect of occupational sex-composition on earnings: job-specialisation, sex-role attitudes and the division of domestic-labour in Spain

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Important theoretical controversies remain unresolved in the literatire on occupational sex-segregation and the gender wage-gap. A useful way of summarising these controversies is viewing them as a debate between - cultural -socialisation. The paper discusses these theories in detail and carries out a preliminary test of the relative explanatory performance of some of their most consequential predictions. This is done by drawing on the Spanish sample of the second wave of the European Social Survey, ESS. The empirical analysis of ESS data illustrates the notable analytical pay-offs that can stem from using rich individual-level indicators, but also exemplifies the statistical llimitations generated by small sample size and high rates of non-response. Empirical results should, therefore, be taken as preliminary. They seem to suggest that the effect of occupational sex-segregation on wages could be explicable by workers' sex-role attitutes, their relative input in domestic production and the job-specific human capital requirements of their jobs. Of these three factors, job-specialisation seeems clearly the most important one.

Strong minimax lower bounds for learning

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Minimax lower bounds for concept learning state, for example, thatfor each sample size $n$ and learning rule $g_n$, there exists a distributionof the observation $X$ and a concept $C$ to be learnt such that the expectederror of $g_n$ is at least a constant times $V/n$, where $V$ is the VC dimensionof the concept class. However, these bounds do not tell anything about therate of decrease of the error for a {\sl fixed} distribution--concept pair.\\In this paper we investigate minimax lower bounds in such a--stronger--sense.We show that for several natural $k$--parameter concept classes, includingthe class of linear halfspaces, the class of balls, the class of polyhedrawith a certain number of faces, and a class of neural networks, for any{\sl sequence} of learning rules $\{g_n\}$, there exists a fixed distributionof $X$ and a fixed concept $C$ such that the expected error is larger thana constant times $k/n$ for {\sl infinitely many n}. We also obtain suchstrong minimax lower bounds for the tail distribution of the probabilityof error, which extend the corresponding minimax lower bounds.

Construyendo las bases para una comparación fiable: la Encuesta Social Europea

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Surveys are a valuable instrument to find out about the social and politicalreality of our context. However, the work of researchers is often limitedby a number of handicaps that are mainly two. On one hand, the samples areusually low technical quality ones and the fieldwork is not carried out inthe finest conditions. On the other hand, many surveys are not especiallydesigned to allow their comparison, a precisely appreciated operation inpolitical research. The article presents the European Social Survey andjustifies its methodological bases. The survey, promoted by the EuropeanScience Foundation and the European Commission, is born from the collectiveeffort of the scientific community with the explicit aim to establishcertain quality standards in the sample design and in the carrying out ofthe fieldwork so as to guarantee the quality of the data and allow eachcomparison between countries.

Using composite estimators to improve both domain and total area estimation

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this article we propose using small area estimators to improve the estimatesof both the small and large area parameters. When the objective is to estimateparameters at both levels accurately, optimality is achieved by a mixed sampledesign of fixed and proportional allocations. In the mixed sample design, oncea sample size has been determined, one fraction of it is distributedproportionally among the different small areas while the rest is evenlydistributed among them. We use Monte Carlo simulations to assess theperformance of the direct estimator and two composite covariant-freesmall area estimators, for different sample sizes and different sampledistributions. Performance is measured in terms of Mean Squared Errors(MSE) of both small and large area parameters. It is found that the adoptionof small area composite estimators open the possibility of 1) reducingsample size when precision is given, or 2) improving precision for a givensample size.

Estimating parliamentary composition through electoral polls

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Any electoral system has an electoral formula that converts voteproportions into parliamentary seats. Pre-electoral polls usually focuson estimating vote proportions and then applying the electoral formulato give a forecast of the parliament's composition. We here describe theproblems arising from this approach: there is always a bias in theforecast. We study the origin of the bias and some methods to evaluateand to reduce it. We propose some rules to compute the sample sizerequired for a given forecast accuracy. We show by Monte Carlo simulationthe performance of the proposed methods using data from Spanish electionsin last years. We also propose graphical methods to visualize how electoralformulae and parliamentary forecasts work (or fail).

Improving small area estimation by combining surveys: new perspectives in regional statistics

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A national survey designed for estimating a specific population quantity is sometimes used for estimation of this quantity also for a small area, such as a province. Budget constraints do not allow a greater sample size for the small area, and so other means of improving estimation have to be devised. We investigate such methods and assess them by a Monte Carlo study. We explore how a complementary survey can be exploited in small area estimation. We use the context of the Spanish Labour Force Survey (EPA) and the Barometer in Spain for our study.

The economic effects of the Protestant Reformation: Testing the Weber hypothesis in the German Lands

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many theories, most famously Max Weber s essay on the Protestant ethic, have hypothesizedthat Protestantism should have favored economic development. With their considerablereligious heterogeneity and stability of denominational affiliations until the 19th century, theGerman Lands of the Holy Roman Empire present an ideal testing ground for this hypothesis.Using population figures in a dataset comprising 272 cities in the years 1300 1900, I find no effectsof Protestantism on economic growth. The finding is robust to the inclusion of a varietyof controls, and does not appear to depend on data selection or small sample size. In addition,Protestantism has no effect when interacted with other likely determinants of economic development.I also analyze the endogeneity of religious choice; instrumental variables estimates ofthe effects of Protestantism are similar to the OLS results.

Country effects in ISSP-1993 environmental data: Comparison of SEM approaches

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Structural equation models (SEM) are commonly used to analyze the relationship between variables some of which may be latent, such as individual ``attitude'' to and ``behavior'' concerning specific issues. A number of difficulties arise when we want to compare a large number of groups, each with large sample size, and the manifest variables are distinctly non-normally distributed. Using an specific data set, we evaluate the appropriateness of the following alternative SEM approaches: multiple group versus MIMIC models, continuous versus ordinal variables estimation methods, and normal theory versus non-normal estimation methods. The approaches are applied to the ISSP-1993 Environmental data set, with the purpose of exploring variation in the mean level of variables of ``attitude'' to and ``behavior''concerning environmental issues and their mutual relationship across countries. Issues of both theoretical and practical relevance arise in the course of this application.

Inequalities for a new data-based method for selecting nonparametric density estimates

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.

Uso de la estadística en la oncología y la hematología

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En este artículo abordamos el uso y la importancia de las herramientas estadísticas que se utilizan principalmente en los estudios médicos del ámbito de la oncología y la hematología, pero aplicables a muchos otros campos tanto médicos como experimentales o industriales. El objetivo del presente trabajo es presentar de una manera clara y precisa la metodología estadística necesaria para analizar los datos obtenidos en los estudios rigurosa y concisamente en cuanto a las hipótesis de trabajo planteadas por los investigadores. La medida de la respuesta al tratamiento elegidas en al tipo de estudio elegido determinarán los métodos estadísticos que se utilizarán durante el análisis de los datos del estudio y también el tamaño de muestra. Mediante la correcta aplicación del análisis estadístico y de una adecuada planificación se puede determinar si la relación encontrada entre la exposición a un tratamiento y un resultado es casual o por el contrario, está sujeto a una relación no aleatoria que podría establecer una relación de causalidad. Hemos estudiado los principales tipos de diseño de los estudios médicos más utilizados, tales como ensayos clínicos y estudios observacionales (cohortes, casos y controles, estudios de prevalencia y estudios ecológicos). También se presenta una sección sobre el cálculo del tamaño muestral de los estudios y cómo calcularlo, ¿Qué prueba estadística debe utilizarse?, los aspectos sobre fuerza del efecto ¿odds ratio¿ (OR) y riesgo relativo (RR), el análisis de supervivencia. Se presentan ejemplos en la mayoría de secciones del artículo y bibliografía más relevante.

«
1
2
3
»