896 resultados para large sample distributions
Resumo:
Acknowledgments This work has been undertaken with the support of the A*MIDEX project (n ∘ ANR-11-IDEX-0001-02) funded by the “Investissements d’Avenir” French Government program, managed by the French National Research Agency (ANR). We are grateful to Julian Williams, Editor Badi H. Baltagi and an anonymous referee for helpful comments. We are responsible for any errors.
Resumo:
The Fornax Spectroscopic Survey will use the Two degree Field spectrograph (2dF) of the Angle-Australian Telescope to obtain spectra for a complete sample of all 14000 objects with 16.5 less than or equal to b(j) less than or equal to 19.7 in a 12 square degree area centred on the Fornax Cluster. The aims of this project include the study of dwarf galaxies in the cluster (both known low surface brightness objects and putative normal surface brightness dwarfs) and a comparison sample of background field galaxies. We will also measure quasars and other active galaxies, any previously unrecognised compact galaxies and a large sample of Galactic stars. By selecting all objects-both stars and galaxies-independent of morphology, we cover a much larger range of surface brightness and scale size than previous surveys. In this paper we first describe the design of the survey. Our targets are selected from UK Schmidt Telescope sky survey plates digitised by the Automated Plate Measuring (APM) facility. We then describe the photometric and astrometric calibration of these data and show that the APM astrometry is accurate enough for use with the 2dF. We also describe a general approach to object identification using cross-correlations which allows us to identify and classify both stellar and galaxy spectra. We present results from the first 2dF field. Redshift distributions and velocity structures are shown for all observed objects in the direction of Fornax, including Galactic stars? galaxies in and around the Fornax Cluster, and for the background galaxy population. The velocity data for the stars show the contributions from the different Galactic components, plus a small tail to high velocities. We find no galaxies in the foreground to the cluster in our 2dF field. The Fornax Cluster is clearly defined kinematically. The mean velocity from the 26 cluster members having reliable redshifts is 1560 +/- 80 km s(-1). They show a velocity dispersion of 380 +/- 50 km s(-1). Large-scale structure can be traced behind the cluster to a redshift beyond z = 0.3. Background compact galaxies and low surface brightness galaxies are found to follow the general galaxy distribution.
Resumo:
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.
Resumo:
In this paper, we propose exact inference procedures for asset pricing models that can be formulated in the framework of a multivariate linear regression (CAPM), allowing for stable error distributions. The normality assumption on the distribution of stock returns is usually rejected in empirical studies, due to excess kurtosis and asymmetry. To model such data, we propose a comprehensive statistical approach which allows for alternative - possibly asymmetric - heavy tailed distributions without the use of large-sample approximations. The methods suggested are based on Monte Carlo test techniques. Goodness-of-fit tests are formally incorporated to ensure that the error distributions considered are empirically sustainable, from which exact confidence sets for the unknown tail area and asymmetry parameters of the stable error distribution are derived. Tests for the efficiency of the market portfolio (zero intercepts) which explicitly allow for the presence of (unknown) nuisance parameter in the stable error distribution are derived. The methods proposed are applied to monthly returns on 12 portfolios of the New York Stock Exchange over the period 1926-1995 (5 year subperiods). We find that stable possibly skewed distributions provide statistically significant improvement in goodness-of-fit and lead to fewer rejections of the efficiency hypothesis.
Resumo:
Statistical tests in vector autoregressive (VAR) models are typically based on large-sample approximations, involving the use of asymptotic distributions or bootstrap techniques. After documenting that such methods can be very misleading even with fairly large samples, especially when the number of lags or the number of equations is not small, we propose a general simulation-based technique that allows one to control completely the level of tests in parametric VAR models. In particular, we show that maximized Monte Carlo tests [Dufour (2002)] can provide provably exact tests for such models, whether they are stationary or integrated. Applications to order selection and causality testing are considered as special cases. The technique developed is applied to quarterly and monthly VAR models of the U.S. economy, comprising income, money, interest rates and prices, over the period 1965-1996.
Resumo:
In this paper we introduce the Weibull power series (WPS) class of distributions which is obtained by compounding Weibull and power series distributions where the compounding procedure follows same way that was previously carried out by Adamidis and Loukas (1998) This new class of distributions has as a particular case the two-parameter exponential power series (EPS) class of distributions (Chahkandi and Gawk 2009) which contains several lifetime models such as exponential geometric (Adamidis and Loukas 1998) exponential Poisson (Kus 2007) and exponential logarithmic (Tahmasbi and Rezaei 2008) distributions The hazard function of our class can be increasing decreasing and upside down bathtub shaped among others while the hazard function of an EPS distribution is only decreasing We obtain several properties of the WPS distributions such as moments order statistics estimation by maximum likelihood and inference for a large sample Furthermore the EM algorithm is also used to determine the maximum likelihood estimates of the parameters and we discuss maximum entropy characterizations under suitable constraints Special distributions are studied in some detail Applications to two real data sets are given to show the flexibility and potentiality of the new class of distributions (C) 2010 Elsevier B V All rights reserved
Resumo:
The 3PL model is a flexible and widely used tool in assessment. However, it suffers from limitations due to its need for large sample sizes. This study introduces and evaluates the efficacy of a new sample size augmentation technique called Duplicate, Erase, and Replace (DupER) Augmentation through a simulation study. Data are augmented using several variations of DupER Augmentation (based on different imputation methodologies, deletion rates, and duplication rates), analyzed in BILOG-MG 3, and results are compared to those obtained from analyzing the raw data. Additional manipulated variables include test length and sample size. Estimates are compared using seven different evaluative criteria. Results are mixed and inconclusive. DupER augmented data tend to result in larger root mean squared errors (RMSEs) and lower correlations between estimates and parameters for both item and ability parameters. However, some DupER variations produce estimates that are much less biased than those obtained from the raw data alone. For one DupER variation, it was found that DupER produced better results for low-ability simulees and worse results for those with high abilities. Findings, limitations, and recommendations for future studies are discussed. Specific recommendations for future studies include the application of Duper Augmentation (1) to empirical data, (2) with additional IRT models, and (3) the analysis of the efficacy of the procedure for different item and ability parameter distributions.
Resumo:
The degree of homogeneity is normally assessed by the variability of the results of independent analyses of several (e.g., 15) normal-scale replicates. Large sample instrumental neutron activation analysis (LS-INAA) with a collimated Ge detector allows inspecting the degree of homogeneity of the initial batch material, using a kilogram-size sample. The test is based on the spatial distributions of induced radioactivity. Such test was applied to samples of Brazilian whole (green) coffee beans (Coffea arabica and Coffea canephora) of approximately I kg in the frame of development of a coffee reference material. Results indicated that the material do not contain significant element composition inhomogeneities between batches of approximately 30-50 g, masses typically forming the starting base of a reference material.
Resumo:
BACKGROUND: Several European HIV observational data bases have, over the last decade, accumulated a substantial number of resistance test results and developed large sample repositories, There is a need to link these efforts together, We here describe the development of such a novel tool that allows to bind these data bases together in a distributed fashion for which the control and data remains with the cohorts rather than classic data mergers.METHODS: As proof-of-concept we entered two basic queries into the tool: available resistance tests and available samples. We asked for patients still alive after 1998-01-01, and between 180 and 195 cm of height, and how many samples or resistance tests there would be available for these patients, The queries were uploaded with the tool to a central web server from which each participating cohort downloaded the queries with the tool and ran them against their database, The numbers gathered were then submitted back to the server and we could accumulate the number of available samples and resistance tests.RESULTS: We obtained the following results from the cohorts on available samples/resistance test: EuResist: not availableI11,194; EuroSIDA: 20,71611,992; ICONA: 3,751/500; Rega: 302/302; SHCS: 53,78311,485, In total, 78,552 samples and 15,473 resistance tests were available amongst these five cohorts. Once these data items have been identified, it is trivial to generate lists of relevant samples that would be usefuI for ultra deep sequencing in addition to the already available resistance tests, Saon the tool will include small analysis packages that allow each cohort to pull a report on their cohort profile and also survey emerging resistance trends in their own cohort,CONCLUSIONS: We plan on providing this tool to all cohorts within the Collaborative HIV and Anti-HIV Drug Resistance Network (CHAIN) and will provide the tool free of charge to others for any non-commercial use, The potential of this tool is to ease collaborations, that is, in projects requiring data to speed up identification of novel resistance mutations by increasing the number of observations across multiple cohorts instead of awaiting single cohorts or studies to reach the critical number needed to address such issues.
Resumo:
In this thesis the X-ray tomography is discussed from the Bayesian statistical viewpoint. The unknown parameters are assumed random variables and as opposite to traditional methods the solution is obtained as a large sample of the distribution of all possible solutions. As an introduction to tomography an inversion formula for Radon transform is presented on a plane. The vastly used filtered backprojection algorithm is derived. The traditional regularization methods are presented sufficiently to ground the Bayesian approach. The measurements are foton counts at the detector pixels. Thus the assumption of a Poisson distributed measurement error is justified. Often the error is assumed Gaussian, altough the electronic noise caused by the measurement device can change the error structure. The assumption of Gaussian measurement error is discussed. In the thesis the use of different prior distributions in X-ray tomography is discussed. Especially in severely ill-posed problems the use of a suitable prior is the main part of the whole solution process. In the empirical part the presented prior distributions are tested using simulated measurements. The effect of different prior distributions produce are shown in the empirical part of the thesis. The use of prior is shown obligatory in case of severely ill-posed problem.
Resumo:
In this thesis, stepwise titration with hydrochloric acid was used to obtain chemical reactivities and dissolution rates of ground limestones and dolostones of varying geological backgrounds (sedimentary, metamorphic or magmatic). Two different ways of conducting the calculations were used: 1) a first order mathematical model was used to calculate extrapolated initial reactivities (and dissolution rates) at pH 4, and 2) a second order mathematical model was used to acquire integrated mean specific chemical reaction constants (and dissolution rates) at pH 5. The calculations of the reactivities and dissolution rates were based on rate of change of pH and particle size distributions of the sample powders obtained by laser diffraction. The initial dissolution rates at pH 4 were repeatedly higher than previously reported literature values, whereas the dissolution rates at pH 5 were consistent with former observations. Reactivities and dissolution rates varied substantially for dolostones, whereas for limestones and calcareous rocks, the variation can be primarily explained by relatively large sample standard deviations. A list of the dolostone samples in a decreasing order of initial reactivity at pH 4 is: 1) metamorphic dolostones with calcite/dolomite ratio higher than about 6% 2) sedimentary dolostones without calcite 3) metamorphic dolostones with calcite/dolomite ratio lower than about 6% The reactivities and dissolution rates were accompanied by a wide range of experimental techniques to characterise the samples, to reveal how different rocks changed during the dissolution process, and to find out which factors had an influence on their chemical reactivities. An emphasis was put on chemical and morphological changes taking place at the surfaces of the particles via X-ray Photoelectron Spectroscopy (XPS) and Scanning Electron Microscopy (SEM). Supporting chemical information was obtained with X-Ray Fluorescence (XRF) measurements of the samples, and Inductively Coupled Plasma-Mass Spectrometry (ICP-MS) and Inductively Coupled Plasma-Optical Emission Spectrometry (ICP-OES) measurements of the solutions used in the reactivity experiments. Information on mineral (modal) compositions and their occurrence was provided by X-Ray Diffraction (XRD), Energy Dispersive X-ray analysis (EDX) and studying thin sections with a petrographic microscope. BET (Brunauer, Emmet, Teller) surface areas were determined from nitrogen physisorption data. Factors increasing chemical reactivity of dolostones and calcareous rocks were found to be sedimentary origin, higher calcite concentration and smaller quartz concentration. Also, it is assumed that finer grain size and larger BET surface areas increase the reactivity although no certain correlation was found in this thesis. Atomic concentrations did not correlate with the reactivities. Sedimentary dolostones, unlike metamorphic ones, were found to have porous surface structures after dissolution. In addition, conventional (XPS) and synchrotron based (HRXPS) X-ray Photoelectron Spectroscopy were used to study bonding environments on calcite and dolomite surfaces. Both samples are insulators, which is why neutralisation measures such as electron flood gun and a conductive mask were used. Surface core level shifts of 0.7 ± 0.1 eV for Ca 2p spectrum of calcite and 0.75 ± 0.05 eV for Mg 2p and Ca 3s spectra of dolomite were obtained. Some satellite features of Ca 2p, C 1s and O 1s spectra have been suggested to be bulk plasmons. The origin of carbide bonds was suggested to be beam assisted interaction with hydrocarbons found on the surface. The results presented in this thesis are of particular importance for choosing raw materials for wet Flue Gas Desulphurisation (FGD) and construction industry. Wet FGD benefits from high reactivity, whereas construction industry can take advantage of slow reactivity of carbonate rocks often used in the facades of fine buildings. Information on chemical bonding environments may help to create more accurate models for water-rock interactions of carbonates.
Resumo:
La dernière décennie a connu un intérêt croissant pour les problèmes posés par les variables instrumentales faibles dans la littérature économétrique, c’est-à-dire les situations où les variables instrumentales sont faiblement corrélées avec la variable à instrumenter. En effet, il est bien connu que lorsque les instruments sont faibles, les distributions des statistiques de Student, de Wald, du ratio de vraisemblance et du multiplicateur de Lagrange ne sont plus standard et dépendent souvent de paramètres de nuisance. Plusieurs études empiriques portant notamment sur les modèles de rendements à l’éducation [Angrist et Krueger (1991, 1995), Angrist et al. (1999), Bound et al. (1995), Dufour et Taamouti (2007)] et d’évaluation des actifs financiers (C-CAPM) [Hansen et Singleton (1982,1983), Stock et Wright (2000)], où les variables instrumentales sont faiblement corrélées avec la variable à instrumenter, ont montré que l’utilisation de ces statistiques conduit souvent à des résultats peu fiables. Un remède à ce problème est l’utilisation de tests robustes à l’identification [Anderson et Rubin (1949), Moreira (2002), Kleibergen (2003), Dufour et Taamouti (2007)]. Cependant, il n’existe aucune littérature économétrique sur la qualité des procédures robustes à l’identification lorsque les instruments disponibles sont endogènes ou à la fois endogènes et faibles. Cela soulève la question de savoir ce qui arrive aux procédures d’inférence robustes à l’identification lorsque certaines variables instrumentales supposées exogènes ne le sont pas effectivement. Plus précisément, qu’arrive-t-il si une variable instrumentale invalide est ajoutée à un ensemble d’instruments valides? Ces procédures se comportent-elles différemment? Et si l’endogénéité des variables instrumentales pose des difficultés majeures à l’inférence statistique, peut-on proposer des procédures de tests qui sélectionnent les instruments lorsqu’ils sont à la fois forts et valides? Est-il possible de proposer les proédures de sélection d’instruments qui demeurent valides même en présence d’identification faible? Cette thèse se focalise sur les modèles structurels (modèles à équations simultanées) et apporte des réponses à ces questions à travers quatre essais. Le premier essai est publié dans Journal of Statistical Planning and Inference 138 (2008) 2649 – 2661. Dans cet essai, nous analysons les effets de l’endogénéité des instruments sur deux statistiques de test robustes à l’identification: la statistique d’Anderson et Rubin (AR, 1949) et la statistique de Kleibergen (K, 2003), avec ou sans instruments faibles. D’abord, lorsque le paramètre qui contrôle l’endogénéité des instruments est fixe (ne dépend pas de la taille de l’échantillon), nous montrons que toutes ces procédures sont en général convergentes contre la présence d’instruments invalides (c’est-à-dire détectent la présence d’instruments invalides) indépendamment de leur qualité (forts ou faibles). Nous décrivons aussi des cas où cette convergence peut ne pas tenir, mais la distribution asymptotique est modifiée d’une manière qui pourrait conduire à des distorsions de niveau même pour de grands échantillons. Ceci inclut, en particulier, les cas où l’estimateur des double moindres carrés demeure convergent, mais les tests sont asymptotiquement invalides. Ensuite, lorsque les instruments sont localement exogènes (c’est-à-dire le paramètre d’endogénéité converge vers zéro lorsque la taille de l’échantillon augmente), nous montrons que ces tests convergent vers des distributions chi-carré non centrées, que les instruments soient forts ou faibles. Nous caractérisons aussi les situations où le paramètre de non centralité est nul et la distribution asymptotique des statistiques demeure la même que dans le cas des instruments valides (malgré la présence des instruments invalides). Le deuxième essai étudie l’impact des instruments faibles sur les tests de spécification du type Durbin-Wu-Hausman (DWH) ainsi que le test de Revankar et Hartley (1973). Nous proposons une analyse en petit et grand échantillon de la distribution de ces tests sous l’hypothèse nulle (niveau) et l’alternative (puissance), incluant les cas où l’identification est déficiente ou faible (instruments faibles). Notre analyse en petit échantillon founit plusieurs perspectives ainsi que des extensions des précédentes procédures. En effet, la caractérisation de la distribution de ces statistiques en petit échantillon permet la construction des tests de Monte Carlo exacts pour l’exogénéité même avec les erreurs non Gaussiens. Nous montrons que ces tests sont typiquement robustes aux intruments faibles (le niveau est contrôlé). De plus, nous fournissons une caractérisation de la puissance des tests, qui exhibe clairement les facteurs qui déterminent la puissance. Nous montrons que les tests n’ont pas de puissance lorsque tous les instruments sont faibles [similaire à Guggenberger(2008)]. Cependant, la puissance existe tant qu’au moins un seul instruments est fort. La conclusion de Guggenberger (2008) concerne le cas où tous les instruments sont faibles (un cas d’intérêt mineur en pratique). Notre théorie asymptotique sous les hypothèses affaiblies confirme la théorie en échantillon fini. Par ailleurs, nous présentons une analyse de Monte Carlo indiquant que: (1) l’estimateur des moindres carrés ordinaires est plus efficace que celui des doubles moindres carrés lorsque les instruments sont faibles et l’endogenéité modérée [conclusion similaire à celle de Kiviet and Niemczyk (2007)]; (2) les estimateurs pré-test basés sur les tests d’exogenété ont une excellente performance par rapport aux doubles moindres carrés. Ceci suggère que la méthode des variables instrumentales ne devrait être appliquée que si l’on a la certitude d’avoir des instruments forts. Donc, les conclusions de Guggenberger (2008) sont mitigées et pourraient être trompeuses. Nous illustrons nos résultats théoriques à travers des expériences de simulation et deux applications empiriques: la relation entre le taux d’ouverture et la croissance économique et le problème bien connu du rendement à l’éducation. Le troisième essai étend le test d’exogénéité du type Wald proposé par Dufour (1987) aux cas où les erreurs de la régression ont une distribution non-normale. Nous proposons une nouvelle version du précédent test qui est valide même en présence d’erreurs non-Gaussiens. Contrairement aux procédures de test d’exogénéité usuelles (tests de Durbin-Wu-Hausman et de Rvankar- Hartley), le test de Wald permet de résoudre un problème courant dans les travaux empiriques qui consiste à tester l’exogénéité partielle d’un sous ensemble de variables. Nous proposons deux nouveaux estimateurs pré-test basés sur le test de Wald qui performent mieux (en terme d’erreur quadratique moyenne) que l’estimateur IV usuel lorsque les variables instrumentales sont faibles et l’endogénéité modérée. Nous montrons également que ce test peut servir de procédure de sélection de variables instrumentales. Nous illustrons les résultats théoriques par deux applications empiriques: le modèle bien connu d’équation du salaire [Angist et Krueger (1991, 1999)] et les rendements d’échelle [Nerlove (1963)]. Nos résultats suggèrent que l’éducation de la mère expliquerait le décrochage de son fils, que l’output est une variable endogène dans l’estimation du coût de la firme et que le prix du fuel en est un instrument valide pour l’output. Le quatrième essai résout deux problèmes très importants dans la littérature économétrique. D’abord, bien que le test de Wald initial ou étendu permette de construire les régions de confiance et de tester les restrictions linéaires sur les covariances, il suppose que les paramètres du modèle sont identifiés. Lorsque l’identification est faible (instruments faiblement corrélés avec la variable à instrumenter), ce test n’est en général plus valide. Cet essai développe une procédure d’inférence robuste à l’identification (instruments faibles) qui permet de construire des régions de confiance pour la matrices de covariances entre les erreurs de la régression et les variables explicatives (possiblement endogènes). Nous fournissons les expressions analytiques des régions de confiance et caractérisons les conditions nécessaires et suffisantes sous lesquelles ils sont bornés. La procédure proposée demeure valide même pour de petits échantillons et elle est aussi asymptotiquement robuste à l’hétéroscédasticité et l’autocorrélation des erreurs. Ensuite, les résultats sont utilisés pour développer les tests d’exogénéité partielle robustes à l’identification. Les simulations Monte Carlo indiquent que ces tests contrôlent le niveau et ont de la puissance même si les instruments sont faibles. Ceci nous permet de proposer une procédure valide de sélection de variables instrumentales même s’il y a un problème d’identification. La procédure de sélection des instruments est basée sur deux nouveaux estimateurs pré-test qui combinent l’estimateur IV usuel et les estimateurs IV partiels. Nos simulations montrent que: (1) tout comme l’estimateur des moindres carrés ordinaires, les estimateurs IV partiels sont plus efficaces que l’estimateur IV usuel lorsque les instruments sont faibles et l’endogénéité modérée; (2) les estimateurs pré-test ont globalement une excellente performance comparés à l’estimateur IV usuel. Nous illustrons nos résultats théoriques par deux applications empiriques: la relation entre le taux d’ouverture et la croissance économique et le modèle de rendements à l’éducation. Dans la première application, les études antérieures ont conclu que les instruments n’étaient pas trop faibles [Dufour et Taamouti (2007)] alors qu’ils le sont fortement dans la seconde [Bound (1995), Doko et Dufour (2009)]. Conformément à nos résultats théoriques, nous trouvons les régions de confiance non bornées pour la covariance dans le cas où les instruments sont assez faibles.
Resumo:
This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.
Resumo:
The translation of an ensemble of model runs into a probability distribution is a common task in model-based prediction. Common methods for such ensemble interpretations proceed as if verification and ensemble were draws from the same underlying distribution, an assumption not viable for most, if any, real world ensembles. An alternative is to consider an ensemble as merely a source of information rather than the possible scenarios of reality. This approach, which looks for maps between ensembles and probabilistic distributions, is investigated and extended. Common methods are revisited, and an improvement to standard kernel dressing, called ‘affine kernel dressing’ (AKD), is introduced. AKD assumes an affine mapping between ensemble and verification, typically not acting on individual ensemble members but on the entire ensemble as a whole, the parameters of this mapping are determined in parallel with the other dressing parameters, including a weight assigned to the unconditioned (climatological) distribution. These amendments to standard kernel dressing, albeit simple, can improve performance significantly and are shown to be appropriate for both overdispersive and underdispersive ensembles, unlike standard kernel dressing which exacerbates over dispersion. Studies are presented using operational numerical weather predictions for two locations and data from the Lorenz63 system, demonstrating both effectiveness given operational constraints and statistical significance given a large sample.
Resumo:
Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10(-7). Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits.