984 resultados para LIKELIHOOD RATIO TESTS
Resumo:
I introduce the new mgof command to compute distributional tests for discrete (categorical, multinomial) variables. The command supports largesample tests for complex survey designs and exact tests for small samples as well as classic large-sample x2-approximation tests based on Pearson’s X2, the likelihood ratio, or any other statistic from the power-divergence family (Cressie and Read, 1984, Journal of the Royal Statistical Society, Series B (Methodological) 46: 440–464). The complex survey correction is based on the approach by Rao and Scott (1981, Journal of the American Statistical Association 76: 221–230) and parallels the survey design correction used for independence tests in svy: tabulate. mgof computes the exact tests by using Monte Carlo methods or exhaustive enumeration. mgof also provides an exact one-sample Kolmogorov–Smirnov test for discrete data.
Resumo:
mgof computes goodness-of-fit tests for the distribution of a discrete (categorical, multinomial) variable. The default is to perform classical large sample chi-squared approximation tests based on Pearson's X2 statistic and the log likelihood ratio (G2) statistic or a statistic from the Cressie-Read family. Alternatively, mgof computes exact tests using Monte Carlo methods or exhaustive enumeration. A Kolmogorov-Smirnov test for discrete data is also provided. The moremata package, also available from SSC, is required.
Resumo:
A new Stata command called -mgof- is introduced. The command is used to compute distributional tests for discrete (categorical, multinomial) variables. Apart from classic large sample $\chi^2$-approximation tests based on Pearson's $X^2$, the likelihood ratio, or any other statistic from the power-divergence family (Cressie and Read 1984), large sample tests for complex survey designs and exact tests for small samples are supported. The complex survey correction is based on the approach by Rao and Scott (1981) and parallels the survey design correction used for independence tests in -svy:tabulate-. The exact tests are computed using Monte Carlo methods or exhaustive enumeration. An exact Kolmogorov-Smirnov test for discrete data is also provided.
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
This work is focused on the development of a methodology for the use of chemical characteristic of tire traces to help answer the following question: "Is the offending tire at the origin of the trace found on the crime scene?". This methodology goes from the trace sampling on the road to statistical analysis of its chemical characteristics. Knowledge about the composition and manufacture of tread tires as well as a review of instrumental techniques used for the analysis of polymeric materials were studied to select, as an ansi vi cal technique for this research, pyrolysis coupled to a gas Chromatograph with a mass spectrometry detector (Py-GC/MS). An analytical method was developed and optimized to obtain the lowest variability between replicates of the same sample. Within-variability of the tread was evaluated regarding width and circumference with several samples taken from twelve tires of different brands and/or models. The variability within each of the treads (within-variability) and between the treads (between-variability) could be quantified. Different statistical methods have shown that within-variability is lower than between-variability, which helped differentiate these tires. Ten tire traces were produced with tires of different brands and/or models by braking tests. These traces have been adequately sampled using sheets of gelatine. Particles of each trace were analysed using the same methodology as for the tires at their origin. The general chemical profile of a trace or of a tire has been characterized by eighty-six compounds. Based on a statistical comparison of the chemical profiles obtained, it has been shown that a tire trace is not differentiable from the tire at its origin but is generally differentiable from tires that are not at its origin. Thereafter, a sample containing sixty tires was analysed to assess the discrimination potential of the developed methodology. The statistical results showed that most of the tires of different brands and models are differentiable. However, tires of the same brand and model with identical characteristics, such as country of manufacture, size and DOT number, are not differentiable. A model, based on a likelihood ratio approach, was chosen to evaluate the results of the comparisons between the chemical profiles of the traces and tires. The methodology developed was finally blindly tested using three simulated scenarios. Each scenario involved a trace of an unknown tire as well as two tires possibly at its origin. The correct results for the three scenarios were used to validate the developed methodology. The different steps of this work were useful to collect the required information to test and validate the underlying assumption that it is possible to help determine if an offending tire » or is not at the origin of a trace, by means of a statistical comparison of their chemical profile. This aid was formalized by a measure of the probative value of the evidence, which is represented by the chemical profile of the trace of the tire. - Ce travail s'est proposé de développer une méthodologie pour l'exploitation des caractéristiques chimiques des traces de pneumatiques dans le but d'aider à répondre à la question suivante : «Est-ce que le pneumatique incriminé est ou n'est pas à l'origine de la trace relevée sur les lieux ? ». Cette méthodologie s'est intéressée du prélèvement de la trace de pneumatique sur la chaussée à l'exploitation statistique de ses caractéristiques chimiques. L'acquisition de connaissances sur la composition et la fabrication de la bande de roulement des pneumatiques ainsi que la revue de techniques instrumentales utilisées pour l'analyse de matériaux polymériques ont permis de choisir, comme technique analytique pour la présente recherche, la pyrolyse couplée à un chromatographe en phase gazeuse avec un détecteur de spectrométrie de masse (Py-GC/MS). Une méthode analytique a été développée et optimisée afin d'obtenir la plus faible variabilité entre les réplicas d'un même échantillon. L'évaluation de l'intravariabilité de la bande de roulement a été entreprise dans sa largeur et sa circonférence à l'aide de plusieurs prélèvements effectués sur douze pneumatiques de marques et/ou modèles différents. La variabilité au sein de chacune des bandes de roulement (intravariabilité) ainsi qu'entre les bandes de roulement considérées (intervariabilité) a pu être quantifiée. Les différentes méthodes statistiques appliquées ont montré que l'intravariabilité est plus faible que l'intervariabilité, ce qui a permis de différencier ces pneumatiques. Dix traces de pneumatiques ont été produites à l'aide de pneumatiques de marques et/ou modèles différents en effectuant des tests de freinage. Ces traces ont pu être adéquatement prélevées à l'aide de feuilles de gélatine. Des particules de chaque trace ont été analysées selon la même méthodologie que pour les pneumatiques à leur origine. Le profil chimique général d'une trace de pneumatique ou d'un pneumatique a été caractérisé à l'aide de huitante-six composés. Sur la base de la comparaison statistique des profils chimiques obtenus, il a pu être montré qu'une trace de pneumatique n'est pas différenciable du pneumatique à son origine mais est, généralement, différenciable des pneumatiques qui ne sont pas à son origine. Par la suite, un échantillonnage comprenant soixante pneumatiques a été analysé afin d'évaluer le potentiel de discrimination de la méthodologie développée. Les méthodes statistiques appliquées ont mis en évidence que des pneumatiques de marques et modèles différents sont, majoritairement, différenciables entre eux. La méthodologie développée présente ainsi un bon potentiel de discrimination. Toutefois, des pneumatiques de la même marque et du même modèle qui présentent des caractéristiques PTD (i.e. pays de fabrication, taille et numéro DOT) identiques ne sont pas différenciables. Un modèle d'évaluation, basé sur une approche dite du likelihood ratio, a été adopté pour apporter une signification au résultat des comparaisons entre les profils chimiques des traces et des pneumatiques. La méthodologie mise en place a finalement été testée à l'aveugle à l'aide de la simulation de trois scénarios. Chaque scénario impliquait une trace de pneumatique inconnue et deux pneumatiques suspectés d'être à l'origine de cette trace. Les résultats corrects obtenus pour les trois scénarios ont permis de valider la méthodologie développée. Les différentes étapes de ce travail ont permis d'acquérir les informations nécessaires au test et à la validation de l'hypothèse fondamentale selon laquelle il est possible d'aider à déterminer si un pneumatique incriminé est ou n'est pas à l'origine d'une trace, par le biais d'une comparaison statistique de leur profil chimique. Cette aide a été formalisée par une mesure de la force probante de l'indice, qui est représenté par le profil chimique de la trace de pneumatique.
Resumo:
We validated the polymerase chain reaction (PCR) with a composite reference standard in 61 patients clinically suspected of having mucosal leishmaniasis, 36 of which were cases and 25 were non-cases according to this reference standard. Patient classification and test application were carried out independently by two blind observers. One pair of primers was used to amplify a fragment of 120 bp in the conserved region of kDNA and another pair was used to amplify the internal transcript spacers (ITS) rDNA. PCR showed 68.6% (95% CI 59.2-72.6) sensitivity and 92% (95% CI 78.9-97.7) specificity; positive likelihood ratio: 8.6 (95% CI 2.8-31.3) and negative likelihood ratio: 0.3 (95% CI 0.3-0.5), when kDNA molecular target was amplified. The test performed better on sensitivity using this target compared to the ITS rDNA molecular target which showed 40% (95% CI 31.5-42.3) sensitivity and 96% (95% CI 84.1-99.3) specificity; positive likelihood ratio: 10 (95% CI 2.0-58.8) and negative likelihood ratio: 0.6 (95% CI 0.6-0.8). The inter-observer agreement was excellent for both tests. Based upon results obtained and due to low performance of conventional methods for diagnosing mucosal leishmaniasis, we consider PCR with kDNA as molecular target is a useful diagnostic test and the ITS rDNA molecular target is useful when the aim is to identify species.
Resumo:
The Hardy-Weinberg law, formulated about 100 years ago, states that under certainassumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur inthe proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p.There are many statistical tests being used to check whether empirical marker data obeys theHardy-Weinberg principle. Among these are the classical xi-square test (with or withoutcontinuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combinationwith Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE)are numerical in nature, requiring the computation of a test statistic and a p-value.There is however, ample space for the use of graphics in HWE tests, in particular for the ternaryplot. Nowadays, many genetical studies are using genetical markers known as SingleNucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the countsone typically computes genotype frequencies and allele frequencies. These frequencies satisfythe unit-sum constraint, and their analysis therefore falls within the realm of compositional dataanalysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotypefrequencies can be adequately represented in a ternary plot. Compositions that are in exactHWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected ina statistical test are typically “close" to the parabola, whereas compositions that differsignificantly from HWE are “far". By rewriting the statistics used to test for HWE in terms ofheterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted inthe ternary plot. This way, compositions can be tested for HWE purely on the basis of theirposition in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphicalrepresentations where large numbers of SNPs can be tested for HWE in a single graph. Severalexamples of graphical tests for HWE (implemented in R software), will be shown, using SNPdata from different human populations
Resumo:
RESUME Le diagnostic d'infection tuberculeuse repose essentiellement sur le test tuberculinique (test de Mantoux). Cependant, le résultat de ce dernier est également influencé par d'autres facteurs, le plus important étant la vaccination par le Bacille Calmette-Guérin (BCG), interaction connue depuis de nombreuses années. Il est généralement admis que l'effet de la vaccination peut entraîner des réactions positives jusqu'à un diamètre d'induration de 15 mm. Au-delà, la positivité du test est en général attribuée à une primo-infection tuberculeuse. Peu d'études se sont réellement penchées sur le sujet. Chez le personnel de soins soumis à des Mantoux répétés, cette notion revêt une importance particulière pour interpréter correctement une réaction fortement positive en l'absence de facteurs de risque tuberculeux, dans un pays à faible endémie tuberculeuse. Notre étude a cherché à déterminer si le diamètre transversal de l'induration du Mantoux était un critère fiable pour distinguer une positivité associée à une infection tuberculeuse de celle associée à une ancienne vaccination. Elle s'est attachée à rechercher un seuil au-delà duquel l'infection tuberculeuse pourrait être considérée comme probable. Entre janvier 1991 et mars 1998, tous les nouveaux employés du CHUV ont été invités à recevoir un test tuberculinique à l'occasion de leur visite d'entrée à la Médecine du personnel. En cas de réponse négative, un deuxième test a été pratiqué une semaine plus tard, pour détecter un éventuel effet booster. Lors de la première visite, l'infirmière a rempli un questionnaire comprenant les données démographiques usuelles, des informations concernant les facteurs pouvant influencer la positivité du test, notamment les antécédents de vaccination par le BCG, les expositions à la tuberculose et l'existence d'antécédents d'infection tuberculeuse. Parmi les 5117 sujets inclus dans l'étude, nous avons trouvé que l'influence de la vaccination variait en fonction de l'âge. Chez les sujets de moins de 40 ans, la vaccination par le BCG était le prédicteur le plus important d'un Mantoux positif inférieur à 18 mm, de loin supérieur aux facteurs de risque habituels pour une infection tuberculeuse, eux aussi significatifs. L'effet du BCG était présent pour des réactions allant jusqu'à 20 mm. Pour les Mantoux supérieurs à 20 mm, l'odds ratio (OR) relatif au BCG demeure clairement élevé (supérieur à 3,4) bien que non significatif. Par contre, pour les employés âgés de plus de 40 ans, le BCG est un facteur prédictif pour les tests supérieurs à 10 mm (OR 2.4) mais n'est plus un facteur significatif pour une taille supérieure à 15 mm. Ces résultats montrent que l'interprétation d'un test tuberculinique même fortement positif, doit être faite avec prudence et discernement. En effet, notre étude démontre que chez les sujets vaccinés de moins de 40 ans, dans les zones de faible endémie tuberculeuse particulièrement en l'absence de facteurs de risque pour une infection tuberculeuse, un Mantoux positif jusqu'à 18 mm est dû, le plus probablement, à une ancienne vaccination par le BCG, plutôt qu'à une infection par M tuberczilosis. L'interprétation des Mantoux de taille inférieure à 18 mm et les Mantoux effectués chez des sujets de moins de 40 ans, doit prendre en compte l'existence d'un BCG antérieur. En conséquence, la mise en évidence d'une réaction de Mantoux fortement positive ne devrait pas conduire systématiquement à un traitement préventif. L'absence de spécificité du test Mantoux, utilisé pour le dépistage de la tuberculose depuis bientôt une centaine d'année, est un problème connu. Nous démontrons que la taille de l'induration ne peut pas être utilisée de façon fiable comme critère pour identifier une infection tuberculeuse chez une personne vaccinée avec le BCG, avec le risque de sui-traiter un nombre important de sujets. Dans notre étude, 21% des sujets avaient un Mantoux supérieur ou égal à 15 mm et auraient dû être traités selon les recommandations en vigueur en Suisse si l'on ne tenait pas compte du BCG antérieur. Des tests plus spécifiques sont actuellement à l'étude et permettront vraisemblablement, à l'avenir, de palier au problème de l'absence de spécificité du test de Mantoux. Abstract : Background. Previous bacillus Calmette-Guerin (BCG) vaccination can confound the results of a tuberculin skin test (TST). We sought to determine a cutoff diameter of TST induration beyond which the influence of BCG vaccination was negligible in evaluating potential Mycobacterium tuberculosis infection in a population of health care workers with a high vaccination rate and low incidence of tuberculosis. Methods. From 1991 through 1998, all new employees at the University Hospital of Lausanne, Switzerland, underwent a 2-step TST at entry visit. We also gathered information on demographic characteristics, along with factors commonly associated with tuberculin positivity, including previous BCG vaccination, history of latent M. tuberculosis infection, and predictors for M. tuberculosis infection. Results. Among the 5117 investigated subjects, we found that influence of BCG vaccination on TST results varied across categories of age (likelihood ratio test, 0.0001). Prior BCG vaccination had a strong influence on skin test results of mm in diameter among persons <40 years old, compared with the influence of factors predictive of M. tuberculosis infection. Prior latent M. tuberculosis infection and travel or employment in a country in which tuberculosis is endemic also had significant influences. Conclusions. Interpretation of TST reactions of mm among BCG-vaccinated persons <40 years of age must be done with caution in areas with a low incidence of tuberculosis. In such a population, except for persons who have never been vaccinated, TST reactions of ---518 mm are more likely to be the result of prior vaccination than infection and should not systematically lead to preventive treatment.
Resumo:
This paper analyzes whether standard covariance matrix tests work whendimensionality is large, and in particular larger than sample size. Inthe latter case, the singularity of the sample covariance matrix makeslikelihood ratio tests degenerate, but other tests based on quadraticforms of sample covariance matrix eigenvalues remain well-defined. Westudy the consistency property and limiting distribution of these testsas dimensionality and sample size go to infinity together, with theirratio converging to a finite non-zero limit. We find that the existingtest for sphericity is robust against high dimensionality, but not thetest for equality of the covariance matrix to a given matrix. For thelatter test, we develop a new correction to the existing test statisticthat makes it robust against high dimensionality.
Resumo:
INTRODUCTION: A clinical decision rule to improve the accuracy of a diagnosis of influenza could help clinicians avoid unnecessary use of diagnostic tests and treatments. Our objective was to develop and validate a simple clinical decision rule for diagnosis of influenza. METHODS: We combined data from 2 studies of influenza diagnosis in adult outpatients with suspected influenza: one set in California and one in Switzerland. Patients in both studies underwent a structured history and physical examination and had a reference standard test for influenza (polymerase chain reaction or culture). We randomly divided the dataset into derivation and validation groups and then evaluated simple heuristics and decision rules from previous studies and 3 rules based on our own multivariate analysis. Cutpoints for stratification of risk groups in each model were determined using the derivation group before evaluating them in the validation group. For each decision rule, the positive predictive value and likelihood ratio for influenza in low-, moderate-, and high-risk groups, and the percentage of patients allocated to each risk group, were reported. RESULTS: The simple heuristics (fever and cough; fever, cough, and acute onset) were helpful when positive but not when negative. The most useful and accurate clinical rule assigned 2 points for fever plus cough, 2 points for myalgias, and 1 point each for duration <48 hours and chills or sweats. The risk of influenza was 8% for 0 to 2 points, 30% for 3 points, and 59% for 4 to 6 points; the rule performed similarly in derivation and validation groups. Approximately two-thirds of patients fell into the low- or high-risk group and would not require further diagnostic testing. CONCLUSION: A simple, valid clinical rule can be used to guide point-of-care testing and empiric therapy for patients with suspected influenza.
Resumo:
The package HIERFSTAT for the statistical software R, created by the R Development Core Team, allows the estimate of hierarchical F-statistics from a hierarchy with any numbers of levels. In addition, it allows testing the statistical significance of population differentiation for these different levels, using a generalized likelihood-ratio test. The package HIERFSTAT is available at http://www.unil.ch/popgen/softwares/hierfstat.htm.
Resumo:
BACKGROUND: Pneumonia is the biggest cause of deaths in young children in developing countries, but early diagnosis and intervention can effectively reduce mortality. We aimed to assess the diagnostic value of clinical signs and symptoms to identify radiological pneumonia in children younger than 5 years and to review the accuracy of WHO criteria for diagnosis of clinical pneumonia. METHODS: We searched Medline (PubMed), Embase (Ovid), the Cochrane Database of Systematic Reviews, and reference lists of relevant studies, without date restrictions, to identify articles assessing clinical predictors of radiological pneumonia in children. Selection was based on: design (diagnostic accuracy studies), target disease (pneumonia), participants (children aged <5 years), setting (ambulatory or hospital care), index test (clinical features), and reference standard (chest radiography). Quality assessment was based on the 2011 Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) criteria. For each index test, we calculated sensitivity and specificity and, when the tests were assessed in four or more studies, calculated pooled estimates with use of bivariate model and hierarchical summary receiver operation characteristics plots for meta-analysis. FINDINGS: We included 18 articles in our analysis. WHO-approved signs age-related fast breathing (six studies; pooled sensitivity 0·62, 95% CI 0·26-0·89; specificity 0·59, 0·29-0·84) and lower chest wall indrawing (four studies; 0·48, 0·16-0·82; 0·72, 0·47-0·89) showed poor diagnostic performance in the meta-analysis. Features with the highest pooled positive likelihood ratios were respiratory rate higher than 50 breaths per min (1·90, 1·45-2·48), grunting (1·78, 1·10-2·88), chest indrawing (1·76, 0·86-3·58), and nasal flaring (1·75, 1·20-2·56). Features with the lowest pooled negative likelihood ratio were cough (0·30, 0·09-0·96), history of fever (0·53, 0·41-0·69), and respiratory rate higher than 40 breaths per min (0·43, 0·23-0·83). INTERPRETATION: Not one clinical feature was sufficient to diagnose pneumonia definitively. Combination of clinical features in a decision tree might improve diagnostic performance, but the addition of new point-of-care tests for diagnosis of bacterial pneumonia would help to attain an acceptable level of accuracy. FUNDING: Swiss National Science Foundation.
Resumo:
The evaluation of forensic evidence can occur at any level within the hierarchy of propositions depending on the question being asked and the amount and type of information that is taken into account within the evaluation. Commonly DNA evidence is reported given propositions that deal with the sub-source level in the hierarchy, which deals only with the possibility that a nominated individual is a source of DNA in a trace (or contributor to the DNA in the case of a mixed DNA trace). We explore the use of information obtained from examinations, presumptive and discriminating tests for body fluids, DNA concentrations and some case circumstances within a Bayesian network in order to provide assistance to the Courts that have to consider propositions at source level. We use a scenario in which the presence of blood is of interest as an exemplar and consider how DNA profiling results and the potential for laboratory error can be taken into account. We finish with examples of how the results of these reports could be presented in court using either numerical values or verbal descriptions of the results.
Resumo:
In 2003, prostate cancer (PCa) is estimated to be the most commonly diagnosed cancer and third leading cause of cancer death in Canada. During PCa population screening, approximately 25% of patients with a normal digital rectal examination (DRE) and intermediate serum prostate specific antigen (PSA) level have PCa. Since all patients typically undergo biopsy, it is expected that approximately 75% of these procedures are unnecessary. The purpose of this study was to compare the degree of efficacy of clinical tests and algorithms in stage II screening for PCa while preventing unnecessary biopsies from occurring. The sample consisted of 201 consecutive men who were suspected of PCa based on the results of a DRE and serum PSA. These men were referred for venipuncture and transrectal ultrasound (TRUS). Clinical tests included TRUS, agespecific reference range PSA (Age-PSA), prostate specific antigen density (PSAD), and free-to-total prostate specific antigen ratio (%fPSA). Clinical results were evaluated individually and within algorithms. Cutoffs of 0.12 and 0.15 ng/ml/cc were employed for PSAD. Cutoffs that would provide a minimum sensitivity of 0.90 and 0.95, respectively were utilized for %fPSA. Statistical analysis included ROC curve analysis, calculated sensitivity (Sens), specificity (Spec), and positive likelihood ratio (LR), with corresponding confidence intervals (Cl). The %fPSA, at a 23% cutoff ({ Sens=0.92; CI, 0.06}, {Spec=0.4l; CI, 0.09}, {LR=1.56; CI, O.ll}), proved to be the most efficacious independent clinical test. The combination of PSAD (cutoff 0.15 ng/ml/cc) and %fPSA (cutoff 23%) ({Sens=0.93; CI, 0.06}, {Spec=0.38; CI, 0.08}, {LR=1.50; CI, 0.10}) was the most efficacious clinical algorithm. This study advocates the use of %fPSA at a cutoff of 23% when screening patients with an intermediate serum PSA and benign DRE.
Resumo:
In this paper we propose exact likelihood-based mean-variance efficiency tests of the market portfolio in the context of Capital Asset Pricing Model (CAPM), allowing for a wide class of error distributions which include normality as a special case. These tests are developed in the frame-work of multivariate linear regressions (MLR). It is well known however that despite their simple statistical structure, standard asymptotically justified MLR-based tests are unreliable. In financial econometrics, exact tests have been proposed for a few specific hypotheses [Jobson and Korkie (Journal of Financial Economics, 1982), MacKinlay (Journal of Financial Economics, 1987), Gib-bons, Ross and Shanken (Econometrica, 1989), Zhou (Journal of Finance 1993)], most of which depend on normality. For the gaussian model, our tests correspond to Gibbons, Ross and Shanken’s mean-variance efficiency tests. In non-gaussian contexts, we reconsider mean-variance efficiency tests allowing for multivariate Student-t and gaussian mixture errors. Our framework allows to cast more evidence on whether the normality assumption is too restrictive when testing the CAPM. We also propose exact multivariate diagnostic checks (including tests for multivariate GARCH and mul-tivariate generalization of the well known variance ratio tests) and goodness of fit tests as well as a set estimate for the intervening nuisance parameters. Our results [over five-year subperiods] show the following: (i) multivariate normality is rejected in most subperiods, (ii) residual checks reveal no significant departures from the multivariate i.i.d. assumption, and (iii) mean-variance efficiency tests of the market portfolio is not rejected as frequently once it is allowed for the possibility of non-normal errors.