21 resultados para 010401 Applied Statistics
em CentAUR: Central Archive University of Reading - UK
Resumo:
Many time series are measured monthly, either as averages or totals, and such data often exhibit seasonal variability-the values of the series are consistently larger for some months of the year than for others. A typical series of this type is the number of deaths each month attributed to SIDS (Sudden Infant Death Syndrome). Seasonality can be modelled in a number of ways. This paper describes and discusses various methods for modelling seasonality in SIDS data, though much of the discussion is relevant to other seasonally varying data. There are two main approaches, either fitting a circular probability distribution to the data, or using regression-based techniques to model the mean seasonal behaviour. Both are discussed in this paper.
Resumo:
In this paper we consider the estimation of population size from onesource capture–recapture data, that is, a list in which individuals can potentially be found repeatedly and where the question is how many individuals are missed by the list. As a typical example, we provide data from a drug user study in Bangkok from 2001 where the list consists of drug users who repeatedly contact treatment institutions. Drug users with 1, 2, 3, . . . contacts occur, but drug users with zero contacts are not present, requiring the size of this group to be estimated. Statistically, these data can be considered as stemming from a zero-truncated count distribution.We revisit an estimator for the population size suggested by Zelterman that is known to be robust under potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a locally truncated Poisson likelihood which is equivalent to a binomial likelihood. This result allows the extension of the Zelterman estimator by means of logistic regression to include observed heterogeneity in the form of covariates. We also review an estimator proposed by Chao and explain why we are not able to obtain similar results for this estimator. The Zelterman estimator is applied in two case studies, the first a drug user study from Bangkok, the second an illegal immigrant study in the Netherlands. Our results suggest the new estimator should be used, in particular, if substantial unobserved heterogeneity is present.
A hierarchical Bayesian model for predicting the functional consequences of amino-acid polymorphisms
Resumo:
Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.
Resumo:
The paper concerns the design and analysis of serial dilution assays to estimate the infectivity of a sample of tissue when it is assumed that the sample contains a finite number of indivisible infectious units such that a subsample will be infectious if it contains one or more of these units. The aim of the study is to estimate the number of infectious units in the original sample. The standard approach to the analysis of data from such a study is based on the assumption of independence of aliquots both at the same dilution level and at different dilution levels, so that the numbers of infectious units in the aliquots follow independent Poisson distributions. An alternative approach is based on calculation of the expected value of the total number of samples tested that are not infectious. We derive the likelihood for the data on the basis of the discrete number of infectious units, enabling calculation of the maximum likelihood estimate and likelihood-based confidence intervals. We use the exact probabilities that are obtained to compare the maximum likelihood estimate with those given by the other methods in terms of bias and standard error and to compare the coverage of the confidence intervals. We show that the methods have very similar properties and conclude that for practical use the method that is based on the Poisson assumption is to be recommended, since it can be implemented by using standard statistical software. Finally we consider the design of serial dilution assays, concluding that it is important that neither the dilution factor nor the number of samples that remain untested should be too large.
Resumo:
Survival times for the Acacia mangium plantation in the Segaliud Lokan Project, Sabah, East Malaysia were analysed based on 20 permanent sample plots (PSPs) established in 1988 as a spacing experiment. The PSPs were established following a complete randomized block design with five levels of spacing randomly assigned to units within four blocks at different sites. The survival times of trees in years are of interest. Since the inventories were only conducted annually, the actual survival time for each tree was not observed. Hence, the data set comprises censored survival times. Initial analysis of the survival of the Acacia mangium plantation suggested there is block by spacing interaction; a Weibull model gives a reasonable fit to the replicate survival times within each PSP; but a standard Weibull regression model is inappropriate because the shape parameter differs between PSPs. In this paper we investigate the form of the non-constant Weibull shape parameter. Parsimonious models for the Weibull survival times have been derived using maximum likelihood methods. The factor selection for the parameters is based on a backward elimination procedure. The models are compared using likelihood ratio statistics. The results suggest that both Weibull parameters depend on spacing and block.
Resumo:
Objective: To assess the effect on growth and iron status in preterm infants of a specially devised weaning strategy compared with current best practices in infant feeding. The preterm weaning strategy recommended the early onset of weaning and the use of foods with a higher energy and protein content than standard milk formula, and foods that are rich sources of iron and zinc. Subjects and design: In a blinded, controlled study, 68 preterm infants (mean (SD) birth weight 1470 (430) g and mean (SD) gestational age 31.3 (2.9) weeks) were randomised to either the preterm weaning strategy group (n = 37) or a current best practice control group (n = 31), from hospital discharge until 1 year gestation corrected age (GCA). Main outcome measures: Weight, supine length, occipitofrontal head circumference, and intakes of energy, protein, and minerals were determined at 0, 6, and 12 months GCA. Levels of haemoglobin, serum iron, and serum ferritin were assayed at 0 and 6 months GCA. Results: Significant positive effects of treatment included: greater increase in standard deviation length scores and length growth velocity; increased intake of energy, protein, and carbohydrate at 6 months GCA and iron at 12 months GCA; increased haemoglobin and serum iron levels at 6 months GCA. Conclusions: The preterm weaning strategy significantly influenced dietary intakes with consequent beneficial effects on growth in length and iron status. This strategy should be adopted as the basis of feeding guidelines for preterm infants after hospital discharge. School of Applied Statistics Faculty of Life Sciences
Resumo:
Many recent papers have documented periodicities in returns, return volatility, bid–ask spreads and trading volume, in both equity and foreign exchange markets. We propose and employ a new test for detecting subtle periodicities in time series data based on a signal coherence function. The technique is applied to a set of seven half-hourly exchange rate series. Overall, we find the signal coherence to be maximal at the 8-h and 12-h frequencies. Retaining only the most coherent frequencies for each series, we implement a trading rule that is based on these observed periodicities. Our results demonstrate in all cases except one that, in gross terms, the rules can generate returns that are considerably greater than those of a buy-and-hold strategy, although they cannot retain their profitability net of transactions costs. We conjecture that this methodology could constitute an important tool for financial market researchers which will enable them to detect, quantify and rank the various periodic components in financial data better.