550 resultados para SMOOTHING SPLINES
Resumo:
In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion (AIC) have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is no longer an asymptotically unbiased estimator of the Akaike information, and in fact favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that leads to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. An implementation in an R package is provided. All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia.
Resumo:
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).
Resumo:
Numerous time series studies have provided strong evidence of an association between increased levels of ambient air pollution and increased levels of hospital admissions, typically at 0, 1, or 2 days after an air pollution episode. An important research aim is to extend existing statistical models so that a more detailed understanding of the time course of hospitalization after exposure to air pollution can be obtained. Information about this time course, combined with prior knowledge about biological mechanisms, could provide the basis for hypotheses concerning the mechanism by which air pollution causes disease. Previous studies have identified two important methodological questions: (1) How can we estimate the shape of the distributed lag between increased air pollution exposure and increased mortality or morbidity? and (2) How should we estimate the cumulative population health risk from short-term exposure to air pollution? Distributed lag models are appropriate tools for estimating air pollution health effects that may be spread over several days. However, estimation for distributed lag models in air pollution and health applications is hampered by the substantial noise in the data and the inherently weak signal that is the target of investigation. We introduce an hierarchical Bayesian distributed lag model that incorporates prior information about the time course of pollution effects and combines information across multiple locations. The model has a connection to penalized spline smoothing using a special type of penalty matrix. We apply the model to estimating the distributed lag between exposure to particulate matter air pollution and hospitalization for cardiovascular and respiratory disease using data from a large United States air pollution and hospitalization database of Medicare enrollees in 94 counties covering the years 1999-2002.
Resumo:
Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.
Resumo:
Recurrent event data are largely characterized by the rate function but smoothing techniques for estimating the rate function have never been rigorously developed or studied in statistical literature. This paper considers the moment and least squares methods for estimating the rate function from recurrent event data. With an independent censoring assumption on the recurrent event process, we study statistical properties of the proposed estimators and propose bootstrap procedures for the bandwidth selection and for the approximation of confidence intervals in the estimation of the occurrence rate function. It is identified that the moment method without resmoothing via a smaller bandwidth will produce curve with nicks occurring at the censoring times, whereas there is no such problem with the least squares method. Furthermore, the asymptotic variance of the least squares estimator is shown to be smaller under regularity conditions. However, in the implementation of the bootstrap procedures, the moment method is computationally more efficient than the least squares method because the former approach uses condensed bootstrap data. The performance of the proposed procedures is studied through Monte Carlo simulations and an epidemiological example on intravenous drug users.
Resumo:
A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.
Resumo:
Visual fixation is employed by humans and some animals to keep a specific 3D location at the center of the visual gaze. Inspired by this phenomenon in nature, this paper explores the idea to transfer this mechanism to the context of video stabilization for a handheld video camera. A novel approach is presented that stabilizes a video by fixating on automatically extracted 3D target points. This approach is different from existing automatic solutions that stabilize the video by smoothing. To determine the 3D target points, the recorded scene is analyzed with a stateof- the-art structure-from-motion algorithm, which estimates camera motion and reconstructs a 3D point cloud of the static scene objects. Special algorithms are presented that search either virtual or real 3D target points, which back-project close to the center of the image for as long a period of time as possible. The stabilization algorithm then transforms the original images of the sequence so that these 3D target points are kept exactly in the center of the image, which, in case of real 3D target points, produces a perfectly stable result at the image center. Furthermore, different methods of additional user interaction are investigated. It is shown that the stabilization process can easily be controlled and that it can be combined with state-of-theart tracking techniques in order to obtain a powerful image stabilization tool. The approach is evaluated on a variety of videos taken with a hand-held camera in natural scenes.
Resumo:
BACKGROUND Results of epidemiological studies linking census with mortality records may be affected by unlinked deaths and changes in cause of death classification. We examined these issues in the Swiss National Cohort (SNC). METHODS The SNC is a longitudinal study of the entire Swiss population, based on the 1990 (6.8 million persons) and 2000 (7.3 million persons) censuses. Among 1,053,393 deaths recorded 1991-2007 5.4% could not be linked using stringent probabilistic linkage. We included the unlinked deaths using pragmatic linkages and compared mortality rates for selected causes with official mortality rates. We also examined the impact of the 1995 change in cause of death coding from version 8 (with some additional rules) to version 10 of the International Classification of Diseases (ICD), using Poisson regression models with restricted cubic splines. Finally, we compared results from Cox models including and excluding unlinked deaths of the association of education, marital status, and nationality with selected causes of death. RESULTS SNC mortality rates underestimated all cause mortality by 9.6% (range 2.4%-17.9%) in the 85+ population. Underestimation was less pronounced in years nearer the censuses and in the 75-84 age group. After including 99.7% of unlinked deaths, annual all cause SNC mortality rates were reflecting official rates (relative difference between -1.4% and +1.8%). In the 85+ population the rates for prostate and breast cancer dropped, by 16% and 21% respectively, between 1994 and 1995 coincident with the change in cause of death coding policy. For suicide in males almost no change was observed. Hazard ratios were only negligibly affected by including the unlinked deaths. A sudden decrease in breast (21% less, 95% confidence interval: 12%-28%) and prostate (16% less, 95% confidence interval: 7%-23%) cancer mortality rates in the 85+ population coincided with the 1995 change in cause of death coding policy. CONCLUSIONS Unlinked deaths bias analyses of absolute mortality rates downwards but have little effect on relative mortality. To describe time trends of cause-specific mortality in the SNC, accounting for the unlinked deaths and for the possible effect of change in death certificate coding was necessary.
Resumo:
Stochastic models for three-dimensional particles have many applications in applied sciences. Lévy–based particle models are a flexible approach to particle modelling. The structure of the random particles is given by a kernel smoothing of a Lévy basis. The models are easy to simulate but statistical inference procedures have not yet received much attention in the literature. The kernel is not always identifiable and we suggest one approach to remedy this problem. We propose a method to draw inference about the kernel from data often used in local stereology and study the performance of our approach in a simulation study.
Resumo:
BACKGROUND Lead exposure is associated with low birth-weight. The objective of this study is to determine whether lead exposure is associated with lower body weight in children, adolescents and adults. METHODS We analyzed data from NHANES 1999-2006 for participants aged ≥3 using multiple logistic and multivariate linear regression. Using age- and sex-standardized BMI Z-scores, overweight and obese children (ages 3-19) were classified by BMI ≥85 th and ≥95 th percentiles, respectively. The adult population (age ≥20) was classified as overweight and obese with BMI measures of 25-29.9 and ≥30, respectively. Blood lead level (BLL) was categorized by weighted quartiles. RESULTS Multivariate linear regressions revealed a lower BMI Z-score in children and adolescents when the highest lead quartile was compared to the lowest lead quartile (β (SE)=-0.33 (0.07), p<0.001), and a decreased BMI in adults (β (SE)=-2.58 (0.25), p<0.001). Multiple logistic analyses in children and adolescents found a negative association between BLL and the percentage of obese and overweight with BLL in the highest quartile compared to the lowest quartile (OR=0.42, 95% CI: 0.30-0.59; and OR=0.67, 95% CI: 0.52-0.88, respectively). Adults in the highest lead quartile were less likely to be obese (OR=0.42, 95% CI: 0.35-0.50) compared to those in the lowest lead quartile. Further analyses with blood lead as restricted cubic splines, confirmed the dose-relationship between blood lead and body weight outcomes. CONCLUSIONS BLLs are associated with lower body mass index and obesity in children, adolescents and adults.
Resumo:
The considerable search for synergistic agents in cancer research is motivated by the therapeutic benefits achieved by combining anti-cancer agents. Synergistic agents make it possible to reduce dosage while maintaining or enhancing a desired effect. Other favorable outcomes of synergistic agents include reduction in toxicity and minimizing or delaying drug resistance. Dose-response assessment and drug-drug interaction analysis play an important part in the drug discovery process, however analysis are often poorly done. This dissertation is an effort to notably improve dose-response assessment and drug-drug interaction analysis. The most commonly used method in published analysis is the Median-Effect Principle/Combination Index method (Chou and Talalay, 1984). The Median-Effect Principle/Combination Index method leads to inefficiency by ignoring important sources of variation inherent in dose-response data and discarding data points that do not fit the Median-Effect Principle. Previous work has shown that the conventional method yields a high rate of false positives (Boik, Boik, Newman, 2008; Hennessey, Rosner, Bast, Chen, 2010) and, in some cases, low power to detect synergy. There is a great need for improving the current methodology. We developed a Bayesian framework for dose-response modeling and drug-drug interaction analysis. First, we developed a hierarchical meta-regression dose-response model that accounts for various sources of variation and uncertainty and allows one to incorporate knowledge from prior studies into the current analysis, thus offering a more efficient and reliable inference. Second, in the case that parametric dose-response models do not fit the data, we developed a practical and flexible nonparametric regression method for meta-analysis of independently repeated dose-response experiments. Third, and lastly, we developed a method, based on Loewe additivity that allows one to quantitatively assess interaction between two agents combined at a fixed dose ratio. The proposed method makes a comprehensive and honest account of uncertainty within drug interaction assessment. Extensive simulation studies show that the novel methodology improves the screening process of effective/synergistic agents and reduces the incidence of type I error. We consider an ovarian cancer cell line study that investigates the combined effect of DNA methylation inhibitors and histone deacetylation inhibitors in human ovarian cancer cell lines. The hypothesis is that the combination of DNA methylation inhibitors and histone deacetylation inhibitors will enhance antiproliferative activity in human ovarian cancer cell lines compared to treatment with each inhibitor alone. By applying the proposed Bayesian methodology, in vitro synergy was declared for DNA methylation inhibitor, 5-AZA-2'-deoxycytidine combined with one histone deacetylation inhibitor, suberoylanilide hydroxamic acid or trichostatin A in the cell lines HEY and SKOV3. This suggests potential new epigenetic therapies in cell growth inhibition of ovarian cancer cells.
Resumo:
One common assumption in interpreting ice-core CO(2) records is that diffusion in the ice does not affect the concentration profile. However, this assumption remains untested because the extremely small CO(2) diffusion coefficient in ice has not been accurately determined in the laboratory. In this study we take advantage of high levels of CO(2) associated with refrozen layers in an ice core from Siple Dome, Antarctica, to study CO(2) diffusion rates. We use noble gases (Xe/Ar and Kr/Ar), electrical conductivity and Ca(2+) ion concentrations to show that substantial CO(2) diffusion may occur in ice on timescales of thousands of years. We estimate the permeation coefficient for CO(2) in ice is similar to 4 x 10(-21) mol m(-1) s(-1) Pa(-1) at -23 degrees C in the top 287 m (corresponding to 2.74 kyr). Smoothing of the CO(2) record by diffusion at this depth/age is one or two orders of magnitude smaller than the smoothing in the firn. However, simulations for depths of similar to 930-950m (similar to 60-70 kyr) indicate that smoothing of the CO(2) record by diffusion in deep ice is comparable to smoothing in the firn. Other types of diffusion (e.g. via liquid in ice grain boundaries or veins) may also be important but their influence has not been quantified.
Resumo:
BACKGROUND Rising levels of overweight and obesity are important public-health concerns worldwide. The purpose of this study is to elucidate their prevalence and trends in Switzerland by analyzing variations in Body Mass Index (BMI) of Swiss conscripts. METHODS The conscription records were provided by the Swiss Army. This study focussed on conscripts 18.5-20.5 years of age from the seven one-year birth cohorts spanning the period 1986-1992. BMI across professional status, area-based socioeconomic position (abSEP), urbanicity and regions was analyzed. Two piecewise quantile regression models with linear splines for three birth-cohort groups were used to examine the association of median BMI with explanatory variables and to determine the extent to which BMI has varied over time. RESULTS The study population consisted of 188,537 individuals. Median BMI was 22.51 kg/m2 (22.45-22.57 95% confidence interval (CI)). BMI was lower among conscripts of high professional status (-0.46 kg/m2; 95% CI: -0.50, -0.42, compared with low), living in areas of high abSEP (-0.11 kg/m2; 95% CI: -0.16, -0.07 compared to medium) and from urban communities (-0.07 kg/m2; 95% CI: -0.11, -0.03, compared with peri-urban). Comparing with Midland, median BMI was highest in the North-West (0.25 kg/m2; 95% CI: 0.19-0.30) and Central regions (0.11 kg/m2; 95% CI: 0.05-0.16) and lowest in the East (-0.19 kg/m2; 95% CI: -0.24, -0.14) and Lake Geneva regions (-0.15 kg/m2; 95% CI: -0.20, -0.09). Trajectories of regional BMI growth varied across birth cohorts, with median BMI remaining high in the Central and North-West regions, whereas stabilization and in some cases a decline were observed elsewhere. CONCLUSIONS BMI of Swiss conscripts is associated with individual and abSEP and urbanicity. Results show regional variation in the levels and temporal trajectories of BMI growth and signal their possible slowdown among recent birth cohorts.
Resumo:
BACKGROUND Estimates of the size of the undiagnosed HIV-infected population are important to understand the HIV epidemic and to plan interventions, including "test-and-treat" strategies. METHODS We developed a multi-state back-calculation model to estimate HIV incidence, time between infection and diagnosis, and the undiagnosed population by CD4 count strata, using surveillance data on new HIV and AIDS diagnoses. The HIV incidence curve was modelled using cubic splines. The model was tested on simulated data and applied to surveillance data on men who have sex with men in The Netherlands. RESULTS The number of HIV infections could be estimated accurately using simulated data, with most values within the 95% confidence intervals of model predictions. When applying the model to Dutch surveillance data, 15,400 (95% confidence interval [CI] = 15,000, 16,000) men who have sex with men were estimated to have been infected between 1980 and 2011. HIV incidence showed a bimodal distribution, with peaks around 1985 and 2005 and a decline in recent years. Mean time to diagnosis was 6.1 (95% CI = 5.8, 6.4) years between 1984 and 1995 and decreased to 2.6 (2.3, 3.0) years in 2011. By the end of 2011, 11,500 (11,000, 12,000) men who have sex with men in The Netherlands were estimated to be living with HIV, of whom 1,750 (1,450, 2,200) were still undiagnosed. Of the undiagnosed men who have sex with men, 29% (22, 37) were infected for less than 1 year, and 16% (13, 20) for more than 5 years. CONCLUSIONS This multi-state back-calculation model will be useful to estimate HIV incidence, time to diagnosis, and the undiagnosed HIV epidemic based on routine surveillance data.
Resumo:
Recently a new method to set the scale in lattice gauge theories, based on the gradient flow generated by the Wilson action, has been proposed, and the systematic errors of the new scales t0 and w0 have been investigated by various groups. The Wilson flow provides also an interesting alternative smoothing procedure particularly useful for the measurement of the topological charge as a pure gluonic observable. We show the viability of this method for N=1 supersymmetric Yang-Mills theory by analysing the configurations produced by the DESY-Muenster Collaboration. The relation between the scale and the topological charge has been investigated showing a strong correlation. We have found that the scale has a linear dependence on the topological charge, the slope of which increases decreasing the volume and the gluino mass. Moreover we have investigated this dependence as a function of the reference parameter used to define the scale: the tuning of this parameter turns out to be fundamental for a more reliable scale setting. Similar conclusions hold for the Sommer parameter r0.