885 resultados para Context data
Resumo:
This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.
Resumo:
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.
Resumo:
In evaluating the accuracy of diagnosis tests, it is common to apply two imperfect tests jointly or sequentially to a study population. In a recent meta-analysis of the accuracy of microsatellite instability testing (MSI) and traditional mutation analysis (MUT) in predicting germline mutations of the mismatch repair (MMR) genes, a Bayesian approach (Chen, Watson, and Parmigiani 2005) was proposed to handle missing data resulting from partial testing and the lack of a gold standard. In this paper, we demonstrate an improved estimation of the sensitivities and specificities of MSI and MUT by using a nonlinear mixed model and a Bayesian hierarchical model, both of which account for the heterogeneity across studies through study-specific random effects. The methods can be used to estimate the accuracy of two imperfect diagnostic tests in other meta-analyses when the prevalence of disease, the sensitivities and/or the specificities of diagnostic tests are heterogeneous among studies. Furthermore, simulation studies have demonstrated the importance of carefully selecting appropriate random effects on the estimation of diagnostic accuracy measurements in this scenario.
Resumo:
Recurrent event data are largely characterized by the rate function but smoothing techniques for estimating the rate function have never been rigorously developed or studied in statistical literature. This paper considers the moment and least squares methods for estimating the rate function from recurrent event data. With an independent censoring assumption on the recurrent event process, we study statistical properties of the proposed estimators and propose bootstrap procedures for the bandwidth selection and for the approximation of confidence intervals in the estimation of the occurrence rate function. It is identified that the moment method without resmoothing via a smaller bandwidth will produce curve with nicks occurring at the censoring times, whereas there is no such problem with the least squares method. Furthermore, the asymptotic variance of the least squares estimator is shown to be smaller under regularity conditions. However, in the implementation of the bootstrap procedures, the moment method is computationally more efficient than the least squares method because the former approach uses condensed bootstrap data. The performance of the proposed procedures is studied through Monte Carlo simulations and an epidemiological example on intravenous drug users.
Resumo:
A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.
Resumo:
The stashR package (a Set of Tools for Administering SHared Repositories) for R implements a simple key-value style database where character string keys are associated with data values. The key-value databases can be either stored locally on the user's computer or accessed remotely via the Internet. Methods specific to the stashR package allow users to share data repositories or access previously created remote data repositories. In particular, methods are available for the S4 classes localDB and remoteDB to insert, retrieve, or delete data from the database as well as to synchronize local copies of the data to the remote version of the database. Users efficiently access information from a remote database by retrieving only the data files indexed by user-specified keys and caching this data in a local copy of the remote database. The local and remote counterparts of the stashR package offer the potential to enhance reproducible research by allowing users of Sweave to cache their R computations for a research paper in a localDB database. This database can then be stored on the Internet as a remoteDB database. When readers of the research paper wish to reproduce the computations involved in creating a specific figure or calculating a specific numeric value, they can access the remoteDB database and obtain the R objects involved in the computation.
Resumo:
In medical follow-up studies, ordered bivariate survival data are frequently encountered when bivariate failure events are used as the outcomes to identify the progression of a disease. In cancer studies interest could be focused on bivariate failure times, for example, time from birth to cancer onset and time from cancer onset to death. This paper considers a sampling scheme where the first failure event (cancer onset) is identified within a calendar time interval, the time of the initiating event (birth) can be retrospectively confirmed, and the occurrence of the second event (death) is observed sub ject to right censoring. To analyze this type of bivariate failure time data, it is important to recognize the presence of bias arising due to interval sampling. In this paper, nonparametric and semiparametric methods are developed to analyze the bivariate survival data with interval sampling under stationary and semi-stationary conditions. Numerical studies demonstrate the proposed estimating approaches perform well with practical sample sizes in different simulated models. We apply the proposed methods to SEER ovarian cancer registry data for illustration of the methods and theory.
Resumo:
BACKGROUND: Pain is a common experience in later life. There is conflicting evidence of the prevalence, impact, and context of pain in older people. GPs are criticised for underestimating and under-treating pain. AIM: To assess the extent to which older people experience pain, and to explore relationships between self-reported pain and functional ability and depression. DESIGN OF STUDY: Secondary analysis of baseline data from a randomised controlled trial of health risk appraisal. SETTING: A total of 1090 community-dwelling non-disabled people aged 65 years and over were included in the study from three group practices in suburban London. METHOD: Main outcome measures were pain in the last 4 weeks and the impact of pain, measured using the 24-item Geriatric Pain Measure; depression symptoms captured using the 5-item Mental Health Inventory; social relationships measured using the 6-item Lubben Social Network Scale; Basic and Instrumental Activities of Daily Living and self-reported symptoms. RESULTS: Forty-five per cent of women and 34% of men reported pain in the previous 4 weeks. Pain experience appeared to be less in the 'oldest old': 27.5% of those aged 85 years and over reported pain compared with 38-53% of the 'younger old'. Those with arthritis were four times more likely to report pain. Pain had a profound impact on activities of daily living, but most of those reporting pain described their health as good or excellent. Although there was a significant association between the experience of pain and depressed mood, the majority of those reporting pain did not have depressed mood. CONCLUSION: A multidimensional approach to assessing pain is appropriate. Primary care practitioners should also assess the impact of pain on activities of daily living.
Resumo:
In 1998-2001 Finland suffered the most severe insect outbreak ever recorded, over 500,000 hectares. The outbreak was caused by the common pine sawfly (Diprion pini L.). The outbreak has continued in the study area, Palokangas, ever since. To find a good method to monitor this type of outbreaks, the purpose of this study was to examine the efficacy of multi-temporal ERS-2 and ENVISAT SAR imagery for estimating Scots pine (Pinus sylvestris L.) defoliation. Three methods were tested: unsupervised k-means clustering, supervised linear discriminant analysis (LDA) and logistic regression. In addition, I assessed if harvested areas could be differentiated from the defoliated forest using the same methods. Two different speckle filters were used to determine the effect of filtering on the SAR imagery and subsequent results. The logistic regression performed best, producing a classification accuracy of 81.6% (kappa 0.62) with two classes (no defoliation, >20% defoliation). LDA accuracy was with two classes at best 77.7% (kappa 0.54) and k-means 72.8 (0.46). In general, the largest speckle filter, 5 x 5 image window, performed best. When additional classes were added the accuracy was usually degraded on a step-by-step basis. The results were good, but because of the restrictions in the study they should be confirmed with independent data, before full conclusions can be made that results are reliable. The restrictions include the small size field data and, thus, the problems with accuracy assessment (no separate testing data) as well as the lack of meteorological data from the imaging dates.
Resumo:
The purpose of this project was to investigate the effect of using of data collection technology on student attitudes towards science instruction. The study was conducted over the course of two years at Madison High School in Adrian, Michigan, primarily in college preparatory physics classes, but also in one college preparatory chemistry class and one environmental science class. A preliminary study was conducted at a Lenawee County Intermediate Schools student summer environmental science day camp. The data collection technology used was a combination of Texas Instruments TI-84 Silver Plus graphing calculators and Vernier LabPro data collection sleds with various probeware attachments, including motion sensors, pH probes and accelerometers. Students were given written procedures for most laboratory activities and were provided with data tables and analysis questions to answer about the activities. The first year of the study included a pretest and posttest measuring student attitudes towards the class they were enrolled in. Pre-test and post-test data were analyzed to determine effect size, which was found to be very small (Coe, 2002). The second year of the study focused only on a physics class and used Keller’s ARCS model for measuring student motivation based on the four aspects of motivation: Attention, Relevance, Confidence and Satisfaction (Keller, 2010). According to this model, it was found that there were two distinct groups in the class, one of which was motivated to learn and the other that was not. The data suggest that the use of data collection technology in science classes should be started early in a student’s career, possibly in early middle school or late elementary. This would build familiarity with the equipment and allow for greater exploration by the student as they progress through high school and into upper level science courses.
Resumo:
Riparian zones are dynamic, transitional ecosystems between aquatic and terrestrial ecosystems with well defined vegetation and soil characteristics. Development of an all-encompassing definition for riparian ecotones, because of their high variability, is challenging. However, there are two primary factors that all riparian ecotones are dependent on: the watercourse and its associated floodplain. Previous approaches to riparian boundary delineation have utilized fixed width buffers, but this methodology has proven to be inadequate as it only takes the watercourse into consideration and ignores critical geomorphology, associated vegetation and soil characteristics. Our approach offers advantages over other previously used methods by utilizing: the geospatial modeling capabilities of ArcMap GIS; a better sampling technique along the water course that can distinguish the 50-year flood plain, which is the optimal hydrologic descriptor of riparian ecotones; the Soil Survey Database (SSURGO) and National Wetland Inventory (NWI) databases to distinguish contiguous areas beyond the 50-year plain; and land use/cover characteristics associated with the delineated riparian zones. The model utilizes spatial data readily available from Federal and State agencies and geospatial clearinghouses. An accuracy assessment was performed to assess the impact of varying the 50-year flood height, changing the DEM spatial resolution (1, 3, 5 and 10m), and positional inaccuracies with the National Hydrography Dataset (NHD) streams layer on the boundary placement of the delineated variable width riparian ecotones area. The result of this study is a robust and automated GIS based model attached to ESRI ArcMap software to delineate and classify variable-width riparian ecotones.
Resumo:
This research project measured the effects of real-world content in a science classroom by determining change (deep knowledge of life science content, including ecosystems from MDE – Grade Level Content Expectations) in a subset of students (6th Grade Science) that may result from the addition of curriculum (real-world content of rearing trout in the classroom). Data showed large gains from the pre-test to post-test in students from both the experimental and control groups. The ecology unit with the implementation of real-world content [trout] was even more successful, and improved students’ deep knowledge of ecosystem content from Michigan’s Department of Education Grade Level Content Expectations. The gains by the experimental group on the constructed response section of the test, which included higher cognitive level items, were significant. Clinical interviews after the post-test confirmed increases in deep knowledge of ecosystem concepts in the experimental group, by revealing that a sample of experimental group students had a better grasp of important ecology concepts as compared to a sample of control group students.
Resumo:
The primary challenge in groundwater and contaminant transport modeling is obtaining the data needed for constructing, calibrating and testing the models. Large amounts of data are necessary for describing the hydrostratigraphy in areas with complex geology. Increasingly states are making spatial data available that can be used for input to groundwater flow models. The appropriateness of this data for large-scale flow systems has not been tested. This study focuses on modeling a plume of 1,4-dioxane in a heterogeneous aquifer system in Scio Township, Washtenaw County, Michigan. The analysis consisted of: (1) characterization of hydrogeology of the area and construction of a conceptual model based on publicly available spatial data, (2) development and calibration of a regional flow model for the site, (3) conversion of the regional model to a more highly resolved local model, (4) simulation of the dioxane plume, and (5) evaluation of the model's ability to simulate field data and estimation of the possible dioxane sources and subsequent migration until maximum concentrations are at or below the Michigan Department of Environmental Quality's residential cleanup standard for groundwater (85 ppb). MODFLOW-2000 and MT3D programs were utilized to simulate the groundwater flow and the development and movement of the 1, 4-dioxane plume, respectively. MODFLOW simulates transient groundwater flow in a quasi-3-dimensional sense, subject to a variety of boundary conditions that can simulate recharge, pumping, and surface-/groundwater interactions. MT3D simulates solute advection with groundwater flow (using the flow solution from MODFLOW), dispersion, source/sink mixing, and chemical reaction of contaminants. This modeling approach was successful at simulating the groundwater flows by calibrating recharge and hydraulic conductivities. The plume transport was adequately simulated using literature dispersivity and sorption coefficients, although the plume geometries were not well constrained.
Resumo:
Turrialba is one of the largest and most active stratovolcanoes in the Central Cordillera of Costa Rica and an excellent target for validation of satellite data using ground based measurements due to its high elevation, relative ease of access, and persistent elevated SO2 degassing. The Ozone Monitoring Instrument (OMI) aboard the Aura satellite makes daily global observations of atmospheric trace gases and it is used in this investigation to obtain volcanic SO2 retrievals in the Turrialba volcanic plume. We present and evaluate the relative accuracy of two OMI SO2 data analysis procedures, the automatic Band Residual Index (BRI) technique and the manual Normalized Cloud-mass (NCM) method. We find a linear correlation and good quantitative agreement between SO2 burdens derived from the BRI and NCM techniques, with an improved correlation when wet season data are excluded. We also present the first comparisons between volcanic SO2 emission rates obtained from ground-based mini-DOAS measurements at Turrialba and three new OMI SO2 data analysis techniques: the MODIS smoke estimation, OMI SO2 lifetime, and OMI SO2 transect techniques. A robust validation of OMI SO2 retrievals was made, with both qualitative and quantitative agreements under specific atmospheric conditions, proving the utility of satellite measurements for estimating accurate SO2 emission rates and monitoring passively degassing volcanoes.