930 resultados para BIASES
Resumo:
Economists and other social scientists often face situations where they have access to two datasets that they can use but one set of data suffers from censoring or truncation. If the censored sample is much bigger than the uncensored sample, it is common for researchers to use the censored sample alone and attempt to deal with the problem of partial observation in some manner. Alternatively, they simply use only the uncensored sample and ignore the censored one so as to avoid biases. It is rarely the case that researchers use both datasets together, mainly because they lack guidance about how to combine them. In this paper, we develop a tractable semiparametric framework for combining the censored and uncensored datasets so that the resulting estimators are consistent, asymptotically normal, and use all information optimally. When the censored sample, which we refer to as the master sample, is much bigger than the uncensored sample (which we call the refreshment sample), the latter can be thought of as providing identification where it is otherwise absent. In contrast, when the refreshment sample is large and could typically be used alone, our methodology can be interpreted as using information from the censored sample to increase effciency. To illustrate our results in an empirical setting, we show how to estimate the effect of changes in compulsory schooling laws on age at first marriage, a variable that is censored for younger individuals. We also demonstrate how refreshment samples for this application can be created by matching cohort information across census datasets.
Resumo:
The purpose of this study was to assess the accuracy and precision of airborne volatile organic compound (VOC) concentrations measured using passive air samplers (3M 3500 organic vapor monitors) over extended sampling durations (9 and 15 days). A total of forty-five organic vapor monitor samples were collected at a State of Texas air monitoring site during two different sampling periods (July/August and November 2008). The results of this study indicate that for most of the tested compounds, there was no significant difference between long-term (9 or 15 days) sample concentrations and the means of parallel consecutive short-term (3 days) sample concentrations. Biases of 9 or 15-day measurements vs. consecutive 3-day measurements showed considerable variability. Those compounds that had percent bias values of <10% are suggested as acceptable for long-term sampling (9 and 15 days). Of the twenty-one compounds examined, 10 compounds are classified as acceptable for long-term sampling; these include m,p-xylene, 1,2,4-trimethylbenzene, n-hexane, ethylbenzene, benzene, toluene, o-xylene, d-limonene, dimethylpentane and methyl tertbutyl ether. The ratio of sampling procedure variability relative to variability within days was approximately 1.89 for both sampling periods for the 3-day vs. 9-day comparisons and approximately 2.19 for both sampling periods for the 3-day vs. 15-day comparisons. Considerably higher concentrations of most VOCs were measured during the November sampling period compared to the July/August period. These differences may be a result of varying meteorological conditions during these two time periods, e.g., the differences in wind direction, and wind speed. Further studies are suggested to further evaluate the accuracy and precision of 3M 3500 organic vapor monitors over extended sampling durations. ^
Resumo:
Many public health agencies and researchers are interested in comparing hospital outcomes, for example, morbidity, mortality, and hospitalization across areas and hospitals. However, since there is variation of rates in clinical trials among hospitals because of several biases, we are interested in controlling for the bias and assessing real differences in clinical practices. In this study, we compared the variations between hospitals in rates of severe Intraventricular Haemorrhage (IVH) infant using Frequentist statistical approach vs. Bayesian hierarchical model through simulation study. The template data set for simulation study was included the number of severe IVH infants of 24 intensive care units in Australian and New Zealand Neonatal Network from 1995 to 1997 in severe IVH rate in preterm babies. We evaluated the rates of severe IVH for 24 hospitals with two hierarchical models in Bayesian approach comparing their performances with the shrunken rates in Frequentist method. Gamma-Poisson (BGP) and Beta-Binomial (BBB) were introduced into Bayesian model and the shrunken estimator of Gamma-Poisson (FGP) hierarchical model using maximum likelihood method were calculated as Frequentist approach. To simulate data, the total number of infants in each hospital was kept and we analyzed the simulated data for both Bayesian and Frequentist models with two true parameters for severe IVH rate. One was the observed rate and the other was the expected severe IVH rate by adjusting for five predictors variables for the template data. The bias in the rate of severe IVH infant estimated by both models showed that Bayesian models gave less variable estimates than Frequentist model. We also discussed and compared the results from three models to examine the variation in rate of severe IVH by 20th centile rates and avoidable number of severe IVH cases. ^
Resumo:
HIV/AIDS is a treatable although incurable disease that presents immense challenges to those infected including physical, social and psychological effects. As of 2009, an estimated 2.4 million people were living with HIV or AIDS in India, 0.3% of the country's population. In India, it is difficult to not only treat but also to track because it is associated with socio-economic factors such as illiteracy, social biases, poor sanitation, malnutrition and social class. Nevertheless, it is important to know the prevalence of HIV/AIDS for several reasons. At the individual level, the quality of life of people living with HIV/AIDS is markedly lower than their counterparts without the disease and is associated with challenges. At the community level, it is important to identify high risk groups, monitor prevention efforts, and allocate appropriate resources to target programs for the reduction of transmission of HIV. ^
Resumo:
My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.
Resumo:
Staphylococcus aureus is a common microorganism in humans, typically colonizing the nasopharynx, skin and other mucosal surfaces. It is among the most frequent causes of clinically-significant bacterial infections accounting for increased morbidity and mortality among individuals with HIV/AIDS. Evidence of higher colonization rates among high-risk HIV populations have been observed however, prevalence estimates have varied. Additionally, behavioral, biological, and/or environmental factors that may account for these high colonization rates are not understood. Previous literature on clinic-based surveys were subject to considerable biases. Additionally, representative samples of high-risk HIV populations were difficult to obtain due in part to an underrepresentation of individuals who may not regularly obtain health care. ^ The main objective of this project is to determine the prevalence of methicillin-sensitive S. aureus (MSSA) and methicillin-resistant (MRSA) nasal colonization in two populations: 1) men who have sex with men (MSM) and 2) injection drug users (IDU). Both of these populations are included in the third round of the National HIV Behavioral Surveillance System (NHBS) in Houston, Texas. ^ In the NHBS-MSM3 study, logistic regression was used to report odds ratios and 95% confidence intervals (CI). For the NHBS-IDU3 study, to account for the lack of independence between samples, the method of generalized estimating equations was utilized to report adjusted odds ratios and 95% CI. The NHBS-MSM3 study enrolled 202 participants with a MSSA colonization rate of 26.7% and MRSA rate of 3%. In the NHBS-IDU3 study, 18.4% were nasally colonized with MSSA and 5.7% were nasally colonized with MRSA. Among the NHBS-MSM3 population, high-risk sexual practices were associated with colonization. For the NHBS-IDU3 population, age, marital status, employment status, and the presence of scabs, were associated with colonization status when controlling for size of recruitment network. In multivariate GEE analyses, the use of antiretroviral medications and age remained significantly associated with S. aureus nasal colonization when controlling for size of recruitment network and gender. In both studies, a significantly higher than expected S. aureus and MRSA colonization rate was observed as compared to colonization rates described for the general population. However, these estimates were moderate in comparison to reported clinic-based MSM and IDU S. aureus colonization findings. This study validates substantial prevalence differences and biases that may exist with data collected from clinic-based MSM and IDU. The prevalence of MSSA and MRSA nasal colonization did not differ significantly with respect to HIV status among NHBS-MSM3/NHBS-IDU3 participants. Continued examination on the effects of S. aureus colonization and infection should be examined longitudinally to confirm additional community-based determinants in populations that are disproportionately affected.^
Resumo:
Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^
Resumo:
Three complementary imaging techniques were used to describe a complex rosette-shaped microboring that penetrates the shells of brachiopods from the OrdovicianSilurian shallow marine limestones of Anticosti Island, Canada. Pyrodendrina cupra n. igen. and isp. is among the oldest dendrinid microborings and consists of shallow and deep penetrating canals that radiate from a central polygonal chamber. The affinity of the tracemaker is unknown, but a foraminiferal origin, as proposed for some dendrinid borings, is rejected. Combining microCT with traditional stereomicroscopy and SEM helped distinguish and quantify fine morphological features while maintaining contextual information of the microboring within the shell substrate. Different imaging techniques inherently bias the description of microborings. These biases must be accounted for as new methods in ichnotaxonomy are integrated with past research based on different methods.
Resumo:
The Wilkes and Aurora basins are large, low-lying sub-glacial basins that may cause areas of weakness in the overlying East Antarctic ice sheet. Previous work based on ice-rafted debris (IRD) provenance analyses found evidence for massive iceberg discharges from these areas during the late Miocene and Pliocene. Here we characterize the sediments shed from the inferred areas of weakness along this margin (94°E to 165°E) by measuring40Ar/39Ar ages of 292 individual detrital hornblende grains from eight marine sediment core locations off East Antarctica and Nd isotopic compositions of the bulk fine fraction from the same sediments. We further expand the toolbox for Antarctic IRD provenance analyses by exploring the application of 40Ar/39Ar ages of detrital biotites; biotite as an IRD tracer eliminates lithological biases imposed by only analyzing hornblendes and allows for characterization of samples with low IRD concentrations. Our data quadruples the number of detrital 40Ar/39Ar ages from this margin of East Antarctica and leads to the following conclusions: (1) Four main sectors between the Ross Sea and Prydz Bay, separated by ice drainage divides, are distinguishable based upon the combination of 40Ar/39Ar ages of detrital hornblende and biotite grains and the e-Nd of the bulk fine fraction; (2) 40Ar/39Ar biotite ages can be used as a robust provenance tracer for this part of East Antarctica; and (3) sediments shed from the coastal areas of the Aurora and Wilkes sub-glacial basins can be clearly distinguished from one another based upon their isotopic fingerprints.
Resumo:
A nested ice flow model was developed for eastern Dronning Maud Land to assist with the dating and interpretation of the EDML deep ice core. The model consists of a high-resolution higher-order ice dynamic flow model that was nested into a comprehensive 3-D thermomechanical model of the whole Antarctic ice sheet. As the drill site is on a flank position the calculations specifically take into account the effects of horizontal advection as deeper ice in the core originated from higher inland. First the regional velocity field and ice sheet geometry is obtained from a forward experiment over the last 8 glacial cycles. The result is subsequently employed in a Lagrangian backtracing algorithm to provide particle paths back to their time and place of deposition. The procedure directly yields the depth-age distribution, surface conditions at particle origin, and a suite of relevant parameters such as initial annual layer thickness. This paper discusses the method and the main results of the experiment, including the ice core chronology, the non-climatic corrections needed to extract the climatic part of the signal, and the thinning function. The focus is on the upper 89% of the ice core (appr. 170 kyears) as the dating below that is increasingly less robust owing to the unknown value of the geothermal heat flux. It is found that the temperature biases resulting from variations of surface elevation are up to half of the magnitude of the climatic changes themselves.
Resumo:
The application of quantitative and semiquantitative methods to assemblage data from dinoflagellate cysts shows potential for interpreting past environments, both in terms of paleotemperature estimates and in recognizing water masses and circulation patterns. Estimates of winter sea-surface temperature (WSST) were produced by using the Impagidinium Index (II) method, and by applying a winter-temperature transfer function (TFw). Estimates of summer sea-surface temperature (SSST) were produced by using a summer-temperature transfer function (TFs), two methods based on a temperature-distribution chart (ACT and ACTpo), and a method based on the ratio of gonyaulacoid:protoperidinioid specimens (G:P). WSST estimates from the II and TFw methods are in close agreement except where Impagidinium species are sparse. SSST estimates from TFs are more variable. The value of the G:P ratio for the Pliocene data in this paper is limited by the apparent sparsity of protoperidinioids, which results in monotonous SSST estimates of 14-26°C. The ACT methods show two biases for the Pliocene data set: taxonomic substitution may force 'matches' yielding incorrect temperature estimates, and the method is highly sensitive to the end-points of species distributions. Dinocyst assemblage data were applied to reconstruct Pliocene sea-surface temperatures between 3.5-2.5 Ma from DSDP Hole 552A, and ODP Holes 646B and 642B, which are presently located beneath cold and cool-temperate waters north of 56°N. Our initial results suggest that at 3.0 Ma, WSSTs were a few degrees C warmer than the present and that there was a somewhat reduced north-south temperature gradient. For all three sites, it is likely that SSSTs were also warmer, but by an unknown, perhaps large, amount. Past oceanic circulation in the North Atlantic was probably different from the present.
Resumo:
Beryllium 10 concentrations (10Becon) were measured at annual resolution from varved sediment cores of Lakes Tiefer See (TSK) and Czechowskie (JC) for the period 1983-2009 (~solar cycles 22 and 23). Calibrating the 10Becon time-series against complementing proxy records from the same archive as well as local precipitation and neutron monitor data, reflecting solar forced changes in atmospheric radionuclide production, allowed (i) identifying the main depositional processes and (ii) evaluating the potential for solar activity reconstruction. 10Becon in TSK and JC sediments are significantly correlated to varying neutron monitor counts (TSK: r=0.5, p=0.05, n=16; JC: r=0.46, p=0.03, n=22). However, the further correlations with changes in organic carbon contents in TSK as well as varying organic carbon and detrital matter contents in JC point to catchment specific biases in the 10Becon time-series. In an attempt to correct for these biases multiple regression analysis was applied to extract an atmospheric 10Be production signal (10Be atmosphere). To increase the signal to noise ratio a 10Be composite record (10Be composite) was calculated from the TSK and JC 10Be atmosphere time-series. 10Becomposite is significantly correlated to variations in the neutron monitor record (r=0.49, p=0.01, n=27) and matches the expected amplitude changes in 10Be production between solar cycle minima and maxima. This calibration study on 10Be from two sites indicates the large potential but also, partly site-specific, limitations of 10Be in varved lake sediments for solar activity reconstruction.
Resumo:
Aerial surveys of narwhals (Monodon monoceros) were conducted in the Canadian High Arctic during the month of August from 2002 to 2004. The surveys covered the waters of Barrow Strait, Prince Regent Inlet, the Gulf of Boothia, Admiralty Inlet, Eclipse Sound, and the eastern coast of Baffin Island, using systematic sampling methods. Fiords were flown along a single transect down the middle. Near-surface population estimates increased by 1.9%-8.7% when corrected for perception bias. The estimates were further increased by a factor of approximately 3, to account for individuals not seen because they were diving when the survey plane flew over (availability bias). These corrections resulted in estimates of 27 656 (SE = 14 939) for the Prince Regent and Gulf of Boothia area, 20 225 (SE = 7285) for the Eclipse Sound area, and 10 073 (SE = 3123) for the East Baffin Island fiord area. The estimate for the Admiralty Inlet area was 5362 (SE = 2681) but is thought to be biased. Surveys could not be done in other known areas of occupation, such as the waters of the Cumberland Peninsula of East Baffin, and channels farther west of the areas surveyed (Peel Sound, Viscount Melville Sound, Smith Sound and Jones Sound, and other channels of the Canadian Arctic archipelago). Despite these probable biases and the incomplete coverage, results of these surveys show that the summering range of narwhals in the Canadian High Arctic is vast. If narwhals are philopatric to their summering areas, as they appear to be, the total population of that range could number more than 60 000 animals. The largest numbers are in the western portion of their summer range, around Somerset Island, and also in the Eclipse Sound area. However, these survey estimates have large variances due to narwhal aggregation in some parts of the surveyed areas.