949 resultados para MAXIMUM LIKELIHOOD
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
Many public health agencies and researchers are interested in comparing hospital outcomes, for example, morbidity, mortality, and hospitalization across areas and hospitals. However, since there is variation of rates in clinical trials among hospitals because of several biases, we are interested in controlling for the bias and assessing real differences in clinical practices. In this study, we compared the variations between hospitals in rates of severe Intraventricular Haemorrhage (IVH) infant using Frequentist statistical approach vs. Bayesian hierarchical model through simulation study. The template data set for simulation study was included the number of severe IVH infants of 24 intensive care units in Australian and New Zealand Neonatal Network from 1995 to 1997 in severe IVH rate in preterm babies. We evaluated the rates of severe IVH for 24 hospitals with two hierarchical models in Bayesian approach comparing their performances with the shrunken rates in Frequentist method. Gamma-Poisson (BGP) and Beta-Binomial (BBB) were introduced into Bayesian model and the shrunken estimator of Gamma-Poisson (FGP) hierarchical model using maximum likelihood method were calculated as Frequentist approach. To simulate data, the total number of infants in each hospital was kept and we analyzed the simulated data for both Bayesian and Frequentist models with two true parameters for severe IVH rate. One was the observed rate and the other was the expected severe IVH rate by adjusting for five predictors variables for the template data. The bias in the rate of severe IVH infant estimated by both models showed that Bayesian models gave less variable estimates than Frequentist model. We also discussed and compared the results from three models to examine the variation in rate of severe IVH by 20th centile rates and avoidable number of severe IVH cases. ^
Resumo:
A Bayesian approach to estimating the intraclass correlation coefficient was used for this research project. The background of the intraclass correlation coefficient, a summary of its standard estimators, and a review of basic Bayesian terminology and methodology were presented. The conditional posterior density of the intraclass correlation coefficient was then derived and estimation procedures related to this derivation were shown in detail. Three examples of applications of the conditional posterior density to specific data sets were also included. Two sets of simulation experiments were performed to compare the mean and mode of the conditional posterior density of the intraclass correlation coefficient to more traditional estimators. Non-Bayesian methods of estimation used were: the methods of analysis of variance and maximum likelihood for balanced data; and the methods of MIVQUE (Minimum Variance Quadratic Unbiased Estimation) and maximum likelihood for unbalanced data. The overall conclusion of this research project was that Bayesian estimates of the intraclass correlation coefficient can be appropriate, useful and practical alternatives to traditional methods of estimation. ^
Resumo:
Standardization is a common method for adjusting confounding factors when comparing two or more exposure category to assess excess risk. Arbitrary choice of standard population in standardization introduces selection bias due to healthy worker effect. Small sample in specific groups also poses problems in estimating relative risk and the statistical significance is problematic. As an alternative, statistical models were proposed to overcome such limitations and find adjusted rates. In this dissertation, a multiplicative model is considered to address the issues related to standardized index namely: Standardized Mortality Ratio (SMR) and Comparative Mortality Factor (CMF). The model provides an alternative to conventional standardized technique. Maximum likelihood estimates of parameters of the model are used to construct an index similar to the SMR for estimating relative risk of exposure groups under comparison. Parametric Bootstrap resampling method is used to evaluate the goodness of fit of the model, behavior of estimated parameters and variability in relative risk on generated sample. The model provides an alternative to both direct and indirect standardization method. ^
Resumo:
A Bayesian approach to estimation of the regression coefficients of a multinominal logit model with ordinal scale response categories is presented. A Monte Carlo method is used to construct the posterior distribution of the link function. The link function is treated as an arbitrary scalar function. Then the Gauss-Markov theorem is used to determine a function of the link which produces a random vector of coefficients. The posterior distribution of the random vector of coefficients is used to estimate the regression coefficients. The method described is referred to as a Bayesian generalized least square (BGLS) analysis. Two cases involving multinominal logit models are described. Case I involves a cumulative logit model and Case II involves a proportional-odds model. All inferences about the coefficients for both cases are described in terms of the posterior distribution of the regression coefficients. The results from the BGLS method are compared to maximum likelihood estimates of the regression coefficients. The BGLS method avoids the nonlinear problems encountered when estimating the regression coefficients of a generalized linear model. The method is not complex or computationally intensive. The BGLS method offers several advantages over Bayesian approaches. ^
Resumo:
In geographical epidemiology, maps of disease rates and disease risk provide a spatial perspective for researching disease etiology. For rare diseases or when the population base is small, the rate and risk estimates may be unstable. Empirical Bayesian (EB) methods have been used to spatially smooth the estimates by permitting an area estimate to "borrow strength" from its neighbors. Such EB methods include the use of a Gamma model, of a James-Stein estimator, and of a conditional autoregressive (CAR) process. A fully Bayesian analysis of the CAR process is proposed. One advantage of this fully Bayesian analysis is that it can be implemented simply by using repeated sampling from the posterior densities. Use of a Markov chain Monte Carlo technique such as Gibbs sampler was not necessary. Direct resampling from the posterior densities provides exact small sample inferences instead of the approximate asymptotic analyses of maximum likelihood methods (Clayton & Kaldor, 1987). Further, the proposed CAR model provides for covariates to be included in the model. A simulation demonstrates the effect of sample size on the fully Bayesian analysis of the CAR process. The methods are applied to lip cancer data from Scotland, and the results are compared. ^
Resumo:
The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^
Resumo:
Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^
Resumo:
The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^
Resumo:
Injection drug use is the third most frequent risk factor for new HIV infections in the United States. A dual mode of exposure: unsafe drug using practices and risky sexual behaviors underlies injection drug users' (IDUs) risk for HIV infection. This research study aims to characterize patterns of drug use and sexual behaviors and to examine the social contexts associated with risk behaviors among a sample of injection drug users. ^ This cross-sectional study includes 523 eligible injection drug users from Houston, Texas, recruited into the 2009 National HIV Behavioral Surveillance project. Three separate set of analyses were carried out. First, using latent class analysis (LCA) and maximum likelihood we identified classes of behavior describing levels of HIV risk, from nine drug and sexual behaviors. Second, eight separate multivariable regression models were built to examine the odds of reporting a given risk behavior. We constructed the most parsimonious multivariable model using a manual backward stepwise process. Third, we examined whether HIV serostatus knowledge (self-reported positive, negative, or unknown serostatus) is associated with drug use and sexual HIV risk behaviors. ^ Participants were mostly male, older, and non-Hispanic Black. Forty-two percent of our sample had behaviors putting them at high risk, 25% at moderate risk, and 33% at low risk for HIV infection. Individuals in the High-risk group had the highest probability of risky behaviors, categorized as almost always sharing needles (0.93), seldom using condoms (0.10), reporting recent exchange sex partners (0.90), and practicing anal sex (0.34). We observed that unsafe injecting practices were associated with high risk sexual behaviors. IDUs who shared needles had higher odds of having anal sex (OR=2.89, 95%CI: 1.69-4.92) and unprotected sex (OR=2.66, 95%CI: 1.38-5.10) at last sex. Additionally, homelessness was associated with needle sharing (OR=2.24, 95% CI: 1.34-3.76) and cocaine use was associated with multiple sex partners (OR=1.82, 95% CI: 1.07-3.11). Furthermore, twenty-one percent of the sample was unaware of their HIV serostatus. The three groups were not different from each other in terms of drug-use behaviors: always using a new sterile needle, or in sharing needles or drug preparation equipment. However, IDUs unaware of their HIV serostatus were 33% more likely to report having more than three sexual partners in the past 12 months; 45% more likely to report to have unprotected sex and 85% more likely to use drug and or alcohol during or before at last sex compared to HIV-positive IDUs. ^ This analysis underscores the merit of LCA approach to empirically categorize injection drug users into distinct classes and identify their risk pattern using multiple indicators and our results show considerable overlap of high risk sexual and drug use behaviors among the high-risk class members. The observed clustering pattern of drug and sexual risk behavior among this population confirms that injection drug users do not represent a homogeneous population in terms of HIV risk. These findings will help develop tailored prevention programs.^
Resumo:
It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^
Resumo:
In this dissertation, we propose a continuous-time Markov chain model to examine the longitudinal data that have three categories in the outcome variable. The advantage of this model is that it permits a different number of measurements for each subject and the duration between two consecutive time points of measurements can be irregular. Using the maximum likelihood principle, we can estimate the transition probability between two time points. By using the information provided by the independent variables, this model can also estimate the transition probability for each subject. The Monte Carlo simulation method will be used to investigate the goodness of model fitting compared with that obtained from other models. A public health example will be used to demonstrate the application of this method. ^
Resumo:
We compare six high-resolution Holocene, sediment cores along a S-N transect on the Norwegian-Svalbard continental margin from ca 60°N to 77.4°N, northern North Atlantic. Planktonic foraminifera in the cores were investigated to show the changes in upper surface and subsurface water mass distribution and properties, including summer sea-surface temperatures (SST). The cores are located below the axis of the Norwegian Current and the West Spitsbergen Current, which today transport warm Atlantic Water to the Arctic. Sediment accumulation rates are generally high at all the core sites, allowing for a temporal resolution of 10-102 years. SST is reconstructed using different types of transfer functions, resulting in very similar SST trends, with deviations of no more than +- 1.0/1.5 °C. A transfer function based on the maximum likelihood statistical approach is found to be most relevant. The reconstruction documents an abrupt change in planktonic foraminiferal faunal composition and an associated warming at the Younger Dryas-Preboreal transition. The earliest part of the Holocene was characterized by large temperature variability, including the Preboreal Oscillations and the 8.2 k event. In general, the early Holocene was characterized by SSTs similar to those of today in the south and warmer than today in the north, and a smaller S-N temperature gradient (0.23 °C/°N) compared to the present temperature gradient (0.46 °C/°N). The southern proxy records (60-69°N) were more strongly influenced by slightly cooler subsurface water probably due to the seasonality of the orbital forcing and increased stratification due to freshening. The northern records (72-77.4°N) display a millennial-scale change associated with reduced insolation and a gradual weakening of the North Atlantic thermohaline circulation (THC). The observed northwards amplification of the early Holocene warming is comparable to the pattern of recent global warming and future climate modelling, which predicts greater warming at higher latitudes. The overall trend during mid and late Holocene was a cooling in the north, stable or weak warming in the south, and a maximum S-N SST gradient of ca 0.7 °C/°N at 5000 cal. years BP. Superimposed on this trend were several abrupt temperature shifts. Four of these shifts, dated to 9000-8000, 5500-3000 and 1000 and ~400 cal. years BP, appear to be global, as they correlate with periods of global climate change. In general, there is a good correlation between the northern North Atlantic temperature records and climate records from Norway and Svalbard.
Resumo:
Introduction Many marine planktonic crustaceans such as copepods have been considered as widespread organisms. However, the growing evidence for cryptic and pseudo-cryptic speciation has emphasized the need of re-evaluating the status of copepod species complexes in molecular and morphological studies to get a clearer picture about pelagic marine species as evolutionary units and their distributions. This study analyses the molecular diversity of the ecologically important Paracalanus parvus species complex. Its seven currently recognized species are abundant and also often dominant in marine coastal regions worldwide from temperate to tropical oceans. Results COI and Cytochrome b sequences of 160 specimens of the Paracalanus parvus complex from all oceans were obtained. Furthermore, 42 COI sequences from GenBank were added for the genetic analyses. Thirteen distinct molecular operational taxonomic units (MOTU) and two single sequences were revealed with cladistic analyses (Maximum Likelihood, Bayesian Inference), of which seven were identical with results from species delimitation methods (barcode gaps, ABDG, GMYC, Rosenberg's P(AB)). In total, 10 to 12 putative species were detected and could be placed in three categories: (1) temperate geographically isolated, (2) warm-temperate to tropical wider spread and (3) circumglobal warm-water species. Conclusions The present study provides evidence of cryptic or pseudocryptic speciation in the Paracalanus parvus complex. One major insight is that the species Paracalanus parvus s.s. is not panmictic, but may be restricted in its distribution to the northeastern Atlantic.
Resumo:
Documenting changes in distribution is necessary for understanding species' response to environmental changes, but data on species distributions are heterogeneous in accuracy and resolution. Combining different data sources and methodological approaches can fill gaps in knowledge about the dynamic processes driving changes in species-rich, but data-poor regions. We combined recent bird survey data from the Neotropical Biodiversity Mapping Initiative (NeoMaps) with historical distribution records to estimate potential changes in the distribution of eight species of Amazon parrots in Venezuela. Using environmental covariates and presence-only data from museum collections and the literature, we first used maximum likelihood to fit a species distribution model (SDM) estimating a historical maximum probability of occurrence for each species. We then used recent, NeoMaps survey data to build single-season occupancy models (OM) with the same environmental covariates, as well as with time- and effort-dependent detectability, resulting in estimates of the current probability of occurrence. We finally calculated the disagreement between predictions as a matrix of probability of change in the state of occurrence. Our results suggested negative changes for the only restricted, threatened species, Amazona barbadensis, which has been independently confirmed with field studies. Two of the three remaining widespread species that were detected, Amazona amazonica, Amazona ochrocephala, also had a high probability of negative changes in northern Venezuela, but results were not conclusive for Amazona farinosa. The four remaining species were undetected in recent field surveys; three of these were most probably absent from the survey locations (Amazona autumnalis, Amazona mercenaria and Amazona festiva), while a fourth (Amazona dufresniana) requires more intensive targeted sampling to estimate its current status. Our approach is unique in taking full advantage of available, but limited data, and in detecting a high probability of change even for rare and patchily-distributed species. However, it is presently limited to species meeting the strong assumptions required for maximum-likelihood estimation with presence-only data, including very high detectability and representative sampling of its historical distribution.