Biblioteca Digital

999 resultados para Biology, Biostatistics|Statistics

Evaluation of the Hosmer -Lemeshow global goodness -of-fit statistic for mixed variable logistic regression models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^

A simulation study of the effects of small center enrollment in multi -center clinical trials

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-center clinical trials are very common in the development of new drugs and devices. One concern in such trials, is the effect of individual investigational sites enrolling small numbers of patients on the overall result. Can the presence of small centers cause an ineffective treatment to appear effective when treatment-by-center interaction is not statistically significant?^ In this research, simulations are used to study the effect that centers enrolling few patients may have on the analysis of clinical trial data. A multi-center clinical trial with 20 sites is simulated to investigate the effect of a new treatment in comparison to a placebo treatment. Twelve of these 20 investigational sites are considered small, each enrolling less than four patients per treatment group. Three clinical trials are simulated with sample sizes of 100, 170 and 300. The simulated data is generated with various characteristics, one in which treatment should be considered effective and another where treatment is not effective. Qualitative interactions are also produced within the small sites to further investigate the effect of small centers under various conditions.^ Standard analysis of variance methods and the "sometimes-pool" testing procedure are applied to the simulated data. One model investigates treatment and center effect and treatment-by-center interaction. Another model investigates treatment effect alone. These analyses are used to determine the power to detect treatment-by-center interactions, and the probability of type I error.^ We find it is difficult to detect treatment-by-center interactions when only a few investigational sites enrolling a limited number of patients participate in the interaction. However, we find no increased risk of type I error in these situations. In a pooled analysis, when the treatment is not effective, the probability of finding a significant treatment effect in the absence of significant treatment-by-center interaction is well within standard limits of type I error. ^

Longitudinal categorical data analysis: A continuous -time Markov chain approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this dissertation, we propose a continuous-time Markov chain model to examine the longitudinal data that have three categories in the outcome variable. The advantage of this model is that it permits a different number of measurements for each subject and the duration between two consecutive time points of measurements can be irregular. Using the maximum likelihood principle, we can estimate the transition probability between two time points. By using the information provided by the independent variables, this model can also estimate the transition probability for each subject. The Monte Carlo simulation method will be used to investigate the goodness of model fitting compared with that obtained from other models. A public health example will be used to demonstrate the application of this method. ^

Evaluation of the accuracy of multilevel analysis parameter estimates for intra-cluster correlation in correlated binary data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many studies in biostatistics deal with binary data. Some of these studies involve correlated observations, which can complicate the analysis of the resulting data. Studies of this kind typically arise when a high degree of commonality exists between test subjects. If there exists a natural hierarchy in the data, multilevel analysis is an appropriate tool for the analysis. Two examples are the measurements on identical twins, or the study of symmetrical organs or appendages such as in the case of ophthalmic studies. Although this type of matching appears ideal for the purposes of comparison, analysis of the resulting data while ignoring the effect of intra-cluster correlation has been shown to produce biased results.^ This paper will explore the use of multilevel modeling of simulated binary data with predetermined levels of correlation. Data will be generated using the Beta-Binomial method with varying degrees of correlation between the lower level observations. The data will be analyzed using the multilevel software package MlwiN (Woodhouse, et al, 1995). Comparisons between the specified intra-cluster correlation of these data and the estimated correlations, using multilevel analysis, will be used to examine the accuracy of this technique in analyzing this type of data. ^

A nonparametric method of comparing the partial areas under two correlated receiver operating characteristic curves

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A non-parametric method was developed and tested to compare the partial areas under two correlated Receiver Operating Characteristic curves. Based on the theory of generalized U-statistics the mathematical formulas have been derived for computing ROC area, and the variance and covariance between the portions of two ROC curves. A practical SAS application also has been developed to facilitate the calculations. The accuracy of the non-parametric method was evaluated by comparing it to other methods. By applying our method to the data from a published ROC analysis of CT image, our results are very close to theirs. A hypothetical example was used to demonstrate the effects of two crossed ROC curves. The two ROC areas are the same. However each portion of the area between two ROC curves were found to be significantly different by the partial ROC curve analysis. For computation of ROC curves with large scales, such as a logistic regression model, we applied our method to the breast cancer study with Medicare claims data. It yielded the same ROC area computation as the SAS Logistic procedure. Our method also provides an alternative to the global summary of ROC area comparison by directly comparing the true-positive rates for two regression models and by determining the range of false-positive values where the models differ. ^

Outcome-based adaptive randomization for binary data in clinical trials---Compare and contrast

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Monte Carlo simulation has been conducted to investigate parameter estimation and hypothesis testing in some well known adaptive randomization procedures. The four urn models studied are Randomized Play-the-Winner (RPW), Randomized Pôlya Urn (RPU), Birth and Death Urn with Immigration (BDUI), and Drop-the-Loses Urn (DL). Two sequential estimation methods, the sequential maximum likelihood estimation (SMLE) and the doubly adaptive biased coin design (DABC), are simulated at three optimal allocation targets that minimize the expected number of failures under the assumption of constant variance of simple difference (RSIHR), relative risk (ORR), and odds ratio (OOR) respectively. Log likelihood ratio test and three Wald-type tests (simple difference, log of relative risk, log of odds ratio) are compared in different adaptive procedures. ^ Simulation results indicates that although RPW is slightly better in assigning more patients to the superior treatment, the DL method is considerably less variable and the test statistics have better normality. When compared with SMLE, DABC has slightly higher overall response rate with lower variance, but has larger bias and variance in parameter estimation. Additionally, the test statistics in SMLE have better normality and lower type I error rate, and the power of hypothesis testing is more comparable with the equal randomization. Usually, RSIHR has the highest power among the 3 optimal allocation ratios. However, the ORR allocation has better power and lower type I error rate when the log of relative risk is the test statistics. The number of expected failures in ORR is smaller than RSIHR. It is also shown that the simple difference of response rates has the worst normality among all 4 test statistics. The power of hypothesis test is always inflated when simple difference is used. On the other hand, the normality of the log likelihood ratio test statistics is robust against the change of adaptive randomization procedures. ^

Functional data analysis approaches for genotype-phenotype association studies from next-generation sequencing

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Statistical Methods for Differential Expressions of Genes Detected in Multiple-Condition Experiment of Microarray

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Most studies of differential gene-expressions have been conducted between two given conditions. The two-condition experimental (TCE) approach is simple in that all genes detected display a common differential expression pattern responsive to a common two-condition difference. Therefore, the genes that are differentially expressed under the other conditions other than the given two conditions are undetectable with the TCE approach. In order to address the problem, we propose a new approach called multiple-condition experiment (MCE) without replication and develop corresponding statistical methods including inference of pairs of conditions for genes, new t-statistics, and a generalized multiple-testing method for any multiple-testing procedure via a control parameter C. We applied these statistical methods to analyze our real MCE data from breast cancer cell lines and found that 85 percent of gene-expression variations were caused by genotypic effects and genotype-ANAX1 overexpression interactions, which agrees well with our expected results. We also applied our methods to the adenoma dataset of Notterman et al. and identified 93 differentially expressed genes that could not be found in TCE. The MCE approach is a conceptual breakthrough in many aspects: (a) many conditions of interests can be conducted simultaneously; (b) study of association between differential expressions of genes and conditions becomes easy; (c) it can provide more precise information for molecular classification and diagnosis of tumors; (d) it can save lot of experimental resources and time for investigators.^

A cohort study of occupational asbestos-exposure related neoplasms in Texas Gulf Coast area

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A cohort study was conducted in Texas and Louisiana Gulf Coast area on individual workers who have been exposed to asbestos for 15 years or more. Most of these workers were employed in petrochemical industries. Of the 15,742 subjects initially selected for the cohort study, 3,258 had positive chest X-ray findings believed to be related to prolonged asbestos exposure. These subjects were further investigated. Their work out included detailed medical and occupational history, laboratory tests and spirometry. One thousand eight-hundred and three cases with positive chest X-ray findings whose data files were considered complete at the end of May 1986 were analyzed and their findings included in this report.^ The prevalence of lung cancer and cancer of the following sights: skin, stomach, oropharyngeal, pancreas and kidneys were significantly increased when compared to data from Connecticut Tumor Registry. The prevalence of other chronic conditions such as hypertension, emphysema, heart disease and peptic ulcer was also significantly high when compared to data for the U.S. and general population furnished by the National Center for Health Statistics (NCHS). In most instances the occurrence of cancer and the chronic ailment previously mentioned appeared to follow 15-25 years of exposure to asbestos. ^

Extension of k-ratio methods to rank-based analyses

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An extension of k-ratio multiple comparison methods to rank-based analyses is described. The new method is analogous to the Duncan-Godbold approximate k-ratio procedure for unequal sample sizes or correlated means. The close parallel of the new methods to the Duncan-Godbold approach is shown by demonstrating that they are based upon different parameterizations as starting points.^ A semi-parametric basis for the new methods is shown by starting from the Cox proportional hazards model, using Wald statistics. From there the log-rank and Gehan-Breslow-Wilcoxon methods may be seen as score statistic based methods.^ Simulations and analysis of a published data set are used to show the performance of the new methods. ^

The use of the computerized and revised vital record in conducting etiologic studies on neural tube defects

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this study was to evaluate the adequacy of computerized vital records in Texas for conducting etiologic studies on neural tube defects (NTDs), using the revised and expanded National Centers for Health Statistics vital record forms introduced in Texas in 1989.^ Cases of NTDs (anencephaly and spina bifida) among Harris County (Houston) residents were identified from the computerized birth and death records for 1989-1991. The validity of the system was then measured against cases ascertained independently through medical records and death certificates. The computerized system performed poorly in its identification of NTDs, particularly for anencephaly, where the false positive rate was 80% with little or no improvement over the 3-year period. For both NTDs the sensitivity and predictive value positive of the tapes were somewhat higher for Hispanic than non-Hispanic mothers.^ Case control studies were conducted utilizing the tape set and the independently verified data set, using controls selected from the live birth tapes. Findings varied widely between the data sets. For example, the anencephaly odds ratio for Hispanic mothers (vs. non-Hispanic) was 1.91 (CI = 1.38-2.65) for the tape file, but 3.18 (CI = 1.81-5.58) for verified records. The odds ratio for diabetes was elevated for the tape set (OR = 3.33, CI = 1.67-6.66) but not for verified cases (OR = 1.09, CI = 0.24-4.96), among whom few mothers were diabetic. It was concluded that computerized tapes should not be solely relied on for NTD studies.^ Using the verified cases, Hispanic mother was associated with spina bifida, and Hispanic mother, teen mother, and previous pregnancy terminations were associated with anencephaly. Mother's birthplace, education, parity, and diabetes were not significant for either NTD.^ Stratified analyses revealed several notable examples of statistical interaction. For anencephaly, strong interaction was observed between Hispanic origin and trimester of first prenatal care.^ The prevalence was 3.8 per 10,000 live births for anencephaly and 2.0 for spina bifida (5.8 per 10,000 births for the combined categories). ^

BEST SUBSET SELECTION FOR CATEGORICAL DATA

Relevância:

90.00% 90.00%

Publicador:

Resumo:

When choosing among models to describe categorical data, the necessity to consider interactions makes selection more difficult. With just four variables, considering all interactions, there are 166 different hierarchical models and many more non-hierarchical models. Two procedures have been developed for categorical data which will produce the "best" subset or subsets of each model size where size refers to the number of effects in the model. Both procedures are patterned after the Leaps and Bounds approach used by Furnival and Wilson for continuous data and do not generally require fitting all models. For hierarchical models, likelihood ratio statistics (G('2)) are computed using iterative proportional fitting and "best" is determined by comparing, among models with the same number of effects, the Pr((chi)(,k)('2) (GREATERTHEQ) G(,ij)('2)) where k is the degrees of freedom for ith model of size j. To fit non-hierarchical as well as hierarchical models, a weighted least squares procedure has been developed.^ The procedures are applied to published occupational data relating to the occurrence of byssinosis. These results are compared to previously published analyses of the same data. Also, the procedures are applied to published data on symptoms in psychiatric patients and again compared to previously published analyses.^ These procedures will make categorical data analysis more accessible to researchers who are not statisticians. The procedures should also encourage more complex exploratory analyses of epidemiologic data and contribute to the development of new hypotheses for study. ^

Estimates of life expectancy gains from reducing/eliminating major causes of death in the USA: An update analysis for population in 2001--2008

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Evaluation of the impact of a disease on life expectancy is an important part of public health. Potential gains in life expectancy (PGLE) that can properly take into account the competing risks are an effective indicator for measuring the impact of the multiple causes of death. This study aimed to measure the PGLEs from reducing/eliminating the major causes of death in the USA from 2001 to 2008. To calculate the PGLEs due to the elimination of specific causes of death, the age-specific mortality rates for heart disease, malignant neoplasms, Alzheimer disease, kidney diseases and HIV/AIDS and life table constructing data were obtained from the National Center for Health Statistics, and the multiple decremental life tables were constructed. The PGLEs by elimination of heart disease, malignant neoplasms or HIV/AIDS continued decreasing from 2001 to 2008, but the PGLE by elimination of Alzheimer's disease or kidney diseases revealed increased trends. The PGLEs (by years) for all race, male, female, white, white male, white female, black, black male and black female at birth by complete elimination of heart disease 2001–2008 were 0.336–0.299, 0.327–0.301, 0.344–0.295, 0.360–0.315, 0.349–0.317, 0.371–0.316,0.278–0.251, 0.272–0.255, and 0.282–0.246 respectively. Similarly, the PGLEs (by years) for all race, male, female, white, white male, white female, black, black male and black female at birth by complete elimination of malignant neoplasms, Alzheimer's disease, kidney disease or HIV/AIDS 2001–2008 were also uncovered, respectively. Most diseases affect specific population, such as, HIV/AIDS tends to have a greater impact on people of working age, heart disease and malignant neoplasms have a greater impact on people over 65 years of age, but Alzheimer's disease and kidney diseases have a greater impact on people over 75 years of age. To measure the impact of these diseases on life expectancy in people of working age, partial multiple decremental life tables were constructed and the PGLEs were computed by partial or complete elimination of various causes of death during the working years. Thus, the results of the study outlined a picture of how each single disease could affect the life expectancy in age-, race-, or sex-specific population in USA. Therefore, the findings would not only assist to evaluate current public health improvements, but also provide useful information for future research and disease control programs.^

A hybrid method in combining treatment effects from matched and unmatched studies

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the biomedical studies, the general data structures have been the matched (paired) and unmatched designs. Recently, many researchers are interested in Meta-Analysis to obtain a better understanding from several clinical data of a medical treatment. The hybrid design, which is combined two data structures, may create the fundamental question for statistical methods and the challenges for statistical inferences. The applied methods are depending on the underlying distribution. If the outcomes are normally distributed, we would use the classic paired and two independent sample T-tests on the matched and unmatched cases. If not, we can apply Wilcoxon signed rank and rank sum test on each case. ^ To assess an overall treatment effect on a hybrid design, we can apply the inverse variance weight method used in Meta-Analysis. On the nonparametric case, we can use a test statistic which is combined on two Wilcoxon test statistics. However, these two test statistics are not in same scale. We propose the Hybrid Test Statistic based on the Hodges-Lehmann estimates of the treatment effects, which are medians in the same scale.^ To compare the proposed method, we use the classic meta-analysis T-test statistic on the combined the estimates of the treatment effects from two T-test statistics. Theoretically, the efficiency of two unbiased estimators of a parameter is the ratio of their variances. With the concept of Asymptotic Relative Efficiency (ARE) developed by Pitman, we show ARE of the hybrid test statistic relative to classic meta-analysis T-test statistic using the Hodges-Lemann estimators associated with two test statistics.^ From several simulation studies, we calculate the empirical type I error rate and power of the test statistics. The proposed statistic would provide effective tool to evaluate and understand the treatment effect in various public health studies as well as clinical trials.^

Using spatial linear models with SAR and CAR structure to examine Texas lung cancer incidence rates

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Scholars have found that socioeconomic status was one of the key factors that influenced early-stage lung cancer incidence rates in a variety of regions. This thesis examined the association between median household income and lung cancer incidence rates in Texas counties. A total of 254 individual counties in Texas with corresponding lung cancer incidence rates from 2004 to 2008 and median household incomes in 2006 were collected from the National Cancer Institute Surveillance System. A simple linear model and spatial linear models with two structures, Simultaneous Autoregressive Structure (SAR) and Conditional Autoregressive Structure (CAR), were used to link median household income and lung cancer incidence rates in Texas. The residuals of the spatial linear models were analyzed with Moran's I and Geary's C statistics, and the statistical results were used to detect similar lung cancer incidence rate clusters and disease patterns in Texas.^

«
1
2
3
4
5
6
7
8
...
66
67
»