941 resultados para Biology, Biostatistics|Hydrology
Resumo:
The problem of analyzing data with updated measurements in the time-dependent proportional hazards model arises frequently in practice. One available option is to reduce the number of intervals (or updated measurements) to be included in the Cox regression model. We empirically investigated the bias of the estimator of the time-dependent covariate while varying the effect of failure rate, sample size, true values of the parameters and the number of intervals. We also evaluated how often a time-dependent covariate needs to be collected and assessed the effect of sample size and failure rate on the power of testing a time-dependent effect.^ A time-dependent proportional hazards model with two binary covariates was considered. The time axis was partitioned into k intervals. The baseline hazard was assumed to be 1 so that the failure times were exponentially distributed in the ith interval. A type II censoring model was adopted to characterize the failure rate. The factors of interest were sample size (500, 1000), type II censoring with failure rates of 0.05, 0.10, and 0.20, and three values for each of the non-time-dependent and time-dependent covariates (1/4,1/2,3/4).^ The mean of the bias of the estimator of the coefficient of the time-dependent covariate decreased as sample size and number of intervals increased whereas the mean of the bias increased as failure rate and true values of the covariates increased. The mean of the bias of the estimator of the coefficient was smallest when all of the updated measurements were used in the model compared with two models that used selected measurements of the time-dependent covariate. For the model that included all the measurements, the coverage rates of the estimator of the coefficient of the time-dependent covariate was in most cases 90% or more except when the failure rate was high (0.20). The power associated with testing a time-dependent effect was highest when all of the measurements of the time-dependent covariate were used. An example from the Systolic Hypertension in the Elderly Program Cooperative Research Group is presented. ^
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^
Resumo:
Multi-center clinical trials are very common in the development of new drugs and devices. One concern in such trials, is the effect of individual investigational sites enrolling small numbers of patients on the overall result. Can the presence of small centers cause an ineffective treatment to appear effective when treatment-by-center interaction is not statistically significant?^ In this research, simulations are used to study the effect that centers enrolling few patients may have on the analysis of clinical trial data. A multi-center clinical trial with 20 sites is simulated to investigate the effect of a new treatment in comparison to a placebo treatment. Twelve of these 20 investigational sites are considered small, each enrolling less than four patients per treatment group. Three clinical trials are simulated with sample sizes of 100, 170 and 300. The simulated data is generated with various characteristics, one in which treatment should be considered effective and another where treatment is not effective. Qualitative interactions are also produced within the small sites to further investigate the effect of small centers under various conditions.^ Standard analysis of variance methods and the "sometimes-pool" testing procedure are applied to the simulated data. One model investigates treatment and center effect and treatment-by-center interaction. Another model investigates treatment effect alone. These analyses are used to determine the power to detect treatment-by-center interactions, and the probability of type I error.^ We find it is difficult to detect treatment-by-center interactions when only a few investigational sites enrolling a limited number of patients participate in the interaction. However, we find no increased risk of type I error in these situations. In a pooled analysis, when the treatment is not effective, the probability of finding a significant treatment effect in the absence of significant treatment-by-center interaction is well within standard limits of type I error. ^
Resumo:
The purpose of this research is to develop a new statistical method to determine the minimum set of rows (R) in a R x C contingency table of discrete data that explains the dependence of observations. The statistical power of the method will be empirically determined by computer simulation to judge its efficiency over the presently existing methods. The method will be applied to data on DNA fragment length variation at six VNTR loci in over 72 populations from five major racial groups of human (total sample size is over 15,000 individuals; each sample having at least 50 individuals). DNA fragment lengths grouped in bins will form the basis of studying inter-population DNA variation within the racial groups are significant, will provide a rigorous re-binning procedure for forensic computation of DNA profile frequencies that takes into account intra-racial DNA variation among populations. ^
Resumo:
Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^
Resumo:
In this dissertation, we propose a continuous-time Markov chain model to examine the longitudinal data that have three categories in the outcome variable. The advantage of this model is that it permits a different number of measurements for each subject and the duration between two consecutive time points of measurements can be irregular. Using the maximum likelihood principle, we can estimate the transition probability between two time points. By using the information provided by the independent variables, this model can also estimate the transition probability for each subject. The Monte Carlo simulation method will be used to investigate the goodness of model fitting compared with that obtained from other models. A public health example will be used to demonstrate the application of this method. ^
Resumo:
The present study identified and compared Coronary Heart Disease (CHD) risk factors quantified as “CHD risk point standards” (CHDRPS) among tri-ethnic (White non-Hispanic [WNH], Hispanic [H], and Black non-Hispanic [BNH]) college students. All 300 tri-ethnic subjects completed the Cardiovascular Risk Assessment Instruments and had blood pressure readings recorded on three occasions. The Bioelectrical Impedance Analysis (BIA) was used to measure body composition. Students' knowledge of CHD risk factors was also measured. In addition, a 15 ml fasting blood sample was collected from 180 subjects and blood lipids and Homocysteine (tHcy) levels were measured. Data were analyzed by gender and ethnicity using one-way Analysis of Variance (ANOVA) with Bonferroni's pairwise mean comparison procedure, Pearson correlation, and Chi-square test with follow-up Bonferroni's Chi-square tests. ^ The mean score of CHDRPS for all subjects was 19.15 ± 6.79. Assigned to the CHD risk category, college students were below-average risk of developing CHD. Males scored significantly (p < 0.013) higher for CHD risk than females, and BNHs scored significantly (p < 0.033) higher than WNHs. High consumption of dietary fat saturated fat and cholesterol resulted in a high CHDRPS among H males and females and WNH females. High alcohol consumption resulted in a high CHDRPS among all subjects. Mean tHcy ± SD of all subjects was 6.33 ± 3. 15 μmol/L. Males had significantly (p < 0.001) higher tHcy than females. Black non-Hispanic females and H females had significantly (p < 0.003) lower tHcy than WNH females. Positive associations were found between tHcy levels and CHDRPS among females (p < 0.001), Hs (p < 0.001), H males (p < 0.049), H females (p < 0.009), and BNH females (p < 0.005). Significant positive correlations were found between BMI levels and CHDRPS in males (p < 0.001), females (p < 0.001), WNHs (p < 0.008), Hs (p < 0.001), WNH males (p < 0.024), H males (p < 0.004) and H females (p < 0.001). The mean knowledge of CHD questions of all subjects was 71.70 ± 7.92 out of 100. The mean knowledge of CHD was significantly higher for WNH males (p < 0.039) than BNH males. A significant inverse correlation (r = 0.392, p < 0.032) was found between the CHD knowledge and CHDRPS in WNH females. The researcher's findings indicate strong gender and ethnic differences in CHD risk factors among the college-age population. ^
Resumo:
This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^
Resumo:
The marked decline in tree island cover across the Everglades over the last century, has been attributed to landscape-scale hydrologic degradation. To preserve and restore Everglades tree islands, a clear understanding of tree island groundwater-surface water interactions is needed, as these interactions strongly influence the chemistry of shallow groundwater and the location and patterns of vegetation in many wetlands. The goal of this work was to define the relationship between groundwater-surface water interactions, plant-water uptake, and the groundwater geochemical condition of tree islands. Groundwater and surface water levels, temperature, and chemistry were monitored on eight constructed and one natural tree island in the Everglades from 2007–2010. Sap flow, diurnal water table fluctuations and stable oxygen isotopes of stem, ground and soil water were used to determine the effect of plant-water uptake on groundwater-surface water interactions. Hydrologic and geochemical modeling was used to further explore the effect of plant-groundwater-surface water interactions on ion concentrations and potential mineral formation.^
Resumo:
Estuaries and estuarine wetlands are ecologically and societally important systems, exhibiting high rates of primary production that fuel offshore secondary production. Hydrological processes play a central role in shaping estuarine ecosystem structure and function by controlling nutrient loading and the relative contributions of marine and terrestrial influences on the estuary. The Comprehensive Everglades Restoration Plan includes plans to restore freshwater delivery to Taylor Slough, a shallow drainage basin in the southern Everglades, ultimately resulting in increased freshwater flow to the downstream Taylor River estuary. The existing seasonal and inter-annual variability of water flow and source in Taylor River affords the opportunity to investigate relationships between ecosystem function and hydrologic forcing. Estimates of aquatic ecosystem metabolism, derived from free-water, diel changes in dissolved oxygen, were combined with assessments of wetland flocculent detritus quality and transport within the context of seasonal changes in Everglades hydrology. Variation in ecosystem gross primary production and respiration were linked to seasonal changes in estuarine water quality using multiple autoregression models. Furthermore, Taylor River was observed to be net heterotrophic, indicating that an allochthonous source of carbon maintained ecosystem respiration in excess of autochthonous primary production. Wetland-derived detritus appears to be an important vector of energy and nutrients across the Everglades landscape; and in Taylor River, is seasonally flushed into ponded segments of the river where it is then respired. Lastly, seasonal water delivery appears to govern feedbacks regulating water column phosphorus availability in the Taylor River estuary.
Resumo:
This article introduces a new listing of published scientific contributions from the Freshwater Biological Association (FBA) and its later Research Council associates – the Institute of Freshwater Ecology (1989–2000) and the Centre for Ecology and Hydrology (2000+). The period 1929–2006 is covered. The authors offer also information on specific features of the listing; also an outline of influences that underlay the research, and its scientific scope.
Resumo:
Oreochromis niloticus (the Nile tilapia) and three other ti1apine species: Oreochromis leucostictus, Tilapia zi11ii and T. rendallii were introduced into Lakes Victoria, Kyoga and Nabugabo in 1950s and 1960s. The source and foci of the stockings are given by Welcomme (1966) but the origin of the stocked species was Lake Albert. The Nile tilapia was introduced as a management measure to relieve fishing pressure on the endemic tiapiines and, since it grows to a bigger size, to encourage a return to the use of larger mesh gill nets. Ti1apia zillii was introduced to fill a vacant ,niche of macrophytes which could not be utilised' by the other tilapiines. Tilapia rendallii, and possibly T. leucosticutus could been introduced into these lakes accidently as a consquence of one of the species being tried out for aquaculture. The Nile perch and Nile tilapia have since fully established themselves and presently dominate the commercial fisheries of Lakes Victoria and Kyoga. The original fisheries based on the endemic tilapiines O. escu1entus and o. variabilis have collapsed. It is hypothesized that the ecological and limnological changes that are observed in Lakes Victoria and Kyoga are due to a truncation of the original food webs of the two lakes. Under the changed conditions, O. niloticus to be either playing a stabilizing role or fuelling nutrient turnover in the lakes. Other testable hypotheses point to the possible role of predation by the Nile perch, change in regional climate and hydrology in the lake basins.
Resumo:
Professor Sir David R. Cox (DRC) is widely acknowledged as among the most important scientists of the second half of the twentieth century. He inherited the mantle of statistical science from Pearson and Fisher, advanced their ideas, and translated statistical theory into practice so as to forever change the application of statistics in many fields, but especially biology and medicine. The logistic and proportional hazards models he substantially developed, are arguably among the most influential biostatistical methods in current practice. This paper looks forward over the period from DRC's 80th to 90th birthdays, to speculate about the future of biostatistics, drawing lessons from DRC's contributions along the way. We consider "Cox's model" of biostatistics, an approach to statistical science that: formulates scientific questions or quantities in terms of parameters gamma in probability models f(y; gamma) that represent in a parsimonious fashion, the underlying scientific mechanisms (Cox, 1997); partition the parameters gamma = theta, eta into a subset of interest theta and other "nuisance parameters" eta necessary to complete the probability distribution (Cox and Hinkley, 1974); develops methods of inference about the scientific quantities that depend as little as possible upon the nuisance parameters (Barndorff-Nielsen and Cox, 1989); and thinks critically about the appropriate conditional distribution on which to base infrences. We briefly review exciting biomedical and public health challenges that are capable of driving statistical developments in the next decade. We discuss the statistical models and model-based inferences central to the CM approach, contrasting them with computationally-intensive strategies for prediction and inference advocated by Breiman and others (e.g. Breiman, 2001) and to more traditional design-based methods of inference (Fisher, 1935). We discuss the hierarchical (multi-level) model as an example of the future challanges and opportunities for model-based inference. We then consider the role of conditional inference, a second key element of the CM. Recent examples from genetics are used to illustrate these ideas. Finally, the paper examines causal inference and statistical computing, two other topics we believe will be central to biostatistics research and practice in the coming decade. Throughout the paper, we attempt to indicate how DRC's work and the "Cox Model" have set a standard of excellence to which all can aspire in the future.
Resumo:
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. We detail some of the design decisions, software paradigms and operational strategies that have allowed a small number of researchers to provide a wide variety of innovative, extensible, software solutions in a relatively short time. The use of an object oriented programming paradigm, the adoption and development of a software package system, designing by contract, distributed development and collaboration with other projects are elements of this project's success. Individually, each of these concepts are useful and important but when combined they have provided a strong basis for rapid development and deployment of innovative and flexible research software for scientific computation. A primary objective of this initiative is achievement of total remote reproducibility of novel algorithmic research results.