35 resultados para log-ratio analysis
Resumo:
An interim analysis is usually applied in later phase II or phase III trials to find convincing evidence of a significant treatment difference that may lead to trial termination at an earlier point than planned at the beginning. This can result in the saving of patient resources and shortening of drug development and approval time. In addition, ethics and economics are also the reasons to stop a trial earlier. In clinical trials of eyes, ears, knees, arms, kidneys, lungs, and other clustered treatments, data may include distribution-free random variables with matched and unmatched subjects in one study. It is important to properly include both subjects in the interim and the final analyses so that the maximum efficiency of statistical and clinical inferences can be obtained at different stages of the trials. So far, no publication has applied a statistical method for distribution-free data with matched and unmatched subjects in the interim analysis of clinical trials. In this simulation study, the hybrid statistic was used to estimate the empirical powers and the empirical type I errors among the simulated datasets with different sample sizes, different effect sizes, different correlation coefficients for matched pairs, and different data distributions, respectively, in the interim and final analysis with 4 different group sequential methods. Empirical powers and empirical type I errors were also compared to those estimated by using the meta-analysis t-test among the same simulated datasets. Results from this simulation study show that, compared to the meta-analysis t-test commonly used for data with normally distributed observations, the hybrid statistic has a greater power for data observed from normally, log-normally, and multinomially distributed random variables with matched and unmatched subjects and with outliers. Powers rose with the increase in sample size, effect size, and correlation coefficient for the matched pairs. In addition, lower type I errors were observed estimated by using the hybrid statistic, which indicates that this test is also conservative for data with outliers in the interim analysis of clinical trials.^
Resumo:
Background. Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer death among females, accounting for 23% (1.38 million) of the total new cancer cases and 14% (458,400) of the total cancer deaths in 2008. [1] Triple-negative breast cancer (TNBC) is an aggressive phenotype comprising 10–20% of all breast cancers (BCs). [2-4] TNBCs show absence of estrogen, progesterone and HER2/neu receptors on the tumor cells. Because of the absence of these receptors, TNBCs are not candidates for targeted therapies. Circulating tumor cells (CTCs) are observed in blood of breast cancer patients even at early stages (Stage I & II) of the disease. Immunological and molecular analysis can be used to detect the presence of tumor cells in the blood (Circulating tumor cells; CTCs) of many breast cancer patients. These cells may explain relapses in early stage breast cancer patients even after adequate local control. CTC detection may be useful in identifying patients at risk for disease progression, and therapies targeting CTCs may improve outcome in patients harboring them. Methods . In this study we evaluated 80 patients with TNBC who are enrolled in a larger prospective study conducted at M D Anderson Cancer Center in order to determine whether the presence of circulating tumor cells is a significant prognostic factor in relapse free and overall survival . Patients with metastatic disease at the time of presentation were excluded from the study. CTCs were assessed using CellSearch System™ (Veridex, Raritan, NJ). CTCs were defined as nucleated cells lacking the presence of CD45 but expressing cytokeratins 8, 18 or 19. The distribution of patient and tumor characteristics was analyzed using chi square test and Fisher's exact test. Log rank test and Cox regression analysis was applied to establish the association of circulating tumor cells with relapse free and overall survival. Results. The median age of the study participants was 53years. The median duration of follow-up was 40 months. Eighty-eight percent (88%) of patients were newly diagnosed (without a previous history of breast cancer), and (60%) of patients were chemo naïve (had not received chemotherapy at the time of their blood draw for CTC analysis). Tumor characteristics such as stage (P=0.40), tumor size (P=69), sentinel nodal involvement (P=0.87), axillary lymph node involvement (P=0.13), adjuvant therapy (P=0.83), and high histological grade of tumor (P=0.26) did not predict the presence of CTCs. However, CTCs predicted worse relapse free survival (1 or more CTCs log rank P value = 0.04, at 2 or more CTCs P = 0.02 and at 3 or more CTCs P < 0.0001) and overall survival (at 1 or more CTCs log rank P value = 0.08, at 2 or more CTCs P = 0.01 and at 3 or more CTCs P = 0.0001. Conclusions. The number of circulating tumor cells predicted worse relapse free survival and overall survival in TNBC patients.^
Resumo:
The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^
Resumo:
It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^