4 resultados para Optimal test set
em DigitalCommons@The Texas Medical Center
Resumo:
Cervical cancer is the leading cause of death and disease from malignant neoplasms among women in developing countries. Even though the Pap smear has significantly decreased the number of deaths from cervical cancer in the past years, it has its limitations. Researchers have developed an automated screening machine which can potentially detect abnormal cases that are overlooked by conventional screening. The goal of quantitative cytology is to classify the patient's tissue sample based on quantitative measurements of the individual cells. It is also much cheaper and potentially can take less time. One of the major challenges of collecting cells with a cytobrush is the possibility of not sampling any existing dysplastic cells on the cervix. Being able to correctly classify patients who have disease without the presence of dysplastic cells could improve the accuracy of quantitative cytology algorithms. Subtle morphologic changes in normal-appearing tissues adjacent to or distant from malignant tumors have been shown to exist, but a comparison of various statistical methods, including many recent advances in the statistical learning field, has not previously been done. The objective of this thesis is to use different classification methods applied to quantitative cytology data for the detection of malignancy associated changes (MACs). In this thesis, Elastic Net is the best algorithm. When we applied the Elastic Net algorithm to the test set, we combined the training set and validation set as "training" set and used 5-fold cross validation to choose the parameter for Elastic Net. It has a sensitivity of 47% at 80% specificity, an AUC 0.52, and a partial AUC 0.10 (95% CI 0.09-0.11).^
Resumo:
When conducting a randomized comparative clinical trial, ethical, scientific or economic considerations often motivate the use of interim decision rules after successive groups of patients have been treated. These decisions may pertain to the comparative efficacy or safety of the treatments under study, cost considerations, the desire to accelerate the drug evaluation process, or the likelihood of therapeutic benefit for future patients. At the time of each interim decision, an important question is whether patient enrollment should continue or be terminated; either due to a high probability that one treatment is superior to the other, or a low probability that the experimental treatment will ultimately prove to be superior. The use of frequentist group sequential decision rules has become routine in the conduct of phase III clinical trials. In this dissertation, we will present a new Bayesian decision-theoretic approach to the problem of designing a randomized group sequential clinical trial, focusing on two-arm trials with time-to-failure outcomes. Forward simulation is used to obtain optimal decision boundaries for each of a set of possible models. At each interim analysis, we use Bayesian model selection to adaptively choose the model having the largest posterior probability of being correct, and we then make the interim decision based on the boundaries that are optimal under the chosen model. We provide a simulation study to compare this method, which we call Bayesian Doubly Optimal Group Sequential (BDOGS), to corresponding frequentist designs using either O'Brien-Fleming (OF) or Pocock boundaries, as obtained from EaSt 2000. Our simulation results show that, over a wide variety of different cases, BDOGS either performs at least as well as both OF and Pocock, or on average provides a much smaller trial. ^
Resumo:
Southeast Texas, including Houston, has a large presence of industrial facilities and has been documented to have poorer air quality and significantly higher cancer rates than the remainder of Texas. Given citizens’ concerns in this 4th largest city in the U.S., Mayor Bill White recently partnered with the UT School of Public Health to determine methods to evaluate the health risks of hazardous air pollutants (HAPs). Sexton et al. (2007) published a report that strongly encouraged analytic studies linking these pollutants with health outcomes. In response, we set out to complete the following aims: 1. determine the optimal exposure assessment strategy to assess the association between childhood cancer rates and increased ambient levels of benzene and 1,3-butadiene (in an ecologic setting) and 2. evaluate whether census tracts with the highest levels of benzene or 1,3-butadiene have higher incidence of childhood lymphohematopoietic cancer compared with census tracts with the lowest levels of benzene or 1,3-butadiene, using Poisson regression. The first aim was achieved by evaluating the usefulness of four data sources: geographic information systems (GIS) to identify proximity to point sources of industrial air pollution, industrial emission data from the U.S. EPA’s Toxic Release Inventory (TRI), routine monitoring data from the U.S. EPA Air Quality System (AQS) from 1999-2000 and modeled ambient air levels from the U.S. EPA’s 1999 National Air Toxic Assessment Project (NATA) ASPEN model. Further, once these four data sources were evaluated, we narrowed them down to two: the routine monitoring data from the AQS for the years 1998-2000 and the 1999 U.S. EPA NATA ASPEN modeled data. We applied kriging (spatial interpolation) methodology to the monitoring data and compared the kriged values to the ASPEN modeled data. Our results indicated poor agreement between the two methods. Relative to the U.S. EPA ASPEN modeled estimates, relying on kriging to classify census tracts into exposure groups would have caused a great deal of misclassification. To address the second aim, we additionally obtained childhood lymphohematopoietic cancer data for 1995-2004 from the Texas Cancer Registry. The U.S. EPA ASPEN modeled data were used to estimate ambient levels of benzene and 1,3-butadiene in separate Poisson regression analyses. All data were analyzed at the census tract level. We found that census tracts with the highest benzene levels had elevated rates of all leukemia (rate ratio (RR) = 1.37; 95% confidence interval (CI), 1.05-1.78). Among census tracts with the highest 1,3-butadiene levels, we observed RRs of 1.40 (95% CI, 1.07-1.81) for all leukemia. We detected no associations between benzene or 1,3-butadiene levels and childhood lymphoma incidence. This study is the first to examine this association in Harris and surrounding counties in Texas and is among the first to correlate monitored levels of HAPs with childhood lymphohematopoietic cancer incidence, evaluating several analytic methods in an effort to determine the most appropriate approach to test this association. Despite recognized weakness of ecologic analyses, our analysis suggests an association between childhood leukemia and hazardous air pollution.^
Resumo:
Radiomics is the high-throughput extraction and analysis of quantitative image features. For non-small cell lung cancer (NSCLC) patients, radiomics can be applied to standard of care computed tomography (CT) images to improve tumor diagnosis, staging, and response assessment. The first objective of this work was to show that CT image features extracted from pre-treatment NSCLC tumors could be used to predict tumor shrinkage in response to therapy. This is important since tumor shrinkage is an important cancer treatment endpoint that is correlated with probability of disease progression and overall survival. Accurate prediction of tumor shrinkage could also lead to individually customized treatment plans. To accomplish this objective, 64 stage NSCLC patients with similar treatments were all imaged using the same CT scanner and protocol. Quantitative image features were extracted and principal component regression with simulated annealing subset selection was used to predict shrinkage. Cross validation and permutation tests were used to validate the results. The optimal model gave a strong correlation between the observed and predicted shrinkages with . The second objective of this work was to identify sets of NSCLC CT image features that are reproducible, non-redundant, and informative across multiple machines. Feature sets with these qualities are needed for NSCLC radiomics models to be robust to machine variation and spurious correlation. To accomplish this objective, test-retest CT image pairs were obtained from 56 NSCLC patients imaged on three CT machines from two institutions. For each machine, quantitative image features with concordance correlation coefficient values greater than 0.90 were considered reproducible. Multi-machine reproducible feature sets were created by taking the intersection of individual machine reproducible feature sets. Redundant features were removed through hierarchical clustering. The findings showed that image feature reproducibility and redundancy depended on both the CT machine and the CT image type (average cine 4D-CT imaging vs. end-exhale cine 4D-CT imaging vs. helical inspiratory breath-hold 3D CT). For each image type, a set of cross-machine reproducible, non-redundant, and informative image features was identified. Compared to end-exhale 4D-CT and breath-hold 3D-CT, average 4D-CT derived image features showed superior multi-machine reproducibility and are the best candidates for clinical correlation.