2 resultados para hierarchical clustering
em DigitalCommons@The Texas Medical Center
Resumo:
Radiomics is the high-throughput extraction and analysis of quantitative image features. For non-small cell lung cancer (NSCLC) patients, radiomics can be applied to standard of care computed tomography (CT) images to improve tumor diagnosis, staging, and response assessment. The first objective of this work was to show that CT image features extracted from pre-treatment NSCLC tumors could be used to predict tumor shrinkage in response to therapy. This is important since tumor shrinkage is an important cancer treatment endpoint that is correlated with probability of disease progression and overall survival. Accurate prediction of tumor shrinkage could also lead to individually customized treatment plans. To accomplish this objective, 64 stage NSCLC patients with similar treatments were all imaged using the same CT scanner and protocol. Quantitative image features were extracted and principal component regression with simulated annealing subset selection was used to predict shrinkage. Cross validation and permutation tests were used to validate the results. The optimal model gave a strong correlation between the observed and predicted shrinkages with . The second objective of this work was to identify sets of NSCLC CT image features that are reproducible, non-redundant, and informative across multiple machines. Feature sets with these qualities are needed for NSCLC radiomics models to be robust to machine variation and spurious correlation. To accomplish this objective, test-retest CT image pairs were obtained from 56 NSCLC patients imaged on three CT machines from two institutions. For each machine, quantitative image features with concordance correlation coefficient values greater than 0.90 were considered reproducible. Multi-machine reproducible feature sets were created by taking the intersection of individual machine reproducible feature sets. Redundant features were removed through hierarchical clustering. The findings showed that image feature reproducibility and redundancy depended on both the CT machine and the CT image type (average cine 4D-CT imaging vs. end-exhale cine 4D-CT imaging vs. helical inspiratory breath-hold 3D CT). For each image type, a set of cross-machine reproducible, non-redundant, and informative image features was identified. Compared to end-exhale 4D-CT and breath-hold 3D-CT, average 4D-CT derived image features showed superior multi-machine reproducibility and are the best candidates for clinical correlation.
Resumo:
In numerous intervention studies and education field trials, random assignment to treatment occurs in clusters rather than at the level of observation. This departure of random assignment of units may be due to logistics, political feasibility, or ecological validity. Data within the same cluster or grouping are often correlated. Application of traditional regression techniques, which assume independence between observations, to clustered data produce consistent parameter estimates. However such estimators are often inefficient as compared to methods which incorporate the clustered nature of the data into the estimation procedure (Neuhaus 1993).1 Multilevel models, also known as random effects or random components models, can be used to account for the clustering of data by estimating higher level, or group, as well as lower level, or individual variation. Designing a study, in which the unit of observation is nested within higher level groupings, requires the determination of sample sizes at each level. This study investigates the design and analysis of various sampling strategies for a 3-level repeated measures design on the parameter estimates when the outcome variable of interest follows a Poisson distribution. ^ Results study suggest that second order PQL estimation produces the least biased estimates in the 3-level multilevel Poisson model followed by first order PQL and then second and first order MQL. The MQL estimates of both fixed and random parameters are generally satisfactory when the level 2 and level 3 variation is less than 0.10. However, as the higher level error variance increases, the MQL estimates become increasingly biased. If convergence of the estimation algorithm is not obtained by PQL procedure and higher level error variance is large, the estimates may be significantly biased. In this case bias correction techniques such as bootstrapping should be considered as an alternative procedure. For larger sample sizes, those structures with 20 or more units sampled at levels with normally distributed random errors produced more stable estimates with less sampling variance than structures with an increased number of level 1 units. For small sample sizes, sampling fewer units at the level with Poisson variation produces less sampling variation, however this criterion is no longer important when sample sizes are large. ^ 1Neuhaus J (1993). “Estimation efficiency and Tests of Covariate Effects with Clustered Binary Data”. Biometrics , 49, 989–996^