2 resultados para multisensory statistical learning

em DigitalCommons@The Texas Medical Center


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cervical cancer is the leading cause of death and disease from malignant neoplasms among women in developing countries. Even though the Pap smear has significantly decreased the number of deaths from cervical cancer in the past years, it has its limitations. Researchers have developed an automated screening machine which can potentially detect abnormal cases that are overlooked by conventional screening. The goal of quantitative cytology is to classify the patient's tissue sample based on quantitative measurements of the individual cells. It is also much cheaper and potentially can take less time. One of the major challenges of collecting cells with a cytobrush is the possibility of not sampling any existing dysplastic cells on the cervix. Being able to correctly classify patients who have disease without the presence of dysplastic cells could improve the accuracy of quantitative cytology algorithms. Subtle morphologic changes in normal-appearing tissues adjacent to or distant from malignant tumors have been shown to exist, but a comparison of various statistical methods, including many recent advances in the statistical learning field, has not previously been done. The objective of this thesis is to use different classification methods applied to quantitative cytology data for the detection of malignancy associated changes (MACs). In this thesis, Elastic Net is the best algorithm. When we applied the Elastic Net algorithm to the test set, we combined the training set and validation set as "training" set and used 5-fold cross validation to choose the parameter for Elastic Net. It has a sensitivity of 47% at 80% specificity, an AUC 0.52, and a partial AUC 0.10 (95% CI 0.09-0.11).^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^