2 resultados para Visual search method
em DigitalCommons@The Texas Medical Center
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
These three manuscripts are presented as a PhD dissertation for the study of using GeoVis application to evaluate telehealth programs. The primary reason of this research was to understand how the GeoVis applications can be designed and developed using combined approaches of HC approach and cognitive fit theory and in terms utilized to evaluate telehealth program in Brazil. First manuscript The first manuscript in this dissertation presented a background about the use of GeoVisualization to facilitate visual exploration of public health data. The manuscript covered the existing challenges that were associated with an adoption of existing GeoVis applications. The manuscript combines the principles of Human Centered approach and Cognitive Fit Theory and a framework using a combination of these approaches is developed that lays the foundation of this research. The framework is then utilized to propose the design, development and evaluation of “the SanaViz” to evaluate telehealth data in Brazil, as a proof of concept. Second manuscript The second manuscript is a methods paper that describes the approaches that can be employed to design and develop “the SanaViz” based on the proposed framework. By defining the various elements of the HC approach and CFT, a mixed methods approach is utilized for the card sorting and sketching techniques. A representative sample of 20 study participants currently involved in the telehealth program at the NUTES telehealth center at UFPE, Recife, Brazil was enrolled. The findings of this manuscript helped us understand the needs of the diverse group of telehealth users, the tasks that they perform and helped us determine the essential features that might be necessary to be included in the proposed GeoVis application “the SanaViz”. Third manuscript The third manuscript involved mix- methods approach to compare the effectiveness and usefulness of the HC GeoVis application “the SanaViz” against a conventional GeoVis application “Instant Atlas”. The same group of 20 study participants who had earlier participated during Aim 2 was enrolled and a combination of quantitative and qualitative assessments was done. Effectiveness was gauged by the time that the participants took to complete the tasks using both the GeoVis applications, the ease with which they completed the tasks and the number of attempts that were taken to complete each task. Usefulness was assessed by System Usability Scale (SUS), a validated questionnaire tested in prior studies. In-depth interviews were conducted to gather opinions about both the GeoVis applications. This manuscript helped us in the demonstration of the usefulness and effectiveness of HC GeoVis applications to facilitate visual exploration of telehealth data, as a proof of concept. Together, these three manuscripts represent challenges of combining principles of Human Centered approach, Cognitive Fit Theory to design and develop GeoVis applications as a method to evaluate Telehealth data. To our knowledge, this is the first study to explore the usefulness and effectiveness of GeoVis to facilitate visual exploration of telehealth data. The results of the research enabled us to develop a framework for the design and development of GeoVis applications related to the areas of public health and especially telehealth. The results of our study showed that the varied users were involved with the telehealth program and the tasks that they performed. Further it enabled us to identify the components that might be essential to be included in these GeoVis applications. The results of our research answered the following questions; (a) Telehealth users vary in their level of understanding about GeoVis (b) Interaction features such as zooming, sorting, and linking and multiple views and representation features such as bar chart and choropleth maps were considered the most essential features of the GeoVis applications. (c) Comparing and sorting were two important tasks that the telehealth users would perform for exploratory data analysis. (d) A HC GeoVis prototype application is more effective and useful for exploration of telehealth data than a conventional GeoVis application. Future studies should be done to incorporate the proposed HC GeoVis framework to enable comprehensive assessment of the users and the tasks they perform to identify the features that might be necessary to be a part of the GeoVis applications. The results of this study demonstrate a novel approach to comprehensively and systematically enhance the evaluation of telehealth programs using the proposed GeoVis Framework.