3 resultados para INTERVAL METHOD

em DigitalCommons@The Texas Medical Center


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This investigation compares two different methodologies for calculating the national cost of epilepsy: provider-based survey method (PBSM) and the patient-based medical charts and billing method (PBMC&BM). The PBSM uses the National Hospital Discharge Survey (NHDS), the National Hospital Ambulatory Medical Care Survey (NHAMCS) and the National Ambulatory Medical Care Survey (NAMCS) as the sources of utilization. The PBMC&BM uses patient data, charts and billings, to determine utilization rates for specific components of hospital, physician and drug prescriptions. ^ The 1995 hospital and physician cost of epilepsy is estimated to be $722 million using the PBSM and $1,058 million using the PBMC&BM. The difference of $336 million results from $136 million difference in utilization and $200 million difference in unit cost. ^ Utilization. The utilization difference of $136 million is composed of an inpatient variation of $129 million, $100 million hospital and $29 million physician, and an ambulatory variation of $7 million. The $100 million hospital variance is attributed to inclusion of febrile seizures in the PBSM, $−79 million, and the exclusion of admissions attributed to epilepsy, $179 million. The former suggests that the diagnostic codes used in the NHDS may not properly match the current definition of epilepsy as used in the PBMC&BM. The latter suggests NHDS errors in the attribution of an admission to the principal diagnosis. ^ The $29 million variance in inpatient physician utilization is the result of different per-day-of-care physician visit rates, 1.3 for the PBMC&BM versus 1.0 for the PBSM. The absence of visit frequency measures in the NHDS affects the internal validity of the PBSM estimate and requires the investigator to make conservative assumptions. ^ The remaining ambulatory resource utilization variance is $7 million. Of this amount, $22 million is the result of an underestimate of ancillaries in the NHAMCS and NAMCS extrapolations using the patient visit weight. ^ Unit cost. The resource cost variation is $200 million, inpatient is $22 million and ambulatory is $178 million. The inpatient variation of $22 million is composed of $19 million in hospital per day rates, due to a higher cost per day in the PBMC&BM, and $3 million in physician visit rates, due to a higher cost per visit in the PBMC&BM. ^ The ambulatory cost variance is $178 million, composed of higher per-physician-visit costs of $97 million and higher per-ancillary costs of $81 million. Both are attributed to the PBMC&BM's precise identification of resource utilization that permits accurate valuation. ^ Conclusion. Both methods have specific limitations. The PBSM strengths are its sample designs that lead to nationally representative estimates and permit statistical point and confidence interval estimation for the nation for certain variables under investigation. However, the findings of this investigation suggest the internal validity of the estimates derived is questionable and important additional information required to precisely estimate the cost of an illness is absent. ^ The PBMC&BM is a superior method in identifying resources utilized in the physician encounter with the patient permitting more accurate valuation. However, the PBMC&BM does not have the statistical reliability of the PBSM; it relies on synthesized national prevalence estimates to extrapolate a national cost estimate. While precision is important, the ability to generalize to the nation may be limited due to the small number of patients that are followed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^