4 resultados para mutual information

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite current enthusiasm for investigation of gene-gene interactions and gene-environment interactions, the essential issue of how to define and detect gene-environment interactions remains unresolved. In this report, we define gene-environment interactions as a stochastic dependence in the context of the effects of the genetic and environmental risk factors on the cause of phenotypic variation among individuals. We use mutual information that is widely used in communication and complex system analysis to measure gene-environment interactions. We investigate how gene-environment interactions generate the large difference in the information measure of gene-environment interactions between the general population and a diseased population, which motives us to develop mutual information-based statistics for testing gene-environment interactions. We validated the null distribution and calculated the type 1 error rates for the mutual information-based statistics to test gene-environment interactions using extensive simulation studies. We found that the new test statistics were more powerful than the traditional logistic regression under several disease models. Finally, in order to further evaluate the performance of our new method, we applied the mutual information-based statistics to three real examples. Our results showed that P-values for the mutual information-based statistics were much smaller than that obtained by other approaches including logistic regression models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A nonlinear viscoelastic image registration algorithm based on the demons paradigm and incorporating inverse consistent constraint (ICC) is implemented. An inverse consistent and symmetric cost function using mutual information (MI) as a similarity measure is employed. The cost function also includes regularization of transformation and inverse consistent error (ICE). The uncertainties in balancing various terms in the cost function are avoided by alternatively minimizing the similarity measure, the regularization of the transformation, and the ICE terms. The diffeomorphism of registration for preventing folding and/or tearing in the deformation is achieved by the composition scheme. The quality of image registration is first demonstrated by constructing brain atlas from 20 adult brains (age range 30-60). It is shown that with this registration technique: (1) the Jacobian determinant is positive for all voxels and (2) the average ICE is around 0.004 voxels with a maximum value below 0.1 voxels. Further, the deformation-based segmentation on Internet Brain Segmentation Repository, a publicly available dataset, has yielded high Dice similarity index (DSI) of 94.7% for the cerebellum and 74.7% for the hippocampus, attesting to the quality of our registration method.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Variants in the complement cascade genes and the LOC387715/HTRA1, have been widely reported to associate with age-related macular degeneration (AMD), the most common cause of visual impairment in industrialized countries. METHODS/PRINCIPAL FINDINGS: We investigated the association between the LOC387715 A69S and complement component C3 R102G risk alleles in the Finnish case-control material and found a significant association with both variants (OR 2.98, p = 3.75 x 10(-9); non-AMD controls and OR 2.79, p = 2.78 x 10(-19), blood donor controls and OR 1.83, p = 0.008; non-AMD controls and OR 1.39, p = 0.039; blood donor controls), respectively. Previously, we have shown a strong association between complement factor H (CFH) Y402H and AMD in the Finnish population. A carrier of at least one risk allele in each of the three susceptibility loci (LOC387715, C3, CFH) had an 18-fold risk of AMD when compared to a non-carrier homozygote in all three loci. A tentative gene-gene interaction between the two major AMD-associated loci, LOC387715 and CFH, was found in this study using a multiplicative (logistic regression) model, a synergy index (departure-from-additivity model) and the mutual information method (MI), suggesting that a common causative pathway may exist for these genes. Smoking (ever vs. never) exerted an extra risk for AMD, but somewhat surprisingly, only in connection with other factors such as sex and the C3 genotype. Population attributable risks (PAR) for the CFH, LOC387715 and C3 variants were 58.2%, 51.4% and 5.8%, respectively, the summary PAR for the three variants being 65.4%. CONCLUSIONS/SIGNIFICANCE: Evidence for gene-gene interaction between two major AMD associated loci CFH and LOC387715 was obtained using three methods, logistic regression, a synergy index and the mutual information (MI) index.