809 resultados para Permutation testing
Resumo:
The last few years have seen the advent of high-throughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology. We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics data by proposing two statistical tests, namely edge permutation and node label permutation tests. We demonstrate the use of the proposed tests by finding significant association between a Gene Ontology-derived "predictome" and data obtained from mRNA expression and phenotypic experiments for Saccharomyces cerevisiae. Moreover, we employ the graph theoretic framework to recast a surprising discrepancy presented in Giaever et al. (2002) between gene expression and knockout phenotype, using expression data from a different set of experiments.
Resumo:
Declarative memory impairments are common in patients with bipolar illness, suggesting underlying hippocampal pathology. However, hippocampal volume deficits are rarely observed in bipolar disorder. Here we used surface-based anatomic mapping to examine hippocampal anatomy in bipolar patients treated with lithium relative to matched control subjects and unmedicated patients with bipolar disorder. High-resolution brain magnetic resonance images were acquired from 33 patients with bipolar disorder ( 21 treated with lithium and 12 unmedicated), and 62 demographically matched healthy control subjects. Three-dimensional parametric mesh models were created from manual tracings of the hippocampal formation. Total hippocampal volume was significantly larger in lithium-treated bipolar patients compared with healthy controls (by 10.3%; p=0.001) and unmedicated bipolar patients ( by 13.9%; p=0.003). Statistical mapping results, confirmed by permutation testing, revealed localized deficits in the right hippocampus, in regions corresponding primarily to cornu ammonis vertical bar subfields, in unmedicated bipolar patients, as compared to both normal controls (p=0.01), and in lithium-treated bipolar patients (p=0.03). These findings demonstrate the sensitivity of these anatomic mapping methods for detecting subtle alterations in hippocampal structure in bipolar disorder. The observed reduction in subregions of the hippocampus in unmedicated bipolar patients suggests a possible neural correlate for memory deficits frequently reported in this illness. Moreover, increased hippocampal volume in lithium-treated bipolar patients may reflect postulated neurotrophic effects of this agent, a possibility warranting further study in longitudinal investigations.
Resumo:
Molecular and behavioural evidence points to an association between sex-steroid hormones and autism spectrum conditions (ASC) and/or autistic traits. Prenatal androgen levels are associated with autistic traits, and several genes involved in steroidogenesis are associated with autism, Asperger Syndrome and/or autistic traits. Furthermore, higher rates of androgen-related conditions (such as Polycystic Ovary Syndrome, hirsutism, acne and hormone-related cancers) are reported in women with autism spectrum conditions. A key question therefore is if serum levels of gonadal and adrenal sex-steroids (particularly testosterone, estradiol, dehydroepiandrosterone sulfate and androstenedione) are elevated in individuals with ASC. This was tested in a total sample of n=166 participants. The final eligible sample for hormone analysis comprised n=128 participants, n=58 of whom had a diagnosis of Asperger Syndrome or high functioning autism (33 males and 25 females) and n=70 of whom were age- and IQ-matched typical controls (39 males and 31 females). ASC diagnosis (without any interaction with sex) strongly predicted androstenedione levels (p<0.01), and serum androstenedione levels were significantly elevated in the ASC group (Mann-Whitney W=2677, p=0.002), a result confirmed by permutation testing in females (permutation-corrected p=0.02). This result is discussed in terms of androstenedione being the immediate precursor of, and being converted into, testosterone, dihydrotestosterone, or estrogens in hormone-sensitive tissues and organs.
Resumo:
Permutation tests are useful for drawing inferences from imaging data because of their flexibility and ability to capture features of the brain that are difficult to capture parametrically. However, most implementations of permutation tests ignore important confounding covariates. To employ covariate control in a nonparametric setting we have developed a Markov chain Monte Carlo (MCMC) algorithm for conditional permutation testing using propensity scores. We present the first use of this methodology for imaging data. Our MCMC algorithm is an extension of algorithms developed to approximate exact conditional probabilities in contingency tables, logit, and log-linear models. An application of our non-parametric method to remove potential bias due to the observed covariates is presented.
Resumo:
With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
We developed an anatomical mapping technique to detect hippocampal and ventricular changes in Alzheimer disease (AD). The resulting maps are sensitive to longitudinal changes in brain structure as the disease progresses. An anatomical surface modeling approach was combined with surface-based statistics to visualize the region and rate of atrophy in serial MRI scans and isolate where these changes link with cognitive decline. Fifty-two high-resolution MRI scans were acquired from 12 AD patients (age: 68.4 +/- 1.9 years) and 14 matched controls (age: 71.4 +/- 0.9 years), each scanned twice (2.1 +/- 0.4 years apart). 3D parametric mesh models of the hippocampus and temporal horns were created in sequential scans and averaged across subjects to identify systematic patterns of atrophy. As an index of radial atrophy, 3D distance fields were generated relating each anatomical surface point to a medial curve threading down the medial axis of each structure. Hippocampal atrophic rates and ventricular expansion were assessed statistically using surface-based permutation testing and were faster in AD than in controls. Using color-coded maps and video sequences, these changes were visualized as they progressed anatomically over time. Additional maps localized regions where atrophic changes linked with cognitive decline. Temporal horn expansion maps were more sensitive to AD progression than maps of hippocampal atrophy, but both maps correlated with clinical deterioration. These quantitative, dynamic visualizations of hippocampal atrophy and ventricular expansion rates in aging and AD may provide a promising measure to track AD progression in drug trials. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
In a weighted spatial network, as specified by an exchange matrix, the variances of the spatial values are inversely proportional to the size of the regions. Spatial values are no more exchangeable under independence, thus weakening the rationale for ordinary permutation and bootstrap tests of spatial autocorrelation. We propose an alternative permutation test for spatial autocorrelation, based upon exchangeable spatial modes, constructed as linear orthogonal combinations of spatial values. The coefficients obtain as eigenvectors of the standardised exchange matrix appearing in spectral clustering, and generalise to the weighted case the concept of spatial filtering for connectivity matrices. Also, two proposals aimed at transforming an acessibility matrix into a exchange matrix with with a priori fixed margins are presented. Two examples (inter-regional migratory flows and binary adjacency networks) illustrate the formalism, rooted in the theory of spectral decomposition for reversible Markov chains.
Resumo:
In a weighted spatial network, as specified by an exchange matrix, the variances of the spatial values are inversely proportional to the size of the regions. Spatial values are no more exchangeable under independence, thus weakening the rationale for ordinary permutation and bootstrap tests of spatial autocorrelation. We propose an alternative permutation test for spatial autocorrelation, based upon exchangeable spatial modes, constructed as linear orthogonal combinations of spatial values. The coefficients obtain as eigenvectors of the standardised exchange matrix appearing in spectral clustering, and generalise to the weighted case the concept of spatial filtering for connectivity matrices. Also, two proposals aimed at transforming an acessibility matrix into a exchange matrix with with a priori fixed margins are presented. Two examples (inter-regional migratory flows and binary adjacency networks) illustrate the formalism, rooted in the theory of spectral decomposition for reversible Markov chains.
Resumo:
There has been great interest in deciding whether a combinatorial structure satisfies some property, or in estimating the value of some numerical function associated with this combinatorial structure, by considering only a randomly chosen substructure of sufficiently large, but constant size. These problems are called property testing and parameter testing, where a property or parameter is said to be testable if it can be estimated accurately in this way. The algorithmic appeal is evident, as, conditional on sampling, this leads to reliable constant-time randomized estimators. Our paper addresses property testing and parameter testing for permutations in a subpermutation perspective; more precisely, we investigate permutation properties and parameters that can be well approximated based on a randomly chosen subpermutation of much smaller size. In this context, we use a theory of convergence of permutation sequences developed by the present authors [C. Hoppen, Y. Kohayakawa, C.G. Moreira, R.M. Sampaio, Limits of permutation sequences through permutation regularity, Manuscript, 2010, 34pp.] to characterize testable permutation parameters along the lines of the work of Borgs et al. [C. Borgs, J. Chayes, L Lovasz, V.T. Sos, B. Szegedy, K. Vesztergombi, Graph limits and parameter testing, in: STOC`06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, ACM, New York, 2006, pp. 261-270.] in the case of graphs. Moreover, we obtain a permutation result in the direction of a famous result of Alon and Shapira [N. Alon, A. Shapira, A characterization of the (natural) graph properties testable with one-sided error, SIAM J. Comput. 37 (6) (2008) 1703-1727.] stating that every hereditary graph property is testable. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Consider the problem of testing k hypotheses simultaneously. In this paper,we discuss finite and large sample theory of stepdown methods that providecontrol of the familywise error rate (FWE). In order to improve upon theBonferroni method or Holm's (1979) stepdown method, Westfall and Young(1993) make eective use of resampling to construct stepdown methods thatimplicitly estimate the dependence structure of the test statistics. However,their methods depend on an assumption called subset pivotality. The goalof this paper is to construct general stepdown methods that do not requiresuch an assumption. In order to accomplish this, we take a close look atwhat makes stepdown procedures work, and a key component is a monotonicityrequirement of critical values. By imposing such monotonicity on estimatedcritical values (which is not an assumption on the model but an assumptionon the method), it is demonstrated that the problem of constructing a validmultiple test procedure which controls the FWE can be reduced to the problemof contructing a single test which controls the usual probability of a Type 1error. This reduction allows us to draw upon an enormous resamplingliterature as a general means of test contruction.
Resumo:
Assaying a large number of genetic markers from patients in clinical trials is now possible in order to tailor drugs with respect to efficacy. The statistical methodology for analysing such massive data sets is challenging. The most popular type of statistical analysis is to use a univariate test for each genetic marker, once all the data from a clinical study have been collected. This paper presents a sequential method for conducting an omnibus test for detecting gene-drug interactions across the genome, thus allowing informed decisions at the earliest opportunity and overcoming the multiple testing problems from conducting many univariate tests. We first propose an omnibus test for a fixed sample size. This test is based on combining F-statistics that test for an interaction between treatment and the individual single nucleotide polymorphism (SNP). As SNPs tend to be correlated, we use permutations to calculate a global p-value. We extend our omnibus test to the sequential case. In order to control the type I error rate, we propose a sequential method that uses permutations to obtain the stopping boundaries. The results of a simulation study show that the sequential permutation method is more powerful than alternative sequential methods that control the type I error rate, such as the inverse-normal method. The proposed method is flexible as we do not need to assume a mode of inheritance and can also adjust for confounding factors. An application to real clinical data illustrates that the method is computationally feasible for a large number of SNPs. Copyright (c) 2007 John Wiley & Sons, Ltd.
Resumo:
In about 50% of first trimester spontaneous abortion the cause remains undetermined after standard cytogenetic investigation. We evaluated the usefulness of array-CGH in diagnosing chromosome abnormalities in products of conception from first trimester spontaneous abortions. Cell culture was carried out in short- and long-term cultures of 54 specimens and cytogenetic analysis was successful in 49 of them. Cytogenetic abnormalities (numerical and structural) were detected in 22 (44.89%) specimens. Subsequent, array-CGH based on large insert clones spaced at ~1 Mb intervals over the whole genome was used in 17 cases with normal G-banding karyotype. This revealed chromosome aneuplodies in three additional cases, giving a final total of 51% cases in which an abnormal karyotype was detected. In keeping with other recently published works, this study shows that array-CGH detects abnormalities in a further ~10% of spontaneous abortion specimens considered to be normal using standard cytogenetic methods. As such, array-CGH technique may present a suitable complementary test to cytogenetic analysis in cases with a normal karyotype.
Resumo:
The aim of this study was to test the hypothesis of differences in performance including differences in ST-T wave changes between healthy men and women submitted to an exercise stress test. Two hundred (45.4%) men and 241 (54.6%) women (mean age: 38.7 ± 11.0 years) were submitted to an exercise stress test. Physiologic and electrocardiographic variables were compared by the Student t-test and the chi-square test. To test the hypothesis of differences in ST-segment changes, data were ranked with functional models based on weighted least squares. To evaluate the influence of gender and age on the diagnosis of ST-segment abnormality, a logistic model was adjusted; P < 0.05 was considered to be significant. Rate-pressure product, duration of exercise and estimated functional capacity were higher in men (P < 0.05). Sixteen (6.7%) women and 9 (4.5%) men demonstrated ST-segment upslope ≥0.15 mV or downslope ≥0.10 mV; the difference was not statistically significant. Age increase of one year added 4% to the chance of upsloping of segment ST ≥0.15 mV or downsloping of segment ST ≥0.1 mV (P = 0.03; risk ratio = 1.040, 95% confidence interval (CI) = 1.002-1.080). Heart rate recovery was higher in women (P < 0.05). The chance of women showing an increase of systolic blood pressure ≤30 mmHg was 85% higher (P = 0.01; risk ratio = 1.85, 95%CI = 1.1-3.05). No significant difference in the frequency of ST-T wave changes was observed between men and women. Other differences may be related to different physical conditioning.
Resumo:
The network of HIV counseling and testing centers in São Paulo, Brazil is a major source of data used to build epidemiological profiles of the client population. We examined HIV-1 incidence from November 2000 to April 2001, comparing epidemiological and socio-behavioral data of recently-infected individuals with those with long-standing infection. A less sensitive ELISA was employed to identify recent infection. The overall incidence of HIV-1 infection was 0.53/100/year (95% CI: 0.31-0.85/100/year): 0.77/100/year for males (95% CI: 0.42-1.27/100/year) and 0.22/100/ year (95% CI: 0.05-0.59/100/year) for females. Overall HIV-1 prevalence was 3.2% (95% CI: 2.8-3.7%), being 4.0% among males (95% CI: 3.3-4.7%) and 2.1% among females (95% CI: 1.6-2.8%). Recent infections accounted for 15% of the total (95% CI: 10.2-20.8%). Recent infection correlated with being younger and male (p = 0.019). Therefore, recent infection was more common among younger males and older females.