12 resultados para false positive

em DigitalCommons@The Texas Medical Center


Relevância:

70.00% 70.00%

Publicador:

Resumo:

False-positive and false-negative values were calculated for five different designs of the trend test and it was demonstrated that a design suggested by Portier and Hoel in 1984 for a different problem produced the lowest false-positive and false-negative rates when applied to historical spontaneous tumor rate data for Fischer Rats. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A non-parametric method was developed and tested to compare the partial areas under two correlated Receiver Operating Characteristic curves. Based on the theory of generalized U-statistics the mathematical formulas have been derived for computing ROC area, and the variance and covariance between the portions of two ROC curves. A practical SAS application also has been developed to facilitate the calculations. The accuracy of the non-parametric method was evaluated by comparing it to other methods. By applying our method to the data from a published ROC analysis of CT image, our results are very close to theirs. A hypothetical example was used to demonstrate the effects of two crossed ROC curves. The two ROC areas are the same. However each portion of the area between two ROC curves were found to be significantly different by the partial ROC curve analysis. For computation of ROC curves with large scales, such as a logistic regression model, we applied our method to the breast cancer study with Medicare claims data. It yielded the same ROC area computation as the SAS Logistic procedure. Our method also provides an alternative to the global summary of ROC area comparison by directly comparing the true-positive rates for two regression models and by determining the range of false-positive values where the models differ. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With hundreds of single nucleotide polymorphisms (SNPs) in a candidate gene and millions of SNPs across the genome, selecting an informative subset of SNPs to maximize the ability to detect genotype-phenotype association is of great interest and importance. In addition, with a large number of SNPs, analytic methods are needed that allow investigators to control the false positive rate resulting from large numbers of SNP genotype-phenotype analyses. This dissertation uses simulated data to explore methods for selecting SNPs for genotype-phenotype association studies. I examined the pattern of linkage disequilibrium (LD) across a candidate gene region and used this pattern to aid in localizing a disease-influencing mutation. The results indicate that the r2 measure of linkage disequilibrium is preferred over the common D′ measure for use in genotype-phenotype association studies. Using step-wise linear regression, the best predictor of the quantitative trait was not usually the single functional mutation. Rather it was a SNP that was in high linkage disequilibrium with the functional mutation. Next, I compared three strategies for selecting SNPs for application to phenotype association studies: based on measures of linkage disequilibrium, based on a measure of haplotype diversity, and random selection. The results demonstrate that SNPs selected based on maximum haplotype diversity are more informative and yield higher power than randomly selected SNPs or SNPs selected based on low pair-wise LD. The data also indicate that for genes with small contribution to the phenotype, it is more prudent for investigators to increase their sample size than to continuously increase the number of SNPs in order to improve statistical power. When typing large numbers of SNPs, researchers are faced with the challenge of utilizing an appropriate statistical method that controls the type I error rate while maintaining adequate power. We show that an empirical genotype based multi-locus global test that uses permutation testing to investigate the null distribution of the maximum test statistic maintains a desired overall type I error rate while not overly sacrificing statistical power. The results also show that when the penetrance model is simple the multi-locus global test does as well or better than the haplotype analysis. However, for more complex models, haplotype analyses offer advantages. The results of this dissertation will be of utility to human geneticists designing large-scale multi-locus genotype-phenotype association studies. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Numerous studies have been carried out to try to better understand the genetic predisposition for cardiovascular disease. Although it is widely believed that multifactorial diseases such as cardiovascular disease is the result from effects of many genes which working alone or interact with other genes, most genetic studies have been focused on identifying of cardiovascular disease susceptibility genes and usually ignore the effects of gene-gene interactions in the analysis. The current study applies a novel linkage disequilibrium based statistic for testing interactions between two linked loci using data from a genome-wide study of cardiovascular disease. A total of 53,394 single nucleotide polymorphisms (SNPs) are tested for pair-wise interactions, and 8,644 interactions are found to be significant with p-values less than 3.5×10-11. Results indicate that known cardiovascular disease susceptibility genes tend not to have many significantly interactions. One SNP in the CACNG1 (calcium channel, voltage-dependent, gamma subunit 1) gene and one SNP in the IL3RA (interleukin 3 receptor, alpha) gene are found to have the most significant pair-wise interactions. Findings from the current study should be replicated in other independent cohort to eliminate potential false positive results.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purpose of this study was to evaluate the adequacy of computerized vital records in Texas for conducting etiologic studies on neural tube defects (NTDs), using the revised and expanded National Centers for Health Statistics vital record forms introduced in Texas in 1989.^ Cases of NTDs (anencephaly and spina bifida) among Harris County (Houston) residents were identified from the computerized birth and death records for 1989-1991. The validity of the system was then measured against cases ascertained independently through medical records and death certificates. The computerized system performed poorly in its identification of NTDs, particularly for anencephaly, where the false positive rate was 80% with little or no improvement over the 3-year period. For both NTDs the sensitivity and predictive value positive of the tapes were somewhat higher for Hispanic than non-Hispanic mothers.^ Case control studies were conducted utilizing the tape set and the independently verified data set, using controls selected from the live birth tapes. Findings varied widely between the data sets. For example, the anencephaly odds ratio for Hispanic mothers (vs. non-Hispanic) was 1.91 (CI = 1.38-2.65) for the tape file, but 3.18 (CI = 1.81-5.58) for verified records. The odds ratio for diabetes was elevated for the tape set (OR = 3.33, CI = 1.67-6.66) but not for verified cases (OR = 1.09, CI = 0.24-4.96), among whom few mothers were diabetic. It was concluded that computerized tapes should not be solely relied on for NTD studies.^ Using the verified cases, Hispanic mother was associated with spina bifida, and Hispanic mother, teen mother, and previous pregnancy terminations were associated with anencephaly. Mother's birthplace, education, parity, and diabetes were not significant for either NTD.^ Stratified analyses revealed several notable examples of statistical interaction. For anencephaly, strong interaction was observed between Hispanic origin and trimester of first prenatal care.^ The prevalence was 3.8 per 10,000 live births for anencephaly and 2.0 for spina bifida (5.8 per 10,000 births for the combined categories). ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nutrient intake and specific food item data from 24-hour dietary recalls were utilized to study the relationship between measures of diet diversity and dietary adequacy in a population of white females of child-bearing age and socioeconomic subgroups of that population. As the basis of the diet diversity measures, twelve food groups were constructed from the 24-hour recall data and the number of unique foods per food group counted and weighted according to specified weighting schemes. Utilizing these food groups, nine diet diversity indices were developed.^ Sensitivity/specificity analysis was used to determine the ability of varying levels of selected diet diversity indices to identify individuals above and below preselected intakes of different nutrients. The true prevalence proportions, sensitivity and specificity, false positive and false negative rates, and positive predictive values observed at the selected levels of diet diversity indices were investigated in relation to the objectives and resources of a variety of nutrition improvement programs. Diet diversity indices constructed from the total population data were evaluated as screening tools for respondent nutrient intakes in each of the socioeconomic subgroups as well.^ The results of the sensitivity/specificity analysis demonstrated that the false positive rate, the false negative rate, or both were too high at each diversity cut-off level to validate the widespread use of any of the diversity indices in the dietary assessment of the study population. Although diet diversity has been shown to be highly correlated with the intakes of a number of nutrients, the diet diversity indices constructed in this study did not adequately represent nutrient intakes in the diet as reported, in this study, intakes as reported in the 24-hour dietary recall. Specific cut-off levels of selected diversity indices might have limited application in some nutrition programs. The results were applicable to the sensitivity/specificity analyses in the socioeconomic subgroups as well as in the total population. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Because of its simplicity and low cost, arm circumference (AC) is being used increasingly in screening for protein energy malnutrition among pre-school children in many parts of the developing world, especially where minimally trained health workers are employed. The objectives of this study were as follows: (1) To determine the relationship of the AC measure with weight for age and weight for height in the detection of malnutrition among pre-school children in a Guatemalan Indian village. (2) To determine the performance of minimally trained promoters under field conditions in measuring AC, weight and height. (3) To describe the practical aspects of taking AC measures versus weight, age and height.^ The study was conducted in San Pablo La Laguna, one of four villages situated on the shores of Lake Atitlan, Guatemala, in which a program of simplified medical care was implemented by the Institute for Nutrition for Central America and Panama (INCAP). Weight, height, AC and age data were collected for 144 chronically malnourished children. The measurements obtained by the trained investigator under the controlled conditions of the health post were correlated against one another and AC was found to have a correlation with weight for age of 0.7127 and with weight for height of 0.7911, both well within the 0.65 to 0.80 range reported in the literature. False positive and false negative analysis showed that AC was more sensitive when compared with weight for height than with weight for age. This was fortunate since, especially in areas with widespread chronic malnutrition, weight for height detects those acute cases in immediate danger of complicating illness or death. Moreover, most of the cases identified as malnourished by AC, but not by weight for height (false positives), were either young or very stunted which made their selection by AC better than weight for height. The large number of cases detected by weight for age, but not by AC (false negative rate--40%) were, however, mostly beyond the critical age period and had normal weight for heights.^ The performance of AC, weight for height and weight for age under field conditions in the hands of minimally trained health workers was also analyzed by correlating these measurements against the same criterion measurements taken under ideally controlled conditions of the health post. AC had the highest correlation with itself indicating that it deteriorated the least in the move to the field. Moreover, there was a high correlation between AC in the field and criterion weight for height (0.7509); this correlation was almost as high as that for field weight for height versus the same measure in the health post (0.7588). The implication is that field errors are so great for the compounded weight for height variable that, in the field, AC is about as good a predictor of the ideal weight for height measure.^ Minimally trained health workers made more errors than the investigator as exemplified by their lower intra-observer correlation coefficients. They consistently measured larger than the investigator for all measures. Also there was a great deal of variability between these minimally trained workers indicating that careful training and followup is necessary for the success of the AC measure.^ AC has many practical advantages compared to the other anthropometric tools. It does not require age data, which are often unreliable in these settings, and does not require sophisticated subtraction and two dimensional table-handling skills that weight for age and weight for height require. The measure is also more easily applied with less disturbance to the child and the community. The AC tape is cheap and not easily damaged or jarred out of calibration while being transported in rugged settings, as is often the case with weight scales. Moreover, it can be kept in a health worker's pocket at all times for continual use in a widespread range of settings. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A community bioassay of copper was performed using benthic macroinvertebrates colonized on multiplate substrate samplers. Five copper concentrations ranging from 0.080-2.20 mg/l total copper were administered to five artificial streams by a Mount and Brungs proportional dilutor. Free copper ion as Cu('++) ranged from .002-.053 mg/l. A sixth stream received no copper and served as a control. Substrates were sampled at days 0, 14, and 28, and the results were used to compare 13 indices used or proposed to assess aquatic environmental impact. Sensitivity of the indices to changes in communities with respect to concentration and time was the basis for the comparison.^ Results indicated that all of the 8 diversity or richness indices tested gave approximately the same result (with the exception of number of species); they increased over the first 2-3 concentrations, then declined. Included among these was the Shannon index which gave false positive results, i.e., it increased, indicating enrichment, when in fact perturbation had occurred. This result was due to the disproportionate effect on the most abundant taxa, which caused a more even distribution of individuals among species. Number of species and individuals declined with increased concentration and time, with only one exception in the case of species, indicating perturbation.^ Results of five community comparison indices were varied at day 14 but by day 28 the results indicated a clear, nearly monotonic, trend due to copper impact. It was assumed that day 28 observations, though probably still changing, were nearer stability than at day 14 and therefore more representative of natural conditions. The changes in community comparison indices showed good agreement at 28 days and reflected the general decline in species and individuals. No single community comparison index could be set apart as superior to the others. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Two molecular epidemiological studies were conducted to examine associations between genetic variation and risk of squamous cell carcinoma of the head and neck (SCCHN). In the first study, we hypothesized that genetic variation in p53 response elements (REs) may play roles in the etiology of SCCHN. We selected and genotyped five polymorphic p53 REs as well as a most frequently studied p53 codon 72 (Arg72Pro, rs1042522) polymorphism in 1,100 non-Hispanic White SCCHN patients and 1,122 age-and sex-matched cancer-free controls recruited at The University of Texas M. D. Anderson Cancer Center. In multivariate logistic regression analysis with adjustment for age, sex, smoking and drinking status, marital status and education level, we observed that the EOMES rs3806624 CC genotype had a significant effect of protection against SCCHN risk (adjusted odds ratio= 0.79, 95% confidence interval =0.64–0.98), compared with the -838TT+CT genotypes. Moreover, a significantly increased risk associated with the combined genotypes of p53 codon 72CC and EOMES -838TT+CT was observed, especially in the subgroup of non-oropharyneal cancer patients. The values of false-positive report probability were also calculated for significant findings. In the second study, we assessed the association between SCCHN risk and four potential regulatory single nucleotide polymorphisms (SNPs) of DEC1 (deleted in esophageal cancer 1) gene, a candidate tumor suppressor gene for esophageal cancer. After adjustment for age, sex, and smoking and drinking status, the variant -606CC (i.e., -249CC) homozygotes had a significantly reduced SCCHN risk (adjusted odds ratio = 0.71, 95% confidence interval = 0.52–0.99), compared with the -606TT homozygotes. Stratification analyses showed that a reduced risk associated with the -606CC genotype was more pronounced in subgroups of non-smokers, non-drinkers, younger subjects (defined as ≤ 57 years), carriers of TP53 Arg/Arg (rs1042522) genotype, patients with oropharyngeal cancer or late-stage SCCHN. Further in silico analysis revealed that the -249 T-to-C change led to a gain of a transcription factor binding site. Additional functional analysis showed that the -249T-to-C change significantly enhanced transcriptional activity of the DEC1 promoter and the DNA-protein binding activity. We conclude that the DEC1 promoter -249 T>C (rs2012775) polymorphism is functional, modulating susceptibility to SCCHN among non-Hispanic Whites. Additional large-scale, preferably population-based studies are needed to validate our findings.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.