901 resultados para false positives
Resumo:
Lint-like program checkers are popular tools that ensure code quality by verifying compliance with best practices for a particular programming language. The proliferation of internal domain-specific languages and models, however, poses new challenges for such tools. Traditional program checkers produce many false positives and fail to accurately check constraints, best practices, common errors, possible optimizations and portability issues particular to domain-specific languages. We advocate the use of dedicated rules to check domain-specific practices. We demonstrate the implementation of domain-specific rules, the automatic fixing of violations, and their application to two case-studies: (1) Seaside defines several internal DSLs through a creative use of the syntax of the host language; and (2) Magritte adds meta-descriptions to existing code by means of special methods. Our empirical validation demonstrates that domain-specific program checking significantly improves code quality when compared with general purpose program checking.
Resumo:
A marker that is strongly associated with outcome (or disease) is often assumed to be effective for classifying individuals according to their current or future outcome. However, for this to be true, the associated odds ratio must be of a magnitude rarely seen in epidemiological studies. An illustration of the relationship between odds ratios and receiver operating characteristic (ROC) curves shows, for example, that a marker with an odds ratio as high as 3 is in fact a very poor classification tool. If a marker identifies 10 percent of controls as positive (false positives) and has an odds ratio of 3, then it will only correctly identify 25 percent of cases as positive (true positives). Moreover, the authors illustrate that a single measure of association such as an odds ratio does not meaningfully describe a marker’s ability to classify subjects. Appropriate statistical methods for assessing and reporting the classification power of a marker are described. The serious pitfalls of using more traditional methods based on parameters in logistic regression models are illustrated.
Resumo:
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.
Resumo:
The purpose of this retrospective study was to evaluate the impact of energy subtraction (ES) chest radiography on the detection of pulmonary nodules and masses in daily routine. Seventy-seven patients and 25 healthy subjects were examined with a single exposure digital radiography system. Five blinded readers evaluated first the non-subtracted PA and lateral chest radiographs alone and then together with the subtracted PA soft tissue images. The size, location and number of lung nodules or masses were registered with the confidence level. CT was used as standard of reference. For the 200 total lesions, a sensitivity of 33.5-52.5% was found at non-subtracted and a sensitivity of 43.5-58.5% at energy-subtracted radiography, corresponding to a significant improvement in four of five readers (p < 0.05). However, in three of five readers the rate of false positives was higher with ES. With ES, sensitivity, but not the area under the alternative free-response receiver operating characteristics (AFROC) curve, showed a good correlation with reader experience (R = 0.90, p = 0.026). In four of five readers, the diagnostic confidence improved with ES (p = 0.0036). We conclude that single-exposure digital ES chest radiography improves detection of most pulmonary nodules and masses, but identification of nodules <1 cm and false-positive findings remain a problem.
Resumo:
OBJECTIVE: To assess the types and numbers of cases, gestational age at specific prenatal diagnosis and diagnostic accuracy of the diagnosis of skeletal dysplasias in a prenatal population from a single tertiary center. METHODS: This was a retrospective database review of type, prenatal and definitive postnatal diagnoses and gestational age at specific prenatal diagnosis of all cases of skeletal dysplasias from a mixed referral and screening population between 1985 and 2007. Prenatal diagnoses were grouped into 'correct ultrasound diagnosis' (complete concordance with postnatal pediatric or pathological findings) or 'partially correct ultrasound diagnosis' (skeletal dysplasias found postnatally to be a different one from that diagnosed prenatally). RESULTS: We included 178 fetuses in this study, of which 176 had a prenatal ultrasound diagnosis of 'skeletal dysplasia'. In 160 cases the prenatal diagnosis of a skeletal dysplasia was confirmed; two cases with skeletal dysplasias identified postnatally had not been diagnosed prenatally, giving 162 fetuses with skeletal dysplasias in total. There were 23 different classifiable types of skeletal dysplasia. The specific diagnoses based on prenatal ultrasound examination alone were correct in 110/162 (67.9%) cases and partially correct in 50/162 (30.9%) cases, (160/162 overall, 98.8%). In 16 cases, skeletal dysplasia was diagnosed prenatally, but was not confirmed postnatally (n = 12 false positives) or the case was lost to follow-up (n = 4). The following skeletal dysplasias were recorded: thanatophoric dysplasia (35 diagnosed correctly prenatally of 40 overall), osteogenesis imperfecta (lethal and non-lethal, 31/35), short-rib dysplasias (5/10), chondroectodermal dysplasia Ellis-van Creveld (4/9), achondroplasia (7/9), achondrogenesis (7/8), campomelic dysplasia (6/8), asphyxiating thoracic dysplasia Jeune (3/7), hypochondrogenesis (1/6), diastrophic dysplasia (2/5), chondrodysplasia punctata (2/2), hypophosphatasia (0/2) as well as a further 7/21 cases with rare or unclassifiable skeletal dysplasias. CONCLUSION: Prenatal diagnosis of skeletal dysplasias can present a considerable diagnostic challenge. However, a meticulous sonographic examination yields high overall detection. In the two most common disorders, thanatophoric dysplasia and osteogenesis imperfecta (25% and 22% of all cases, respectively), typical sonomorphology accounts for the high rates of completely correct prenatal diagnosis (88% and 89%, respectively) at the first diagnostic examination.
Resumo:
OBJECTIVES This study sought to validate the Logistic Clinical SYNTAX (Synergy Between Percutaneous Coronary Intervention With Taxus and Cardiac Surgery) score in patients with non-ST-segment elevation acute coronary syndromes (ACS), in order to further legitimize its clinical application. BACKGROUND The Logistic Clinical SYNTAX score allows for an individualized prediction of 1-year mortality in patients undergoing contemporary percutaneous coronary intervention. It is composed of a "Core" Model (anatomical SYNTAX score, age, creatinine clearance, and left ventricular ejection fraction), and "Extended" Model (composed of an additional 6 clinical variables), and has previously been cross validated in 7 contemporary stent trials (>6,000 patients). METHODS One-year all-cause death was analyzed in 2,627 patients undergoing percutaneous coronary intervention from the ACUITY (Acute Catheterization and Urgent Intervention Triage Strategy) trial. Mortality predictions from the Core and Extended Models were studied with respect to discrimination, that is, separation of those with and without 1-year all-cause death (assessed by the concordance [C] statistic), and calibration, that is, agreement between observed and predicted outcomes (assessed with validation plots). Decision curve analyses, which weight the harms (false positives) against benefits (true positives) of using a risk score to make mortality predictions, were undertaken to assess clinical usefulness. RESULTS In the ACUITY trial, the median SYNTAX score was 9.0 (interquartile range 5.0 to 16.0); approximately 40% of patients had 3-vessel disease, 29% diabetes, and 85% underwent drug-eluting stent implantation. Validation plots confirmed agreement between observed and predicted mortality. The Core and Extended Models demonstrated substantial improvements in the discriminative ability for 1-year all-cause death compared with the anatomical SYNTAX score in isolation (C-statistics: SYNTAX score: 0.64, 95% confidence interval [CI]: 0.56 to 0.71; Core Model: 0.74, 95% CI: 0.66 to 0.79; Extended Model: 0.77, 95% CI: 0.70 to 0.83). Decision curve analyses confirmed the increasing ability to correctly identify patients who would die at 1 year with the Extended Model versus the Core Model versus the anatomical SYNTAX score, over a wide range of thresholds for mortality risk predictions. CONCLUSIONS Compared to the anatomical SYNTAX score alone, the Core and Extended Models of the Logistic Clinical SYNTAX score more accurately predicted individual 1-year mortality in patients presenting with non-ST-segment elevation acute coronary syndromes undergoing percutaneous coronary intervention. These findings support the clinical application of the Logistic Clinical SYNTAX score.
Resumo:
Most published genomewide association studies (GWAS) in sheep have investigated recessively inherited monogenic traits. The objective here was to assess the feasibility of performing GWAS for a dominant trait for which the genetic basis was already known. A total of 42 Manchega and Rasa Aragonesa sheep that segregate solid black or white coat pigmentation were genotyped using the SNP50 BeadChip. Previous analysis in Manchegas demonstrated a complete association between the pigmentation trait and alleles of the MC1R gene, setting an a priori expectation for GWAS. Multiple methods were used to identify and quantify the strength of population substructure between black and white animals, before allelic association testing was performed for 49 034 SNPs. Following correction for substructure, GWAS identified the most strongly associated SNP (s26449) was also the closest to the MC1R gene. The finding was strongly supported by the permutation tree-based random forest (RF) analysis. Importantly, GWAS identified unlinked SNP with only slightly lower p-values than for s26449. Random forest analysis indicated these were false positives, suggesting interpretation based on both approaches was beneficial. The results indicate that a combined analytical approach can be successful in studies where a modest number of animals are available and substantial population stratification exists.
Resumo:
The considerable search for synergistic agents in cancer research is motivated by the therapeutic benefits achieved by combining anti-cancer agents. Synergistic agents make it possible to reduce dosage while maintaining or enhancing a desired effect. Other favorable outcomes of synergistic agents include reduction in toxicity and minimizing or delaying drug resistance. Dose-response assessment and drug-drug interaction analysis play an important part in the drug discovery process, however analysis are often poorly done. This dissertation is an effort to notably improve dose-response assessment and drug-drug interaction analysis. The most commonly used method in published analysis is the Median-Effect Principle/Combination Index method (Chou and Talalay, 1984). The Median-Effect Principle/Combination Index method leads to inefficiency by ignoring important sources of variation inherent in dose-response data and discarding data points that do not fit the Median-Effect Principle. Previous work has shown that the conventional method yields a high rate of false positives (Boik, Boik, Newman, 2008; Hennessey, Rosner, Bast, Chen, 2010) and, in some cases, low power to detect synergy. There is a great need for improving the current methodology. We developed a Bayesian framework for dose-response modeling and drug-drug interaction analysis. First, we developed a hierarchical meta-regression dose-response model that accounts for various sources of variation and uncertainty and allows one to incorporate knowledge from prior studies into the current analysis, thus offering a more efficient and reliable inference. Second, in the case that parametric dose-response models do not fit the data, we developed a practical and flexible nonparametric regression method for meta-analysis of independently repeated dose-response experiments. Third, and lastly, we developed a method, based on Loewe additivity that allows one to quantitatively assess interaction between two agents combined at a fixed dose ratio. The proposed method makes a comprehensive and honest account of uncertainty within drug interaction assessment. Extensive simulation studies show that the novel methodology improves the screening process of effective/synergistic agents and reduces the incidence of type I error. We consider an ovarian cancer cell line study that investigates the combined effect of DNA methylation inhibitors and histone deacetylation inhibitors in human ovarian cancer cell lines. The hypothesis is that the combination of DNA methylation inhibitors and histone deacetylation inhibitors will enhance antiproliferative activity in human ovarian cancer cell lines compared to treatment with each inhibitor alone. By applying the proposed Bayesian methodology, in vitro synergy was declared for DNA methylation inhibitor, 5-AZA-2'-deoxycytidine combined with one histone deacetylation inhibitor, suberoylanilide hydroxamic acid or trichostatin A in the cell lines HEY and SKOV3. This suggests potential new epigenetic therapies in cell growth inhibition of ovarian cancer cells.
Resumo:
High-throughput assays, such as yeast two-hybrid system, have generated a huge amount of protein-protein interaction (PPI) data in the past decade. This tremendously increases the need for developing reliable methods to systematically and automatically suggest protein functions and relationships between them. With the available PPI data, it is now possible to study the functions and relationships in the context of a large-scale network. To data, several network-based schemes have been provided to effectively annotate protein functions on a large scale. However, due to those inherent noises in high-throughput data generation, new methods and algorithms should be developed to increase the reliability of functional annotations. Previous work in a yeast PPI network (Samanta and Liang, 2003) has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional associations between proteins, and hence suggest their functions. One advantage of the work is that their algorithm is not sensitive to noises (false positives) in high-throughput PPI data. In this study, we improved their prediction scheme by developing a new algorithm and new methods which we applied on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting functionally associated proteins. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as independent and unbiased benchmarks to evaluate our algorithms and methods within the human PPI network. We showed that, compared with the previous work from Samanta and Liang, our algorithm and methods developed in this study improved the overall quality of functional inferences for human proteins. By applying the algorithms to the human PPI network, we obtained 4,233 significant functional associations among 1,754 proteins. Further comparisons of their KEGG and GO annotations allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made pathway analysis to identify several subclusters that are highly enriched in certain signaling pathways. Particularly, we performed a detailed analysis on a subcluster enriched in the transforming growth factor β signaling pathway (P<10-50) which is important in cell proliferation and tumorigenesis. Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotations in this post-genomic era.
Resumo:
In search of transmittable epigenetic marks we investigated gene expression in testes and sperm cells of differentially fed F0 boars from a three generation pig feeding experiment that showed phenotypic differences in the F2 generation. RNA samples from 8 testes of boars that received either a diet enriched in methylating micronutrients or a control diet were analyzed by microarray analysis. We found moderate differential expression between testes of differentially fed boars with a high FDR of 0.82 indicating that most of the differentially expressed genes were false positives. Nevertheless, we performed a pathway analysis and found disparate pathway maps of development_A2B receptor: action via G-protein alpha s, cell adhesion_Tight junctions and cell adhesion_Endothelial cell contacts by junctional mechanisms which show inconclusive relation to epigenetic inheritance. Four RNA samples from sperm cells of these differentially fed boars were analyzed by RNA-Seq methodology. We found no differential gene expression in sperm cells of the two groups (adjusted P-value>0.05). Nevertheless, we also explored gene expression in sperm by a pathway analysis showing that genes were enriched for the pathway maps of bacterial infections in cystic fibrosis (CF) airways, glycolysis and gluconeogenesis p.3 and cell cycle_Initiation of mitosis. Again, these pathway maps are miscellaneous without an obvious relationship to epigenetic inheritance. It is concluded that the methylating micronutrients moderately if at all affects RNA expression in testes of differentially fed boars. Furthermore, gene expression in sperm cells is not significantly affected by extensive supplementation of methylating micronutrients and thus RNA molecules could not be established as the epigenetic mark in this feeding experiment.
Resumo:
Dynamically typed languages lack information about the types of variables in the source code. Developers care about this information as it supports program comprehension. Ba- sic type inference techniques are helpful, but may yield many false positives or negatives. We propose to mine information from the software ecosys- tem on how frequently given types are inferred unambigu- ously to improve the quality of type inference for a single system. This paper presents an approach to augment existing type inference techniques by supplementing the informa- tion available in the source code of a project with data from other projects written in the same language. For all available projects, we track how often messages are sent to instance variables throughout the source code. Predictions for the type of a variable are made based on the messages sent to it. The evaluation of a proof-of-concept prototype shows that this approach works well for types that are sufficiently popular, like those from the standard librarie, and tends to create false positives for unpopular or domain specific types. The false positives are, in most cases, fairly easily identifiable. Also, the evaluation data shows a substantial increase in the number of correctly inferred types when compared to the non-augmented type inference.
Resumo:
AIMS A non-invasive gene-expression profiling (GEP) test for rejection surveillance of heart transplant recipients originated in the USA. A European-based study, Cardiac Allograft Rejection Gene Expression Observational II Study (CARGO II), was conducted to further clinically validate the GEP test performance. METHODS AND RESULTS Blood samples for GEP testing (AlloMap(®), CareDx, Brisbane, CA, USA) were collected during post-transplant surveillance. The reference standard for rejection status was based on histopathology grading of tissue from endomyocardial biopsy. The area under the receiver operating characteristic curve (AUC-ROC), negative (NPVs), and positive predictive values (PPVs) for the GEP scores (range 0-39) were computed. Considering the GEP score of 34 as a cut-off (>6 months post-transplantation), 95.5% (381/399) of GEP tests were true negatives, 4.5% (18/399) were false negatives, 10.2% (6/59) were true positives, and 89.8% (53/59) were false positives. Based on 938 paired biopsies, the GEP test score AUC-ROC for distinguishing ≥3A rejection was 0.70 and 0.69 for ≥2-6 and >6 months post-transplantation, respectively. Depending on the chosen threshold score, the NPV and PPV range from 98.1 to 100% and 2.0 to 4.7%, respectively. CONCLUSION For ≥2-6 and >6 months post-transplantation, CARGO II GEP score performance (AUC-ROC = 0.70 and 0.69) is similar to the CARGO study results (AUC-ROC = 0.71 and 0.67). The low prevalence of ACR contributes to the high NPV and limited PPV of GEP testing. The choice of threshold score for practical use of GEP testing should consider overall clinical assessment of the patient's baseline risk for rejection.
Resumo:
Because of its simplicity and low cost, arm circumference (AC) is being used increasingly in screening for protein energy malnutrition among pre-school children in many parts of the developing world, especially where minimally trained health workers are employed. The objectives of this study were as follows: (1) To determine the relationship of the AC measure with weight for age and weight for height in the detection of malnutrition among pre-school children in a Guatemalan Indian village. (2) To determine the performance of minimally trained promoters under field conditions in measuring AC, weight and height. (3) To describe the practical aspects of taking AC measures versus weight, age and height.^ The study was conducted in San Pablo La Laguna, one of four villages situated on the shores of Lake Atitlan, Guatemala, in which a program of simplified medical care was implemented by the Institute for Nutrition for Central America and Panama (INCAP). Weight, height, AC and age data were collected for 144 chronically malnourished children. The measurements obtained by the trained investigator under the controlled conditions of the health post were correlated against one another and AC was found to have a correlation with weight for age of 0.7127 and with weight for height of 0.7911, both well within the 0.65 to 0.80 range reported in the literature. False positive and false negative analysis showed that AC was more sensitive when compared with weight for height than with weight for age. This was fortunate since, especially in areas with widespread chronic malnutrition, weight for height detects those acute cases in immediate danger of complicating illness or death. Moreover, most of the cases identified as malnourished by AC, but not by weight for height (false positives), were either young or very stunted which made their selection by AC better than weight for height. The large number of cases detected by weight for age, but not by AC (false negative rate--40%) were, however, mostly beyond the critical age period and had normal weight for heights.^ The performance of AC, weight for height and weight for age under field conditions in the hands of minimally trained health workers was also analyzed by correlating these measurements against the same criterion measurements taken under ideally controlled conditions of the health post. AC had the highest correlation with itself indicating that it deteriorated the least in the move to the field. Moreover, there was a high correlation between AC in the field and criterion weight for height (0.7509); this correlation was almost as high as that for field weight for height versus the same measure in the health post (0.7588). The implication is that field errors are so great for the compounded weight for height variable that, in the field, AC is about as good a predictor of the ideal weight for height measure.^ Minimally trained health workers made more errors than the investigator as exemplified by their lower intra-observer correlation coefficients. They consistently measured larger than the investigator for all measures. Also there was a great deal of variability between these minimally trained workers indicating that careful training and followup is necessary for the success of the AC measure.^ AC has many practical advantages compared to the other anthropometric tools. It does not require age data, which are often unreliable in these settings, and does not require sophisticated subtraction and two dimensional table-handling skills that weight for age and weight for height require. The measure is also more easily applied with less disturbance to the child and the community. The AC tape is cheap and not easily damaged or jarred out of calibration while being transported in rugged settings, as is often the case with weight scales. Moreover, it can be kept in a health worker's pocket at all times for continual use in a widespread range of settings. ^
Resumo:
Social desirability and the fear of sanctions can deter survey respondents from responding truthfully to sensitive questions. Self-reports on norm breaking behavior such as shoplifting, non-voting, or tax evasion may therefore be subject to considerable misreporting. To mitigate such misreporting, various indirect techniques for asking sensitive questions, such as the randomized response technique (RRT), have been proposed in the literature. In our study, we evaluate the viability of several variants of the RRT, including the recently proposed crosswise-model RRT, by comparing respondents’ self-reports on cheating in dice games to actual cheating behavior, thereby distinguishing between false negatives (underreporting) and false positives (overreporting). The study has been implemented as an online survey on Amazon Mechanical Turk (N = 6,505). Our results indicate that the forced-response RRT and the unrelated-question RRT, as implemented in our survey, fail to reduce the level of misreporting compared to conventional direct questioning. For the crosswise-model RRT, we do observe a reduction of false negatives (that is, an increase in the proportion of cheaters who admit having cheated). At the same time, however, there is an increase in false positives (that is, an increase in non-cheaters who falsely admit having cheated). Overall, our findings suggest that none of the implemented sensitive questions techniques substantially outperforms direct questioning. Furthermore, our study demonstrates the importance of distinguishing false negatives and false positives when evaluating the validity of sensitive question techniques.
Resumo:
Is Benford's law a good instrument to detect fraud in reports of statistical and scientific data? For a valid test the probability of "false positives" and "false negatives" has to be low. However, it is very doubtful whether the Benford distribution is an appropriate tool to discriminate between manipulated and non-manipulated estimates. Further research should focus more on the validity of the test and test results should be interpreted more carefully.