941 resultados para False positives reduction
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
The impact of erroneous genotypes having passed standard quality control (QC) can be severe in genome-wide association studies, genotype imputation, and estimation of heritability and prediction of genetic risk based on single nucleotide polymorphisms (SNP). To detect such genotyping errors, a simple two-locus QC method, based on the difference in test statistic of association between single SNPs and pairs of SNPs, was developed and applied. The proposed approach could detect many problematic SNPs with statistical significance even when standard single SNP QC analyses fail to detect them in real data. Depending on the data set used, the number of erroneous SNPs that were not filtered out by standard single SNP QC but detected by the proposed approach varied from a few hundred to thousands. Using simulated data, it was shown that the proposed method was powerful and performed better than other tested existing methods. The power of the proposed approach to detect erroneous genotypes was approximately 80% for a 3% error rate per SNP. This novel QC approach is easy to implement and computationally efficient, and can lead to a better quality of genotypes for subsequent genotype-phenotype investigations.
Resumo:
The Bloom filter is a space efficient randomized data structure for representing a set and supporting membership queries. Bloom filters intrinsically allow false positives. However, the space savings they offer outweigh the disadvantage if the false positive rates are kept sufficiently low. Inspired by the recent application of the Bloom filter in a novel multicast forwarding fabric, this paper proposes a variant of the Bloom filter, the optihash. The optihash introduces an optimization for the false positive rate at the stage of Bloom filter formation using the same amount of space at the cost of slightly more processing than the classic Bloom filter. Often Bloom filters are used in situations where a fixed amount of space is a primary constraint. We present the optihash as a good alternative to Bloom filters since the amount of space is the same and the improvements in false positives can justify the additional processing. Specifically, we show via simulations and numerical analysis that using the optihash the false positives occurrences can be reduced and controlled at a cost of small additional processing. The simulations are carried out for in-packet forwarding. In this framework, the Bloom filter is used as a compact link/route identifier and it is placed in the packet header to encode the route. At each node, the Bloom filter is queried for membership in order to make forwarding decisions. A false positive in the forwarding decision is translated into packets forwarded along an unintended outgoing link. By using the optihash, false positives can be reduced. The optimization processing is carried out in an entity termed the Topology Manger which is part of the control plane of the multicast forwarding fabric. This processing is only carried out on a per-session basis, not for every packet. The aim of this paper is to present the optihash and evaluate its false positive performances via simulations in order to measure the influence of different parameters on the false positive rate. The false positive rate for the optihash is then compared with the false positive probability of the classic Bloom filter.
Resumo:
Some unexpected promiscuous inhibitors were observed in a virtual screening protocol applied to select cruzain inhibitors from the ZINC database. Physical-chemical and pharmacophore model filters were used to reduce the database size. The selected compounds were docked into the cruzain active site. Six hit compounds were tested as inhibitors. Although the compounds were designed to be nucleophilically attacked by the catalytic cysteine of cruzain, three of them showed typical promiscuous behavior, revealing that false positives are a prevalent concern in VS programs. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.
Resumo:
False-positive and false-negative values were calculated for five different designs of the trend test and it was demonstrated that a design suggested by Portier and Hoel in 1984 for a different problem produced the lowest false-positive and false-negative rates when applied to historical spontaneous tumor rate data for Fischer Rats. ^
Resumo:
This study investigated the role of contextual factors in personnel selection. Specifically, I explored if specific job factors such as the wage, training, available applicant pool and security concerns around a job, influenced personnel decisions. Additionally, I explored if the individual differences of decision makers played a role in how the previously mentioned job factors affected their decisions. A policy-capturing methodology was employed to determine the weight participants place on the job factors when selecting candidates for different jobs. Regression and correlational analyses were computed with the beta weights obtained from individual regression analyses. The results obtained from the two samples (student and general population) revealed that specific job characteristics did indeed influence personnel decisions. Participants were more concerned with making mistakes and thus less likely to accept candidates when selecting candidates for jobs having high salary and/or high training requirements.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Regulatory authorities, the food industry and the consumer demand reliable determination of chemical contaminants present in foods. A relatively new analytical technique that addresses this need is an immunobiosensor based on surface plasmon resonance (SPR) measurements. Although a range of tests have been developed to measure residues in milk, meat, animal bile and honey, a considerable problem has been encountered with both serum and plasma samples. The high degree of non-specific binding of some sample components can lead to loss of assay robustness, increased rates of false positives and general loss of assay sensitivity. In this paper we describe a straightforward precipitation technique to remove interfering substances from serum samples to be analysed for veterinary anthelmintics by SPR. This technique enabled development of an assay to detect a wide range of benzimidazole residues in serum samples by immunobiosensor. The limit of quantification was below 5 ng/ml and coefficients of variation were about 2%.
Resumo:
Identification of hot spots, also known as the sites with promise, black spots, accident-prone locations, or priority investigation locations, is an important and routine activity for improving the overall safety of roadway networks. Extensive literature focuses on methods for hot spot identification (HSID). A subset of this considerable literature is dedicated to conducting performance assessments of various HSID methods. A central issue in comparing HSID methods is the development and selection of quantitative and qualitative performance measures or criteria. The authors contend that currently employed HSID assessment criteria—namely false positives and false negatives—are necessary but not sufficient, and additional criteria are needed to exploit the ordinal nature of site ranking data. With the intent to equip road safety professionals and researchers with more useful tools to compare the performances of various HSID methods and to improve the level of HSID assessments, this paper proposes four quantitative HSID evaluation tests that are, to the authors’ knowledge, new and unique. These tests evaluate different aspects of HSID method performance, including reliability of results, ranking consistency, and false identification consistency and reliability. It is intended that road safety professionals apply these different evaluation tests in addition to existing tests to compare the performances of various HSID methods, and then select the most appropriate HSID method to screen road networks to identify sites that require further analysis. This work demonstrates four new criteria using 3 years of Arizona road section accident data and four commonly applied HSID methods [accident frequency ranking, accident rate ranking, accident reduction potential, and empirical Bayes (EB)]. The EB HSID method reveals itself as the superior method in most of the evaluation tests. In contrast, identifying hot spots using accident rate rankings performs the least well among the tests. The accident frequency and accident reduction potential methods perform similarly, with slight differences explained. The authors believe that the four new evaluation tests offer insight into HSID performance heretofore unavailable to analysts and researchers.
Resumo:
This paper presents visual detection and classification of light vehicles and personnel on a mine site.We capitalise on the rapid advances of ConvNet based object recognition but highlight that a naive black box approach results in a significant number of false positives. In particular, the lack of domain specific training data and the unique landscape in a mine site causes a high rate of errors. We exploit the abundance of background-only images to train a k-means classifier to complement the ConvNet. Furthermore, localisation of objects of interest and a reduction in computation is enabled through region proposals. Our system is tested on over 10km of real mine site data and we were able to detect both light vehicles and personnel. We show that the introduction of our background model can reduce the false positive rate by an order of magnitude.
Resumo:
GC-MS data on veterinary drug residues in bovine urine are used for controlling the illegal practice of fattening cattle. According to current detection criteria, peak patterns of preferably four ions should agree within 10 or 20% from a corresponding standard pattern. These criteria are rigid, rather arbitrary and do not match daily practice. A new model, based on multivariate modeling of log peak abundance ratios, provides a theoretical basis for the identification of analytes and optimizes the balance between the avoidance of false positives and false negatives. The performance of the model is demonstrated on data provided by five laboratories, each supplying GC-MS measurements on the detection of clenbuterol, dienestrol and 19 beta-nortestosterone in urine. The proposed model shows a better performance than confirmation by using the current criteria and provides a statistical basis for inspection criteria in terms of error probabilities.
Resumo:
The unresolved issue of false-positive D-dimer results in the diagnostic workup of pulmonary embolism Pulmonary embolism (PE) remains a difficult diagnosis as it lacks specific symptoms and clinical signs. After the determination of the pretest PE probability by a validated clinical score, D-dimers (DD) is the initial blood test in the majority of patients whose probability is low or intermediate. The low specificity of DD results in a high number of false-positives that then require thoracic angio-CT. A new clinical decision rule, called the Pulmonary Embolism Rule-out criteria (PERC), identifies patients at such low risk that PE can be safely ruled-out without a DD test. Its safety has been confirmed in US emergency departments, but retrospective European studies showed that it would lead to 5-7% of undiagnosed PE. Alternative strategies are needed to reduce the proportion of false-positive DD results.
Resumo:
Recent studies have indicated that research practices in psychology may be susceptible to factors that increase false-positive rates, raising concerns about the possible prevalence of false-positive findings. The present article discusses several practices that may run counter to the inflation of false-positive rates. Taking these practices into account would lead to a more balanced view on the false-positive issue. Specifically, we argue that an inflation of false-positive rates would diminish, sometimes to a substantial degree, when researchers (a) have explicit a priori theoretical hypotheses, (b) include multiple replication studies in a single paper, and (c) collect additional data based on observed results. We report findings from simulation studies and statistical evidence that support these arguments. Being aware of these preventive factors allows researchers not to overestimate the pervasiveness of false-positives in psychology and to gauge the susceptibility of a paper to possible false-positives in practical and fair ways.
Resumo:
The epidemiology of temporomandibular disorders varies widely in the literature. The aim of this study was to determine the prevalence of TMD in dental students of the Federal University of Rio Grande do Norte assessed by different indexes. The sample consisted of 101 individuals selected by a randomized process, whose general outline was systematic sampling. For evaluation of the signs and symptoms of TMD, an anamnestic index, Fonseca s protocol, and two clinical indexes, the RDC/TMD (Research Diagnostic Criteria for Temporomandibular Disorders), or standard index, and the Helkimo s Clinical Dysfunction Index were applied. Data were analyzed using the chi-square test and kappa, besides verifying the sensitivity and specificity (5% significance). The diagnosis of TMD by different indexes showed a variation in the prevalence between 72.3% (Helkimo s Clinical index), 64.4% (Fonseca s anamnestic index) and 35.6% (RDC/TMD). There was no statistical difference between the sexes for the RDC/TMD, although this difference was found for Fonseca s and Helkimo s indexes (p<0.05). The most frequent type of TMD were joint disorders (Groups II and III), and the subtypes disc displacement with reduction (17.8%) and arthralgia (15.8%). Most individuals showed a mild TMD (45.5%) for both indexes, Fonseca and Helkimo. When comparing the types of diagnoses, RDC/TMD with Fonseca and Helkimo, low agreement was found (k=0.17 and k= 0.35, respectively). A moderate correlation between the severity of TMD was obtained (kw= 0.53) for Fonseca s protocol and Helkimo s index. High sensitivity and low specificity were seen for both diagnoses compared to standard, resulting in excessive false positives. Within the limitations of the study, it was concluded that the prevalence of TMD can vary widely, depending on the index used for its diagnosis