25 resultados para false positive rates

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent studies have indicated that research practices in psychology may be susceptible to factors that increase false-positive rates, raising concerns about the possible prevalence of false-positive findings. The present article discusses several practices that may run counter to the inflation of false-positive rates. Taking these practices into account would lead to a more balanced view on the false-positive issue. Specifically, we argue that an inflation of false-positive rates would diminish, sometimes to a substantial degree, when researchers (a) have explicit a priori theoretical hypotheses, (b) include multiple replication studies in a single paper, and (c) collect additional data based on observed results. We report findings from simulation studies and statistical evidence that support these arguments. Being aware of these preventive factors allows researchers not to overestimate the pervasiveness of false-positives in psychology and to gauge the susceptibility of a paper to possible false-positives in practical and fair ways.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Bloom filter is a space efficient randomized data structure for representing a set and supporting membership queries. Bloom filters intrinsically allow false positives. However, the space savings they offer outweigh the disadvantage if the false positive rates are kept sufficiently low. Inspired by the recent application of the Bloom filter in a novel multicast forwarding fabric, this paper proposes a variant of the Bloom filter, the optihash. The optihash introduces an optimization for the false positive rate at the stage of Bloom filter formation using the same amount of space at the cost of slightly more processing than the classic Bloom filter. Often Bloom filters are used in situations where a fixed amount of space is a primary constraint. We present the optihash as a good alternative to Bloom filters since the amount of space is the same and the improvements in false positives can justify the additional processing. Specifically, we show via simulations and numerical analysis that using the optihash the false positives occurrences can be reduced and controlled at a cost of small additional processing. The simulations are carried out for in-packet forwarding. In this framework, the Bloom filter is used as a compact link/route identifier and it is placed in the packet header to encode the route. At each node, the Bloom filter is queried for membership in order to make forwarding decisions. A false positive in the forwarding decision is translated into packets forwarded along an unintended outgoing link. By using the optihash, false positives can be reduced. The optimization processing is carried out in an entity termed the Topology Manger which is part of the control plane of the multicast forwarding fabric. This processing is only carried out on a per-session basis, not for every packet. The aim of this paper is to present the optihash and evaluate its false positive performances via simulations in order to measure the influence of different parameters on the false positive rate. The false positive rate for the optihash is then compared with the false positive probability of the classic Bloom filter.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy potentials derived from threading into a single score representing the relationship between two proteins, as designated by CATH. The improved version incorporates PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from FSSP, and now also makes use of PSIPRED predicted secondary structure and bi-directional scoring in order to calculate the final alignment score. Pairwise potentials and solvation potentials are calculated from the given sequence alignment which are then used as inputs to a multi-layer, feed-forward neural network, along with the alignment score, alignment length and sequence length. The neural network has also been expanded to accommodate the secondary structure element alignment (SSEA) score as an extra input and it is now trained to learn the FSSP Z-score as a measurement of similarity between two proteins. Results: The improvements made to GenTHREADER increase the number of remote homologues that can be detected with a low error rate, implying higher reliability of score, whilst also increasing the quality of the models produced. We find that up to five times as many true positives can be detected with low error rate per query. Total MaxSub score is doubled at low false positive rates using the improved method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years an increasing number of papers have employed meta-analysis to integrate effect sizes of researchers’ own series of studies within a single paper (“internal meta-analysis”). Although this approach has the obvious advantage of obtaining narrower confidence intervals, we show that it could inadvertently inflate false-positive rates if researchers are motivated to use internal meta-analysis in order to obtain a significant overall effect. Specifically, if one decides whether to stop or continue a further replication experiment depending on the significance of the results in an internal meta-analysis, false-positive rates would increase beyond the nominal level. We conducted a set of Monte-Carlo simulations to demonstrate our argument, and provided a literature review to gauge awareness and prevalence of this issue. Furthermore, we made several recommendations when using internal meta-analysis to make a judgment on statistical significance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An extensive set of machine learning and pattern classification techniques trained and tested on KDD dataset failed in detecting most of the user-to-root attacks. This paper aims to provide an approach for mitigating negative aspects of the mentioned dataset, which led to low detection rates. Genetic algorithm is employed to implement rules for detecting various types of attacks. Rules are formed of the features of the dataset identified as the most important ones for each attack type. In this way we introduce high level of generality and thus achieve high detection rates, but also gain high reduction of the system training time. Thenceforth we re-check the decision of the user-to- root rules with the rules that detect other types of attacks. In this way we decrease the false-positive rate. The model was verified on KDD 99, demonstrating higher detection rates than those reported by the state- of-the-art while maintaining low false-positive rate.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

High resolution descriptions of plant distribution have utility for many ecological applications but are especially useful for predictive modelling of gene flow from transgenic crops. Difficulty lies in the extrapolation errors that occur when limited ground survey data are scaled up to the landscape or national level. This problem is epitomized by the wide confidence limits generated in a previous attempt to describe the national abundance of riverside Brassica rapa (a wild relative of cultivated rapeseed) across the United Kingdom. Here, we assess the value of airborne remote sensing to locate B. rapa over large areas and so reduce the need for extrapolation. We describe results from flights over the river Nene in England acquired using Airborne Thematic Mapper (ATM) and Compact Airborne Spectrographic Imager (CASI) imagery, together with ground truth data. It proved possible to detect 97% of flowering B. rapa on the basis of spectral profiles. This included all stands of plants that occupied >2m square (>5 plants), which were detected using single-pixel classification. It also included very small populations (<5 flowering plants, 1-2m square) that generated mixed pixels, which were detected using spectral unmixing. The high detection accuracy for flowering B. rapa was coupled with a rather large false positive rate (43%). The latter could be reduced by using the image detections to target fieldwork to confirm species identity, or by acquiring additional remote sensing data such as laser altimetry or multitemporal imagery.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Flooding is a major hazard in both rural and urban areas worldwide, but it is in urban areas that the impacts are most severe. An investigation of the ability of high resolution TerraSAR-X Synthetic Aperture Radar (SAR) data to detect flooded regions in urban areas is described. The study uses a TerraSAR-X image of a 1 in 150 year flood near Tewkesbury, UK, in 2007, for which contemporaneous aerial photography exists for validation. The DLR SAR End-To-End simulator (SETES) was used in conjunction with airborne scanning laser altimetry (LiDAR) data to estimate regions of the image in which water would not be visible due to shadow or layover caused by buildings and taller vegetation. A semi-automatic algorithm for the detection of floodwater in urban areas is described, together with its validation using the aerial photographs. 76% of the urban water pixels visible to TerraSAR-X were correctly detected, with an associated false positive rate of 25%. If all urban water pixels were considered, including those in shadow and layover regions, these figures fell to 58% and 19% respectively. The algorithm is aimed at producing urban flood extents with which to calibrate and validate urban flood inundation models, and these findings indicate that TerraSAR-X is capable of providing useful data for this purpose.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Flooding is a major hazard in both rural and urban areas worldwide, but it is in urban areas that the impacts are most severe. An investigation of the ability of high resolution TerraSAR-X data to detect flooded regions in urban areas is described. An important application for this would be the calibration and validation of the flood extent predicted by an urban flood inundation model. To date, research on such models has been hampered by lack of suitable distributed validation data. The study uses a 3m resolution TerraSAR-X image of a 1-in-150 year flood near Tewkesbury, UK, in 2007, for which contemporaneous aerial photography exists for validation. The DLR SETES SAR simulator was used in conjunction with airborne LiDAR data to estimate regions of the TerraSAR-X image in which water would not be visible due to radar shadow or layover caused by buildings and taller vegetation, and these regions were masked out in the flood detection process. A semi-automatic algorithm for the detection of floodwater was developed, based on a hybrid approach. Flooding in rural areas adjacent to the urban areas was detected using an active contour model (snake) region-growing algorithm seeded using the un-flooded river channel network, which was applied to the TerraSAR-X image fused with the LiDAR DTM to ensure the smooth variation of heights along the reach. A simpler region-growing approach was used in the urban areas, which was initialized using knowledge of the flood waterline in the rural areas. Seed pixels having low backscatter were identified in the urban areas using supervised classification based on training areas for water taken from the rural flood, and non-water taken from the higher urban areas. Seed pixels were required to have heights less than a spatially-varying height threshold determined from nearby rural waterline heights. Seed pixels were clustered into urban flood regions based on their close proximity, rather than requiring that all pixels in the region should have low backscatter. This approach was taken because it appeared that urban water backscatter values were corrupted in some pixels, perhaps due to contributions from side-lobes of strong reflectors nearby. The TerraSAR-X urban flood extent was validated using the flood extent visible in the aerial photos. It turned out that 76% of the urban water pixels visible to TerraSAR-X were correctly detected, with an associated false positive rate of 25%. If all urban water pixels were considered, including those in shadow and layover regions, these figures fell to 58% and 19% respectively. These findings indicate that TerraSAR-X is capable of providing useful data for the calibration and validation of urban flood inundation models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the largest uncertainties in quantifying the impact of aviation on climate concerns the formation and spreading of persistent contrails. The inclusion of a cloud scheme that allows for ice supersaturation into the integrated forecast system (IFS) of the European Centre for Medium-Range Weather Forecasts (ECMWF) can be a useful tool to help reduce these uncertainties. This study evaluates the quality of the ECMWF forecasts with respect to ice super saturation in the upper troposphere by comparing them to visual observations of persistent contrails and radiosonde measurements of ice supersaturation over England. The performance of 1- to 3-day forecasts is compared including also the vertical accuracy of the supersaturation forecasts. It is found that the operational forecasts from the ECMWF are able to predict cold ice supersaturated regions very well. For the best cases Peirce skill scores of 0.7 are obtained, with hit rates at times exceeding 80% and false-alarm rates below 20%. Results are very similar for comparisons with visual observations and radiosonde measurements, the latter providing the better statistical significance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It is generally accepted that genetics may be an important factor in explaining the variation between patients’ responses to certain drugs. However, identification and confirmation of the responsible genetic variants is proving to be a challenge in many cases. A number of difficulties that maybe encountered in pursuit of these variants, such as non-replication of a true effect, population structure and selection bias, can be mitigated or at least reduced by appropriate statistical methodology. Another major statistical challenge facing pharmacogenetics studies is trying to detect possibly small polygenic effects using large volumes of genetic data, while controlling the number of false positive signals. Here we review statistical design and analysis options available for investigations of genetic resistance to anti-epileptic drugs.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

When assessing hypotheses, the possibility and consequences of false-positive conclusions should be considered along with the avoidance of false-negative ones. A recent assessment of the system of rice intensification (SRI) by McDonald et al. [McDonald, A.J., Hobbs, P.R., Riha, S.J., 2006. Does the system of rice intensification outperform conventional best management? A synopsis of the empirical record. Field Crops Res. 96, 31-36] provides a good example where this was not done as it was preoccupied with avoiding false-positives only. It concluded, based on a desk study using secondary data assembled selectively from diverse sources and with a 95% level of confidence, that 'best management practices' (BMPs) on average produce 11% higher rice yields than SRI methods, and that, therefore, SRI has little to offer beyond what is already known by scientists.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

High resolution descriptions of plant distribution have utility for many ecological applications but are especially useful for predictive modeling of gene flow from transgenic crops. Difficulty lies in the extrapolation errors that occur when limited ground survey data are scaled up to the landscape or national level. This problem is epitomized by the wide confidence limits generated in a previous attempt to describe the national abundance of riverside Brassica rapa (a wild relative of cultivated rapeseed) across the United Kingdom. Here, we assess the value of airborne remote sensing to locate B. rapa over large areas and so reduce the need for extrapolation. We describe results from flights over the river Nene in England acquired using Airborne Thematic Mapper (ATM) and Compact Airborne Spectrographic Imager (CASI) imagery, together with ground truth data. It proved possible to detect 97% of flowering B. rapa on the basis of spectral profiles. This included all stands of plants that occupied >2m square (>5 plants), which were detected using single-pixel classification. It also included very small populations (<5 flowering plants, 1-2m square) that generated mixed pixels, which were detected using spectral unmixing. The high detection accuracy for flowering B. rapa was coupled with a rather large false positive rate (43%). The latter could be reduced by using the image detections to target fieldwork to confirm species identity, or by acquiring additional remote sensing data such as laser altimetry or multitemporal imagery.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a new face verification algorithm based on Gabor wavelets and AdaBoost. In the algorithm, faces are represented by Gabor wavelet features generated by Gabor wavelet transform. Gabor wavelets with 5 scales and 8 orientations are chosen to form a family of Gabor wavelets. By convolving face images with these 40 Gabor wavelets, the original images are transformed into magnitude response images of Gabor wavelet features. The AdaBoost algorithm selects a small set of significant features from the pool of the Gabor wavelet features. Each feature is the basis for a weak classifier which is trained with face images taken from the XM2VTS database. The feature with the lowest classification error is selected in each iteration of the AdaBoost operation. We also address issues regarding computational costs in feature selection with AdaBoost. A support vector machine (SVM) is trained with examples of 20 features, and the results have shown a low false positive rate and a low classification error rate in face verification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We have discovered a novel approach of intrusion detection system using an intelligent data classifier based on a self organizing map (SOM). We have surveyed all other unsupervised intrusion detection methods, different alternative SOM based techniques and KDD winner IDS methods. This paper provides a robust designed and implemented intelligent data classifier technique based on a single large size (30x30) self organizing map (SOM) having the capability to detect all types of attacks given in the DARPA Archive 1999 the lowest false positive rate being 0.04 % and higher detection rate being 99.73% tested using full KDD data sets and 89.54% comparable detection rate and 0.18% lowest false positive rate tested using corrected data sets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A near real-time flood detection algorithm giving a synoptic overview of the extent of flooding in both urban and rural areas, and capable of working during night-time and day-time even if cloud was present, could be a useful tool for operational flood relief management. The paper describes an automatic algorithm using high resolution Synthetic Aperture Radar (SAR) satellite data that builds on existing approaches, including the use of image segmentation techniques prior to object classification to cope with the very large number of pixels in these scenes. Flood detection in urban areas is guided by the flood extent derived in adjacent rural areas. The algorithm assumes that high resolution topographic height data are available for at least the urban areas of the scene, in order that a SAR simulator may be used to estimate areas of radar shadow and layover. The algorithm proved capable of detecting flooding in rural areas using TerraSAR-X with good accuracy, classifying 89% of flooded pixels correctly, with an associated false positive rate of 6%. Of the urban water pixels visible to TerraSAR-X, 75% were correctly detected, with a false positive rate of 24%. If all urban water pixels were considered, including those in shadow and layover regions, these figures fell to 57% and 18% respectively.