948 resultados para error rates
Resumo:
This paper addresses the problem of maximum margin classification given the moments of class conditional densities and the false positive and false negative error rates. Using Chebyshev inequalities, the problem can be posed as a second order cone programming problem. The dual of the formulation leads to a geometric optimization problem, that of computing the distance between two ellipsoids, which is solved by an iterative algorithm. The formulation is extended to non-linear classifiers using kernel methods. The resultant classifiers are applied to the case of classification of unbalanced datasets with asymmetric costs for misclassification. Experimental results on benchmark datasets show the efficacy of the proposed method.
Resumo:
Next-generation sequencing (NGS) is a valuable tool for the detection and quantification of HIV-1 variants in vivo. However, these technologies require detailed characterization and control of artificially induced errors to be applicable for accurate haplotype reconstruction. To investigate the occurrence of substitutions, insertions, and deletions at the individual steps of RT-PCR and NGS, 454 pyrosequencing was performed on amplified and non-amplified HIV-1 genomes. Artificial recombination was explored by mixing five different HIV-1 clonal strains (5-virus-mix) and applying different RT-PCR conditions followed by 454 pyrosequencing. Error rates ranged from 0.04-0.66% and were similar in amplified and non-amplified samples. Discrepancies were observed between forward and reverse reads, indicating that most errors were introduced during the pyrosequencing step. Using the 5-virus-mix, non-optimized, standard RT-PCR conditions introduced artificial recombinants in a fraction of at least 30% of the reads that subsequently led to an underestimation of true haplotype frequencies. We minimized the fraction of recombinants down to 0.9-2.6% by optimized, artifact-reducing RT-PCR conditions. This approach enabled correct haplotype reconstruction and frequency estimations consistent with reference data obtained by single genome amplification. RT-PCR conditions are crucial for correct frequency estimation and analysis of haplotypes in heterogeneous virus populations. We developed an RT-PCR procedure to generate NGS data useful for reliable haplotype reconstruction and quantification.
Resumo:
The use of presence/absence data in wildlife management and biological surveys is widespread. There is a growing interest in quantifying the sources of error associated with these data. We show that false-negative errors (failure to record a species when in fact it is present) can have a significant impact on statistical estimation of habitat models using simulated data. Then we introduce an extension of logistic modeling, the zero-inflated binomial (ZIB) model that permits the estimation of the rate of false-negative errors and the correction of estimates of the probability of occurrence for false-negative errors by using repeated. visits to the same site. Our simulations show that even relatively low rates of false negatives bias statistical estimates of habitat effects. The method with three repeated visits eliminates the bias, but estimates are relatively imprecise. Six repeated visits improve precision of estimates to levels comparable to that achieved with conventional statistics in the absence of false-negative errors In general, when error rates are less than or equal to50% greater efficiency is gained by adding more sites, whereas when error rates are >50% it is better to increase the number of repeated visits. We highlight the flexibility of the method with three case studies, clearly demonstrating the effect of false-negative errors for a range of commonly used survey methods.
Resumo:
Classifier selection is a problem encountered by multi-biometric systems that aim to improve performance through fusion of decisions. A particular decision fusion architecture that combines multiple instances (n classifiers) and multiple samples (m attempts at each classifier) has been proposed in previous work to achieve controlled trade-off between false alarms and false rejects. Although analysis on text-dependent speaker verification has demonstrated better performance for fusion of decisions with favourable dependence compared to statistically independent decisions, the performance is not always optimal. Given a pool of instances, best performance with this architecture is obtained for certain combination of instances. Heuristic rules and diversity measures have been commonly used for classifier selection but it is shown that optimal performance is achieved for the `best combination performance' rule. As the search complexity for this rule increases exponentially with the addition of classifiers, a measure - the sequential error ratio (SER) - is proposed in this work that is specifically adapted to the characteristics of sequential fusion architecture. The proposed measure can be used to select a classifier that is most likely to produce a correct decision at each stage. Error rates for fusion of text-dependent HMM based speaker models using SER are compared with other classifier selection methodologies. SER is shown to achieve near optimal performance for sequential fusion of multiple instances with or without the use of multiple samples. The methodology applies to multiple speech utterances for telephone or internet based access control and to other systems such as multiple finger print and multiple handwriting sample based identity verification systems.
Resumo:
After attending this presentation, attendees will gain awareness of: (1) the error and uncertainty associated with the application of the Suchey-Brooks (S-B) method of age estimation of the pubic symphysis to a contemporary Australian population; (2) the implications of sexual dimorphism and bilateral asymmetry of the pubic symphysis through preliminary geometric morphometric assessment; and (3) the value of three-dimensional (3D) autopsy data acquisition for creating forensic anthropological standards. This presentation will impact the forensic science community by demonstrating that, in the absence of demographically sound skeletal collections, post-mortem autopsy data provides an exciting platform for the construction of large contemporary ‘virtual osteological libraries’ for which forensic anthropological research can be conducted on Australian individuals. More specifically, this study assesses the applicability and accuracy of the S-B method to a contemporary adult population in Queensland, Australia, and using a geometric morphometric approach, provides an insight to the age-related degeneration of the pubic symphysis. Despite the prominent use of the Suchey-Brooks (1990) method of age estimation in forensic anthropological practice, it is subject to intrinsic limitations, with reports of differential inter-population error rates between geographical locations1-4. Australian forensic anthropology is constrained by a paucity of population specific standards due to a lack of repositories of documented skeletons. Consequently, in Australian casework proceedings, standards constructed from predominately American reference samples are applied to establish a biological profile. In the global era of terrorism and natural disasters, more specific population standards are required to improve the efficiency of medico-legal death investigation in Queensland. The sample comprises multi-slice computed tomography (MSCT) scans of the pubic symphysis (slice thickness: 0.5mm, overlap: 0.1mm) on 195 individuals of caucasian ethnicity aged 15-70 years. Volume rendering reconstruction of the symphyseal surface was conducted in Amira® (v.4.1) and quantitative analyses in Rapidform® XOS. The sample was divided into ten-year age sub-sets (eg. 15-24) with a final sub-set of 65-70 years. Error with respect to the method’s assigned means were analysed on the basis of bias (directionality of error), inaccuracy (magnitude of error) and percentage correct classification of left and right symphyseal surfaces. Morphometric variables including surface area, circumference, maximum height and width of the symphyseal surface and micro-architectural assessment of cortical and trabecular bone composition were quantified using novel automated engineering software capabilities. The results of this study demonstrated correct age classification utilizing the mean and standard deviations of each phase of the S-B method of 80.02% and 86.18% in Australian males and females, respectively. Application of the S-B method resulted in positive biases and mean inaccuracies of 7.24 (±6.56) years for individuals less than 55 years of age, compared to negative biases and mean inaccuracies of 5.89 (±3.90) years for individuals greater than 55 years of age. Statistically significant differences between chronological and S-B mean age were demonstrated in 83.33% and 50% of the six age subsets in males and females, respectively. Asymmetry of the pubic symphysis was a frequent phenomenon with 53.33% of the Queensland population exhibiting statistically significant (χ2 - p<0.01) differential phase classification of left and right surfaces of the same individual. Directionality was found in bilateral asymmetry, with the right symphyseal faces being slightly older on average and providing more accurate estimates using the S-B method5. Morphometric analysis verified these findings, with the left surface exhibiting significantly greater circumference and surface area than the right (p<0.05). Morphometric analysis demonstrated an increase in maximum height and width of the surface with age, with most significant changes (p<0.05) occurring between the 25-34 and 55-64 year age subsets. These differences may be attributed to hormonal components linked to menopause in females and a reduction in testosterone in males. Micro-architectural analysis demonstrated degradation of cortical composition with age, with differential bone resorption between the medial, ventral and dorsal surfaces of the pubic symphysis. This study recommends that the S-B method be applied with caution in medico-legal death investigations of unknown skeletal remains in Queensland. Age estimation will always be accompanied by error; therefore this study demonstrates the potential for quantitative morphometric modelling of age related changes of the pubic symphysis as a tool for methodological refinement, providing a rigor and robust assessment to remove the subjectivity associated with current pelvic aging methods.
Resumo:
So far, most Phase II trials have been designed and analysed under a frequentist framework. Under this framework, a trial is designed so that the overall Type I and Type II errors of the trial are controlled at some desired levels. Recently, a number of articles have advocated the use of Bavesian designs in practice. Under a Bayesian framework, a trial is designed so that the trial stops when the posterior probability of treatment is within certain prespecified thresholds. In this article, we argue that trials under a Bayesian framework can also be designed to control frequentist error rates. We introduce a Bayesian version of Simon's well-known two-stage design to achieve this goal. We also consider two other errors, which are called Bayesian errors in this article because of their similarities to posterior probabilities. We show that our method can also control these Bayesian-type errors. We compare our method with other recent Bayesian designs in a numerical study and discuss implications of different designs on error rates. An example of a clinical trial for patients with nasopharyngeal carcinoma is used to illustrate differences of the different designs.
Resumo:
With technology scaling, vulnerability to soft errors in random logic is increasing. There is a need for on-line error detection and protection for logic gates even at sea level. The error checker is the key element for an on-line detection mechanism. We compare three different checkers for error detection from the point of view of area, power and false error detection rates. We find that the double sampling checker (used in Razor), is the simplest and most area and power efficient, but suffers from very high false detection rates of 1.15 times the actual error rates. We also find that the alternate approaches of triple sampling and integrate and sample method (I&S) can be designed to have zero false detection rates, but at an increased area, power and implementation complexity. The triple sampling method has about 1.74 times the area and twice the power as compared to the Double Sampling method and also needs a complex clock generation scheme. The I&S method needs about 16% more power with 0.58 times the area as double sampling, but comes with more stringent implementation constraints as it requires detection of small voltage swings.
Resumo:
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Resumo:
In the IEEE 802.11 MAC layer protocol, there are different trade-off points between the number of nodes competing for the medium and the network capacity provided to them. There is also a trade-off between the wireless channel condition during the transmission period and the energy consumption of the nodes. Current approaches at modeling energy consumption in 802.11 based networks do not consider the influence of the channel condition on all types of frames (control and data) in the WLAN. Nor do they consider the effect on the different MAC and PHY schemes that can occur in 802.11 networks. In this paper, we investigate energy consumption corresponding to the number of competing nodes in IEEE 802.11's MAC and PHY layers in error-prone wireless channel conditions, and present a new energy consumption model. Analysis of the power consumed by each type of MAC and PHY over different bit error rates shows that the parameters in these layers play a critical role in determining the overall energy consumption of the ad-hoc network. The goal of this research is not only to compare the energy consumption using exact formulae in saturated IEEE 802.11-based DCF networks under varying numbers of competing nodes, but also, as the results show, to demonstrate that channel errors have a significant impact on the energy consumption.
Resumo:
Multiuser diversity gain has been investigated well in terms of a system capacity formulation in the literature. In practice, however, designs on multiuser systems with nonzero error rates require a relationship between the error rates and the number of users within a cell. Considering a best-user scheduling, where the user with the best channel condition is scheduled to transmit per scheduling interval, our focus is on the uplink. We assume that each user communicates with the base station through a single-input multiple-output channel. We derive a closed-form expression for the average BER, and analyze how the average BER goes to zero asymptotically as the number of users increases for a given SNR. Note that the analysis of average BER even in SI SO multiuser diversity systems has not been done with respect to the number of users for a given SNR. Our analysis can be applied to multiuser diversity systems with any number of antennas.
Resumo:
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.
Resumo:
While channel coding is a standard method of improving a system’s energy efficiency in digital communications, its practice does not extend to high-speed links. Increasing demands in network speeds are placing a large burden on the energy efficiency of high-speed links and render the benefit of channel coding for these systems a timely subject. The low error rates of interest and the presence of residual intersymbol interference (ISI) caused by hardware constraints impede the analysis and simulation of coded high-speed links. Focusing on the residual ISI and combined noise as the dominant error mechanisms, this paper analyses error correlation through concepts of error region, channel signature, and correlation distance. This framework provides a deeper insight into joint error behaviours in high-speed links, extends the range of statistical simulation for coded high-speed links, and provides a case against the use of biased Monte Carlo methods in this setting
Resumo:
Background: Medication errors are an important cause of morbidity and mortality in primary care. The aims of this study are to determine the effectiveness, cost effectiveness and acceptability of a pharmacist-led information-technology-based complex intervention compared with simple feedback in reducing proportions of patients at risk from potentially hazardous prescribing and medicines management in general (family) practice. Methods: Research subject group: "At-risk" patients registered with computerised general practices in two geographical regions in England. Design: Parallel group pragmatic cluster randomised trial. Interventions: Practices will be randomised to either: (i) Computer-generated feedback; or (ii) Pharmacist-led intervention comprising of computer-generated feedback, educational outreach and dedicated support. Primary outcome measures: The proportion of patients in each practice at six and 12 months post intervention: - with a computer-recorded history of peptic ulcer being prescribed non-selective non-steroidal anti-inflammatory drugs - with a computer-recorded diagnosis of asthma being prescribed beta-blockers - aged 75 years and older receiving long-term prescriptions for angiotensin converting enzyme inhibitors or loop diuretics without a recorded assessment of renal function and electrolytes in the preceding 15 months. Secondary outcome measures; These relate to a number of other examples of potentially hazardous prescribing and medicines management. Economic analysis: An economic evaluation will be done of the cost per error avoided, from the perspective of the UK National Health Service (NHS), comparing the pharmacist-led intervention with simple feedback. Qualitative analysis: A qualitative study will be conducted to explore the views and experiences of health care professionals and NHS managers concerning the interventions, and investigate possible reasons why the interventions prove effective, or conversely prove ineffective. Sample size: 34 practices in each of the two treatment arms would provide at least 80% power (two-tailed alpha of 0.05) to demonstrate a 50% reduction in error rates for each of the three primary outcome measures in the pharmacist-led intervention arm compared with a 11% reduction in the simple feedback arm. Discussion: At the time of submission of this article, 72 general practices have been recruited (36 in each arm of the trial) and the interventions have been delivered. Analysis has not yet been undertaken.