65 resultados para Cryptography Statistical methods
Resumo:
This paper presents and discusses the use of Bayesian procedures - introduced through the use of Bayesian networks in Part I of this series of papers - for 'learning' probabilities from data. The discussion will relate to a set of real data on characteristics of black toners commonly used in printing and copying devices. Particular attention is drawn to the incorporation of the proposed procedures as an integral part in probabilistic inference schemes (notably in the form of Bayesian networks) that are intended to address uncertainties related to particular propositions of interest (e.g., whether or not a sample originates from a particular source). The conceptual tenets of the proposed methodologies are presented along with aspects of their practical implementation using currently available Bayesian network software.
Resumo:
As a thorough aggregation of probability and graph theory, Bayesian networks currently enjoy widespread interest as a means for studying factors that affect the coherent evaluation of scientific evidence in forensic science. Paper I of this series of papers intends to contribute to the discussion of Bayesian networks as a framework that is helpful for both illustrating and implementing statistical procedures that are commonly employed for the study of uncertainties (e.g. the estimation of unknown quantities). While the respective statistical procedures are widely described in literature, the primary aim of this paper is to offer an essentially non-technical introduction on how interested readers may use these analytical approaches - with the help of Bayesian networks - for processing their own forensic science data. Attention is mainly drawn to the structure and underlying rationale of a series of basic and context-independent network fragments that users may incorporate as building blocs while constructing larger inference models. As an example of how this may be done, the proposed concepts will be used in a second paper (Part II) for specifying graphical probability networks whose purpose is to assist forensic scientists in the evaluation of scientific evidence encountered in the context of forensic document examination (i.e. results of the analysis of black toners present on printed or copied documents).
Resumo:
This paper presents a validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images. Several image models assuming different hypotheses regarding the intensity distribution model, the spatial model and the number of classes are assessed. The methods are tested on simulated data for which the classification ground truth is known. Different noise and intensity nonuniformities are added to simulate real imaging conditions. No enhancement of the image quality is considered either before or during the classification process. This way, the accuracy of the methods and their robustness against image artifacts are tested. Classification is also performed on real data where a quantitative validation compares the methods' results with an estimated ground truth from manual segmentations by experts. Validity of the various classification methods in the labeling of the image as well as in the tissue volume is estimated with different local and global measures. Results demonstrate that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities. We also demonstrate that partial volume is not perfectly modeled, even though methods that account for mixture classes outperform methods that only consider pure Gaussian classes. Finally, we show that simulated data results can also be extended to real data.
Resumo:
The role of land cover change as a significant component of global change has become increasingly recognized in recent decades. Large databases measuring land cover change, and the data which can potentially be used to explain the observed changes, are also becoming more commonly available. When developing statistical models to investigate observed changes, it is important to be aware that the chosen sampling strategy and modelling techniques can influence results. We present a comparison of three sampling strategies and two forms of grouped logistic regression models (multinomial and ordinal) in the investigation of patterns of successional change after agricultural land abandonment in Switzerland. Results indicated that both ordinal and nominal transitional change occurs in the landscape and that the use of different sampling regimes and modelling techniques as investigative tools yield different results. Synthesis and applications. Our multimodel inference identified successfully a set of consistently selected indicators of land cover change, which can be used to predict further change, including annual average temperature, the number of already overgrown neighbouring areas of land and distance to historically destructive avalanche sites. This allows for more reliable decision making and planning with respect to landscape management. Although both model approaches gave similar results, ordinal regression yielded more parsimonious models that identified the important predictors of land cover change more efficiently. Thus, this approach is favourable where land cover change pattern can be interpreted as an ordinal process. Otherwise, multinomial logistic regression is a viable alternative.
Resumo:
For several decades mechanical properties of shallow formations (soil) obtained by sonic to ultrasonic wave testing were reported to be greater than those based on mechanical tests. The present article relying on a statistical analysis of more than 300 tests shows that elastic moduli of the soil can indeed be obtained from (ultra)sonic tests and that they are identical to those resulting from mechanical tests.
Resumo:
Oscillations have been increasingly recognized as a core property of neural responses that contribute to spontaneous, induced, and evoked activities within and between individual neurons and neural ensembles. They are considered as a prominent mechanism for information processing within and communication between brain areas. More recently, it has been proposed that interactions between periodic components at different frequencies, known as cross-frequency couplings, may support the integration of neuronal oscillations at different temporal and spatial scales. The present study details methods based on an adaptive frequency tracking approach that improve the quantification and statistical analysis of oscillatory components and cross-frequency couplings. This approach allows for time-varying instantaneous frequency, which is particularly important when measuring phase interactions between components. We compared this adaptive approach to traditional band-pass filters in their measurement of phase-amplitude and phase-phase cross-frequency couplings. Evaluations were performed with synthetic signals and EEG data recorded from healthy humans performing an illusory contour discrimination task. First, the synthetic signals in conjunction with Monte Carlo simulations highlighted two desirable features of the proposed algorithm vs. classical filter-bank approaches: resilience to broad-band noise and oscillatory interference. Second, the analyses with real EEG signals revealed statistically more robust effects (i.e. improved sensitivity) when using an adaptive frequency tracking framework, particularly when identifying phase-amplitude couplings. This was further confirmed after generating surrogate signals from the real EEG data. Adaptive frequency tracking appears to improve the measurements of cross-frequency couplings through precise extraction of neuronal oscillations.
Resumo:
Fraud is as old as Mankind. There are an enormous number of historical documents which show the interaction between truth and untruth; therefore it is not really surprising that the prevalence of publication discrepancies is increasing. More surprising is that new cases especially in the medical field generate such a huge astonishment. In financial mathematics a statistical tool for detection of fraud is known which uses the knowledge of Newcomb and Benford regarding the distribution of natural numbers. This distribution is not equal and lower numbers are more likely to be detected compared to higher ones. In this investigation all numbers contained in the blinded abstracts of the 2009 annual meeting of the Swiss Society of Anesthesia and Resuscitation (SGAR) were recorded and analyzed regarding the distribution. A manipulated abstract was also included in the investigation. The χ(2)-test was used to determine statistical differences between expected and observed counts of numbers. There was also a faked abstract integrated in the investigation. A p<0.05 was considered significant. The distribution of the 1,800 numbers in the 77 submitted abstracts followed Benford's law. The manipulated abstract was detected by statistical means (difference in expected versus observed p<0.05). Statistics cannot prove whether the content is true or not but can give some serious hints to look into the details in such conspicuous material. These are the first results of a test for the distribution of numbers presented in medical research.
Resumo:
We extend PML theory to account for information on the conditional moments up to order four, but without assuming a parametric model, to avoid a risk of misspecification of the conditional distribution. The key statistical tool is the quartic exponential family, which allows us to generalize the PML2 and QGPML1 methods proposed in Gourieroux et al. (1984) to PML4 and QGPML2 methods, respectively. An asymptotic theory is developed. The key numerical tool that we use is the Gauss-Freud integration scheme that solves a computational problem that has previously been raised in several fields. Simulation exercises demonstrate the feasibility and robustness of the methods [Authors]
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
Purpose: To evaluate the diagnostic value and image quality of CT with filtered back projection (FBP) compared with adaptive statistical iterative reconstructed images (ASIR) in body stuffers with ingested cocaine-filled packets.Methods and Materials: Twenty-nine body stuffers (mean age 31.9 years, 3 women) suspected for ingestion of cocaine-filled packets underwent routine-dose 64-row multidetector CT with FBP (120kV, pitch 1.375, 100-300 mA and automatic tube current modulation (auto mA), rotation time 0.7sec, collimation 2.5mm), secondarily reconstructed with 30 % and 60 % ASIR. In 13 (44.83%) out of the body stuffers cocaine-filled packets were detected, confirmed by exact analysis of the faecal content including verification of the number (range 1-25). Three radiologists independently and blindly evaluated anonymous CT examinations (29 FBP-CT and 68 ASIR-CT) for the presence and number of cocaine-filled packets indicating observers' confidence, and graded them for diagnostic quality, image noise, and sharpness. Sensitivity, specificity, area under the receiver operating curve (ROC) Az and interobserver agreement between the 3 radiologists for FBP-CT and ASIR-CT were calculated.Results: The increase of the percentage of ASIR significantly diminished the objective image noise (p<0.001). Overall sensitivity and specificity for the detection of the cocaine-filled packets were 87.72% and 76.15%, respectively. The difference of ROC area Az between the different reconstruction techniques was significant (p= 0.0101), that is 0.938 for FBP-CT, 0.916 for 30 % ASIR-CT, and 0.894 for 60 % ASIR-CT.Conclusion: Despite the evident image noise reduction obtained by ASIR, the diagnostic value for detecting cocaine-filled packets decreases, depending on the applied ASIR percentage.
Resumo:
Interpretability and power of genome-wide association studies can be increased by imputing unobserved genotypes, using a reference panel of individuals genotyped at higher marker density. For many markers, genotypes cannot be imputed with complete certainty, and the uncertainty needs to be taken into account when testing for association with a given phenotype. In this paper, we compare currently available methods for testing association between uncertain genotypes and quantitative traits. We show that some previously described methods offer poor control of the false-positive rate (FPR), and that satisfactory performance of these methods is obtained only by using ad hoc filtering rules or by using a harsh transformation of the trait under study. We propose new methods that are based on exact maximum likelihood estimation and use a mixture model to accommodate nonnormal trait distributions when necessary. The new methods adequately control the FPR and also have equal or better power compared to all previously described methods. We provide a fast software implementation of all the methods studied here; our new method requires computation time of less than one computer-day for a typical genome-wide scan, with 2.5 M single nucleotide polymorphisms and 5000 individuals.
Resumo:
This book gives a general view of sequence analysis, the statistical study of successions of states or events. It includes innovative contributions on life course studies, transitions into and out of employment, contemporaneous and historical careers, and political trajectories. The approach presented in this book is now central to the life-course perspective and the study of social processes more generally. This volume promotes the dialogue between approaches to sequence analysis that developed separately, within traditions contrasted in space and disciplines. It includes the latest developments in sequential concepts, coding, atypical datasets and time patterns, optimal matching and alternative algorithms, survey optimization, and visualization. Field studies include original sequential material related to parenting in 19th-century Belgium, higher education and work in Finland and Italy, family formation before and after German reunification, French Jews persecuted in occupied France, long-term trends in electoral participation, and regime democratization. Overall the book reassesses the classical uses of sequences and it promotes new ways of collecting, formatting, representing and processing them. The introduction provides basic sequential concepts and tools, as well as a history of the method. Chapters are presented in a way that is both accessible to the beginner and informative to the expert.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
The question of where retroviral DNA becomes integrated in chromosomes is important for understanding (i) the mechanisms of viral growth, (ii) devising new anti-retroviral therapy, (iii) understanding how genomes evolve, and (iv) developing safer methods for gene therapy. With the completion of genome sequences for many organisms, it has become possible to study integration targeting by cloning and sequencing large numbers of host-virus DNA junctions, then mapping the host DNA segments back onto the genomic sequence. This allows statistical analysis of the distribution of integration sites relative to the myriad types of genomic features that are also being mapped onto the sequence scaffold. Here we present methods for recovering and analyzing integration site sequences.