149 resultados para Statistical count
em Queensland University of Technology - ePrints Archive
Resumo:
At NDSS 2012, Yan et al. analyzed the security of several challenge-response type user authentication protocols against passive observers, and proposed a generic counting based statistical attack to recover the secret of some counting based protocols given a number of observed authentication sessions. Roughly speaking, the attack is based on the fact that secret (pass) objects appear in challenges with a different probability from non-secret (decoy) objects when the responses are taken into account. Although they mentioned that a protocol susceptible to this attack should minimize this difference, they did not give details as to how this can be achieved barring a few suggestions. In this paper, we attempt to fill this gap by generalizing the attack with a much more comprehensive theoretical analysis. Our treatment is more quantitative which enables us to describe a method to theoretically estimate a lower bound on the number of sessions a protocol can be safely used against the attack. Our results include 1) two proposed fixes to make counting protocols practically safe against the attack at the cost of usability, 2) the observation that the attack can be used on non-counting based protocols too as long as challenge generation is contrived, 3) and two main design principles for user authentication protocols which can be considered as extensions of the principles from Yan et al. This detailed theoretical treatment can be used as a guideline during the design of counting based protocols to determine their susceptibility to this attack. The Foxtail protocol, one of the protocols analyzed by Yan et al., is used as a representative to illustrate our theoretical and experimental results.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros
Resumo:
Now in its second edition, this book describes tools that are commonly used in transportation data analysis. The first part of the text provides statistical fundamentals while the second part presents continuous dependent variable models. With a focus on count and discrete dependent variable models, the third part features new chapters on mixed logit models, logistic regression, and ordered probability models. The last section provides additional coverage of Bayesian statistical modeling, including Bayesian inference and Markov chain Monte Carlo methods. Data sets are available online to use with the modeling techniques discussed.
Resumo:
A fundamental prerequisite of population health research is the ability to establish an accurate denominator. This in turn requires that every individual in the study population is counted. However, this seemingly simple principle has become a point of conflict between researchers whose aim is to produce evidence of disparities in population health outcomes and governments whose policies promote(intentionally or not) inequalities that are the underlying causes of health disparities. Research into the health of asylum seekers is a case in point. There is a growing body of evidence documenting the adverse affects of recent changes in asylum-seeking legislation, including mandatory detention. However, much of this evidence has been dismissed by some governments as being unsound, biased and unscientific because, it is argued, evidence is derived from small samples or from case studies. Yet, it is the policies of governments that are the key barrier to the conduct of rigorous population health research on asylum seekers. In this paper, the authors discuss the challenges of counting asylum seekers and the limitations of data reported in some industrialized countries. They argue that the lack of accurate statistical data on asylum seekers has been an effective neo-conservative strategy for erasing the health inequalities in this vulnerable population, indeed a strategy that renders invisible this population. They describe some alternative strategies that may be used by researchers to obtain denominator data on hard-to-reach populations such as asylum seekers.
Resumo:
In this paper we present a new simulation methodology in order to obtain exact or approximate Bayesian inference for models for low-valued count time series data that have computationally demanding likelihood functions. The algorithm fits within the framework of particle Markov chain Monte Carlo (PMCMC) methods. The particle filter requires only model simulations and, in this regard, our approach has connections with approximate Bayesian computation (ABC). However, an advantage of using the PMCMC approach in this setting is that simulated data can be matched with data observed one-at-a-time, rather than attempting to match on the full dataset simultaneously or on a low-dimensional non-sufficient summary statistic, which is common practice in ABC. For low-valued count time series data we find that it is often computationally feasible to match simulated data with observed data exactly. Our particle filter maintains $N$ particles by repeating the simulation until $N+1$ exact matches are obtained. Our algorithm creates an unbiased estimate of the likelihood, resulting in exact posterior inferences when included in an MCMC algorithm. In cases where exact matching is computationally prohibitive, a tolerance is introduced as per ABC. A novel aspect of our approach is that we introduce auxiliary variables into our particle filter so that partially observed and/or non-Markovian models can be accommodated. We demonstrate that Bayesian model choice problems can be easily handled in this framework.
Resumo:
In this paper we present a new method for performing Bayesian parameter inference and model choice for low count time series models with intractable likelihoods. The method involves incorporating an alive particle filter within a sequential Monte Carlo (SMC) algorithm to create a novel pseudo-marginal algorithm, which we refer to as alive SMC^2. The advantages of this approach over competing approaches is that it is naturally adaptive, it does not involve between-model proposals required in reversible jump Markov chain Monte Carlo and does not rely on potentially rough approximations. The algorithm is demonstrated on Markov process and integer autoregressive moving average models applied to real biological datasets of hospital-acquired pathogen incidence, animal health time series and the cumulative number of poison disease cases in mule deer.
Resumo:
Background The problem of silent multiple comparisons is one of the most difficult statistical problems faced by scientists. It is a particular problem for investigating a one-off cancer cluster reported to a health department because any one of hundreds, or possibly thousands, of neighbourhoods, schools, or workplaces could have reported a cluster, which could have been for any one of several types of cancer or any one of several time periods. Methods This paper contrasts the frequentist approach with a Bayesian approach for dealing with silent multiple comparisons in the context of a one-off cluster reported to a health department. Two published cluster investigations were re-analysed using the Dunn-Sidak method to adjust frequentist p-values and confidence intervals for silent multiple comparisons. Bayesian methods were based on the Gamma distribution. Results Bayesian analysis with non-informative priors produced results similar to the frequentist analysis, and suggested that both clusters represented a statistical excess. In the frequentist framework, the statistical significance of both clusters was extremely sensitive to the number of silent multiple comparisons, which can only ever be a subjective "guesstimate". The Bayesian approach is also subjective: whether there is an apparent statistical excess depends on the specified prior. Conclusion In cluster investigations, the frequentist approach is just as subjective as the Bayesian approach, but the Bayesian approach is less ambitious in that it treats the analysis as a synthesis of data and personal judgements (possibly poor ones), rather than objective reality. Bayesian analysis is (arguably) a useful tool to support complicated decision-making, because it makes the uncertainty associated with silent multiple comparisons explicit.