996 resultados para Statistical principles
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.
Resumo:
Solutions to combinatorial optimization problems, such as problems of locating facilities, frequently rely on heuristics to minimize the objective function. The optimum is sought iteratively and a criterion is needed to decide when the procedure (almost) attains it. Pre-setting the number of iterations dominates in OR applications, which implies that the quality of the solution cannot be ascertained. A small, almost dormant, branch of the literature suggests using statistical principles to estimate the minimum and its bounds as a tool to decide upon stopping and evaluating the quality of the solution. In this paper we examine the functioning of statistical bounds obtained from four different estimators by using simulated annealing on p-median test problems taken from Beasley’s OR-library. We find the Weibull estimator and the 2nd order Jackknife estimator preferable and the requirement of sample size to be about 10 being much less than the current recommendation. However, reliable statistical bounds are found to depend critically on a sample of heuristic solutions of high quality and we give a simple statistic useful for checking the quality. We end the paper with an illustration on using statistical bounds in a problem of locating some 70 distribution centers of the Swedish Post in one Swedish region.
Resumo:
Effective risk management is crucial for any organisation. One of its key steps is risk identification, but few tools exist to support this process. Here we present a method for the automatic discovery of a particular type of process-related risk, the danger of deadline transgressions or overruns, based on the analysis of event logs. We define a set of time-related process risk indicators, i.e., patterns observable in event logs that highlight the likelihood of an overrun, and then show how instances of these patterns can be identified automatically using statistical principles. To demonstrate its feasibility, the approach has been implemented as a plug-in module to the process mining framework ProM and tested using an event log from a Dutch financial institution.
Resumo:
This paper presents our experience with combining statistical principles and participatory methods to generate national statistics. The methodology was developed in Malawi during 1999–2002. We demonstrate that if PRA is combined with statistical principles (including probability-based sampling and standardization), it can produce total population statistics and estimates of the proportion of households with certain characteristics (e.g., poverty). It can also provide quantitative data on complex issues of national importance such as poverty targeting. This approach is distinct from previous PRA-based approaches, which generate numbers at community level but only provide qualitative information at national level.
Resumo:
Solutions to combinatorial optimization, such as p-median problems of locating facilities, frequently rely on heuristics to minimize the objective function. The minimum is sought iteratively and a criterion is needed to decide when the procedure (almost) attains it. However, pre-setting the number of iterations dominates in OR applications, which implies that the quality of the solution cannot be ascertained. A small branch of the literature suggests using statistical principles to estimate the minimum and use the estimate for either stopping or evaluating the quality of the solution. In this paper we use test-problems taken from Baesley's OR-library and apply Simulated Annealing on these p-median problems. We do this for the purpose of comparing suggested methods of minimum estimation and, eventually, provide a recommendation for practioners. An illustration ends the paper being a problem of locating some 70 distribution centers of the Swedish Post in a region.
Resumo:
Solutions to combinatorial optimization problems frequently rely on heuristics to minimize an objective function. The optimum is sought iteratively and pre-setting the number of iterations dominates in operations research applications, which implies that the quality of the solution cannot be ascertained. Deterministic bounds offer a mean of ascertaining the quality, but such bounds are available for only a limited number of heuristics and the length of the interval may be difficult to control in an application. A small, almost dormant, branch of the literature suggests using statistical principles to derive statistical bounds for the optimum. We discuss alternative approaches to derive statistical bounds. We also assess their performance by testing them on 40 test p-median problems on facility location, taken from Beasley’s OR-library, for which the optimum is known. We consider three popular heuristics for solving such location problems; simulated annealing, vertex substitution, and Lagrangian relaxation where only the last offers deterministic bounds. Moreover, we illustrate statistical bounds in the location of 71 regional delivery points of the Swedish Post. We find statistical bounds reliable and much more efficient than deterministic bounds provided that the heuristic solutions are sampled close to the optimum. Statistical bounds are also found computationally affordable.
Resumo:
The gravity model, entropy model, potential type model and others like these have been adopted to formulate interregional trade coefficients under the framework of Multi-Regional I-O (MRIO) analysis. Since most of these models are based upon analogies in physics or on statistical principles, they do not provide a theoretical explanation from the view of a firm's or individual's rational and deterministic decision making. In this paper, according to the deterministic choice theory, not only is an alternative formulation of the trade coefficients presented, but also a discussion of an appropriate definition for purchasing prices indices. Since this formulation is consistent with the MRIO system, it can be employed as a useful model-building tool in multi-regional models such as the spatial CGE model.
Resumo:
Aiming to establish a rigorous link between macroscopic random motion (described e.g. by Langevin-type theories) and microscopic dynamics, we have undertaken a kinetic-theoretical study of the dynamics of a classical test-particle weakly coupled to a large heat-bath in thermal equilibrium. Both subsystems are subject to an external force field. From the (time-non-local) generalized master equation a Fokker-Planck-type equation follows as a "quasi-Markovian" approximation. The kinetic operator thus defined is shown to be ill-defined; in specific, it does not preserve the positivity of the test-particle distribution function f(x, v; t). Adopting an alternative approach, previously introduced for quantum open systems, is proposed to lead to a correct kinetic operator, which yields all the expected properties. A set of explicit expressions for the diffusion and drift coefficients are obtained, allowing for modelling macroscopic diffusion and dynamical friction phenomena, in terms of an external field and intrinsic physical parameters.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros
Resumo:
Background The four principles of Beauchamp and Childress - autonomy, non-maleficence, beneficence and justice - have been extremely influential in the field of medical ethics, and are fundamental for understanding the current approach to ethical assessment in health care. This study tests whether these principles can be quantitatively measured on an individual level, and then subsequently if they are used in the decision making process when individuals are faced with ethical dilemmas. Methods The Analytic Hierarchy Process was used as a tool for the measurement of the principles. Four scenarios, which involved conflicts between the medical ethical principles, were presented to participants and they made judgments about the ethicality of the action in the scenario, and their intentions to act in the same manner if they were in the situation. Results Individual preferences for these medical ethical principles can be measured using the Analytic Hierarchy Process. This technique provides a useful tool in which to highlight individual medical ethical values. On average individuals have a significant preference for non-maleficence over the other principles, however, and perhaps counter-intuitively, this preference does not seem to relate to applied ethical judgements in specific ethical dilemmas. Conclusions People state they value these medical ethical principles but they do not actually seem to use them directly in the decision making process. The reasons for this are explained through the lack of a behavioural model to account for the relevant situational factors not captured by the principles. The limitations of the principles in predicting ethical decision making are discussed.
Resumo:
At NDSS 2012, Yan et al. analyzed the security of several challenge-response type user authentication protocols against passive observers, and proposed a generic counting based statistical attack to recover the secret of some counting based protocols given a number of observed authentication sessions. Roughly speaking, the attack is based on the fact that secret (pass) objects appear in challenges with a different probability from non-secret (decoy) objects when the responses are taken into account. Although they mentioned that a protocol susceptible to this attack should minimize this difference, they did not give details as to how this can be achieved barring a few suggestions. In this paper, we attempt to fill this gap by generalizing the attack with a much more comprehensive theoretical analysis. Our treatment is more quantitative which enables us to describe a method to theoretically estimate a lower bound on the number of sessions a protocol can be safely used against the attack. Our results include 1) two proposed fixes to make counting protocols practically safe against the attack at the cost of usability, 2) the observation that the attack can be used on non-counting based protocols too as long as challenge generation is contrived, 3) and two main design principles for user authentication protocols which can be considered as extensions of the principles from Yan et al. This detailed theoretical treatment can be used as a guideline during the design of counting based protocols to determine their susceptibility to this attack. The Foxtail protocol, one of the protocols analyzed by Yan et al., is used as a representative to illustrate our theoretical and experimental results.
Resumo:
The geometry of tree branches can have considerable effect on their efficiency in terms of carbon export per unit carbon investment in structure. The purpose of this study was to evaluate different design criteria using data describing the form of Picea sitchensis branches. Allometric analysis of the data suggests that resources are distributed to favour shoots with the greatest opportunity for extension into new space, with priority to the extension of the leader. The distribution of allometric relations of links (branch elements) was tested against two models: the pipe model, based on hydraulic transport requirements, and a static load model based on the requirement of shoots to provide mechanical resistance to static loads. Static load resistance required the load parameter to be proportional to the link radius raised to the power of 4. This was shown to be true within a 95% statistical confidence limit. The pipe model would require total distal length to be proportional to link radius squared but the measured branches did not conform well to this model. The comparison suggests that the diameters of branch elements were more related to the requirements for mechanical load. The cost of following a hydraulic design principle (the pipe model) in terms of mechanical efficiency was estimated and suggested that the pipe model branch would not be mechanically compromised but would use structural resources inefficiently. Resource allocation among branch elements was found to be consistent with mechanical stability criteria but also indicated the possibility of allocation based on other criteria, such as potential light interception by shoots. The evidence suggests that whilst branch topology increments by reiteration of units of morphogenesis, the geometry follows a functional design pattern.
Resumo:
In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.
Resumo:
Tese de doutoramento, Farmácia (Tecnologia Farmacêutica), Universidade de Lisboa, Faculdade de Farmácia, 2014
Resumo:
For many types of learners one can compute the statistically 'optimal' way to select data. We review how these techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.