964 resultados para Statistical count
Resumo:
Históricamente la captación estadística del empleo agropecuario (y más específicamente del empleo asalariado agropecuario) ha presentado una serie de problemas y limitaciones para las fuentes de datos tradicionales. Los elevados niveles de transitoriedad, estacionalidad, no registro e informalidad han tenido como consecuencia una serie de dificultades en su cuantificación por medio de las fuentes censales y muestrales tradicionales. Los procesos más recientes que atraviesan a esta fracción social (urbanización, acortamiento de ciclos productivos y ocupacionales, disminución de requerimientos de fuerza de trabajo por la mecanización de ciertas cosechas, etc.) parecen haber incrementado dichas dificultades. Trabajos previos realizados demuestran que los censos de población y agropecuarios arrojan diferentes resultados en la cuantificación de los asalariados del sector. Se presenta en este documento un análisis comparado de los resultados obtenidos en Argentina, por el Censo Nacional de Población y Vivienda de 2001 y el Censo Nacional Agropecuario de 2002. El objetivo buscado es realizar una aproximación a las diferentes cifras de asalariados en el agro que arrojan ambos relevamientos en todos los departamentos del país. A su vez, se intentará vincular dichas diferencias con los distintos territorios y distintas estructuras sociales y agrarias, buscando descubrir si permiten aportar a la explicación de aquellos resultados divergentes. Para ello se confeccionó una base de datos del total del país, desagregada a nivel provincial y departamental (máximo nivel de desagregación permitida por las fuentes publicadas) de la cantidad total de asalariados agropecuarios y diversos indicadores de la estructura social y agraria (cantidad de explotaciones pobres, niveles de urbanización, distribución de la tierra, etc.).
Resumo:
Históricamente la captación estadística del empleo agropecuario (y más específicamente del empleo asalariado agropecuario) ha presentado una serie de problemas y limitaciones para las fuentes de datos tradicionales. Los elevados niveles de transitoriedad, estacionalidad, no registro e informalidad han tenido como consecuencia una serie de dificultades en su cuantificación por medio de las fuentes censales y muestrales tradicionales. Los procesos más recientes que atraviesan a esta fracción social (urbanización, acortamiento de ciclos productivos y ocupacionales, disminución de requerimientos de fuerza de trabajo por la mecanización de ciertas cosechas, etc.) parecen haber incrementado dichas dificultades. Trabajos previos realizados demuestran que los censos de población y agropecuarios arrojan diferentes resultados en la cuantificación de los asalariados del sector. Se presenta en este documento un análisis comparado de los resultados obtenidos en Argentina, por el Censo Nacional de Población y Vivienda de 2001 y el Censo Nacional Agropecuario de 2002. El objetivo buscado es realizar una aproximación a las diferentes cifras de asalariados en el agro que arrojan ambos relevamientos en todos los departamentos del país. A su vez, se intentará vincular dichas diferencias con los distintos territorios y distintas estructuras sociales y agrarias, buscando descubrir si permiten aportar a la explicación de aquellos resultados divergentes. Para ello se confeccionó una base de datos del total del país, desagregada a nivel provincial y departamental (máximo nivel de desagregación permitida por las fuentes publicadas) de la cantidad total de asalariados agropecuarios y diversos indicadores de la estructura social y agraria (cantidad de explotaciones pobres, niveles de urbanización, distribución de la tierra, etc.).
Resumo:
Históricamente la captación estadística del empleo agropecuario (y más específicamente del empleo asalariado agropecuario) ha presentado una serie de problemas y limitaciones para las fuentes de datos tradicionales. Los elevados niveles de transitoriedad, estacionalidad, no registro e informalidad han tenido como consecuencia una serie de dificultades en su cuantificación por medio de las fuentes censales y muestrales tradicionales. Los procesos más recientes que atraviesan a esta fracción social (urbanización, acortamiento de ciclos productivos y ocupacionales, disminución de requerimientos de fuerza de trabajo por la mecanización de ciertas cosechas, etc.) parecen haber incrementado dichas dificultades. Trabajos previos realizados demuestran que los censos de población y agropecuarios arrojan diferentes resultados en la cuantificación de los asalariados del sector. Se presenta en este documento un análisis comparado de los resultados obtenidos en Argentina, por el Censo Nacional de Población y Vivienda de 2001 y el Censo Nacional Agropecuario de 2002. El objetivo buscado es realizar una aproximación a las diferentes cifras de asalariados en el agro que arrojan ambos relevamientos en todos los departamentos del país. A su vez, se intentará vincular dichas diferencias con los distintos territorios y distintas estructuras sociales y agrarias, buscando descubrir si permiten aportar a la explicación de aquellos resultados divergentes. Para ello se confeccionó una base de datos del total del país, desagregada a nivel provincial y departamental (máximo nivel de desagregación permitida por las fuentes publicadas) de la cantidad total de asalariados agropecuarios y diversos indicadores de la estructura social y agraria (cantidad de explotaciones pobres, niveles de urbanización, distribución de la tierra, etc.).
Resumo:
At NDSS 2012, Yan et al. analyzed the security of several challenge-response type user authentication protocols against passive observers, and proposed a generic counting based statistical attack to recover the secret of some counting based protocols given a number of observed authentication sessions. Roughly speaking, the attack is based on the fact that secret (pass) objects appear in challenges with a different probability from non-secret (decoy) objects when the responses are taken into account. Although they mentioned that a protocol susceptible to this attack should minimize this difference, they did not give details as to how this can be achieved barring a few suggestions. In this paper, we attempt to fill this gap by generalizing the attack with a much more comprehensive theoretical analysis. Our treatment is more quantitative which enables us to describe a method to theoretically estimate a lower bound on the number of sessions a protocol can be safely used against the attack. Our results include 1) two proposed fixes to make counting protocols practically safe against the attack at the cost of usability, 2) the observation that the attack can be used on non-counting based protocols too as long as challenge generation is contrived, 3) and two main design principles for user authentication protocols which can be considered as extensions of the principles from Yan et al. This detailed theoretical treatment can be used as a guideline during the design of counting based protocols to determine their susceptibility to this attack. The Foxtail protocol, one of the protocols analyzed by Yan et al., is used as a representative to illustrate our theoretical and experimental results.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros
Resumo:
Now in its second edition, this book describes tools that are commonly used in transportation data analysis. The first part of the text provides statistical fundamentals while the second part presents continuous dependent variable models. With a focus on count and discrete dependent variable models, the third part features new chapters on mixed logit models, logistic regression, and ordered probability models. The last section provides additional coverage of Bayesian statistical modeling, including Bayesian inference and Markov chain Monte Carlo methods. Data sets are available online to use with the modeling techniques discussed.
Resumo:
A fundamental prerequisite of population health research is the ability to establish an accurate denominator. This in turn requires that every individual in the study population is counted. However, this seemingly simple principle has become a point of conflict between researchers whose aim is to produce evidence of disparities in population health outcomes and governments whose policies promote(intentionally or not) inequalities that are the underlying causes of health disparities. Research into the health of asylum seekers is a case in point. There is a growing body of evidence documenting the adverse affects of recent changes in asylum-seeking legislation, including mandatory detention. However, much of this evidence has been dismissed by some governments as being unsound, biased and unscientific because, it is argued, evidence is derived from small samples or from case studies. Yet, it is the policies of governments that are the key barrier to the conduct of rigorous population health research on asylum seekers. In this paper, the authors discuss the challenges of counting asylum seekers and the limitations of data reported in some industrialized countries. They argue that the lack of accurate statistical data on asylum seekers has been an effective neo-conservative strategy for erasing the health inequalities in this vulnerable population, indeed a strategy that renders invisible this population. They describe some alternative strategies that may be used by researchers to obtain denominator data on hard-to-reach populations such as asylum seekers.
Resumo:
In this paper we present a new simulation methodology in order to obtain exact or approximate Bayesian inference for models for low-valued count time series data that have computationally demanding likelihood functions. The algorithm fits within the framework of particle Markov chain Monte Carlo (PMCMC) methods. The particle filter requires only model simulations and, in this regard, our approach has connections with approximate Bayesian computation (ABC). However, an advantage of using the PMCMC approach in this setting is that simulated data can be matched with data observed one-at-a-time, rather than attempting to match on the full dataset simultaneously or on a low-dimensional non-sufficient summary statistic, which is common practice in ABC. For low-valued count time series data we find that it is often computationally feasible to match simulated data with observed data exactly. Our particle filter maintains $N$ particles by repeating the simulation until $N+1$ exact matches are obtained. Our algorithm creates an unbiased estimate of the likelihood, resulting in exact posterior inferences when included in an MCMC algorithm. In cases where exact matching is computationally prohibitive, a tolerance is introduced as per ABC. A novel aspect of our approach is that we introduce auxiliary variables into our particle filter so that partially observed and/or non-Markovian models can be accommodated. We demonstrate that Bayesian model choice problems can be easily handled in this framework.
Resumo:
In this paper we present a new method for performing Bayesian parameter inference and model choice for low count time series models with intractable likelihoods. The method involves incorporating an alive particle filter within a sequential Monte Carlo (SMC) algorithm to create a novel pseudo-marginal algorithm, which we refer to as alive SMC^2. The advantages of this approach over competing approaches is that it is naturally adaptive, it does not involve between-model proposals required in reversible jump Markov chain Monte Carlo and does not rely on potentially rough approximations. The algorithm is demonstrated on Markov process and integer autoregressive moving average models applied to real biological datasets of hospital-acquired pathogen incidence, animal health time series and the cumulative number of poison disease cases in mule deer.
Resumo:
The quality of raw and processed fishery products depend on several factors like physiological conditions at the time of capture, morphological differences, rigor mortis, species, rate of icing and subsequent storage conditions. Sensory evaluation is still the most reliable method for evaluation of the freshness of raw processed fishery products. Sophisticated methods like Intelectron fish tester, cell fragility technique and chemical and bacteriological methods like estimation of trimethylamine, hypoxanthine, carbonyl compounds, volatile acid and total bacterial count have no doubt been developed for accessing the spoilage in fish products.
Resumo:
The variable start and duration of the Grey seal breeding season makes the estimation of total pup production from a single census very difficult. Classifying the count into morphological age classes enables the form and timing of the birth rate curve and estimates of pup mortality rates to be elucidated. A simulation technique is described which enables the duration of each morphological stage to be determined from a series of such classified counts taken over one season. A further statistical technique uses these estimates to calculate the mean timing and duration of the breeding season from a single classified count taken from similar populations in subsequent years. This information allows total pup production to be calculated for any appropriate breeding colony. Some guidance is given as to the optimal timing of that single census which would yield the best estimate of production, although the precise date is not critical to the success of the technique. Results from single census estimates obtained in this way are compared with known production data from more detailed surveys for a number of different colonies.
Resumo:
Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably
Resumo:
The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.
Resumo:
Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.