956 resultados para Models for count data


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Diabetes patients might suffer from an unhealthy life, long-term treatment and chronic complicated diseases. The decreasing hospitalization rate is a crucial problem for health care centers. This study combines the bagging method with base classifier decision tree and costs-sensitive analysis for diabetes patients' classification purpose. Real patients' data collected from a regional hospital in Thailand were analyzed. The relevance factors were selected and used to construct base classifier decision tree models to classify diabetes and non-diabetes patients. The bagging method was then applied to improve accuracy. Finally, asymmetric classification cost matrices were used to give more alternative models for diabetes data analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

My thesis examines fine-scale habitat use and movement patterns of age 1 Greenland cod (Gadus macrocephalus ogac) tracked using acoustic telemetry. Recent advances in tracking technologies such as GPS and acoustic telemetry have led to increasingly large and detailed datasets that present new opportunities for researchers to address fine-scale ecological questions regarding animal movement and spatial distribution. There is a growing demand for home range models that will not only work with massive quantities of autocorrelated data, but that can also exploit the added detail inherent in these high-resolution datasets. Most published home range studies use radio-telemetry or satellite data from terrestrial mammals or avian species, and most studies that evaluate the relative performance of home range models use simulated data. In Chapter 2, I used actual field-collected data from age-1 Greenland cod tracked with acoustic telemetry to evaluate the accuracy and precision of six home range models: minimum convex polygons, kernel densities with plug-in bandwidth selection and the reference bandwidth, adaptive local convex hulls, Brownian bridges, and dynamic Brownian bridges. I then applied the most appropriate model to two years (2010-2012) of tracking data collected from 82 tagged Greenland cod tracked in Newman Sound, Newfoundland, Canada, to determine diel and seasonal differences in habitat use and movement patterns (Chapter 3). Little is known of juvenile cod ecology, so resolving these relationships will provide valuable insight into activity patterns, habitat use, and predator-prey dynamics, while filling a knowledge gap regarding the use of space by age 1 Greenland cod in a coastal nursery habitat. By doing so, my thesis demonstrates an appropriate technique for modelling the spatial use of fish from acoustic telemetry data that can be applied to high-resolution, high-frequency tracking datasets collected from mobile organisms in any environment.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

No estudo de séries temporais, os processos estocásticos usuais assumem que as distribuições marginais são contínuas e, em geral, não são adequados para modelar séries de contagem, pois as suas características não lineares colocam alguns problemas estatísticos, principalmente na estimação dos parâmetros. Assim, investigou-se metodologias apropriadas de análise e modelação de séries com distribuições marginais discretas. Neste contexto, Al-Osh and Alzaid (1987) e McKenzie (1988) introduziram na literatura a classe dos modelos autorregressivos com valores inteiros não negativos, os processos INAR. Estes modelos têm sido frequentemente tratados em artigos científicos ao longo das últimas décadas, pois a sua importância nas aplicações em diversas áreas do conhecimento tem despertado um grande interesse no seu estudo. Neste trabalho, após uma breve revisão sobre séries temporais e os métodos clássicos para a sua análise, apresentamos os modelos autorregressivos de valores inteiros não negativos de primeira ordem INAR (1) e a sua extensão para uma ordem p, as suas propriedades e alguns métodos de estimação dos parâmetros nomeadamente, o método de Yule-Walker, o método de Mínimos Quadrados Condicionais (MQC), o método de Máxima Verosimilhança Condicional (MVC) e o método de Quase Máxima Verosimilhança (QMV). Apresentamos também um critério automático de seleção de ordem para modelos INAR, baseado no Critério de Informação de Akaike Corrigido, AICC, um dos critérios usados para determinar a ordem em modelos autorregressivos, AR. Finalmente, apresenta-se uma aplicação da metodologia dos modelos INAR em dados reais de contagem relativos aos setores dos transportes marítimos e atividades de seguros de Cabo Verde.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Presentation at the CRIS2016 conference in St Andrews, June 10, 2016

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Leafy greens are essential part of a healthy diet. Because of their health benefits, production and consumption of leafy greens has increased considerably in the U.S. in the last few decades. However, leafy greens are also associated with a large number of foodborne disease outbreaks in the last few years. The overall goal of this dissertation was to use the current knowledge of predictive models and available data to understand the growth, survival, and death of enteric pathogens in leafy greens at pre- and post-harvest levels. Temperature plays a major role in the growth and death of bacteria in foods. A growth-death model was developed for Salmonella and Listeria monocytogenes in leafy greens for varying temperature conditions typically encountered during supply chain. The developed growth-death models were validated using experimental dynamic time-temperature profiles available in the literature. Furthermore, these growth-death models for Salmonella and Listeria monocytogenes and a similar model for E. coli O157:H7 were used to predict the growth of these pathogens in leafy greens during transportation without temperature control. Refrigeration of leafy greens meets the purposes of increasing their shelf-life and mitigating the bacterial growth, but at the same time, storage of foods at lower temperature increases the storage cost. Nonlinear programming was used to optimize the storage temperature of leafy greens during supply chain while minimizing the storage cost and maintaining the desired levels of sensory quality and microbial safety. Most of the outbreaks associated with consumption of leafy greens contaminated with E. coli O157:H7 have occurred during July-November in the U.S. A dynamic system model consisting of subsystems and inputs (soil, irrigation, cattle, wildlife, and rainfall) simulating a farm in a major leafy greens producing area in California was developed. The model was simulated incorporating the events of planting, irrigation, harvesting, ground preparation for the new crop, contamination of soil and plants, and survival of E. coli O157:H7. The predictions of this system model are in agreement with the seasonality of outbreaks. This dissertation utilized the growth, survival, and death models of enteric pathogens in leafy greens during production and supply chain.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

No estudo de séries temporais, os processos estocásticos usuais assumem que as distribuições marginais são contínuas e, em geral, não são adequados para modelar séries de contagem, pois as suas características não lineares colocam alguns problemas estatísticos, principalmente na estimação dos parâmetros. Assim, investigou-se metodologias apropriadas de análise e modelação de séries com distribuições marginais discretas. Neste contexto, Al-Osh and Alzaid (1987) e McKenzie (1988) introduziram na literatura a classe dos modelos autorregressivos com valores inteiros não negativos, os processos INAR. Estes modelos têm sido frequentemente tratados em artigos científicos ao longo das últimas décadas, pois a sua importância nas aplicações em diversas áreas do conhecimento tem despertado um grande interesse no seu estudo. Neste trabalho, após uma breve revisão sobre séries temporais e os métodos clássicos para a sua análise, apresentamos os modelos autorregressivos de valores inteiros não negativos de primeira ordem INAR (1) e a sua extensão para uma ordem p, as suas propriedades e alguns métodos de estimação dos parâmetros nomeadamente, o método de Yule-Walker, o método de Mínimos Quadrados Condicionais (MQC), o método de Máxima Verosimilhança Condicional (MVC) e o método de Quase Máxima Verosimilhança (QMV). Apresentamos também um critério automático de seleção de ordem para modelos INAR, baseado no Critério de Informação de Akaike Corrigido, AICC, um dos critérios usados para determinar a ordem em modelos autorregressivos, AR. Finalmente, apresenta-se uma aplicação da metodologia dos modelos INAR em dados reais de contagem relativos aos setores dos transportes marítimos e atividades de seguros de Cabo Verde.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper is concerned with SIR (susceptible--infected--removed) household epidemic models in which the infection response may be either mild or severe, with the type of response also affecting the infectiousness of an individual. Two different models are analysed. In the first model, the infection status of an individual is predetermined, perhaps due to partial immunity, and in the second, the infection status of an individual depends on the infection status of its infector and on whether the individual was infected by a within- or between-household contact. The first scenario may be modelled using a multitype household epidemic model, and the second scenario by a model we denote by the infector-dependent-severity household epidemic model. Large population results of the two models are derived, with the focus being on the distribution of the total numbers of mild and severe cases in a typical household, of any given size, in the event that the epidemic becomes established. The aim of the paper is to investigate whether it is possible to determine which of the two underlying explanations is causing the varying response when given final size household outbreak data containing mild and severe cases. We conduct numerical studies which show that, given data on sufficiently many households, it is generally possible to discriminate between the two models by comparing the Kullback-Leibler divergence for the two fitted models to these data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Contingent Protection has grown to become an important trade restricting device. In the European Union, protection instruments like antidumping are used extensively. This paper analyses whether macroeconomic pressures may contribute to explain the variations in the intensity of antidumping protectionism in the EU. The empirical analysis uses count data models, applying various specification tests to derive the most appropriate specification. Our results suggest that the filing activity is inversely related to the macroeconomic conditions. Moreover, they confirm existing evidence for the US suggesting that domestic macroeconomic pressures are a more important determinant of contingent protection policy than external pressures.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Species occurrence and abundance models are important tools that can be used in biodiversity conservation, and can be applied to predict or plan actions needed to mitigate the environmental impacts of hydropower dams. In this study our objectives were: (i) to model the occurrence and abundance of threatened plant species, (ii) to verify the relationship between predicted occurrence and true abundance, and (iii) to assess whether models based on abundance are more effective in predicting species occurrence than those based on presence–absence data. Individual representatives of nine species were counted within 388 randomly georeferenced plots (10 m × 50 m) around the Barra Grande hydropower dam reservoir in southern Brazil. We modelled their relationship with 15 environmental variables using both occurrence (Generalised Linear Models) and abundance data (Hurdle and Zero-Inflated models). Overall, occurrence models were more accurate than abundance models. For all species, observed abundance was significantly, although not strongly, correlated with the probability of occurrence. This correlation lost significance when zero-abundance (absence) sites were excluded from analysis, but only when this entailed a substantial drop in sample size. The same occurred when analysing relationships between abundance and probability of occurrence from previously published studies on a range of different species, suggesting that future studies could potentially use probability of occurrence as an approximate indicator of abundance when the latter is not possible to obtain. This possibility might, however, depend on life history traits of the species in question, with some traits favouring a relationship between occurrence and abundance. Reconstructing species abundance patterns from occurrence could be an important tool for conservation planning and the management of threatened species, allowing scientists to indicate the best areas for collection and reintroduction of plant germplasm or choose conservation areas most likely to maintain viable populations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Knowledge of the geographical distribution of timber tree species in the Amazon is still scarce. This is especially true at the local level, thereby limiting natural resource management actions. Forest inventories are key sources of information on the occurrence of such species. However, areas with approved forest management plans are mostly located near access roads and the main industrial centers. The present study aimed to assess the spatial scale effects of forest inventories used as sources of occurrence data in the interpolation of potential species distribution models. The occurrence data of a group of six forest tree species were divided into four geographical areas during the modeling process. Several sampling schemes were then tested applying the maximum entropy algorithm, using the following predictor variables: elevation, slope, exposure, normalized difference vegetation index (NDVI) and height above the nearest drainage (HAND). The results revealed that using occurrence data from only one geographical area with unique environmental characteristics increased both model overfitting to input data and omission error rates. The use of a diagonal systematic sampling scheme and lower threshold values led to improved model performance. Forest inventories may be used to predict areas with a high probability of species occurrence, provided they are located in forest management plan regions representative of the environmental range of the model projection area.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In silico methods, such as musculoskeletal modelling, may aid the selection of the optimal surgical treatment for highly complex pathologies such as scoliosis. Many musculoskeletal models use a generic, simplified representation of the intervertebral joints, which are fundamental to the flexibility of the spine. Therefore, to model and simulate the spine, a suitable representation of the intervertebral joint is crucial. The aim of this PhD was to characterise specimen-specific models of the intervertebral joint for multi-body models from experimental datasets. First, the project investigated the characterisation of a specimen-specific lumped parameter model of the intervertebral joint from an experimental dataset of a four-vertebra lumbar spine segment. Specimen-specific stiffnesses were determined with an optimisation method. The sensitivity of the parameters to the joint pose was investigate. Results showed the stiffnesses and predicted motions were highly depended on both the joint pose. Following the first study, the method was reapplied to another dataset that included six complete lumbar spine segments under three different loading conditions. Specimen-specific uniform stiffnesses across joint levels and level-dependent stiffnesses were calculated by optimisation. Specimen-specific stiffness show high inter-specimen variability and were also specific to the loading condition. Level-dependent stiffnesses are necessary for accurate kinematic predictions and should be determined independently of one another. Secondly, a framework to create subject-specific musculoskeletal models of individuals with severe scoliosis was developed. This resulted in a robust codified pipeline for creating subject-specific, severely scoliotic spine models from CT data. In conclusion, this thesis showed that specimen-specific intervertebral joint stiffnesses were highly sensitive to joint pose definition and the importance of level-dependent optimisation. Further, an open-source codified pipeline to create patient-specific scoliotic spine models from CT data was released. These studies and this pipeline can facilitate the specimen-specific characterisation of the scoliotic intervertebral joint and its incorporation into scoliotic musculoskeletal spine models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Size distributions in woody plant populations have been used to assess their regeneration status, assuming that size structures with reverse-J shapes represent stable populations. We present an empirical approach of this issue using five woody species from the Cerrado. Considering count data for all plants of these five species over a 12-year period, we analyzed size distribution by: a) plotting frequency distributions and their adjustment to the negative exponential curve and b) calculating the Gini coefficient. To look for a relationship between size structure and future trends, we considered the size structures from the first census year. We analyzed changes in number over time and performed a simple population viability analysis, which gives the mean population growth rate, its variance and the probability of extinction in a given time period. Frequency distributions and the Gini coefficient were not able to predict future trends in population numbers. We recommend that managers should not use measures of size structure as a basis for management decisions without applying more appropriate demographic studies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a physically transparent analytic model of astrophysical S factors as a function of a center-of-mass energy E of colliding nuclei (below and above the Coulomb barrier) for nonresonant fusion reactions. For any given reaction, the S(E) model contains four parameters [two of which approximate the barrier potential, U(r)]. They are easily interpolated along many reactions involving isotopes of the same elements; they give accurate practical expressions for S(E) with only several input parameters for many reactions. The model reproduces the suppression of S(E) at low energies (of astrophysical importance) due to the shape of the low-r wing of U(r). The model can be used to reconstruct U(r) from computed or measured S(E). For illustration, we parametrize our recent calculations of S(E) (using the Sao Paulo potential and the barrier penetration formalism) for 946 reactions involving stable and unstable isotopes of C, O, Ne, and Mg (with nine parameters for all reactions involving many isotopes of the same elements, e. g., C+O). In addition, we analyze astrophysically important (12)C+(12)C reaction, compare theoretical models with experimental data, and discuss the problem of interpolating reliably known S(E) values to low energies (E less than or similar to 2-3 MeV).