963 resultados para Count data models


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cette thèse présente des méthodes de traitement de données de comptage en particulier et des données discrètes en général. Il s'inscrit dans le cadre d'un projet stratégique du CRNSG, nommé CC-Bio, dont l'objectif est d'évaluer l'impact des changements climatiques sur la répartition des espèces animales et végétales. Après une brève introduction aux notions de biogéographie et aux modèles linéaires mixtes généralisés aux chapitres 1 et 2 respectivement, ma thèse s'articulera autour de trois idées majeures. Premièrement, nous introduisons au chapitre 3 une nouvelle forme de distribution dont les composantes ont pour distributions marginales des lois de Poisson ou des lois de Skellam. Cette nouvelle spécification permet d'incorporer de l'information pertinente sur la nature des corrélations entre toutes les composantes. De plus, nous présentons certaines propriétés de ladite distribution. Contrairement à la distribution multidimensionnelle de Poisson qu'elle généralise, celle-ci permet de traiter les variables avec des corrélations positives et/ou négatives. Une simulation permet d'illustrer les méthodes d'estimation dans le cas bidimensionnel. Les résultats obtenus par les méthodes bayésiennes par les chaînes de Markov par Monte Carlo (CMMC) indiquent un biais relatif assez faible de moins de 5% pour les coefficients de régression des moyennes contrairement à ceux du terme de covariance qui semblent un peu plus volatils. Deuxièmement, le chapitre 4 présente une extension de la régression multidimensionnelle de Poisson avec des effets aléatoires ayant une densité gamma. En effet, conscients du fait que les données d'abondance des espèces présentent une forte dispersion, ce qui rendrait fallacieux les estimateurs et écarts types obtenus, nous privilégions une approche basée sur l'intégration par Monte Carlo grâce à l'échantillonnage préférentiel. L'approche demeure la même qu'au chapitre précédent, c'est-à-dire que l'idée est de simuler des variables latentes indépendantes et de se retrouver dans le cadre d'un modèle linéaire mixte généralisé (GLMM) conventionnel avec des effets aléatoires de densité gamma. Même si l'hypothèse d'une connaissance a priori des paramètres de dispersion semble trop forte, une analyse de sensibilité basée sur la qualité de l'ajustement permet de démontrer la robustesse de notre méthode. Troisièmement, dans le dernier chapitre, nous nous intéressons à la définition et à la construction d'une mesure de concordance donc de corrélation pour les données augmentées en zéro par la modélisation de copules gaussiennes. Contrairement au tau de Kendall dont les valeurs se situent dans un intervalle dont les bornes varient selon la fréquence d'observations d'égalité entre les paires, cette mesure a pour avantage de prendre ses valeurs sur (-1;1). Initialement introduite pour modéliser les corrélations entre des variables continues, son extension au cas discret implique certaines restrictions. En effet, la nouvelle mesure pourrait être interprétée comme la corrélation entre les variables aléatoires continues dont la discrétisation constitue nos observations discrètes non négatives. Deux méthodes d'estimation des modèles augmentés en zéro seront présentées dans les contextes fréquentiste et bayésien basées respectivement sur le maximum de vraisemblance et l'intégration de Gauss-Hermite. Enfin, une étude de simulation permet de montrer la robustesse et les limites de notre approche.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Les données comptées (count data) possèdent des distributions ayant des caractéristiques particulières comme la non-normalité, l’hétérogénéité des variances ainsi qu’un nombre important de zéros. Il est donc nécessaire d’utiliser les modèles appropriés afin d’obtenir des résultats non biaisés. Ce mémoire compare quatre modèles d’analyse pouvant être utilisés pour les données comptées : le modèle de Poisson, le modèle binomial négatif, le modèle de Poisson avec inflation du zéro et le modèle binomial négatif avec inflation du zéro. À des fins de comparaisons, la prédiction de la proportion du zéro, la confirmation ou l’infirmation des différentes hypothèses ainsi que la prédiction des moyennes furent utilisées afin de déterminer l’adéquation des différents modèles. Pour ce faire, le nombre d’arrestations des membres de gangs de rue sur le territoire de Montréal fut utilisé pour la période de 2005 à 2007. L’échantillon est composé de 470 hommes, âgés de 18 à 59 ans. Au terme des analyses, le modèle le plus adéquat est le modèle binomial négatif puisque celui-ci produit des résultats significatifs, s’adapte bien aux données observées et produit une proportion de zéro très similaire à celle observée.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The log-ratio methodology makes available powerful tools for analyzing compositional data. Nevertheless, the use of this methodology is only possible for those data sets without null values. Consequently, in those data sets where the zeros are present, a previous treatment becomes necessary. Last advances in the treatment of compositional zeros have been centered especially in the zeros of structural nature and in the rounded zeros. These tools do not contemplate the particular case of count compositional data sets with null values. In this work we deal with \count zeros" and we introduce a treatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichlet probability distribution as a prior and we estimate the posterior probabilities. Then we apply a multiplicative modi¯cation for the non-zero values. We present a case study where this new methodology is applied. Key words: count data, multiplicative replacement, composition, log-ratio analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Even though antenatal care is universally regarded as important, determinants of demand for antenatal care have not been widely studied. Evidence concerning which and how socioeconomic conditions influence whether a pregnant woman attends or not at least one antenatal consultation or how these factors affect the absences to antenatal consultations is very limited. In order to generate this evidence, a two-stage analysis was performed with data from the Demographic and Health Survey carried out by Profamilia in Colombia during 2005. The first stage was run as a logit model showing the marginal effects on the probability of attending the first visit and an ordinary least squares model was performed for the second stage. It was found that mothers living in the pacific region as well as young mothers seem to have a lower probability of attending the first visit but these factors are not related to the number of absences to antenatal consultation once the first visit has been achieved. The effect of health insurance was surprising because of the differing effects that the health insurers showed. Some familiar and personal conditions such as willingness to have the last children and number of previous children, demonstrated to be important in the determination of demand. The effect of mother’s educational attainment was proved as important whereas the father’s educational achievement was not. This paper provides some elements for policy making in order to increase the demand inducement of antenatal care, as well as stimulating research on demand for specific issues on health.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Annual loss of nests by industrial (nonwoodlot) forest harvesting in Canada was estimated using two avian point-count data sources: (1) the Boreal Avian Monitoring Project (BAM) dataset for provinces operating in this biome and (2) available data summarized for the major (nonboreal) forest regions of British Columbia. Accounting for uncertainty in the proportion of harvest occurring during the breeding season and in avian nesting densities, our estimate ranges from 616 thousand to 2.09 million nests. Estimates of the impact on numbers of individuals recruited into the adult breeding population were made based on the application of survivorship estimates at various stages of the life cycle. Future improvements to this estimate are expected as better and more extensive avian breeding pair density estimates become available and as provincial forestry statistics become more refined, spatially and temporally. The effect of incidental take due to forestry is not uniform and is disproportionately centered in the southern boreal. Those species whose ranges occur primarily in these regions are most at risk for industrial forestry in general and for incidental take in particular. Refinements to the nest loss estimate for industrial forestry in Canada will be achieved primarily through the provision of more accurate estimates of the area of forest harvested annually during the breeding season stratified by forest type and Bird Conservation Region (BCR). A better understanding of survivorship among life-history stages for forest birds would also allow for better modeling of the effect of nest loss on adult recruitment. Finally, models are needed to project legacy effects of forest harvesting on avian populations that take into account forest succession and accompanying cumulative effects of landscape change.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper analyses the cut flower market as an example of an invasion pathway along which species of non-indigenous plant pests can travel to reach new areas. The paper examines the probability of pest detection by assessing information on pest detection and detection effort associated with the import of cut flowers. We test the link between the probability of plant pest arrivals as a precursor to potential invasion, and volume of traded flowers using count data regression models. The analysis is applied to the UK import of specific genera of cut flowers form Kenya between 1996 and 2004. There is a link between pest detection and the Genus of cut flower imported. Hence, pest detection efforts should focus on identifying and targeting those imported plants with a high risk of carrying pest species. For most of the plants studied efforts allocated to inspection have a significant influence on the probabilty of pest detction. However, by better targetting inspection efforts, it is shown that plant inspection effort could be reduced without increasing the risk of pest entry. Similarly, for most of the plants analysed, an increase in volume traded will not necessarily lead to an increase in the number of pests entering the UK. For some species, such as conclude that analysis at the rank of plant Genus is important both to understand the effectiveness of plant pest detection efforts and consequently to manage the risk of introduction of non-indigenous species.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We describe ncWMS, an implementation of the Open Geospatial Consortium’s Web Map Service (WMS) specification for multidimensional gridded environmental data. ncWMS can read data in a large number of common scientific data formats – notably the NetCDF format with the Climate and Forecast conventions – then efficiently generate map imagery in thousands of different coordinate reference systems. It is designed to require minimal configuration from the system administrator and, when used in conjunction with a suitable client tool, provides end users with an interactive means for visualizing data without the need to download large files or interpret complex metadata. It is also used as a “bridging” tool providing interoperability between the environmental science community and users of geographic information systems. ncWMS implements a number of extensions to the WMS standard in order to fulfil some common scientific requirements, including the ability to generate plots representing timeseries and vertical sections. We discuss these extensions and their impact upon present and future interoperability. We discuss the conceptual mapping between the WMS data model and the data models used by gridded data formats, highlighting areas in which the mapping is incomplete or ambiguous. We discuss the architecture of the system and particular technical innovations of note, including the algorithms used for fast data reading and image generation. ncWMS has been widely adopted within the environmental data community and we discuss some of the ways in which the software is integrated within data infrastructures and portals.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The humpback whale (Megaptera novaeangliae) population that uses Abrolhos Bank, off the east coast of Brazil as a breeding ground is increasing. To describe temporal changes in the relative abundance of humpback whales around Abrolhos, seven years (1998-2004) of whale count data were collected during July through to November. During one-hour-scans, observers determined group size within 9.3 km (5 n.m.) of a land-based observing station. A total Of 930 scans, comprising 7996 sightings of adults and 2044 calves were analysed using generalized linear models that included variables for time of day, day of the season, years and two-way interactions as possible predictors. The pattern observed was the gradual build-up and decline in whale counts within seasons. Patterns and peaks of adult and calf counts varied among years. Although fluctuation was observed, there was generally an increasing trend in adult counts among years. Calf counts increased only in 2004. These fluctuations may have been caused by some environmental conditions in humpback whales` summering grounds and also by changes in spatial-temporal concentrations in Abrolhos Bank. The general pattern observed within the study area mirrored what was observed in the whole Abrolhos Bank. Knowledge of the consistency with which humpback whales use this important nursing area should prove beneficial for designing future monitoring programmes especially related to whale watching activities around Abrolhos Archipelago.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

HydroShare is an online, collaborative system being developed for open sharing of hydrologic data and models. The goal of HydroShare is to enable scientists to easily discover and access hydrologic data and models, retrieve them to their desktop or perform analyses in a distributed computing environment that may include grid, cloud or high performance computing model instances as necessary. Scientists may also publish outcomes (data, results or models) into HydroShare, using the system as a collaboration platform for sharing data, models and analyses. HydroShare is expanding the data sharing capability of the CUAHSI Hydrologic Information System by broadening the classes of data accommodated, creating new capability to share models and model components, and taking advantage of emerging social media functionality to enhance information about and collaboration around hydrologic data and models. One of the fundamental concepts in HydroShare is that of a Resource. All content is represented using a Resource Data Model that separates system and science metadata and has elements common to all resources as well as elements specific to the types of resources HydroShare will support. These will include different data types used in the hydrology community and models and workflows that require metadata on execution functionality. The HydroShare web interface and social media functions are being developed using the Drupal content management system. A geospatial visualization and analysis component enables searching, visualizing, and analyzing geographic datasets. The integrated Rule-Oriented Data System (iRODS) is being used to manage federated data content and perform rule-based background actions on data and model resources, including parsing to generate metadata catalog information and the execution of models and workflows. This presentation will introduce the HydroShare functionality developed to date, describe key elements of the Resource Data Model and outline the roadmap for future development.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Physical activity has been scientifically discussed as fundamental in the process of healthy ageing. Hence, this study aimed at determining the factors that influence older people to perform physical activities. The complete IPAQ (International Physical Activity Questionnaire) was applied to a population-based sample consisting of 364 elderly persons in the city of Botucatu, São Paulo, Brazil. Days of physical activity performed by the older people were considered by taking into account household and leisure activities. Models for count data were fitted by including socio-demographic variables as well as those related to life satisfaction. It was shown that housework physical-activity performance is associated with female, who predominantly showed to be more active in all levels. Male seemed to be more predisposed to perform lighter recreation, sports and leisure-time physical activities, such as walking. Additionally, poor schooling showed to be decisive for not performing physical activities both at home and during leisure.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this paper is to develop a Bayesian analysis for the right-censored survival data when immune or cured individuals may be present in the population from which the data is taken. In our approach the number of competing causes of the event of interest follows the Conway-Maxwell-Poisson distribution which generalizes the Poisson distribution. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the proposed model. Also, some discussions on the model selection and an illustration with a real data set are considered.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we propose a random intercept Poisson model in which the random effect is assumed to follow a generalized log-gamma (GLG) distribution. This random effect accommodates (or captures) the overdispersion in the counts and induces within-cluster correlation. We derive the first two moments for the marginal distribution as well as the intraclass correlation. Even though numerical integration methods are, in general, required for deriving the marginal models, we obtain the multivariate negative binomial model from a particular parameter setting of the hierarchical model. An iterative process is derived for obtaining the maximum likelihood estimates for the parameters in the multivariate negative binomial model. Residual analysis is proposed and two applications with real data are given for illustration. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.