956 resultados para Models for count data
Resumo:
On-farm records are essential for managing mastitis in dairy herds. Mastitis records are a useful tool for caring for an individual cow, to monitor compliance of farm personnel working with groups of animals, to understand the epidemiology of mastitis in the herd, to ensure responsible drug utilization, and to document accountability in care of the cow. Herds have become larger and more people are involved with individual animal care. This article describes a records plan that can be used to monitor mastitis at the herd level, aid in decision-making processes for individual cows, and improve drug use on dairy herds.
Resumo:
In this paper, we propose nonlinear elliptical models for correlated data with heteroscedastic and/or autoregressive structures. Our aim is to extend the models proposed by Russo et al. [22] by considering a more sophisticated scale structure to deal with variations in data dispersion and/or a possible autocorrelation among measurements taken throughout the same experimental unit. Moreover, to avoid the possible influence of outlying observations or to take into account the non-normal symmetric tails of the data, we assume elliptical contours for the joint distribution of random effects and errors, which allows us to attribute different weights to the observations. We propose an iterative algorithm to obtain the maximum-likelihood estimates for the parameters and derive the local influence curvatures for some specific perturbation schemes. The motivation for this work comes from a pharmacokinetic indomethacin data set, which was analysed previously by Bocheng and Xuping [1] under normality.
Resumo:
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretestposttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
Chlamydia trachomatis is the most common bacterial sexually transmitted infection (STI) in many developed countries. The highest prevalence rates are found among young adults who have frequent partner change rates. Three published individual-based models have incorporated a detailed description of age-specific sexual behaviour in order to quantify the transmission of C. trachomatis in the population and to assess the impact of screening interventions. Owing to varying assumptions about sexual partnership formation and dissolution and the great uncertainty about critical parameters, such models show conflicting results about the impact of preventive interventions. Here, we perform a detailed evaluation of these models by comparing the partnership formation and dissolution dynamics with data from Natsal 2000, a population-based probability sample survey of sexual attitudes and lifestyles in Britain. The data also allow us to describe the dispersion of C. trachomatis infections as a function of sexual behaviour, using the Gini coefficient. We suggest that the Gini coefficient is a useful measure for calibrating infectious disease models that include risk structure and highlight the need to estimate this measure for other STIs.
Resumo:
Riparian zones are dynamic, transitional ecosystems between aquatic and terrestrial ecosystems with well defined vegetation and soil characteristics. Development of an all-encompassing definition for riparian ecotones, because of their high variability, is challenging. However, there are two primary factors that all riparian ecotones are dependent on: the watercourse and its associated floodplain. Previous approaches to riparian boundary delineation have utilized fixed width buffers, but this methodology has proven to be inadequate as it only takes the watercourse into consideration and ignores critical geomorphology, associated vegetation and soil characteristics. Our approach offers advantages over other previously used methods by utilizing: the geospatial modeling capabilities of ArcMap GIS; a better sampling technique along the water course that can distinguish the 50-year flood plain, which is the optimal hydrologic descriptor of riparian ecotones; the Soil Survey Database (SSURGO) and National Wetland Inventory (NWI) databases to distinguish contiguous areas beyond the 50-year plain; and land use/cover characteristics associated with the delineated riparian zones. The model utilizes spatial data readily available from Federal and State agencies and geospatial clearinghouses. An accuracy assessment was performed to assess the impact of varying the 50-year flood height, changing the DEM spatial resolution (1, 3, 5 and 10m), and positional inaccuracies with the National Hydrography Dataset (NHD) streams layer on the boundary placement of the delineated variable width riparian ecotones area. The result of this study is a robust and automated GIS based model attached to ESRI ArcMap software to delineate and classify variable-width riparian ecotones.
Resumo:
This paper is concerned with the analysis of zero-inflated count data when time of exposure varies. It proposes a modified zero-inflated count data model where the probability of an extra zero is derived from an underlying duration model with Weibull hazard rate. The new model is compared to the standard Poisson model with logit zero inflation in an application to the effect of treatment with thiotepa on the number of new bladder tumors.