858 resultados para Zero-inflated models, Poisson distribution, Negative binomial distribution, Bernoulli trials, Safety performance functions, Small area analysis
Resumo:
Do siblings of centenarians tend to have longer life spans? To answer this question, life spans of 184 siblings for 42 centenarians have been evaluated. Two important questions have been addressed in analyzing the sibling data. First, a standard needs to be established, to which the life spans of 184 siblings are compared. In this report, an external reference population is constructed from the U.S. life tables. Its estimated mortality rates are treated as baseline hazards from which the relative mortality of the siblings are estimated. Second, the standard survival models which assume independent observations are invalid when correlation within family exists, underestimating the true variance. Methods that allow correlations are illustrated by three different methods. First, the cumulative relative excess mortality between siblings and their comparison group is calculated and used as an effective graphic tool, along with the Product Limit estimator of the survival function. The variance estimator of the cumulative relative excess mortality is adjusted for the potential within family correlation using Taylor linearization approach. Second, approaches that adjust for the inflated variance are examined. They are adjusted one-sample log-rank test using design effect originally proposed by Rao and Scott in the correlated binomial or Poisson distribution setting and the robust variance estimator derived from the log-likelihood function of a multiplicative model. Nether of these two approaches provide correlation estimate within families, but the comparison with the comparison with the standard remains valid under dependence. Last, using the frailty model concept, the multiplicative model, where the baseline hazards are known, is extended by adding a random frailty term that is based on the positive stable or the gamma distribution. Comparisons between the two frailty distributions are performed by simulation. Based on the results from various approaches, it is concluded that the siblings of centenarians had significant lower mortality rates as compared to their cohorts. The frailty models also indicate significant correlations between the life spans of the siblings. ^
Resumo:
O objetivo dessa pesquisa foi avaliar aspectos genéticos que relacionados à produção in vitro de embriões na raça Guzerá. O primeiro estudo focou na estimação de (co) variâncias genéticas e fenotípicas em características relacionadas a produção de embriões e na detecção de possível associação com a idade ao primeiro parto (AFC). Foi detectada baixa e média herdabilidade para características relacionadas à produção de oócitos e embriões. Houve fraca associação genética entre características ligadas a reprodução artificial e a idade ao primeiro parto. O segundo estudo avaliou tendências genéticas e de endogamia em uma população Guzerá no Brasil. Doadoras e embriões produzidos in vitro foram considerados como duas subpopulações de forma a realizar comparações acerca das diferenças de variação anual genética e do coeficiente de endogamia. A tendência anual do coeficiente de endogamia (F) foi superior para a população geral, sendo detectado efeito quadrático. No entanto, a média de F para a sub- população de embriões foi maior do que na população geral e das doadoras. Foi observado ganho genético anual superior para a idade ao primeiro parto e para a produção de leite (305 dias) entre embriões produzidos in vitro do que entre doadoras ou entre a população geral. O terceiro estudo examinou os efeitos do coeficiente de endogamia da doadora, do reprodutor (usado na fertilização in vitro) e dos embriões sobre resultados de produção in vitro de embriões na raça Guzerá. Foi detectado efeito da endogamia da doadora e dos embriões sobre as características estudadas. O quarto (e último) estudo foi elaborado para comparar a adequação de modelos mistos lineares e generalizados sob método de Máxima Verossimilhança Restrita (REML) e sua adequação a variáveis discretas. Quatro modelos hierárquicos assumindo diferentes distribuições para dados de contagem encontrados no banco. Inferência foi realizada com base em diagnósticos de resíduo e comparação de razões entre componentes de variância para os modelos em cada variável. Modelos Poisson superaram tanto o modelo linear (com e sem transformação da variável) quanto binomial negativo à qualidade do ajuste e capacidade preditiva, apesar de claras diferenças observadas na distribuição das variáveis. Entre os modelos testados, a pior qualidade de ajuste foi obtida para o modelo linear mediante transformação logarítmica (Log10 X +1) da variável resposta.
Resumo:
The use of presence/absence data in wildlife management and biological surveys is widespread. There is a growing interest in quantifying the sources of error associated with these data. We show that false-negative errors (failure to record a species when in fact it is present) can have a significant impact on statistical estimation of habitat models using simulated data. Then we introduce an extension of logistic modeling, the zero-inflated binomial (ZIB) model that permits the estimation of the rate of false-negative errors and the correction of estimates of the probability of occurrence for false-negative errors by using repeated. visits to the same site. Our simulations show that even relatively low rates of false negatives bias statistical estimates of habitat effects. The method with three repeated visits eliminates the bias, but estimates are relatively imprecise. Six repeated visits improve precision of estimates to levels comparable to that achieved with conventional statistics in the absence of false-negative errors In general, when error rates are less than or equal to50% greater efficiency is gained by adding more sites, whereas when error rates are >50% it is better to increase the number of repeated visits. We highlight the flexibility of the method with three case studies, clearly demonstrating the effect of false-negative errors for a range of commonly used survey methods.
Resumo:
Run-off-road (ROR) crashes have increasingly become a serious concern for transportation officials in the State of Florida. These types of crashes have increased proportionally in recent years statewide and have been the focus of the Florida Department of Transportation. The goal of this research was to develop statistical models that can be used to investigate the possible causal relationships between roadway geometric features and ROR crashes on Florida's rural and urban principal arterials. ^ In this research, Zero-Inflated Poisson (ZIP) and Zero-Inflated Negative Binomial (ZINB) Regression models were used to better model the excessive number of roadway segments with no ROR crashes. Since Florida covers a diverse area and since there are sixty-seven counties, it was divided into four geographical regions to minimize possible unobserved heterogeneity. Three years of crash data (2000–2002) encompassing those for principal arterials on the Florida State Highway System were used. Several statistical models based on the ZIP and ZINB regression methods were fitted to predict the expected number of ROR crashes on urban and rural roads for each region. Each region was further divided into urban and rural areas, resulting in a total of eight crash models. A best-fit predictive model was identified for each of these eight models in terms of AIC values. The ZINB regression was found to be appropriate for seven of the eight models and the ZIP regression was found to be more appropriate for the remaining model. To achieve model convergence, some explanatory variables that were not statistically significant were included. Therefore, strong conclusions cannot be derived from some of these models. ^ Given the complex nature of crashes, recommendations for additional research are made. The interaction of weather and human condition would be quite valuable in discerning additional causal relationships for these types of crashes. Additionally, roadside data should be considered and incorporated into future research of ROR crashes. ^
Resumo:
The relationship between workplace absenteeism and adverse lifestyle factors (smoking, physical inactivity and poor dietary patterns) remains ambiguous. Reliance on self-reported absenteeism and obesity measures may contribute to this uncertainty. Using objective absenteeism and health status measures, the present study aimed to investigate what health status outcomes and lifestyle factors influence workplace absenteeism. Cross-sectional data were obtained from a complex workplace dietary intervention trial, the Food Choice at Work Study. Four multinational manufacturing workplaces in Cork, Republic of Ireland. Participants included 540 randomly selected employees from the four workplaces. Annual count absenteeism data were collected. Physical assessments included objective health status measures (BMI, midway waist circumference and blood pressure). FFQ measured diet quality from which DASH (Dietary Approaches to Stop Hypertension) scores were constructed. A zero-inflated negative binomial (zinb) regression model examined associations between health status outcomes, lifestyle characteristics and absenteeism. The mean number of absences was 2·5 (sd 4·5) d. After controlling for sociodemographic and lifestyle characteristics, the zinb model indicated that absenteeism was positively associated with central obesity, increasing expected absence rate by 72 %. Consuming a high-quality diet and engaging in moderate levels of physical activity were negatively associated with absenteeism and reduced expected frequency by 50 % and 36 %, respectively. Being in a managerial/supervisory position also reduced expected frequency by 50 %. To reduce absenteeism, workplace health promotion policies should incorporate recommendations designed to prevent and manage excess weight, improve diet quality and increase physical activity levels of employees.
Resumo:
Field infestation and spatial distribution of introduced Bactrocera carambolae Drew and Hancock and native species of Anastrepha in common guavas [Psidium guajava (L.)] were investigated in the eastern Amazon. Fruit sampling was carried out in the municipalities of Calc¸oene and Oiapoque in the state of Amapa, Brazil. The frequency distribution of larvae in fruit was fitted to the negative binomial distribution. Anastrepha striata was more abundant in both sampled areas in comparison to Anastrepha fraterculus (Wiedemann) and B. carambolae. The frequency distribution analysis of adults revealed an aggregated pattern for B. carambolae as well as for A. fraterculus and Anastrepha striata Schiner, described by the negative binomial distribution. Although the populations of Anastrepha spp. may have suffered some impact due to the presence of B. carambolae, the results are still not robust enough to indicate effective reduction in the abundance of Anastrepha spp. caused by B. carambolae in a general sense. The high degree of aggregation observed for both species suggests interspecific co-occurrence with the simultaneous presence of both species in the analysed fruit. Moreover, a significant fraction of uninfested guavas also indicated absence of competitive displacement.
Resumo:
This paper develops a semiparametric estimation approach for mixed count regression models based on series expansion for the unknown density of the unobserved heterogeneity. We use the generalized Laguerre series expansion around a gamma baseline density to model unobserved heterogeneity in a Poisson mixture model. We establish the consistency of the estimator and present a computational strategy to implement the proposed estimation techniques in the standard count model as well as in truncated, censored, and zero-inflated count regression models. Monte Carlo evidence shows that the finite sample behavior of the estimator is quite good. The paper applies the method to a model of individual shopping behavior. © 1999 Elsevier Science S.A. All rights reserved.
Resumo:
Background Detection of outbreaks is an important part of disease surveillance. Although many algorithms have been designed for detecting outbreaks, few have been specifically assessed against diseases that have distinct seasonal incidence patterns, such as those caused by vector-borne pathogens. Methods We applied five previously reported outbreak detection algorithms to Ross River virus (RRV) disease data (1991-2007) for the four local government areas (LGAs) of Brisbane, Emerald, Redland and Townsville in Queensland, Australia. The methods used were the Early Aberration Reporting System (EARS) C1, C2 and C3 methods, negative binomial cusum (NBC), historical limits method (HLM), Poisson outbreak detection (POD) method and the purely temporal SaTScan analysis. Seasonally-adjusted variants of the NBC and SaTScan methods were developed. Some of the algorithms were applied using a range of parameter values, resulting in 17 variants of the five algorithms. Results The 9,188 RRV disease notifications that occurred in the four selected regions over the study period showed marked seasonality, which adversely affected the performance of some of the outbreak detection algorithms. Most of the methods examined were able to detect the same major events. The exception was the seasonally-adjusted NBC methods that detected an excess of short signals. The NBC, POD and temporal SaTScan algorithms were the only methods that consistently had high true positive rates and low false positive and false negative rates across the four study areas. The timeliness of outbreak signals generated by each method was also compared but there was no consistency across outbreaks and LGAs. Conclusions This study has highlighted several issues associated with applying outbreak detection algorithms to seasonal disease data. In lieu of a true gold standard, a quantitative comparison is difficult and caution should be taken when interpreting the true positives, false positives, sensitivity and specificity.
Resumo:
Background Multilevel and spatial models are being increasingly used to obtain substantive information on area-level inequalities in cancer survival. Multilevel models assume independent geographical areas, whereas spatial models explicitly incorporate geographical correlation, often via a conditional autoregressive prior. However the relative merits of these methods for large population-based studies have not been explored. Using a case-study approach, we report on the implications of using multilevel and spatial survival models to study geographical inequalities in all-cause survival. Methods Multilevel discrete-time and Bayesian spatial survival models were used to study geographical inequalities in all-cause survival for a population-based colorectal cancer cohort of 22,727 cases aged 20–84 years diagnosed during 1997–2007 from Queensland, Australia. Results Both approaches were viable on this large dataset, and produced similar estimates of the fixed effects. After adding area-level covariates, the between-area variability in survival using multilevel discrete-time models was no longer significant. Spatial inequalities in survival were also markedly reduced after adjusting for aggregated area-level covariates. Only the multilevel approach however, provided an estimation of the contribution of geographical variation to the total variation in survival between individual patients. Conclusions With little difference observed between the two approaches in the estimation of fixed effects, multilevel models should be favored if there is a clear hierarchical data structure and measuring the independent impact of individual- and area-level effects on survival differences is of primary interest. Bayesian spatial analyses may be preferred if spatial correlation between areas is important and if the priority is to assess small-area variations in survival and map spatial patterns. Both approaches can be readily fitted to geographically enabled survival data from international settings
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
Published as an article in: Investigaciones Economicas, 2005, vol. 29, issue 3, pages 483-523.
Resumo:
In Asia, especially in China, our knowledge of the distribution of testate amoebae is still limited. In this paper, the geographical distribution of testate amoebae in Tibetan Plateau and northwestern Yunnan Plateau, southwest China and their relationships with the climatic factors have been studied. We found testate amoebae shifted in the most dominant species and increased in species (or genus) richness from northwest to southeast. Further, the linear regression analyses revealed that both species richness and genus richness have higher positive correlations with the mean temperature of the warmest month and annual mean precipitation as contrasted with the mean altitude, which showed weak negative correlation. This indicates that the temperature and precipitation are more significant influences on the richness than the altitude. The cluster analysis based on the community structure, defined by Sorenson's coefficient matrix, suggested four groups from the 10 physiographical regions. This geographical distribution pattern was also closely related with the climatic regionalization. The present climatic regionalization pattern of the study area originated from the uplift of Tibetan Plateau and mainly occurred in or after the late Pleistocene. Therefore, the geographical distribution of testate amoebae in our study area may have experienced complicated and drastic changes corresponding to the variation of the climate caused by the geological events.
Resumo:
We study the kinetics of protein folding via statistical energy landscape theory. We concentrate on the local-connectivity case, where the configurational changes can only occur among neighboring states, with the folding progress described in terms of an order parameter given by the fraction of native conformations. The non-Markovian diffusion dynamics is analyzed in detail and an expression for the mean first-passage time (MFPT) from non-native unfolded states to native folded state is obtained. It was found that the MFPT has a V-shaped dependence on the temperature. We also find that the MFPT is shortened as one increases the gap between the energy of the native and average non-native folded states relative to the fluctuations of the energy landscape. The second- and higher-order moments are studied to infer the first-passage time distribution. At high temperature, the distribution becomes close to a Poisson distribution, while at low temperatures the distribution becomes a Levy-type distribution with power-law tails, indicating a nonself-averaging intermittent behavior of folding dynamics. We note the likely relevance of this result to single-molecule dynamics experiments, where a power law (Levy) distribution of the relaxation time of the underlined protein energy landscape is observed.