932 resultados para Generalized Linear-models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE To assess trends in the frequency of concomitant vascular reconstructions (VRs) from 2000 through 2009 among patients who underwent pancreatectomy, as well as to compare the short-term outcomes between patients who underwent pancreatic resection with and without VR. DESIGN Single-center series have been conducted to evaluate the short-term and long-term outcomes of VR during pancreatic resection. However, its effectiveness from a population-based perspective is still unknown. Unadjusted, multivariable, and propensity score-adjusted generalized linear models were performed. SETTING Nationwide Inpatient Sample from 2000 through 2009. PATIENTS A total of 10 206 patients were involved. MAIN OUTCOME MEASURES Incidence of VR during pancreatic resection, perioperative in-hospital complications, and length of hospital stay. RESULTS Overall, 10 206 patients were included in this analysis. Of these, 412 patients (4.0%) underwent VR, with the rate increasing from 0.7% in 2000 to 6.0% in 2009 (P < .001). Patients who underwent pancreatic resection with VR were at a higher risk for intraoperative (propensity score-adjusted odds ratio, 1.94; P = .001) and postoperative (propensity score-adjusted odds ratio, 1.36; P = .008) complications, while the mortality and median length of hospital stay were similar to those of patients without VR. Among the 25% of hospitals with the highest surgical volume, patients who underwent pancreatic surgery with VR had significantly higher rates of postoperative complications and mortality than patients without VR. CONCLUSIONS The frequency of VR during pancreatic surgery is increasing in the United States. In contrast with most single-center analyses, this population-based study demonstrated that patients who underwent VR during pancreatic surgery had higher rates of adverse postoperative outcomes than their counterparts who underwent pancreatic resection only. Prospective studies incorporating long-term outcomes are warranted to further define which patients benefit from VR.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Generalized linear mixed models (GLMM) are generalized linear models with normally distributed random effects in the linear predictor. Penalized quasi-likelihood (PQL), an approximate method of inference in GLMMs, involves repeated fitting of linear mixed models with “working” dependent variables and iterative weights that depend on parameter estimates from the previous cycle of iteration. The generality of PQL, and its implementation in commercially available software, has encouraged the application of GLMMs in many scientific fields. Caution is needed, however, since PQL may sometimes yield badly biased estimates of variance components, especially with binary outcomes. Recent developments in numerical integration, including adaptive Gaussian quadrature, higher order Laplace expansions, stochastic integration and Markov chain Monte Carlo (MCMC) algorithms, provide attractive alternatives to PQL for approximate likelihood inference in GLMMs. Analyses of some well known datasets, and simulations based on these analyses, suggest that PQL still performs remarkably well in comparison with more elaborate procedures in many practical situations. Adaptive Gaussian quadrature is a viable alternative for nested designs where the numerical integration is limited to a small number of dimensions. Higher order Laplace approximations hold the promise of accurate inference more generally. MCMC is likely the method of choice for the most complex problems that involve high dimensional integrals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Risk factors and outcomes of bronchial stricture after lung transplantation are not well defined. An association between acute rejection and development of stricture has been suggested in small case series. We evaluated this relationship using a large national registry. METHODS: All lung transplantations between April 1994 and December 2008 per the United Network for Organ Sharing (UNOS) database were analyzed. Generalized linear models were used to determine the association between early rejection and development of stricture after adjusting for potential confounders. The association of stricture with postoperative lung function and overall survival was also evaluated. RESULTS: Nine thousand three hundred thirty-five patients were included for analysis. The incidence of stricture was 11.5% (1,077/9,335), with no significant change in incidence during the study period (P=0.13). Early rejection was associated with a significantly greater incidence of stricture (adjusted odds ratio [AOR], 1.40; 95% confidence interval [CI], 1.22-1.61; p<0.0001). Male sex, restrictive lung disease, and pretransplantation requirement for hospitalization were also associated with stricture. Those who experienced stricture had a lower postoperative peak percent predicted forced expiratory volume at 1 second (FEV1) (median 74% versus 86% for bilateral transplants only; p<0.0001), shorter unadjusted survival (median 6.09 versus 6.82 years; p<0.001) and increased risk of death after adjusting for potential confounders (adjusted hazard ratio 1.13; 95% CI, 1.03-1.23; p=0.007). CONCLUSIONS: Early rejection is associated with an increased incidence of stricture. Recipients with stricture demonstrate worse postoperative lung function and survival. Prospective studies may be warranted to further assess causality and the potential for coordinated rejection and stricture surveillance strategies to improve postoperative outcomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We examined outcomes and trends in surgery and radiation use for patients with locally advanced esophageal cancer, for whom optimal treatment isn't clear. Trends in surgery and radiation for patients with T1-T3N1M0 squamous cell or adenocarcinoma of the mid or distal esophagus in the Surveillance, Epidemiology, and End Results database from 1998 to 2008 were analyzed using generalized linear models including year as predictor; Surveillance, Epidemiology, and End Results doesn't record chemotherapy data. Local treatment was unimodal if patients had only surgery or radiation and bimodal if they had both. Five-year cancer-specific survival (CSS) and overall survival (OS) were analyzed using propensity-score adjusted Cox proportional-hazard models. Overall 5-year survival for the 3295 patients identified (mean age 65.1 years, standard deviation 11.0) was 18.9% (95% confidence interval: 17.3-20.7). Local treatment was bimodal for 1274 (38.7%) and unimodal for 2021 (61.3%) patients; 1325 (40.2%) had radiation alone and 696 (21.1%) underwent only surgery. The use of bimodal therapy (32.8-42.5%, P = 0.01) and radiation alone (29.3-44.5%, P < 0.001) increased significantly from 1998 to 2008. Bimodal therapy predicted improved CSS (hazard ratios [HR]: 0.68, P < 0.001) and OS (HR: 0.58, P < 0.001) compared with unimodal therapy. For the first 7 months (before survival curve crossing), CSS after radiation therapy alone was similar to surgery alone (HR: 0.86, P = 0.12) while OS was worse for surgery only (HR: 0.70, P = 0.001). However, worse CSS (HR: 1.43, P < 0.001) and OS (HR: 1.46, P < 0.001) after that initial timeframe were found for radiation therapy only. The use of radiation to treat locally advanced mid and distal esophageal cancers increased from 1998 to 2008. Survival was best when both surgery and radiation were used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND Estimating the prevalence of comorbidities and their associated costs in patients with diabetes is fundamental to optimizing health care management. This study assesses the prevalence and health care costs of comorbid conditions among patients with diabetes compared with patients without diabetes. Distinguishing potentially diabetes- and nondiabetes-related comorbidities in patients with diabetes, we also determined the most frequent chronic conditions and estimated their effect on costs across different health care settings in Switzerland. METHODS Using health care claims data from 2011, we calculated the prevalence and average health care costs of comorbidities among patients with and without diabetes in inpatient and outpatient settings. Patients with diabetes and comorbid conditions were identified using pharmacy-based cost groups. Generalized linear models with negative binomial distribution were used to analyze the effect of comorbidities on health care costs. RESULTS A total of 932,612 persons, including 50,751 patients with diabetes, were enrolled. The most frequent potentially diabetes- and nondiabetes-related comorbidities in patients older than 64 years were cardiovascular diseases (91%), rheumatologic conditions (55%), and hyperlipidemia (53%). The mean total health care costs for diabetes patients varied substantially by comorbidity status (US$3,203-$14,223). Patients with diabetes and more than two comorbidities incurred US$10,584 higher total costs than patients without comorbidity. Costs were significantly higher in patients with diabetes and comorbid cardiovascular disease (US$4,788), hyperlipidemia (US$2,163), hyperacidity disorders (US$8,753), and pain (US$8,324) compared with in those without the given disease. CONCLUSION Comorbidities in patients with diabetes are highly prevalent and have substantial consequences for medical expenditures. Interestingly, hyperacidity disorders and pain were the most costly conditions. Our findings highlight the importance of developing strategies that meet the needs of patients with diabetes and comorbidities. Integrated diabetes care such as used in the Chronic Care Model may represent a useful strategy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aim Our aims were to compare the composition of testate amoeba (TA) communities from Santa Cruz Island, Galápagos Archipelago, which are likely in existence only as a result of anthropogenic habitat transformation, with similar naturally occurring communities from northern and southern continental peatlands. Additionally, we aimed at assessing the importance of niche-based and dispersal-based processes in determining community composition and taxonomic and functional diversity. Location The humid highlands of the central island of Santa Cruz, Galápagos Archipelago. Methods We survey the alpha, beta and gamma taxonomic and functional diversities of TA, and the changes in functional traits along a gradient of wet to dry habitats. We compare the TA community composition, abundance and frequency recorded in the insular peatlands with that recorded in continental peatlands of Northern and Southern Hemispheres. We use generalized linear models to determine how environmental conditions influence taxonomic and functional diversity as well as the mean values of functional traits within communities. We finally apply variance partitioning to assess the relative importance of niche- and dispersal-based processes in determining community composition. Results TA communities in Santa Cruz Island were different from their Northern Hemisphere and South American counterparts with most genera considered as characteristic for Northern Hemisphere and South American Sphagnum peatlands missing or very rare in the Galápagos. Functional traits were most correlated with elevation and site topography and alpha functional diversity to the type of material sampled and site topography. Community composition was more strongly correlated with spatial variables than with environmental ones. Main conclusions TA communities of the Sphagnum peatlands of Santa Cruz Island and the mechanisms shaping these communities contrast with Northern Hemisphere and South American peatlands. Soil moisture was not a strong predictor of community composition most likely because rainfall and clouds provide sufficient moisture. Dispersal limitation was more important than environmental filtering because of the isolation of the insular peatlands from continental ones and the young ecological history of these ecosystems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Secchi depth is a measure of water transparency. In the Baltic Sea region, Secchi depth maps are used to assess eutrophication and as input for habitat models. Due to their spatial and temporal coverage, satellite data would be the most suitable data source for such maps. But the Baltic Sea's optical properties are so different from the open ocean that globally calibrated standard models suffer from large errors. Regional predictive models that take the Baltic Sea's special optical properties into account are thus needed. This paper tests how accurately generalized linear models (GLMs) and generalized additive models (GAMs) with MODIS/Aqua and auxiliary data as inputs can predict Secchi depth at a regional scale. It uses cross-validation to test the prediction accuracy of hundreds of GAMs and GLMs with up to 5 input variables. A GAM with 3 input variables (chlorophyll a, remote sensing reflectance at 678 nm, and long-term mean salinity) made the most accurate predictions. Tested against field observations not used for model selection and calibration, the best model's mean absolute error (MAE) for daily predictions was 1.07 m (22%), more than 50% lower than for other publicly available Baltic Sea Secchi depth maps. The MAE for predicting monthly averages was 0.86 m (15%). Thus, the proposed model selection process was able to find a regional model with good prediction accuracy. It could be useful to find predictive models for environmental variables other than Secchi depth, using data from other satellite sensors, and for other regions where non-standard remote sensing models are needed for prediction and mapping. Annual and monthly mean Secchi depth maps for 2003-2012 come with this paper as Supplementary materials.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The principal risks in the railway industry are mainly associated with collisions, derailments and level crossing accidents. An understanding of the nature of previous accidents on the railway network is required to identify potential causes and develop safety systems and deploy safety procedures. Risk assessment is a process for determining the risk magnitude to assist with decision-making. We propose a three-step methodology to predict the mean number of fatalities in railway accidents. The first is to predict the mean number of accidents by analyzing generalized linear models and selecting the one that best fits to the available historical data on the basis of goodness-offit statistics. The second is to compute the mean number of fatalities per accident and the third is to estimate the mean number of fatalities. The methodology is illustrated on the Spanish railway system. Statistical models accounting for annual and grouped data for the 1992-2009 time period have been analyzed. After identifying the models for broad and narrow gauges, we predicted mean number of accidents and the number of fatalities for the 2010-18 time period.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2002 Mathematics Subject Classification: 62M10.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62P10, 62J12.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.