961 resultados para Generalized Additive Models


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The analysis of concurrent constraint programs is a challenge due to the inherently concurrent behaviour of its computational model. However, most implementations of the concurrent paradigm can be viewed as a computation with a fixed scheduling rule which suspends some goals so that their execution is postponed until some condition awakens them. For a certain kind of properties, an analysis defined in these terms is correct. Furthermore, it is much more tractable, and in addition can make use of existing analysis technology for the underlying fixed computation rule. We show how this can be done when the starting point is a framework for the analysis of sequential programs. The resulting analysis, which incorporates suspensions, is adequate for concurrent models where concurrency is localized, e.g. the Andorra model. We refine the analysis for this particular case. Another model in which concurrency is preferably encapsulated, and thus suspensions are local to parts of the computation, is that of CIAO. Nonetheless, the analysis scheme can be generalized to models with global concurrency. We also sketch how this could be done, and we show how the resulting analysis framework could be used for analyzing typical properties, such as suspensión freeness.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The principal risks in the railway industry are mainly associated with collisions, derailments and level crossing accidents. An understanding of the nature of previous accidents on the railway network is required to identify potential causes and develop safety systems and deploy safety procedures. Risk assessment is a process for determining the risk magnitude to assist with decision-making. We propose a three-step methodology to predict the mean number of fatalities in railway accidents. The first is to predict the mean number of accidents by analyzing generalized linear models and selecting the one that best fits to the available historical data on the basis of goodness-offit statistics. The second is to compute the mean number of fatalities per accident and the third is to estimate the mean number of fatalities. The methodology is illustrated on the Spanish railway system. Statistical models accounting for annual and grouped data for the 1992-2009 time period have been analyzed. After identifying the models for broad and narrow gauges, we predicted mean number of accidents and the number of fatalities for the 2010-18 time period.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Species distribution models (SDM) predict species occurrence based on statistical relationships with environmental conditions. The R-package biomod2 which includes 10 different SDM techniques and 10 different evaluation methods was used in this study. Macroalgae are the main biomass producers in Potter Cove, King George Island (Isla 25 de Mayo), Antarctica, and they are sensitive to climate change factors such as suspended particulate matter (SPM). Macroalgae presence and absence data were used to test SDMs suitability and, simultaneously, to assess the environmental response of macroalgae as well as to model four scenarios of distribution shifts by varying SPM conditions due to climate change. According to the averaged evaluation scores of Relative Operating Characteristics (ROC) and True scale statistics (TSS) by models, those methods based on a multitude of decision trees such as Random Forest and Classification Tree Analysis, reached the highest predictive power followed by generalized boosted models (GBM) and maximum-entropy approaches (Maxent). The final ensemble model used 135 of 200 calculated models (TSS > 0.7) and identified hard substrate and SPM as the most influencing parameters followed by distance to glacier, total organic carbon (TOC), bathymetry and slope. The climate change scenarios show an invasive reaction of the macroalgae in case of less SPM and a retreat of the macroalgae in case of higher assumed SPM values.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Blue whiting (Micromesistius poutassou, http://www.marinespecies.org/aphia.php?p=taxdetails&id=126439) is a small mesopelagic planktivorous gadoid found throughout the North-East Atlantic. This data contains the results of a model-based analysis of larvae captured by the Continuous Plankton Recorder (CPR) during the period 1951-2005. The observations are analysed using Generalised Additive Models (GAMs) of the the spatial, seasonal and interannual variation in the occurrence of larvae. The best fitting model is chosen using the Aikaike Information Criteria (AIC). The probability of occurrence in the continous plankton recorder is then normalised and converted to a probability distribution function in space (UTM projection Zone 28) and season (day of year). The best fitting model splits the distribution into two separate spawning grounds north and south of a dividing line at 53 N. The probability distribution is therefore normalised in these two regions (ie the space-time integral over each of the two regions is 1). The modelled outputs are on a UTM Zone 28 grid: however, for convenience, the latitude ("lat") and longitude ("lon") of each of these grid points are also included as a variable in the NetCDF file. The assignment of each grid point to either the Northern or Southern component (defined here as north/south of 53 N), is also included as a further variable ("component"). Finally, the day of year ("doy") is stored as the number of days elapsed from and included January 1 (ie doy=1 on January 1) - the year is thereafter divided into 180 grid points.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Context: There is evidence suggesting that the prevalence of disability in late life has declined over time while the prevalence of disabling chronic diseases has increased. The dynamic equilibrium of morbidity hypothesis suggests that these seemingly contradictory trends are due to the attenuation of the morbidity-disability link over time. The aim of this study was to empirically test this assumption.Methods: Data were drawn from three repeated cross-sections of SWEOLD, a population-based survey among the Swedish men and women ages 77 and older. Logistic regression models were fitted to assess the trends in the prevalence of Activities of Daily Living (ADL) disability, Instrumental ADL (IADL) disability, and selected groups of chronic conditions. The changes in the associations between chronic conditions and disabilities were examined in both multiplicative and additive models.Results: Between 1992 and 2011, the odds of ADL disability significantly declined among women whereas the odds of IADL disability significantly declined among men. During the same period, the prevalence of most chronic morbidities including multimorbidity went up. Significant attenuations of the morbidity-disability associations were found for cardiovascular diseases, metabolic disorders, poor lung function, psychological distress, and multimorbidity.Conclusion: In agreement with the dynamic equilibrium hypothesis, this study concludes that the associations between chronic conditions and disability among the Swedish older adults have largely waned over time.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-06

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents an effective decision making system for leak detection based on multiple generalized linear models and clustering techniques. The training data for the proposed decision system is obtained by setting up an experimental pipeline fully operational distribution system. The system is also equipped with data logging for three variables; namely, inlet pressure, outlet pressure, and outlet flow. The experimental setup is designed such that multi-operational conditions of the distribution system, including multi pressure and multi flow can be obtained. We then statistically tested and showed that pressure and flow variables can be used as signature of leak under the designed multi-operational conditions. It is then shown that the detection of leakages based on the training and testing of the proposed multi model decision system with pre data clustering, under multi operational conditions produces better recognition rates in comparison to the training based on the single model approach. This decision system is then equipped with the estimation of confidence limits and a method is proposed for using these confidence limits for obtaining more robust leakage recognition results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

2002 Mathematics Subject Classification: 62M10.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62P10, 62J12.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Smoking prevalence among adolescents in the Middle East remains high while rates of smoking have been declining among adolescents elsewhere. The aims of this research were to (1) describe patterns of cigarette and waterpipe (WP) smoking, (2) identify determinants of WP smoking initiation, and (3) identify determinants of cigarette smoking initiation in a cohort of Jordanian school children. ^ Among this cohort of school children in Irbid, Jordan, (age ≈ 12.6 at baseline) the first aim (N=1,781) described time trends in smoking behavior, age at initiation, and changes in frequency of smoking from 2008–2011 (grades 7–10). The second aim (N=1,243) identified determinants of WP initiation among WP-naïve students; and the third aim (N=1,454) identified determinants of cigarette smoking initiation among cigarette naïve participants. Determinants of initiation were assessed with generalized mixed models. All analyses were stratified by gender. ^ Baseline prevalence of current smoking (cigarettes or WP) for boys and girls was 22.9% and 8.7% respectively. Prevalence of ever- and current- any smoking, cigarette smoking, WP smoking, and dual cigarette/WP smoking was higher in boys than girls each year (p<0.001). At all time points, prevalence of WP smoking was higher than that of cigarette smoking (p<0.001) for both boys and girls. WP initiation was documented in 39% of boys and 28% of girls. Cigarette initiation was documented in 37% of boys and 24% of girls. Determinants of WP initiation included ever-cigarette smoking, low WP refusal self-efficacy, intention to smoke, and having teachers and friends who smoke WP. Determinants of cigarette smoking initiation included ever-WP smoking, low cigarette refusal self-efficacy, intention to start smoking cigarettes, and having friends and family who smoke.^ These studies reveal intensive smoking patterns at early ages among Jordanian youth in Irbid, characterized by a predominance of WP smoking. WP may be a vehicle for tobacco dependence and subsequent cigarette uptake. The sizeable incidence of WP and cigarette initiation among students of both sexes points to a need for culturally relevant smoking prevention interventions. Gender-specific factors, refusal skills, and smoking cessation of both WP and cigarettes for youth and their parents/teachers would be important components of such initiatives. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.