922 resultados para Generalized Linear Model


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Магдалина Василева Тодорова - В статията е описан подход за верификация на процедурни програми чрез изграждане на техни модели, дефинирани чрез обобщени мрежи. Подходът интегрира концепцията “design by contract” с подходи за верификация от тип доказателство на теореми и проверка на съгласуваност на модели. За целта разделно се верифицират функциите, които изграждат програмата относно спецификации според предназначението им. Изгражда се обобщен мрежов модел, специфициащ връзките между функциите във вид на коректни редици от извиквания. За главната функция на програмата се построява обобщен мрежов модел и се проверява дали той съответства на мрежовия модел на връзките между функциите на програмата. Всяка от функциите на програмата, която използва други функции се верифицира и относно спецификацията, зададена чрез мрежовия модел на връзките между функциите на програмата.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62P10, 62J12.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper explains how Poisson regression can be used in studies in which the dependent variable describes the number of occurrences of some rare event such as suicide. After pointing out why ordinary linear regression is inappropriate for treating dependent variables of this sort, we go on to present the basic Poisson regression model and show how it fits in the broad class of generalized linear models. Then we turn to discussing a major problem of Poisson regression known as overdispersion and suggest possible solutions, including the correction of standard errors and negative binomial regression. The paper ends with a detailed empirical example, drawn from our own research on suicide.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We investigate by means of Monte Carlo simulation and finite-size scaling analysis the critical properties of the three dimensional O (5) non-linear σ model and of the antiferromagnetic RP^(2) model, both of them regularized on a lattice. High accuracy estimates are obtained for the critical exponents, universal dimensionless quantities and critical couplings. It is concluded that both models belong to the same universality class, provided that rather non-standard identifications are made for the momentum-space propagator of the RP^(2) model. We have also investigated the phase diagram of the RP^(2) model extended by a second-neighbor interaction. A rich phase diagram is found, where most of the phase transitions are of the first order.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Conifer populations appear disproportionately threatened by global change. Most examples are, however, drawn from the northern hemisphere and long-term rates of population decline are not well documented as historical data are often lacking. We use a large and long-term (1931-2013) repeat photography dataset together with environmental data and fire records to account for the decline of the critically endangered Widdringtonia cedarbergensis. Eighty-seven historical and repeat photo-pairs were analysed to establish 20th century changes in W. cedarbergensis demography. A generalized linear mixed-effects model was fitted to determine the relative importance of environmental factors and fire-return interval on mortality for the species. Results: From an initial total of 1313 live trees in historical photographs, 74% had died and only 44 (3.4%) had recruited in the repeat photographs, leaving 387 live individuals. Juveniles (mature adults) had decreased (increased) from 27% (73%) to 8% (92%) over the intervening period. Our model demonstrates that mortality is related to greater fire frequency, higher temperatures, lower elevations, less rocky habitats and aspect (i.e. east-facing slopes had the least mortality). Conclusions: Our results show that W. cedarbergensis populations have declined significantly over the recorded period, with a pronounced decline in the last 30 years. Individuals that established in open habitats at lower, hotter elevations and experienced a greater fire frequency appear to be more vulnerable to mortality than individuals growing within protected, rocky environments at higher, cooler locations with less frequent fires. Climate models predict increasing temperatures for our study area (and likely increases in wildfires). If these predictions are realised, further declines in the species can be expected. Urgent management interventions, including seedling out-planting in fire-protected high elevation sites, reducing fire frequency in higher elevation populations, and assisted migration, should be considered.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The municipal management in any country of the globe requires planning and allocation of resources evenly. In Brazil, the Law of Budgetary Guidelines (LDO) guides municipal managers toward that balance. This research develops a model that seeks to find the balance of the allocation of public resources in Brazilian municipalities, considering the LDO as a parameter. For this using statistical techniques and multicriteria analysis as a first step in order to define allocation strategies, based on the technical aspects arising from the municipal manager. In a second step, presented in linear programming based optimization where the objective function is derived from the preference of the results of the manager and his staff. The statistical representation is presented to support multicriteria development in the definition of replacement rates through time series. The multicriteria analysis was structured by defining the criteria, alternatives and the application of UTASTAR methods to calculate replacement rates. After these initial settings, an application of linear programming was developed to find the optimal allocation of enforcement resources of the municipal budget. Data from the budget of a municipality in southwestern Paraná were studied in the application of the model and analysis of results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

O objetivo desse estudo foi caracterizar a composição florística e a estrutura do componente arbóreo em fragmento de Floresta Ombrófila Mista Alto-Montana e avaliar a influência do efeito de borda sobre a organização, estrutura, riqueza e diversidade de espécies. Foram alocadas 50 parcelas permanentes de 10 x 20 m, divididas em cinco transeções distanciadas, no mínimo, 100 m entre si, em um fragmento florestal, no município de Bom Jardim da Serra - SC. As árvores com circunferência ≥ 15,7 cm na altura do peito (CAP) foram mensuradas (CAP e altura total), identificadas e classificadas quanto às guildas de regeneração (pioneiras, climácicas exigentes em luz e climácicas tolerantes à sombra). Os dados foram analisados por meio dos índices de valor de importância (IVI), NMDS (Nonmetric Multidimensional Scaling), modelo aditivo generalizado e regressões lineares simples. Foram observados 1.457 indivíduos arbóreos, distribuídos em 29 famílias, 43 gêneros e 55 espécies. A espécie com maior valor de importância foi Dicksonia sellowiana Hook. Não foi observada influência do efeito de borda sobre a organização, a estrutura (diâmetro médio, altura média e densidade) da comunidade e participação relativa das guildas de regeneração. No entanto, ficaram evidenciados maiores valores de diversidade, riqueza e equabilidade nas áreas de borda. Desta forma, concluí-se que parte das variações dos valores relativos à diversidade de espécies arbóreas na Floresta Ombrófila Mista Ato-Montana foi determinada pela distância da borda.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this work, the relationship between diameter at breast height (d) and total height (h) of individual-tree was modeled with the aim to establish provisory height-diameter (h-d) equations for maritime pine (Pinus pinaster Ait.) stands in the Lomba ZIF, Northeast Portugal. Using data collected locally, several local and generalized h-d equations from the literature were tested and adaptations were also considered. Model fitting was conducted by using usual nonlinear least squares (nls) methods. The best local and generalized models selected, were also tested as mixed models applying a first-order conditional expectation (FOCE) approximation procedure and maximum likelihood methods to estimate fixed and random effects. For the calibration of the mixed models and in order to be consistent with the fitting procedure, the FOCE method was also used to test different sampling designs. The results showed that the local h-d equations with two parameters performed better than the analogous models with three parameters. However a unique set of parameter values for the local model can not be used to all maritime pine stands in Lomba ZIF and thus, a generalized model including covariates from the stand, in addition to d, was necessary to obtain an adequate predictive performance. No evident superiority of the generalized mixed model in comparison to the generalized model with nonlinear least squares parameters estimates was observed. On the other hand, in the case of the local model, the predictive performance greatly improved when random effects were included. The results showed that the mixed model based in the local h-d equation selected is a viable alternative for estimating h if variables from the stand are not available. Moreover, it was observed that it is possible to obtain an adequate calibrated response using only 2 to 5 additional h-d measurements in quantile (or random) trees from the distribution of d in the plot (stand). Balancing sampling effort, accuracy and straightforwardness in practical applications, the generalized model from nls fit is recommended. Examples of applications of the selected generalized equation to the forest management are presented, namely how to use it to complete missing information from forest inventory and also showing how such an equation can be incorporated in a stand-level decision support system that aims to optimize the forest management for the maximization of wood volume production in Lomba ZIF maritime pine stands.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

O prognóstico da perda dentária é um dos principais problemas na prática clínica de medicina dentária. Um dos principais fatores prognósticos é a quantidade de suporte ósseo do dente, definido pela área da superfície radicular dentária intraóssea. A estimação desta grandeza tem sido realizada por diferentes metodologias de investigação com resultados heterogéneos. Neste trabalho utilizamos o método da planimetria com microtomografia para calcular a área da superfície radicular (ASR) de uma amostra de cinco dentes segundos pré-molares inferiores obtida da população portuguesa, com o objetivo final de criar um modelo estatístico para estimar a área de superfície radicular intraóssea a partir de indicadores clínicos da perda óssea. Por fim propomos um método para aplicar os resultados na prática. Os dados referentes à área da superfície radicular, comprimento total do dente (CT) e dimensão mésio-distal máxima da coroa (MDeq) serviram para estabelecer as relações estatísticas entre variáveis e definir uma distribuição normal multivariada. Por fim foi criada uma amostra de 37 observações simuladas a partir da distribuição normal multivariada definida e estatisticamente idênticas aos dados da amostra de cinco dentes. Foram ajustados cinco modelos lineares generalizados aos dados simulados. O modelo estatístico foi selecionado segundo os critérios de ajustamento, preditibilidade, potência estatística, acurácia dos parâmetros e da perda de informação, e validado pela análise gráfica de resíduos. Apoiados nos resultados propomos um método em três fases para estimação área de superfície radicular perdida/remanescente. Na primeira fase usamos o modelo estatístico para estimar a área de superfície radicular, na segunda estimamos a proporção (decis) de raiz intraóssea usando uma régua de Schei adaptada e na terceira multiplicamos o valor obtido na primeira fase por um coeficiente que representa a proporção de raiz perdida (ASRp) ou da raiz remanescente (ASRr) para o decil estimado na segunda fase. O ponto forte deste estudo foi a aplicação de metodologia estatística validada para operacionalizar dados clínicos na estimação de suporte ósseo perdido. Como pontos fracos consideramos a aplicação destes resultados apenas aos segundos pré-molares mandibulares e a falta de validação clínica.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The objective of this study was to estimate the spatial distribution of work accident risk in the informal work market in the urban zone of an industrialized city in southeast Brazil and to examine concomitant effects of age, gender, and type of occupation after controlling for spatial risk variation. The basic methodology adopted was that of a population-based case-control study with particular interest focused on the spatial location of work. Cases were all casual workers in the city suffering work accidents during a one-year period; controls were selected from the source population of casual laborers by systematic random sampling of urban homes. The spatial distribution of work accidents was estimated via a semiparametric generalized additive model with a nonparametric bidimensional spline of the geographical coordinates of cases and controls as the nonlinear spatial component, and including age, gender, and occupation as linear predictive variables in the parametric component. We analyzed 1,918 cases and 2,245 controls between 1/11/2003 and 31/10/2004 in Piracicaba, Brazil. Areas of significantly high and low accident risk were identified in relation to mean risk in the study region (p < 0.01). Work accident risk for informal workers varied significantly in the study area. Significant age, gender, and occupational group effects on accident risk were identified after correcting for this spatial variation. A good understanding of high-risk groups and high-risk regions underpins the formulation of hypotheses concerning accident causality and the development of effective public accident prevention policies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The milling of thin parts is a high added value operation where the machinist has to face the chatter problem. The study of the stability of these operations is a complex task due to the changing modal parameters as the part loses mass during the machining and the complex shape of the tools that are used. The present work proposes a methodology for chatter avoidance in the milling of flexible thin floors with a bull-nose end mill. First, a stability model for the milling of compliant systems in the tool axis direction with bull-nose end mills is presented. The contribution is the averaging method used to be able to use a linear model to predict the stability of the operation. Then, the procedure for the calculation of stability diagrams for the milling of thin floors is presented. The method is based on the estimation of the modal parameters of the part and the corresponding stability lobes during the machining. As in thin floor milling the depth of cut is already defined by the floor thickness previous to milling, the use of stability diagrams that relate the tool position along the tool-path with the spindle speed is proposed. Hence, the sequence of spindle speeds that the tool must have during the milling can be selected. Finally, this methodology has been validated by means of experimental tests.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

[ES] Diversos estudios han investigado sobre los posibles determinantes del precio del derecho de emisión europeo. En este trabajo de fin de grado se pretende analizar qué factores influyen en el precio de este producto financiero y de qué manera lo hacen, además de comprobar posibles cambios en el funcionamiento del mercado. La metodología utilizada para llevar a cabo este análisis se basa principalmente en el modelo de regresión lineal general. A diferencia de otros estudios existentes, la muestra utilizada va desde 2008 hasta 2015, por lo que incluye la segunda fase (2008-2012) de este mercado de derechos de emisión y la tercera (2013-2015), lo que permite analizar las posibles diferencias de funcionamiento del mercado entre ambas fases. Los resultados obtenidos sostienen la existencia de este cambio estructural de manera que en la segunda fase los factores más influyentes son el gas natural y el petróleo, mientras que en la tercera fase el comportamiento del mercado cambia drásticamente de forma que el carbón parece ser el factor más influyente.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Resumo: Registros de sobrevivência do nascimento ao desmame de 3846 crias de ovinos da raça Santa Inês foram analisados por modelos de reprodutor linear e não linear (modelo de limiar), para estimar componentes de variância e herdabilidade. Os modelos usados para sobrevivência, analisada como característica da cria, incluíram os efeitos fixos de sexo, da combinação tipo de nascimento-criação da cria e da idade da ovelha ao parto, efeito da covariável peso da cria ao nascer e efeitos aleatórios de reprodutor, da classe rebanho-ano-estação e do resíduo. Componentes de variância para o modelo linear foram estimados pelo método da máxima verossimilhança restrita (REML) e para o modelo não linear por uma aproximação da máxima verossimilhança marginal (MML), pelo programa CMMAT2. O coeficiente de herdabilidade (h2) estimado pelo modelo de limiar foi de 0,29, e pelo modelo linear, 0,14. A correlação de ordem de Spearman entre as capacidades de transmissão dos reprodutores, com base nos dois modelos foi de 0,96. As estimativas de h2 obtidas indicam a possibilidade de se obter, por seleção, ganho genético para sobrevivência. [Linear and nonlinear models in genetic analyses of lamb survival in the Santa Inês hair sheep breed]. Abstract: Records of 3,846 lambs survival from birth to weaning of Santa Inês hair sheep breed, were analyzed by linear and non linear sire models (threshold model) to estimate variance components and heritability (h2). The models that were used to analyze survival, considered in this study as a lamb trait, included the fixed effects of sex of the lamb, combination of type of birth-rearing of lamb, and age of ewe, birth weight of lamb as covariate, and random effects of sire, herd-year-season and residual. Variance components were obtained using restricted maximum likelihood (REML), in linear model and marginal maximum likelihood in threshold model through CMMAT2 program. Estimate of heritability (h2) obtained by threshold model was 0.29 and by linear model was 0.14. Rank correlation of Spearman, between sire solutions based on the two models was 0.96. The obtained estimates in this study indicate that it is possible to acquire genetic gain to survival by selection.