12 resultados para generalized linear models

em Universitat de Girona, Spain


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Several methods have been suggested to estimate non-linear models with interaction terms in the presence of measurement error. Structural equation models eliminate measurement error bias, but require large samples. Ordinary least squares regression on summated scales, regression on factor scores and partial least squares are appropriate for small samples but do not correct measurement error bias. Two stage least squares regression does correct measurement error bias but the results strongly depend on the instrumental variable choice. This article discusses the old disattenuated regression method as an alternative for correcting measurement error in small samples. The method is extended to the case of interaction terms and is illustrated on a model that examines the interaction effect of innovation and style of use of budgets on business performance. Alternative reliability estimates that can be used to disattenuate the estimates are discussed. A comparison is made with the alternative methods. Methods that do not correct for measurement error bias perform very similarly and considerably worse than disattenuated regression

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Linear response functions are implemented for a vibrational configuration interaction state allowing accurate analytical calculations of pure vibrational contributions to dynamical polarizabilities. Sample calculations are presented for the pure vibrational contributions to the polarizabilities of water and formaldehyde. We discuss the convergence of the results with respect to various details of the vibrational wave function description as well as the potential and property surfaces. We also analyze the frequency dependence of the linear response function and the effect of accounting phenomenologically for the finite lifetime of the excited vibrational states. Finally, we compare the analytical response approach to a sum-over-states approach

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A variational approach for reliably calculating vibrational linear and nonlinear optical properties of molecules with large electrical and/or mechanical anharmonicity is introduced. This approach utilizes a self-consistent solution of the vibrational Schrödinger equation for the complete field-dependent potential-energy surface and, then, adds higher-level vibrational correlation corrections as desired. An initial application is made to static properties for three molecules of widely varying anharmonicity using the lowest-level vibrational correlation treatment (i.e., vibrational Møller-Plesset perturbation theory). Our results indicate when the conventional Bishop-Kirtman perturbation method can be expected to break down and when high-level vibrational correlation methods are likely to be required. Future improvements and extensions are discussed

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Els estudis de supervivència s'interessen pel temps que passa des de l'inici de l'estudi (diagnòstic de la malaltia, inici del tractament,...) fins que es produeix l'esdeveniment d'interès (mort, curació, millora,...). No obstant això, moltes vegades aquest esdeveniment s'observa més d'una vegada en un mateix individu durant el període de seguiment (dades de supervivència multivariant). En aquest cas, és necessari utilitzar una metodologia diferent a la utilitzada en l'anàlisi de supervivència estàndard. El principal problema que l'estudi d'aquest tipus de dades comporta és que les observacions poden no ser independents. Fins ara, aquest problema s'ha solucionat de dues maneres diferents en funció de la variable dependent. Si aquesta variable segueix una distribució de la família exponencial s'utilitzen els models lineals generalitzats mixtes (GLMM); i si aquesta variable és el temps, variable amb una distribució de probabilitat no pertanyent a aquesta família, s'utilitza l'anàlisi de supervivència multivariant. El que es pretén en aquesta tesis és unificar aquests dos enfocs, és a dir, utilitzar una variable dependent que sigui el temps amb agrupacions d'individus o d'observacions, a partir d'un GLMM, amb la finalitat d'introduir nous mètodes pel tractament d'aquest tipus de dades.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A joint distribution of two discrete random variables with finite support can be displayed as a two way table of probabilities adding to one. Assume that this table has n rows and m columns and all probabilities are non-null. This kind of table can be seen as an element in the simplex of n · m parts. In this context, the marginals are identified as compositional amalgams, conditionals (rows or columns) as subcompositions. Also, simplicial perturbation appears as Bayes theorem. However, the Euclidean elements of the Aitchison geometry of the simplex can also be translated into the table of probabilities: subspaces, orthogonal projections, distances. Two important questions are addressed: a) given a table of probabilities, which is the nearest independent table to the initial one? b) which is the largest orthogonal projection of a row onto a column? or, equivalently, which is the information in a row explained by a column, thus explaining the interaction? To answer these questions three orthogonal decompositions are presented: (1) by columns and a row-wise geometric marginal, (2) by rows and a columnwise geometric marginal, (3) by independent two-way tables and fully dependent tables representing row-column interaction. An important result is that the nearest independent table is the product of the two (row and column)-wise geometric marginal tables. A corollary is that, in an independent table, the geometric marginals conform with the traditional (arithmetic) marginals. These decompositions can be compared with standard log-linear models. Key words: balance, compositional data, simplex, Aitchison geometry, composition, orthonormal basis, arithmetic and geometric marginals, amalgam, dependence measure, contingency table

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariables with some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependence of a composition with a categorical variable, a colored set of ternary diagrams might be a good idea for a first look at the data, but it will fast hide important aspects if the composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if the conventional, black-box ilr is used. Thinking on terms of the Euclidean structure of the simplex, we suggest to set up appropriate projections, which on one side show the compositional geometry and on the other side are still comprehensible by a non-expert analyst, readable for all locations and scales of the data. This is e.g. done by defining special balance displays with carefully- selected axes. Following this idea, we need to systematically ask how to display, explore, describe, and test the relation to complementary or explanatory data of categorical, real, ratio or again compositional scales. This contribution shows that it is sufficient to use some basic concepts and very few advanced tools from multivariate statistics (principal covariances, multivariate linear models, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariate analysis

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Piecewise linear models systems arise as mathematical models of systems in many practical applications, often from linearization for nonlinear systems. There are two main approaches of dealing with these systems according to their continuous or discrete-time aspects. We propose an approach which is based on the state transformation, more particularly the partition of the phase portrait in different regions where each subregion is modeled as a two-dimensional linear time invariant system. Then the Takagi-Sugeno model, which is a combination of local model is calculated. The simulation results show that the Alpha partition is well-suited for dealing with such a system

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Topological indices have been applied to build QSAR models for a set of 20 antimalarial cyclic peroxy cetals. In order to evaluate the reliability of the proposed linear models leave-n-out and Internal Test Sets (ITS) approaches have been considered. The proposed procedure resulted in a robust and consensued prediction equation and here it is shown why it is superior to the employed standard cross-validation algorithms involving multilinear regression models

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years, some epidemiologic studies have attributed adverse effects of air pollutants on health not only to particles and sulfur dioxide but also to photochemical air pollutants (nitrogen dioxide and ozone). The effects are usually small, leading to some inconsistencies in the results of the studies. Furthermore, the different methodologic approaches of the studies used has made it difficult to derive generic conclusions. We provide here a quantitative summary of the short-term effects of photochemical air pollutants on mortality in seven Spanish cities involved in the EMECAM project, using generalized additive models from analyses of single and multiple pollutants. Nitrogen dioxide and ozone data were provided by seven EMECAM cities (Barcelona, Gijón, Huelva, Madrid, Oviedo, Seville, and Valencia). Mortality indicators included daily total mortality from all causes excluding external causes, daily cardiovascular mortality, and daily respiratory mortality. Individual estimates, obtained from city-specific generalized additive Poisson autoregressive models, were combined by means of fixed effects models and, if significant heterogeneity among local estimates was found, also by random effects models. Significant positive associations were found between daily mortality (all causes and cardiovascular) and NO2, once the rest of air pollutants were taken into account. A 10 μg/m3 increase in the 24-hr average 1-day NO2 level was associated with an increase in the daily number of deaths of 0.43% [95% confidence interval(CI), –0.003–0.86%] for all causes excluding external. In the case of significant relationships, relative risks for cause-specific mortality were nearly twice as much as that for total mortality for all the photochemical pollutants. Ozone was independently related only to cardiovascular daily mortality. No independent statistically significant relationship between photochemical air pollutants and respiratory mortality was found. The results in this study suggest that, given the present levels of photochemical pollutants, people living in Spanish cities are exposed to health risks derived from air pollution

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of this thesis is to narrow the gap between two different control techniques: the continuous control and the discrete event control techniques DES. This gap can be reduced by the study of Hybrid systems, and by interpreting as Hybrid systems the majority of large-scale systems. In particular, when looking deeply into a process, it is often possible to identify interaction between discrete and continuous signals. Hybrid systems are systems that have both continuous, and discrete signals. Continuous signals are generally supposed continuous and differentiable in time, since discrete signals are neither continuous nor differentiable in time due to their abrupt changes in time. Continuous signals often represent the measure of natural physical magnitudes such as temperature, pressure etc. The discrete signals are normally artificial signals, operated by human artefacts as current, voltage, light etc. Typical processes modelled as Hybrid systems are production systems, chemical process, or continuos production when time and continuous measures interacts with the transport, and stock inventory system. Complex systems as manufacturing lines are hybrid in a global sense. They can be decomposed into several subsystems, and their links. Another motivation for the study of Hybrid systems is the tools developed by other research domains. These tools benefit from the use of temporal logic for the analysis of several properties of Hybrid systems model, and use it to design systems and controllers, which satisfies physical or imposed restrictions. This thesis is focused in particular types of systems with discrete and continuous signals in interaction. That can be modelled hard non-linealities, such as hysteresis, jumps in the state, limit cycles, etc. and their possible non-deterministic future behaviour expressed by an interpretable model description. The Hybrid systems treated in this work are systems with several discrete states, always less than thirty states (it can arrive to NP hard problem), and continuous dynamics evolving with expression: with Ki ¡ Rn constant vectors or matrices for X components vector. In several states the continuous evolution can be several of them Ki = 0. In this formulation, the mathematics can express Time invariant linear system. By the use of this expression for a local part, the combination of several local linear models is possible to represent non-linear systems. And with the interaction with discrete events of the system the model can compose non-linear Hybrid systems. Especially multistage processes with high continuous dynamics are well represented by the proposed methodology. Sate vectors with more than two components, as third order models or higher is well approximated by the proposed approximation. Flexible belt transmission, chemical reactions with initial start-up and mobile robots with important friction are several physical systems, which profits from the benefits of proposed methodology (accuracy). The motivation of this thesis is to obtain a solution that can control and drive the Hybrid systems from the origin or starting point to the goal. How to obtain this solution, and which is the best solution in terms of one cost function subject to the physical restrictions and control actions is analysed. Hybrid systems that have several possible states, different ways to drive the system to the goal and different continuous control signals are problems that motivate this research. The requirements of the system on which we work is: a model that can represent the behaviour of the non-linear systems, and that possibilities the prediction of possible future behaviour for the model, in order to apply an supervisor which decides the optimal and secure action to drive the system toward the goal. Specific problems can be determined by the use of this kind of hybrid models are: - The unity of order. - Control the system along a reachable path. - Control the system in a safe path. - Optimise the cost function. - Modularity of control The proposed model solves the specified problems in the switching models problem, the initial condition calculus and the unity of the order models. Continuous and discrete phenomena are represented in Linear hybrid models, defined with defined eighth-tuple parameters to model different types of hybrid phenomena. Applying a transformation over the state vector : for LTI system we obtain from a two-dimensional SS a single parameter, alpha, which still maintains the dynamical information. Combining this parameter with the system output, a complete description of the system is obtained in a form of a graph in polar representation. Using Tagaki-Sugeno type III is a fuzzy model which include linear time invariant LTI models for each local model, the fuzzyfication of different LTI local model gives as a result a non-linear time invariant model. In our case the output and the alpha measure govern the membership function. Hybrid systems control is a huge task, the processes need to be guided from the Starting point to the desired End point, passing a through of different specific states and points in the trajectory. The system can be structured in different levels of abstraction and the control in three layers for the Hybrid systems from planning the process to produce the actions, these are the planning, the process and control layer. In this case the algorithms will be applied to robotics ¡V a domain where improvements are well accepted ¡V it is expected to find a simple repetitive processes for which the extra effort in complexity can be compensated by some cost reductions. It may be also interesting to implement some control optimisation to processes such as fuel injection, DC-DC converters etc. In order to apply the RW theory of discrete event systems on a Hybrid system, we must abstract the continuous signals and to project the events generated for these signals, to obtain new sets of observable and controllable events. Ramadge & Wonham¡¦s theory along with the TCT software give a Controllable Sublanguage of the legal language generated for a Discrete Event System (DES). Continuous abstraction transforms predicates over continuous variables into controllable or uncontrollable events, and modifies the set of uncontrollable, controllable observable and unobservable events. Continuous signals produce into the system virtual events, when this crosses the bound limits. If this event is deterministic, they can be projected. It is necessary to determine the controllability of this event, in order to assign this to the corresponding set, , controllable, uncontrollable, observable and unobservable set of events. Find optimal trajectories in order to minimise some cost function is the goal of the modelling procedure. Mathematical model for the system allows the user to apply mathematical techniques over this expression. These possibilities are, to minimise a specific cost function, to obtain optimal controllers and to approximate a specific trajectory. The combination of the Dynamic Programming with Bellman Principle of optimality, give us the procedure to solve the minimum time trajectory for Hybrid systems. The problem is greater when there exists interaction between adjacent states. In Hybrid systems the problem is to determine the partial set points to be applied at the local models. Optimal controller can be implemented in each local model in order to assure the minimisation of the local costs. The solution of this problem needs to give us the trajectory to follow the system. Trajectory marked by a set of set points to force the system to passing over them. Several ways are possible to drive the system from the Starting point Xi to the End point Xf. Different ways are interesting in: dynamic sense, minimum states, approximation at set points, etc. These ways need to be safe and viable and RchW. And only one of them must to be applied, normally the best, which minimises the proposed cost function. A Reachable Way, this means the controllable way and safe, will be evaluated in order to obtain which one minimises the cost function. Contribution of this work is a complete framework to work with the majority Hybrid systems, the procedures to model, control and supervise are defined and explained and its use is demonstrated. Also explained is the procedure to model the systems to be analysed for automatic verification. Great improvements were obtained by using this methodology in comparison to using other piecewise linear approximations. It is demonstrated in particular cases this methodology can provide best approximation. The most important contribution of this work, is the Alpha approximation for non-linear systems with high dynamics While this kind of process is not typical, but in this case the Alpha approximation is the best linear approximation to use, and give a compact representation.