923 resultados para Generalised Linear Models
Resumo:
A joint distribution of two discrete random variables with finite support can be displayed as a two way table of probabilities adding to one. Assume that this table has n rows and m columns and all probabilities are non-null. This kind of table can be seen as an element in the simplex of n · m parts. In this context, the marginals are identified as compositional amalgams, conditionals (rows or columns) as subcompositions. Also, simplicial perturbation appears as Bayes theorem. However, the Euclidean elements of the Aitchison geometry of the simplex can also be translated into the table of probabilities: subspaces, orthogonal projections, distances. Two important questions are addressed: a) given a table of probabilities, which is the nearest independent table to the initial one? b) which is the largest orthogonal projection of a row onto a column? or, equivalently, which is the information in a row explained by a column, thus explaining the interaction? To answer these questions three orthogonal decompositions are presented: (1) by columns and a row-wise geometric marginal, (2) by rows and a columnwise geometric marginal, (3) by independent two-way tables and fully dependent tables representing row-column interaction. An important result is that the nearest independent table is the product of the two (row and column)-wise geometric marginal tables. A corollary is that, in an independent table, the geometric marginals conform with the traditional (arithmetic) marginals. These decompositions can be compared with standard log-linear models. Key words: balance, compositional data, simplex, Aitchison geometry, composition, orthonormal basis, arithmetic and geometric marginals, amalgam, dependence measure, contingency table
Resumo:
Theory of compositional data analysis is often focused on the composition only. However in practical applications we often treat a composition together with covariables with some other scale. This contribution systematically gathers and develop statistical tools for this situation. For instance, for the graphical display of the dependence of a composition with a categorical variable, a colored set of ternary diagrams might be a good idea for a first look at the data, but it will fast hide important aspects if the composition has many parts, or it takes extreme values. On the other hand colored scatterplots of ilr components could not be very instructive for the analyst, if the conventional, black-box ilr is used. Thinking on terms of the Euclidean structure of the simplex, we suggest to set up appropriate projections, which on one side show the compositional geometry and on the other side are still comprehensible by a non-expert analyst, readable for all locations and scales of the data. This is e.g. done by defining special balance displays with carefully- selected axes. Following this idea, we need to systematically ask how to display, explore, describe, and test the relation to complementary or explanatory data of categorical, real, ratio or again compositional scales. This contribution shows that it is sufficient to use some basic concepts and very few advanced tools from multivariate statistics (principal covariances, multivariate linear models, trellis or parallel plots, etc.) to build appropriate procedures for all these combinations of scales. This has some fundamental implications in their software implementation, and how might they be taught to analysts not already experts in multivariate analysis
Resumo:
Piecewise linear models systems arise as mathematical models of systems in many practical applications, often from linearization for nonlinear systems. There are two main approaches of dealing with these systems according to their continuous or discrete-time aspects. We propose an approach which is based on the state transformation, more particularly the partition of the phase portrait in different regions where each subregion is modeled as a two-dimensional linear time invariant system. Then the Takagi-Sugeno model, which is a combination of local model is calculated. The simulation results show that the Alpha partition is well-suited for dealing with such a system
Resumo:
Resumen: Este trabajo estudia los resultados en matemáticas y lenguaje de 32000 estudiantes en la prueba saber 11 del 2008, de la ciudad de Bogotá. Este análisis reconoce que los individuos se encuentran contenidos en barrios y colegios, pero no todos los individuos del mismo barrio asisten a la misma escuela y viceversa. Con el fin de modelar esta estructura de datos se utilizan varios modelos econométricos, incluyendo una regresión jerárquica multinivel de efectos cruzados. Nuestro objetivo central es identificar en qué medida y que condiciones del barrio y del colegio se correlacionan con los resultados educacionales de la población objetivo y cuáles características de los barrios y de los colegios están más asociadas al resultado en las pruebas. Usamos datos de la prueba saber 11, del censo de colegios c600, del censo poblacional del 2005 y de la policía metropolitana de Bogotá. Nuestras estimaciones muestran que tanto el barrio como el colegio están correlacionados con los resultados en las pruebas; pero el efecto del colegio parece ser mucho más fuerte que el del barrio. Las características del colegio que están más asociadas con el resultado en las pruebas son la educación de los profesores, la jornada, el valor de la pensión, y el contexto socio económico del colegio. Las características de los barrios más asociadas con el resultado en las pruebas son, la presencia de universitarios en la UPZ, un clúster de altos niveles de educación y nivel de crimen en el barrio que se correlaciona negativamente. Los resultados anteriores fueron hallados teniendo en cuenta controles familiares y personales.
Resumo:
Introducción: El delirium es un trastorno de conciencia de inicio agudo asociado a confusión o disfunción cognitiva, se puede presentar hasta en 42% de pacientes, de los cuales hasta el 80% ocurren en UCI. El delirium aumenta la estancia hospitalaria, el tiempo de ventilación mecánica y la morbimortalidad. Se pretendió evaluar la prevalencia de periodo de delirium en adultos que ingresaron a la UCI en un hospital de cuarto nivel durante 2012 y los factores asociados a su desarrollo. Metodología Se realizó un estudio transversal con corte analítico, se incluyeron pacientes hospitalizados en UCI médica y UCI quirúrgica. Se aplicó la escala de CAM-ICU y el Examen Mínimo del Estado Mental para evaluar el estado mental. Las asociaciones significativas se ajustaron con análisis multivariado. Resultados: Se incluyeron 110 pacientes, el promedio de estancia fue 5 días; la prevalencia de periodo de delirium fue de 19.9%, la mediana de edad fue 64.5 años. Se encontró una asociación estadísticamente significativa entre el delirium y la alteración cognitiva de base, depresión, administración de anticolinérgicos y sepsis (p< 0,05). Discusión Hasta la fecha este es el primer estudio en la institución. La asociación entre delirium en la UCI y sepsis, uso de anticolinérgicos, y alteración cognitiva de base son consistentes y comparables con factores de riesgo descritos en la literatura mundial.
Resumo:
Even though antenatal care is universally regarded as important, determinants of demand for antenatal care have not been widely studied. Evidence concerning which and how socioeconomic conditions influence whether a pregnant woman attends or not at least one antenatal consultation or how these factors affect the absences to antenatal consultations is very limited. In order to generate this evidence, a two-stage analysis was performed with data from the Demographic and Health Survey carried out by Profamilia in Colombia during 2005. The first stage was run as a logit model showing the marginal effects on the probability of attending the first visit and an ordinary least squares model was performed for the second stage. It was found that mothers living in the pacific region as well as young mothers seem to have a lower probability of attending the first visit but these factors are not related to the number of absences to antenatal consultation once the first visit has been achieved. The effect of health insurance was surprising because of the differing effects that the health insurers showed. Some familiar and personal conditions such as willingness to have the last children and number of previous children, demonstrated to be important in the determination of demand. The effect of mother’s educational attainment was proved as important whereas the father’s educational achievement was not. This paper provides some elements for policy making in order to increase the demand inducement of antenatal care, as well as stimulating research on demand for specific issues on health.
Resumo:
We study the role of natural resource windfalls in explaining the efficiency of public expenditures. Using a rich dataset of expenditures and public good provision for 1,836 municipalities in Peru for period 2001-2010, we estimate a non-monotonic relationship between the efficiency of public good provision and the level of natural resource transfers. Local governments that were extremely favored by the boom of mineral prices were more efficient in using fiscal windfalls whereas those benefited with modest transfers were more inefficient. These results can be explained by the increase in political competition associated with the boom. However, the fact that increases in efficiency were related to reductions in public good provision casts doubts about the beneficial effects of political competition in promoting efficiency.
Resumo:
Topological indices have been applied to build QSAR models for a set of 20 antimalarial cyclic peroxy cetals. In order to evaluate the reliability of the proposed linear models leave-n-out and Internal Test Sets (ITS) approaches have been considered. The proposed procedure resulted in a robust and consensued prediction equation and here it is shown why it is superior to the employed standard cross-validation algorithms involving multilinear regression models
Resumo:
The aim of this thesis is to narrow the gap between two different control techniques: the continuous control and the discrete event control techniques DES. This gap can be reduced by the study of Hybrid systems, and by interpreting as Hybrid systems the majority of large-scale systems. In particular, when looking deeply into a process, it is often possible to identify interaction between discrete and continuous signals. Hybrid systems are systems that have both continuous, and discrete signals. Continuous signals are generally supposed continuous and differentiable in time, since discrete signals are neither continuous nor differentiable in time due to their abrupt changes in time. Continuous signals often represent the measure of natural physical magnitudes such as temperature, pressure etc. The discrete signals are normally artificial signals, operated by human artefacts as current, voltage, light etc. Typical processes modelled as Hybrid systems are production systems, chemical process, or continuos production when time and continuous measures interacts with the transport, and stock inventory system. Complex systems as manufacturing lines are hybrid in a global sense. They can be decomposed into several subsystems, and their links. Another motivation for the study of Hybrid systems is the tools developed by other research domains. These tools benefit from the use of temporal logic for the analysis of several properties of Hybrid systems model, and use it to design systems and controllers, which satisfies physical or imposed restrictions. This thesis is focused in particular types of systems with discrete and continuous signals in interaction. That can be modelled hard non-linealities, such as hysteresis, jumps in the state, limit cycles, etc. and their possible non-deterministic future behaviour expressed by an interpretable model description. The Hybrid systems treated in this work are systems with several discrete states, always less than thirty states (it can arrive to NP hard problem), and continuous dynamics evolving with expression: with Ki ¡ Rn constant vectors or matrices for X components vector. In several states the continuous evolution can be several of them Ki = 0. In this formulation, the mathematics can express Time invariant linear system. By the use of this expression for a local part, the combination of several local linear models is possible to represent non-linear systems. And with the interaction with discrete events of the system the model can compose non-linear Hybrid systems. Especially multistage processes with high continuous dynamics are well represented by the proposed methodology. Sate vectors with more than two components, as third order models or higher is well approximated by the proposed approximation. Flexible belt transmission, chemical reactions with initial start-up and mobile robots with important friction are several physical systems, which profits from the benefits of proposed methodology (accuracy). The motivation of this thesis is to obtain a solution that can control and drive the Hybrid systems from the origin or starting point to the goal. How to obtain this solution, and which is the best solution in terms of one cost function subject to the physical restrictions and control actions is analysed. Hybrid systems that have several possible states, different ways to drive the system to the goal and different continuous control signals are problems that motivate this research. The requirements of the system on which we work is: a model that can represent the behaviour of the non-linear systems, and that possibilities the prediction of possible future behaviour for the model, in order to apply an supervisor which decides the optimal and secure action to drive the system toward the goal. Specific problems can be determined by the use of this kind of hybrid models are: - The unity of order. - Control the system along a reachable path. - Control the system in a safe path. - Optimise the cost function. - Modularity of control The proposed model solves the specified problems in the switching models problem, the initial condition calculus and the unity of the order models. Continuous and discrete phenomena are represented in Linear hybrid models, defined with defined eighth-tuple parameters to model different types of hybrid phenomena. Applying a transformation over the state vector : for LTI system we obtain from a two-dimensional SS a single parameter, alpha, which still maintains the dynamical information. Combining this parameter with the system output, a complete description of the system is obtained in a form of a graph in polar representation. Using Tagaki-Sugeno type III is a fuzzy model which include linear time invariant LTI models for each local model, the fuzzyfication of different LTI local model gives as a result a non-linear time invariant model. In our case the output and the alpha measure govern the membership function. Hybrid systems control is a huge task, the processes need to be guided from the Starting point to the desired End point, passing a through of different specific states and points in the trajectory. The system can be structured in different levels of abstraction and the control in three layers for the Hybrid systems from planning the process to produce the actions, these are the planning, the process and control layer. In this case the algorithms will be applied to robotics ¡V a domain where improvements are well accepted ¡V it is expected to find a simple repetitive processes for which the extra effort in complexity can be compensated by some cost reductions. It may be also interesting to implement some control optimisation to processes such as fuel injection, DC-DC converters etc. In order to apply the RW theory of discrete event systems on a Hybrid system, we must abstract the continuous signals and to project the events generated for these signals, to obtain new sets of observable and controllable events. Ramadge & Wonham¡¦s theory along with the TCT software give a Controllable Sublanguage of the legal language generated for a Discrete Event System (DES). Continuous abstraction transforms predicates over continuous variables into controllable or uncontrollable events, and modifies the set of uncontrollable, controllable observable and unobservable events. Continuous signals produce into the system virtual events, when this crosses the bound limits. If this event is deterministic, they can be projected. It is necessary to determine the controllability of this event, in order to assign this to the corresponding set, , controllable, uncontrollable, observable and unobservable set of events. Find optimal trajectories in order to minimise some cost function is the goal of the modelling procedure. Mathematical model for the system allows the user to apply mathematical techniques over this expression. These possibilities are, to minimise a specific cost function, to obtain optimal controllers and to approximate a specific trajectory. The combination of the Dynamic Programming with Bellman Principle of optimality, give us the procedure to solve the minimum time trajectory for Hybrid systems. The problem is greater when there exists interaction between adjacent states. In Hybrid systems the problem is to determine the partial set points to be applied at the local models. Optimal controller can be implemented in each local model in order to assure the minimisation of the local costs. The solution of this problem needs to give us the trajectory to follow the system. Trajectory marked by a set of set points to force the system to passing over them. Several ways are possible to drive the system from the Starting point Xi to the End point Xf. Different ways are interesting in: dynamic sense, minimum states, approximation at set points, etc. These ways need to be safe and viable and RchW. And only one of them must to be applied, normally the best, which minimises the proposed cost function. A Reachable Way, this means the controllable way and safe, will be evaluated in order to obtain which one minimises the cost function. Contribution of this work is a complete framework to work with the majority Hybrid systems, the procedures to model, control and supervise are defined and explained and its use is demonstrated. Also explained is the procedure to model the systems to be analysed for automatic verification. Great improvements were obtained by using this methodology in comparison to using other piecewise linear approximations. It is demonstrated in particular cases this methodology can provide best approximation. The most important contribution of this work, is the Alpha approximation for non-linear systems with high dynamics While this kind of process is not typical, but in this case the Alpha approximation is the best linear approximation to use, and give a compact representation.
Resumo:
Satellite-based rainfall monitoring is widely used for climatological studies because of its full global coverage but it is also of great importance for operational purposes especially in areas such as Africa where there is a lack of ground-based rainfall data. Satellite rainfall estimates have enormous potential benefits as input to hydrological and agricultural models because of their real time availability, low cost and full spatial coverage. One issue that needs to be addressed is the uncertainty on these estimates. This is particularly important in assessing the likely errors on the output from non-linear models (rainfall-runoff or crop yield) which make use of the rainfall estimates, aggregated over an area, as input. Correct assessment of the uncertainty on the rainfall is non-trivial as it must take account of • the difference in spatial support of the satellite information and independent data used for calibration • uncertainties on the independent calibration data • the non-Gaussian distribution of rainfall amount • the spatial intermittency of rainfall • the spatial correlation of the rainfall field This paper describes a method for estimating the uncertainty on satellite-based rainfall values taking account of these factors. The method involves firstly a stochastic calibration which completely describes the probability of rainfall occurrence and the pdf of rainfall amount for a given satellite value, and secondly the generation of ensemble of rainfall fields based on the stochastic calibration but with the correct spatial correlation structure within each ensemble member. This is achieved by the use of geostatistical sequential simulation. The ensemble generated in this way may be used to estimate uncertainty at larger spatial scales. A case study of daily rainfall monitoring in the Gambia, west Africa for the purpose of crop yield forecasting is presented to illustrate the method.
Resumo:
A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.
Resumo:
We develop the linearization of a semi-implicit semi-Lagrangian model of the one-dimensional shallow-water equations using two different methods. The usual tangent linear model, formed by linearizing the discrete nonlinear model, is compared with a model formed by first linearizing the continuous nonlinear equations and then discretizing. Both models are shown to perform equally well for finite perturbations. However, the asymptotic behaviour of the two models differs as the perturbation size is reduced. This leads to difficulties in showing that the models are correctly coded using the standard tests. To overcome this difficulty we propose a new method for testing linear models, which we demonstrate both theoretically and numerically. © Crown copyright, 2003. Royal Meteorological Society
Resumo:
An extensive statistical ‘downscaling’ study is done to relate large-scale climate information from a general circulation model (GCM) to local-scale river flows in SW France for 51 gauging stations ranging from nival (snow-dominated) to pluvial (rainfall-dominated) river-systems. This study helps to select the appropriate statistical method at a given spatial and temporal scale to downscale hydrology for future climate change impact assessment of hydrological resources. The four proposed statistical downscaling models use large-scale predictors (derived from climate model outputs or reanalysis data) that characterize precipitation and evaporation processes in the hydrological cycle to estimate summary flow statistics. The four statistical models used are generalized linear (GLM) and additive (GAM) models, aggregated boosted trees (ABT) and multi-layer perceptron neural networks (ANN). These four models were each applied at two different spatial scales, namely at that of a single flow-gauging station (local downscaling) and that of a group of flow-gauging stations having the same hydrological behaviour (regional downscaling). For each statistical model and each spatial resolution, three temporal resolutions were considered, namely the daily mean flows, the summary statistics of fortnightly flows and a daily ‘integrated approach’. The results show that flow sensitivity to atmospheric factors is significantly different between nival and pluvial hydrological systems which are mainly influenced, respectively, by shortwave solar radiations and atmospheric temperature. The non-linear models (i.e. GAM, ABT and ANN) performed better than the linear GLM when simulating fortnightly flow percentiles. The aggregated boosted trees method showed higher and less variable R2 values to downscale the hydrological variability in both nival and pluvial regimes. Based on GCM cnrm-cm3 and scenarios A2 and A1B, future relative changes of fortnightly median flows were projected based on the regional downscaling approach. The results suggest a global decrease of flow in both pluvial and nival regimes, especially in spring, summer and autumn, whatever the considered scenario. The discussion considers the performance of each statistical method for downscaling flow at different spatial and temporal scales as well as the relationship between atmospheric processes and flow variability.
Resumo:
Previous attempts to apply statistical models, which correlate nutrient intake with methane production, have been of limited. value where predictions are obtained for nutrient intakes and diet types outside those. used in model construction. Dynamic mechanistic models have proved more suitable for extrapolation, but they remain computationally expensive and are not applied easily in practical situations. The first objective of this research focused on employing conventional techniques to generate statistical models of methane production appropriate to United Kingdom dairy systems. The second objective was to evaluate these models and a model published previously using both United Kingdom and North American data sets. Thirdly, nonlinear models were considered as alternatives to the conventional linear regressions. The United Kingdom calorimetry data used to construct the linear models also were used to develop the three. nonlinear alternatives that were ball of modified Mitscherlich (monomolecular) form. Of the linear models tested,, an equation from the literature proved most reliable across the full range of evaluation data (root mean square prediction error = 21.3%). However, the Mitscherlich models demonstrated the greatest degree of adaptability across diet types and intake level. The most successful model for simulating the independent data was a modified Mitscherlich equation with the steepness parameter set to represent dietary starch-to-ADF ratio (root mean square prediction error = 20.6%). However, when such data were unavailable, simpler Mitscherlich forms relating dry matter or metabolizable energy intake to methane production remained better alternatives relative to their linear counterparts.
Resumo:
The aims of this study were to explore the environmental factors that determine the distribution of plant communities in temporary rock pools and provide a quantitative analysis of vegetation-environment relationships for five study sites on the island of Gavdos, southwest of Crete, Greece. Data from 99 rock pools were collected and analysed using Two-Way Indicator Species Analysis (TWINSPAN), Detrended Correspondence Analysis (DCA) and Canonical Correspondence Analysis (CCA) to identify the principal communities and environmental gradients that are linked to community distribution. A total of 46 species belonging to 21 families were recorded within the study area. The dominant families were Labiatae, Gramineae and Compositae while therophytes and chamaephytes were the most frequent life forms. The samples were classified into six community types using TWINSPAN, which were also corroborated by CCA analysis. The principal gradients for vegetation distribution, identified by CCA, were associated with water storage and water retention ability, as expressed by pool perimeter and water depth. Generalised Additive Models (GAMs) were employed to identify responses of four dominant rock pool species to water depth. The resulting species response curves showed niche differentiation in the cases of Callitriche pulchra and Tillaea vaillantii and revealed competition between Zannichellia pedunculata and Chara vulgaris. The use of classification in combination with ordination techniques resulted in a good discrimination between plant communities. Generalised Additive Models are a powerful tool in investigating species response curves to environmental gradients. The methodology adopted can be employed for improving baseline information on plant community ecology and distribution in Mediterranean ephemeral pools.