962 resultados para variable data printing
Resumo:
Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.
Resumo:
Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.
Resumo:
There is currently considerable interest in developing general non-linear density models based on latent, or hidden, variables. Such models have the ability to discover the presence of a relatively small number of underlying `causes' which, acting in combination, give rise to the apparent complexity of the observed data set. Unfortunately, to train such models generally requires large computational effort. In this paper we introduce a novel latent variable algorithm which retains the general non-linear capabilities of previous models but which uses a training procedure based on the EM algorithm. We demonstrate the performance of the model on a toy problem and on data from flow diagnostics for a multi-phase oil pipeline.
Resumo:
Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.
Resumo:
Obtaining wind vectors over the ocean is important for weather forecasting and ocean modelling. Several satellite systems used operationally by meteorological agencies utilise scatterometers to infer wind vectors over the oceans. In this paper we present the results of using novel neural network based techniques to estimate wind vectors from such data. The problem is partitioned into estimating wind speed and wind direction. Wind speed is modelled using a multi-layer perceptron (MLP) and a sum of squares error function. Wind direction is a periodic variable and a multi-valued function for a given set of inputs; a conventional MLP fails at this task, and so we model the full periodic probability density of direction conditioned on the satellite derived inputs using a Mixture Density Network (MDN) with periodic kernel functions. A committee of the resulting MDNs is shown to improve the results.
Resumo:
Purpose: To assess repeatability and reproducibility, to determine normative data, and to investigate the effect of age-related macular disease, compared with normals, on photostress recovery time measured using the Eger Macular Stressometer (EMS). Method: The study population comprised 49 healthy eyes of 49 participants. Four EMS measurements were taken in two sessions separated by 1 h by two practitioners, with reversal of order in the second session. EMS readings were also taken from 17 age-related maculopathy (ARM), and 12 age-related macular degeneration (AMD), affected eyes. Results: EMS readings are repeatable to within ± 7 s. There is a statistically significant difference between controls and ARM affected eyes (t = 2.169, p = 0.045), and AMD affected eyes (t = 2.817, p = 0.016). The EMS is highly specific, and demonstrates sensitivity of 29% for ARM, and 50% for AMD. Conclusions: The EMS may be a useful screening test for ARM, however, direct illumination of the macula of greater intensity and longer duration may yield less variable results. © 2004 The College of Optometrists.
Resumo:
This paper introduces a new technique in the investigation of limited-dependent variable models. This paper illustrates that variable precision rough set theory (VPRS), allied with the use of a modern method of classification, or discretisation of data, can out-perform the more standard approaches that are employed in economics, such as a probit model. These approaches and certain inductive decision tree methods are compared (through a Monte Carlo simulation approach) in the analysis of the decisions reached by the UK Monopolies and Mergers Committee. We show that, particularly in small samples, the VPRS model can improve on more traditional models, both in-sample, and particularly in out-of-sample prediction. A similar improvement in out-of-sample prediction over the decision tree methods is also shown.
Resumo:
Data envelopment analysis (DEA) is defined based on observed units and by finding the distance of each unit to the border of estimated production possibility set (PPS). The convexity is one of the underlying assumptions of the PPS. This paper shows some difficulties of using standard DEA models in the presence of input-ratios and/or output-ratios. The paper defines a new convexity assumption when data includes a ratio variable. Then it proposes a series of modified DEA models which are capable to rectify this problem.
Resumo:
The book aims to introduce the reader to DEA in the most accessible manner possible. It is specifically aimed at those who have had no prior exposure to DEA and wish to learn its essentials, how it works, its key uses, and the mechanics of using it. The latter will include using DEA software. Students on degree or training courses will find the book especially helpful. The same is true of practitioners engaging in comparative efficiency assessments and performance management within their organisation. Examples are used throughout the book to help the reader consolidate the concepts covered. Table of content: List of Tables. List of Figures. Preface. Abbreviations. 1. Introduction to Performance Measurement. 2. Definitions of Efficiency and Related Measures. 3. Data Envelopment Analysis Under Constant Returns to Scale: Basic Principles. 4. Data Envelopment Analysis under Constant Returns to Scale: General Models. 5. Using Data Envelopment Analysis in Practice. 6. Data Envelopment Analysis under Variable Returns to Scale. 7. Assessing Policy Effectiveness and Productivity Change Using DEA. 8. Incorporating Value Judgements in DEA Assessments. 9. Extensions to Basic DEA Models. 10. A Limited User Guide for Warwick DEA Software. Author Index. Topic Index. References.
Resumo:
The ability to distinguish one visual stimulus from another slightly different one depends on the variability of their internal representations. In a recent paper on human visual-contrast discrimination, Kontsevich et al (2002 Vision Research 42 1771 - 1784) re-considered the long-standing question whether the internal noise that limits discrimination is fixed (contrast-invariant) or variable (contrast-dependent). They tested discrimination performance for 3 cycles deg-1 gratings over a wide range of incremental contrast levels at three masking contrasts, and showed that a simple model with an expansive response function and response-dependent noise could fit the data very well. Their conclusion - that noise in visual-discrimination tasks increases markedly with contrast - has profound implications for our understanding and modelling of vision. Here, however, we re-analyse their data, and report that a standard gain-control model with a compressive response function and fixed additive noise can also fit the data remarkably well. Thus these experimental data do not allow us to decide between the two models. The question remains open. [Supported by EPSRC grant GR/S74515/01]
Resumo:
Different types of numerical data can be collected in a scientific investigation and the choice of statistical analysis will often depend on the distribution of the data. A basic distinction between variables is whether they are ‘parametric’ or ‘non-parametric’. When a variable is parametric, the data come from a symmetrically shaped distribution known as the ‘Gaussian’ or ‘normal distribution’ whereas non-parametric variables may have a distribution which deviates markedly in shape from normal. This article describes several aspects of the problem of non-normality including: (1) how to test for two common types of deviation from a normal distribution, viz., ‘skew’ and ‘kurtosis’, (2) how to fit the normal distribution to a sample of data, (3) the transformation of non-normally distributed data and scores, and (4) commonly used ‘non-parametric’ statistics which can be used in a variety of circumstances.
Resumo:
A local area network that can support both voice and data packets offers economic advantages due to the use of only a single network for both types of traffic, greater flexibility to changing user demands, and it also enables efficient use to be made of the transmission capacity. The latter aspect is very important in local broadcast networks where the capacity is a scarce resource, for example mobile radio. This research has examined two types of local broadcast network, these being the Ethernet-type bus local area network and a mobile radio network with a central base station. With such contention networks, medium access control (MAC) protocols are required to gain access to the channel. MAC protocols must provide efficient scheduling on the channel between the distributed population of stations who want to transmit. No access scheme can exceed the performance of a single server queue, due to the spatial distribution of the stations. Stations cannot in general form a queue without using part of the channel capacity to exchange protocol information. In this research, several medium access protocols have been examined and developed in order to increase the channel throughput compared to existing protocols. However, the established performance measures of average packet time delay and throughput cannot adequately characterise protocol performance for packet voice. Rather, the percentage of bits delivered within a given time bound becomes the relevant performance measure. Performance evaluation of the protocols has been examined using discrete event simulation and in some cases also by mathematical modelling. All the protocols use either implicit or explicit reservation schemes, with their efficiency dependent on the fact that many voice packets are generated periodically within a talkspurt. Two of the protocols are based on the existing 'Reservation Virtual Time CSMA/CD' protocol, which forms a distributed queue through implicit reservations. This protocol has been improved firstly by utilising two channels, a packet transmission channel and a packet contention channel. Packet contention is then performed in parallel with a packet transmission to increase throughput. The second protocol uses variable length packets to reduce the contention time between transmissions on a single channel. A third protocol developed, is based on contention for explicit reservations. Once a station has achieved a reservation, it maintains this effective queue position for the remainder of the talkspurt and transmits after it has sensed the transmission from the preceeding station within the queue. In the mobile radio environment, adaptions to the protocols were necessary in order that their operation was robust to signal fading. This was achieved through centralised control at a base station, unlike the local area network versions where the control was distributed at the stations. The results show an improvement in throughput compared to some previous protocols. Further work includes subjective testing to validate the protocols' effectiveness.
Resumo:
Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very high-dimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) algorithm and during the visualization process. We have tested the proposed algorithms by visualizing electrostatic potential data for Major Histocompatibility Complex (MHC) class-I proteins. The experiments show that the variation in the original version of GTM and GTM-FS worked successfully with data of more than 2000 dimensions and we compare the results with other linear/nonlinear projection methods: Principal Component Analysis (PCA), Neuroscale (NSC) and Gaussian Process Latent Variable Model (GPLVM).
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Non-linear relationships are common in microbiological research and often necessitate the use of the statistical techniques of non-linear regression or curve fitting. In some circumstances, the investigator may wish to fit an exponential model to the data, i.e., to test the hypothesis that a quantity Y either increases or decays exponentially with increasing X. This type of model is straight forward to fit as taking logarithms of the Y variable linearises the relationship which can then be treated by the methods of linear regression.