923 resultados para Categorical variable
Resumo:
To deliver sample estimates provided with the necessary probability foundation to permit generalization from the sample data subset to the whole target population being sampled, probability sampling strategies are required to satisfy three necessary not sufficient conditions: (i) All inclusion probabilities be greater than zero in the target population to be sampled. If some sampling units have an inclusion probability of zero, then a map accuracy assessment does not represent the entire target region depicted in the map to be assessed. (ii) The inclusion probabilities must be: (a) knowable for nonsampled units and (b) known for those units selected in the sample: since the inclusion probability determines the weight attached to each sampling unit in the accuracy estimation formulas, if the inclusion probabilities are unknown, so are the estimation weights. This original work presents a novel (to the best of these authors' knowledge, the first) probability sampling protocol for quality assessment and comparison of thematic maps generated from spaceborne/airborne Very High Resolution (VHR) images, where: (I) an original Categorical Variable Pair Similarity Index (CVPSI, proposed in two different formulations) is estimated as a fuzzy degree of match between a reference and a test semantic vocabulary, which may not coincide, and (II) both symbolic pixel-based thematic quality indicators (TQIs) and sub-symbolic object-based spatial quality indicators (SQIs) are estimated with a degree of uncertainty in measurement in compliance with the well-known Quality Assurance Framework for Earth Observation (QA4EO) guidelines. Like a decision-tree, any protocol (guidelines for best practice) comprises a set of rules, equivalent to structural knowledge, and an order of presentation of the rule set, known as procedural knowledge. The combination of these two levels of knowledge makes an original protocol worth more than the sum of its parts. The several degrees of novelty of the proposed probability sampling protocol are highlighted in this paper, at the levels of understanding of both structural and procedural knowledge, in comparison with related multi-disciplinary works selected from the existing literature. In the experimental session the proposed protocol is tested for accuracy validation of preliminary classification maps automatically generated by the Satellite Image Automatic MapperT (SIAMT) software product from two WorldView-2 images and one QuickBird-2 image provided by DigitalGlobe for testing purposes. In these experiments, collected TQIs and SQIs are statistically valid, statistically significant, consistent across maps and in agreement with theoretical expectations, visual (qualitative) evidence and quantitative quality indexes of operativeness (OQIs) claimed for SIAMT by related papers. As a subsidiary conclusion, the statistically consistent and statistically significant accuracy validation of the SIAMT pre-classification maps proposed in this contribution, together with OQIs claimed for SIAMT by related works, make the operational (automatic, accurate, near real-time, robust, scalable) SIAMT software product eligible for opening up new inter-disciplinary research and market opportunities in accordance with the visionary goal of the Global Earth Observation System of Systems (GEOSS) initiative and the QA4EO international guidelines.
Resumo:
The prognostic value of exercise (EXE) and dobutamine echocardiograms (DbE) has been well defined in large studies. However, while risk is determined by both clinical and echo features, no simple means of combining these data has been defined. We sought to combine these data into risk scores. Methods. At 3 expert centers, 7650 pts underwent standard EXE (n=5211) and DbE (w2439) for evaluation of known or suspected CAD and were followed for up to 10 years (mean 5-2) for major events (death or myocardial infarction). A subgroup of 2953 EXE and 1025 DbE pts was randomly selected to develop separate multivariate models for prediction of events. After simplication of each model for clinical use, models were validated in the remaining EXE and DbE pts. ResuI1s. The total number of events was 200 in the EXE and 225 in the DbE pts, of which 58 and 99 events occurred in the respective testing groups. The following regression equations gave equivalent results I” the testing and validation groups for both EXE and DbE; DbE = (Age’O.02) + (DM’l .O) + (Low RPP’0.6) + ([CHF+lschemia+Scar]‘O.7) EXE = ([DM+CHF]‘O.S) + O.S(lschemla #) + l.B(Scar#) - (METS0.19) (where each categorical variable scored 1 when present and 0 when absent, Ischemia# = 1 for l-2 VD. 6 for 3 VD; Scar# = 1 for 1-2 VD, 1.7 for 3 VD). The table summarizes the scores and equivalent outcomes for EXE and DbE. Conclusions. Risk scores based on clinical and EXE or DbE results may be used to quantify the risk of events during follow-up.
Resumo:
Neste estudo apresenta-se uma avaliação do impacto da Biblioteca da Faculdade de Ciências da Universidade do Porto (FCUP), na perspetiva dos estudantes e observando alguns dados relativos ao uso da Biblioteca FCUP. A investigação recorre a uma análise de métodos mistos, isto é, avalia dados qualitativos, que descrevem mas não mensuram as características, em especial as ações humanas, e dados quantitativos que apresentam forma numérica e revelam uma quantidade certa podendo ser sujeitos a manipulação estatística. A «Notícias da Biblioteca», uma newsletter publicada bimestralmente pela Biblioteca FCUP, inclui uma seção denominada “Voz do Utilizador”, onde são incluídos pequenos textos de opinião da autoria dos utilizadores, escolhidos aleatoriamente e, que frequentam as instalações destes serviços da FCUP. Aplicando a Norma Internacional ISO16439:2014 (E) – Information and documentation – Methods and procedures for assessing the impact of libraries, examinaram-se vinte textos de opinião, sobre a biblioteca, redigidos por estudantes de várias nacionalidades, publicados de janeiro de 2013 a dezembro de 2014, na «Notícias da Biblioteca». Investigaram-se ainda 7 entrevistas. Apelando à Estatística Descritiva efetuaram-se tabelas de contingência que são úteis para conhecer a relação entre os dados em função de determinados grupos, isto é, observar a frequência de uma variável em função das categorias de outra variável. Com as tabelas de contingência obtém-se percentagens em função dos efeitos do impacto da biblioteca e dos grupos de estudantes e analisa-se a relação entre eles. O estudo utilizou ainda alguns dados estatísticos indicadores que mostram o impacto, a saber, indicadores de desempenho relativos ao uso da biblioteca – número de visitas per capita e número de empréstimos per capita. Estas avaliações em bibliotecas fornecem informações úteis para as chefias de topo, para o desenvolvimento de novos projetos e a otimização do impacto e desempenho destes serviços das instituições de Ensino Superior./ This study presents an impact assessment of the Library of the Faculty of Science, University of Porto (FCUP), from the students’ point of view and some statistical data collected from the FCUP Library use. We used a mixed methods research, i.e., qualitative data, that describes but does not measure characteristics, in particular human actions, and quantitative data represented by numbers that indicate exact amounts which can be statistically manipulated. The newsletter «Notícias da Biblioteca» published, bimonthly, by the FCUP Library includes a section called “User Voice” where we can find opinion texts expressed by library users, randomly chosen, and who usually go to the library. Applying International Standard ISO16439: 2014 (E) - Information and documentation - Methods and procedures for assessing the impact of libraries, we examined 20 opinion texts written by students of various nationalities, published from January 2013 to December 2014, in «Notícias da Biblioteca» During this period of time, we have taken seven interviews. Using the principles of Descriptive Statistics, we produced Contingency Tables to determine how the data relates in terms of certain groups, i.e., to observe the frequency of the dependent variable on another categorical variable. Working with the Contingency Tables allowed us to obtain percentages according to the effects of the impact of the library in various groups of students and study the relationship between them. This study also used some statistical data that show the impact of the library within the institution, such as: performance indicators relating to the use of the library - number of visits per capita and number of loans per capita. These assessment impact studies in libraries provide useful data for the top management, to develop new projects, thus maximizing the impact and performance of these services of higher education institutions.
Resumo:
Ordinal outcomes are frequently employed in diagnosis and clinical trials. Clinical trials of Alzheimer's disease (AD) treatments are a case in point using the status of mild, moderate or severe disease as outcome measures. As in many other outcome oriented studies, the disease status may be misclassified. This study estimates the extent of misclassification in an ordinal outcome such as disease status. Also, this study estimates the extent of misclassification of a predictor variable such as genotype status. An ordinal logistic regression model is commonly used to model the relationship between disease status, the effect of treatment, and other predictive factors. A simulation study was done. First, data based on a set of hypothetical parameters and hypothetical rates of misclassification was created. Next, the maximum likelihood method was employed to generate likelihood equations accounting for misclassification. The Nelder-Mead Simplex method was used to solve for the misclassification and model parameters. Finally, this method was applied to an AD dataset to detect the amount of misclassification present. The estimates of the ordinal regression model parameters were close to the hypothetical parameters. β1 was hypothesized at 0.50 and the mean estimate was 0.488, β2 was hypothesized at 0.04 and the mean of the estimates was 0.04. Although the estimates for the rates of misclassification of X1 were not as close as β1 and β2, they validate this method. X 1 0-1 misclassification was hypothesized as 2.98% and the mean of the simulated estimates was 1.54% and, in the best case, the misclassification of k from high to medium was hypothesized at 4.87% and had a sample mean of 3.62%. In the AD dataset, the estimate for the odds ratio of X 1 of having both copies of the APOE 4 allele changed from an estimate of 1.377 to an estimate 1.418, demonstrating that the estimates of the odds ratio changed when the analysis includes adjustment for misclassification. ^
Resumo:
Speech can be understood at widely varying production rates. A working memory is described for short-term storage of temporal lists of input items. The working memory is a cooperative-competitive neural network that automatically adjusts its integration rate, or gain, to generate a short-term memory code for a list that is independent of item presentation rate. Such an invariant working memory model is used to simulate data of Repp (1980) concerning the changes of phonetic category boundaries as a function of their presentation rate. Thus the variability of categorical boundaries can be traced to the temporal in variance of the working memory code.
Resumo:
The need for timely population data for health planning and Indicators of need has Increased the demand for population estimates. The data required to produce estimates is difficult to obtain and the process is time consuming. Estimation methods that require less effort and fewer data are needed. The structure preserving estimator (SPREE) is a promising technique not previously used to estimate county population characteristics. This study first uses traditional regression estimation techniques to produce estimates of county population totals. Then the structure preserving estimator, using the results produced in the first phase as constraints, is evaluated.^ Regression methods are among the most frequently used demographic methods for estimating populations. These methods use symptomatic indicators to predict population change. This research evaluates three regression methods to determine which will produce the best estimates based on the 1970 to 1980 indicators of population change. Strategies for stratifying data to improve the ability of the methods to predict change were tested. Difference-correlation using PMSA strata produced the equation which fit the data the best. Regression diagnostics were used to evaluate the residuals.^ The second phase of this study is to evaluate use of the structure preserving estimator in making estimates of population characteristics. The SPREE estimation approach uses existing data (the association structure) to establish the relationship between the variable of interest and the associated variable(s) at the county level. Marginals at the state level (the allocation structure) supply the current relationship between the variables. The full allocation structure model uses current estimates of county population totals to limit the magnitude of county estimates. The limited full allocation structure model has no constraints on county size. The 1970 county census age - gender population provides the association structure, the allocation structure is the 1980 state age - gender distribution.^ The full allocation model produces good estimates of the 1980 county age - gender populations. An unanticipated finding of this research is that the limited full allocation model produces estimates of county population totals that are superior to those produced by the regression methods. The full allocation model is used to produce estimates of 1986 county population characteristics. ^
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^
Resumo:
In this dissertation, we propose a continuous-time Markov chain model to examine the longitudinal data that have three categories in the outcome variable. The advantage of this model is that it permits a different number of measurements for each subject and the duration between two consecutive time points of measurements can be irregular. Using the maximum likelihood principle, we can estimate the transition probability between two time points. By using the information provided by the independent variables, this model can also estimate the transition probability for each subject. The Monte Carlo simulation method will be used to investigate the goodness of model fitting compared with that obtained from other models. A public health example will be used to demonstrate the application of this method. ^
Resumo:
Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.