985 resultados para Ordered Categorical Data


80.00% 80.00%



Introducción Los sistemas de puntuación para predicción se han desarrollado para medir la severidad de la enfermedad y el pronóstico de los pacientes en la unidad de cuidados intensivos. Estas medidas son útiles para la toma de decisiones clínicas, la estandarización de la investigación, y la comparación de la calidad de la atención al paciente crítico. Materiales y métodos Estudio de tipo observacional analítico de cohorte en el que reviso las historias clínicas de 283 pacientes oncológicos admitidos a la unidad de cuidados intensivos (UCI) durante enero de 2014 a enero de 2016 y a quienes se les estimo la probabilidad de mortalidad con los puntajes pronósticos APACHE IV y MPM II, se realizó regresión logística con las variables predictoras con las que se derivaron cada uno de los modelos es sus estudios originales y se determinó la calibración, la discriminación y se calcularon los criterios de información Akaike AIC y Bayesiano BIC. Resultados En la evaluación de desempeño de los puntajes pronósticos APACHE IV mostro mayor capacidad de predicción (AUC = 0,95) en comparación con MPM II (AUC = 0,78), los dos modelos mostraron calibración adecuada con estadístico de Hosmer y Lemeshow para APACHE IV (p = 0,39) y para MPM II (p = 0,99). El ∆ BIC es de 2,9 que muestra evidencia positiva en contra de APACHE IV. Se reporta el estadístico AIC siendo menor para APACHE IV lo que indica que es el modelo con mejor ajuste a los datos. Conclusiones APACHE IV tiene un buen desempeño en la predicción de mortalidad de pacientes críticamente enfermos, incluyendo pacientes oncológicos. Por lo tanto se trata de una herramienta útil para el clínico en su labor diaria, al permitirle distinguir los pacientes con alta probabilidad de mortalidad.


40.00% 40.00%



In longitudinal studies of disease, patients may experience several events through a follow-up period. In these studies, the sequentially ordered events are often of interest and lead to problems that have received much attention recently. Issues of interest include the estimation of bivariate survival, marginal distributions and the conditional distribution of gap times. In this work we consider the estimation of the survival function conditional to a previous event. Different nonparametric approaches will be considered for estimating these quantities, all based on the Kaplan-Meier estimator of the survival function. We explore the finite sample behavior of the estimators through simulations. The different methods proposed in this article are applied to a data set from a German Breast Cancer Study. The methods are used to obtain predictors for the conditional survival probabilities as well as to study the influence of recurrence in overall survival.


40.00% 40.00%



When health status is an ordered response variable, Allison and Foster (2004)postulate that a distribution Q exhibits more inequality than a distribution P if Q is obtained from P via a sequence of median preserving spreads. This paper introduces a parametric family of inequality indices which are founded on the Allison and Foster ordering. [Authors]


40.00% 40.00%



Because self-reported health status [SRHS] is an ordered response variable, inequality measurement for SRHS data requires a numerical scale for converting individual responses into a summary statistic. The choice of scale is however problematic, since small variations in the numerical scale may reverse the ordering of a given pair of distributions of SRHS data in relation to conventional inequality indices such as the variance. This paper introduces a parametric family of inequality indices, founded on an inequality ordering proposed by Allison and Foster [Allison, R.A., Foster, J., 2004. Measuring health inequalities using qualitative data. Journal of Health Economics 23, 505-524], which satisfy a suitable invariance property with respect to the choice of numerical scale. Several key members of the parametric family are also derived, and an empirical application using data from the Swiss Health Survey illustrates the proposed methodology. [Authors]


40.00% 40.00%



40.00% 40.00%



There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson’s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.


40.00% 40.00%



The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.


40.00% 40.00%



Many variables that are of interest in social science research are nominal variables with two or more categories, such as employment status, occupation, political preference, or self-reported health status. With longitudinal survey data it is possible to analyse the transitions of individuals between different employment states or occupations (for example). In the statistical literature, models for analysing categorical dependent variables with repeated observations belong to the family of models known as generalized linear mixed models (GLMMs). The specific GLMM for a dependent variable with three or more categories is the multinomial logit random effects model. For these models, the marginal distribution of the response does not have a closed form solution and hence numerical integration must be used to obtain maximum likelihood estimates for the model parameters. Techniques for implementing the numerical integration are available but are computationally intensive requiring a large amount of computer processing time that increases with the number of clusters (or individuals) in the data and are not always readily accessible to the practitioner in standard software. For the purposes of analysing categorical response data from a longitudinal social survey, there is clearly a need to evaluate the existing procedures for estimating multinomial logit random effects model in terms of accuracy, efficiency and computing time. The computational time will have significant implications as to the preferred approach by researchers. In this paper we evaluate statistical software procedures that utilise adaptive Gaussian quadrature and MCMC methods, with specific application to modeling employment status of women using a GLMM, over three waves of the HILDA survey.


30.00% 30.00%



The mesoporous SBA-15 silica with uniform hexagonal pore, narrow pore size distribution and tuneable pore diameter was organofunctionalized with glutaraldehyde-bridged silylating agent. The precursor and its derivative silicas were ibuprofen-loaded for controlled delivery in simulated biological fluids. The synthesized silicas were characterized by elemental analysis, infrared spectroscopy, (13)C and (29)Si solid state NMR spectroscopy, nitrogen adsorption, X-ray diffractometry, thermogravimetry and scanning electron microscopy. Surface functionalization with amine containing bridged hydrophobic structure resulted in significantly decreased surface area from 802.4 to 63.0 m(2) g(-1) and pore diameter 8.0-6.0 nm, which ultimately increased the drug-loading capacity from 18.0% up to 28.3% and a very slow release rate of ibuprofen over the period of 72.5h. The in vitro drug release demonstrated that SBA-15 presented the fastest release from 25% to 27% and SBA-15GA gave near 10% of drug release in all fluids during 72.5 h. The Korsmeyer-Peppas model better fits the release data with the Fickian diffusion mechanism and zero order kinetics for synthesized mesoporous silicas. Both pore sizes and hydrophobicity influenced the rate of the release process, indicating that the chemically modified silica can be suggested to design formulation of slow and constant release over a defined period, to avoid repeated administration.


30.00% 30.00%



Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).


30.00% 30.00%



In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.


30.00% 30.00%



In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.


30.00% 30.00%



Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação


30.00% 30.00%



In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.


30.00% 30.00%



In this paper we investigate the ability of a number of different ordered probit models to predict ratings based on firm-specific data on business and financial risks. We investigate models based on momentum, drift and ageing and compare them against alternatives that take into account the initial rating of the firm and its previous actual rating. Using data on US bond issuing firms rated by Fitch over the years 2000 to 2007 we compare the performance of these models in predicting the rating in-sample and out-of-sample using root mean squared errors, Diebold-Mariano tests of forecast performance and contingency tables. We conclude that initial and previous states have a substantial influence on rating prediction.