937 resultados para Categorical variables
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
Resumo:
The article presents abstracts of papers for a conference on research methods including "On the Folly of Rewarding A While Hoping for B: A Critical Assessment of Theory Development," "All That Jazz: A Methodological Story of Stories," and "An Accounting of Counting: Universalism, Particularism, and the Counting of Qualitative Data."
Resumo:
2000 Mathematics Subject Classification: 62P10, 62H30
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
Introduction Leprosy is a chronic disease that affects skin and peripheral nerves. Disease complications include reactional episodes and physical impairment. One World Health Organization (WHO) goal of leprosy programs is to decrease the number of grade 2 impairment diagnoses by 2015. This study aims to evaluate clinical factors associated with the occurrence of leprosy reactions and physical impairment in leprosy patients. Methods We conducted a retrospective study of data from medical records of patients followed in two important centers for the treatment of leprosy in Aracaju, Sergipe, Brazil, from 2005 to 2011. We used the chi-square test to analyze associations between the following categorical variables: gender, age, operational classification, clinical forms, leprosy reactions, corticosteroid treatment, and physical impairment at the diagnosis and after cure. Clinical variables associated with multibacillary leprosy and/or reactional episodes and the presence of any grade of physical impairment after cure were evaluated using the logistic regression model. Results We found that men were more affected by multibacillary forms, reactional episodes, and grade 2 physical impairment at diagnosis. Leprosy reactions were detected in a total of 40% of patients and all were treated with corticosteroids. However, physical impairment was observed in 29.8% of the patients analyzed at the end of the treatment and our multivariate analysis associated a low dose and short period of corticosteroid treatment with persistence of physical impairments. Conclusions Physical impairment should receive an increased attention before and after treatment, and adequate treatment should be emphasized.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Categorical data cannot be interpolated directly because they are outcomes of discrete random variables. Thus, types of categorical variables are transformed into indicator functions that can be handled by interpolation methods. Interpolated indicator values are then backtransformed to the original types of categorical variables. However, aspects such as variability and uncertainty of interpolated values of categorical data have never been considered. In this paper we show that the interpolation variance can be used to map an uncertainty zone around boundaries between types of categorical variables. Moreover, it is shown that the interpolation variance is a component of the total variance of the categorical variables, as measured by the coefficient of unalikeability. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
The fuzzy min–max neural network classifier is a supervised learning method. This classifier takes the hybrid neural networks and fuzzy systems approach. All input variables in the network are required to correspond to continuously valued variables, and this can be a significant constraint in many real-world situations where there are not only quantitative but also categorical data. The usual way of dealing with this type of variables is to replace the categorical by numerical values and treat them as if they were continuously valued. But this method, implicitly defines a possibly unsuitable metric for the categories. A number of different procedures have been proposed to tackle the problem. In this article, we present a new method. The procedure extends the fuzzy min–max neural network input to categorical variables by introducing new fuzzy sets, a new operation, and a new architecture. This provides for greater flexibility and wider application. The proposed method is then applied to missing data imputation in voting intention polls. The micro data—the set of the respondents’ individual answers to the questions—of this type of poll are especially suited for evaluating the method since they include a large number of numerical and categorical attributes.
Resumo:
There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson’s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.
Resumo:
Apesar dos elevados riscos à saúde e capacidade para o trabalho dos eletricitários, há carência de estudos sobre o tema no Brasil. O objetivo desse estudo é identificar o perfil de saúde e capacidade para o trabalho de eletricitários de São Paulo. Foi feito um estudo transversal junto a 475 trabalhadores de uma empresa do setor eletricitário. A coleta de dados foi por meio de questionários sobre capacidade para o trabalho, estado de saúde, estresse no trabalho, atividade física, dependência ao tabaco e ao álcool. A consistência interna das escalas foi avaliada usando o coeficiente alfa de Cronbach. Foi feita análise descritiva por meio das médias, desvios-padrão, valores mínimos e máximos dos escores e proporções para as variáveis qualitativas. O estado de saúde dos trabalhadores apresentou pontuação elevada nas dimensões analisadas, com médias entre 72,8 a 91,2 (escore de 0,0 a 100,0 pontos). A capacidade para o trabalho teve pontuação elevada, com média de 41,8 (escore de 7,0 a 49,0 pontos). Concluiu-se que os trabalhadores da população de estudo apresentaram elevados padrões do estado de saúde e da capacidade para o trabalho. Sugere-se o desenvolvimento de estudos longitudinais para avaliar relações causais e a existência de efeito do trabalhador sadio.
Resumo:
OBJETIVO: Descrever o perfil de pacientes adultos residentes no município de São Paulo que evoluíram para óbito associado à tuberculose, segundo fatores biológicos, ambientais e institucionais. MÉTODOS: Estudo descritivo, abrangendo todos os óbitos por tuberculose (N=416) ocorridos em 2002, entre maiores de 15 anos. Os dados analisados foram obtidos do Sistema Municipal de Informações de Mortalidade, prontuários hospitalares, Serviço de Verificação de Óbitos e Sistema de Vigilância de Tuberculose. Os cálculos dos riscos relativos e intervalos de confiança de 95 por cento (IC 95 por cento) tiveram como referência o sexo feminino, grupo de 15 a 29 anos, e os naturais do Estado de São Paulo. A análise comparativa usou o teste do qui-quadrado de Pearson e o exato de Fisher para variáveis categóricas e o teste Kruskal-Wallis para variáveis contínuas. RESULTADOS: Do total de óbitos, 78 por cento apresentavam a forma pulmonar; o diagnóstico foi efetuado após a morte em 30 por cento e em unidades de atendimento primário em 14 por cento dos casos; 44 por cento não iniciaram tratamento; 49 por cento não foram notificados; 76 por cento eram homens e a mediana da idade foi de 51 anos; 52 por cento tinham até quatro anos de estudo, 4 por cento eram prováveis moradores de rua. As taxas de mortalidade aumentavam com a idade, sendo de 5,0/100.000 no município, variando de zero a 35, conforme o distrito. Para 82 de 232 pacientes com registro de tratamento, havia referência de tratamento anterior, e desses, 41 o haviam abandonado. Constatou-se presença de diabetes (16 por cento), doença pulmonar obstrutiva crônica (19 por cento), HIV (11 por cento), tabagismo (71 por cento) e alcoolismo (64 por cento) nos pacientes. CONCLUSÕES: Homens acima de 50 anos, migrantes e residentes em distritos com baixo Índice de Desenvolvimento Humano apresentam maiores riscos de óbito. )A pouca escolaridade e apresentar co-morbidades são características importante Observou-se baixa participação das unidades básicas de saúde no diagnóstico e a elevada sub-notificação
Resumo:
Background and Study Aim: This study evaluated the influence of competitive practice and training aspects on incidence of injuries to the lower limbs joints in formalized (taolu) and combat (sanshou) kung fu athletes. Material/Methods: One hundred and twenty-seven kung fu athletes (taolu, n=82; sanshou, n=45) were interviewed about kung fu practice (practice time, competition time and competition level), training volume (days of training per week and hours per training session) and injury profiles (incidence and type). Continuous variables were compared by non-parametric Kolmogorov-Smirnov test (disciplines and competition levels as grouping variables). The effects of categorical variables (kung fu practice) on injury profiles were analyzed using the Pearson`s chi-square test. The level of significance was set at p<0.05. Results: Our data exhibited large frequency of injury reports (70.1%) and significantly differences on injury profiles between disciplines and competition levels. Taolu athletes, despite the lower practice/competition time (-51.5 and -41.8%, respectively), presented frequency of injury reports twofold greater, longer daily training volume (23.3%) and higher incidence of lower limbs joints injuries than sanshou athletes (35.4% and 11.8%, respectively). Conclusions: Our results suggest a link between injury profiles (incidence and type) and specific characteristics of kung fu disciplines.
Resumo:
This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed-crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narrow and broad-leaved weeds. The inferred categorical variable values for the weed-crop competitiveness along with three other categorical variables extracted from estimated maps for the weed seed production and weed coverage are then used as input for a second Bayesian network classifier to infer categorical variables values for the risk of infestation. Weed biomass and yield loss data samples are used to learn the probability relationship among the nodes of the first and second Bayesian classifiers in a supervised fashion, respectively. For comparison purposes, two types of Bayesian network structures are considered, namely an expert-based Bayesian classifier and a naive Bayes classifier. The inference system focused on the knowledge interpretation by translating a Bayesian classifier into a set of classification rules. The results obtained for the risk inference in a corn-crop field are presented and discussed. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Aims and objectives. To compare the clinical profile of patients included in a clinical trial of autologous bone marrow cells as an adjunctive therapy to coronary artery bypass grafting with that of patients undergoing routine coronary artery bypass grafting. Background. The therapeutic potential of autologous bone marrow cells has been explored in the treatment of severe coronary artery disease. There are few data regarding the clinical and socio-economic profile of patients included in clinical trials using bone marrow cell. Design. Case-control study. Method. Sixty-seven patients (61 SD 9) years, 82% men) with multivessel coronary artery disease were divided into two groups: patients in the bone marrow cell group (n = 34) underwent incomplete coronary artery bypass grafting + intramyocardial injection of autologous bone marrow cells (lymphomonocytic fraction -2.0 (SD 0.2 x 108) cells/patient) in the ischaemic, non-revascularised myocardium, whereas patients in the coronary artery bypass grafting group (n = 33) underwent routine bypass surgery. Demographics, socio-economic status, clinical and echocardiographic data were collected. Statistical analysis included the Fisher`s exact test (categorical variables) and the Student`s t-test (continuous variables). Results. There were no significant differences between groups regarding age, gender, BMI, heart rate, blood pressure and echo data. There was a greater prevalence of obesity (65 vs. 33%; OR = 3.7 [1.3-10.1]), of previous myocardial infarction (68 vs. 39%; OR = 3.2 [1.2-8.8]) and prior revascularisation procedures (59 vs. 24%; OR = 4.5 [1.6-12.7]) in the autologous bone marrow cells group and of smokers in the coronary artery bypass grafting group (51 vs. 23%; OR = 3.5 [1.2-10.4]). Conclusions. Patients included in this clinical trial of autologous bone marrow cells for severe coronary artery disease presented a greater prevalence of myocardial revascularisation procedures, indicating a more severe clinical presentation of the disease. Fewer smokers in this group could be attributable to life style changes after previous cardiovascular events and/or interventions. Relevance to clinical practice. The knowledge of the clinical profile of patients included in cell therapy trials may help researchers in the identification of patients that may be enroled in future clinical trials of this new therapeutic strategy.