991 resultados para CATEGORICAL-DATA
Resumo:
This paper addresses the application of a PCA analysis on categorical data prior to diagnose a patients data set using a Case-Based Reasoning (CBR) system. The particularity is that the standard PCA techniques are designed to deal with numerical attributes, but our medical data set contains many categorical data and alternative methods as RS-PCA are required. Thus, we propose to hybridize RS-PCA (Regular Simplex PCA) and a simple CBR. Results show how the hybrid system produces similar results when diagnosing a medical data set, that the ones obtained when using the original attributes. These results are quite promising since they allow to diagnose with less computation effort and memory storage
Resumo:
In the analysis of multivariate categorical data, typically the analysis of questionnaire data, it is often advantageous, for substantive and technical reasons, to analyse a subset of response categories. In multiple correspondence analysis, where each category is coded as a column of an indicator matrix or row and column of Burt matrix, it is not correct to simply analyse the corresponding submatrix of data, since the whole geometric structure is different for the submatrix . A simple modification of the correspondence analysis algorithm allows the overall geometric structure of the complete data set to be retained while calculating the solution for the selected subset of points. This strategy is useful for analysing patterns of response amongst any subset of categories and relating these patterns to demographic factors, especially for studying patterns of particular responses such as missing and neutral responses. The methodology is illustrated using data from the International Social Survey Program on Family and Changing Gender Roles in 1994.
Resumo:
The generalization of simple (two-variable) correspondence analysis to more than two categorical variables, commonly referred to as multiple correspondence analysis, is neither obvious nor well-defined. We present two alternative ways of generalizing correspondence analysis, one based on the quantification of the variables and intercorrelation relationships, and the other based on the geometric ideas of simple correspondence analysis. We propose a version of multiple correspondence analysis, with adjusted principal inertias, as the method of choice for the geometric definition, since it contains simple correspondence analysis as an exact special case, which is not the situation of the standard generalizations. We also clarify the issue of supplementary point representation and the properties of joint correspondence analysis, a method that visualizes all two-way relationships between the variables. The methodology is illustrated using data on attitudes to science from the International Social Survey Program on Environment in 1993.
Resumo:
It is shown how correspondence analysis may be applied to a subset of response categories from a questionnaire survey, for example the subset of undecided responses or the subset of responses for a particular category. The idea is to maintain the original relative frequencies of the categories and not re-express them relative to totals within the subset, as would normally be done in a regular correspondence analysis of the subset. Furthermore, the masses and chi-square metric assigned to the data subset are the same as those in the correspondence analysis of the whole data set. This variant of the method, called Subset Correspondence Analysis, is illustrated on data from the ISSP survey on Family and Changing Gender Roles.
Resumo:
This paper addresses the application of a PCA analysis on categorical data prior to diagnose a patients data set using a Case-Based Reasoning (CBR) system. The particularity is that the standard PCA techniques are designed to deal with numerical attributes, but our medical data set contains many categorical data and alternative methods as RS-PCA are required. Thus, we propose to hybridize RS-PCA (Regular Simplex PCA) and a simple CBR. Results show how the hybrid system produces similar results when diagnosing a medical data set, that the ones obtained when using the original attributes. These results are quite promising since they allow to diagnose with less computation effort and memory storage
Resumo:
PURPOSE: The aim of this study was to evaluate serum levels of inducible nitric oxide synthase (INOS), myeloperoxidase (MPO), total antioxidant status (TAS), and total oxidative status (TOS) in women with primary ovarian insufficiency (POI) and to compare them with healthy fertile women. We also examined the possible risk factors associated with POI.METHODS: This cross-sectional case control study was conducted in Zekai Tahir Burak Women's Health Education and Research Hospital. The study population consisted of 44 women with POI (study group) and 36 healthy fertile women (control group). In all patients, serum levels of INOS, MPO, TAS, and TOS were determined. INOS and MPO levels were measured by enzyme-linked immunosorbent assay whereas colorimetric method was used for evaluating TAS and TOS levels. Age, body mass index (BMI), obstetric history, smoking status, family history, comorbidities, sonographic findings, complete blood count values, C-reactive protein and baseline hormone levels were also analyzed. Student's t-test or Mann-Whitney U test was used to compare continuous variables between the groups; categorical data were evaluated by using Pearson χ2 or Fisher exact test, when appropriate. Binary logistic regression method was used to identify risk factors for POI.RESULTS: We found significantly elevated levels of INOS (234.1±749.5 versus133.8±143.0; p=0.005), MPO (3,438.7±1,228.6 versus 2,481.9±1,230.1; p=0.001), and TOS (4.3±1.4 versus 3.6±1.4; p=0.02) in the sera of the study group when compared to the BMI-age matched control group. However, difference in serum levels of TAS were not significant between the 2 groups (1.7±0.2 versus 1.6±0.2; p=0.15). Logistic regression method demonstrated that BMI <25 kg/m2, nulliparity, family history of POI, smoking, and elevated serum levels of INOS, MPO, and TOS were independent risk factors for POI.CONCLUSION: We found an increase in INOS, MPO, and TOS in women with POI. These serum markers may be promising in early diagnosis of POI. Further large-scale studies are required to determine whether oxidative stress markers have a role in diagnosing POI.
Resumo:
Introduction: Le gène O6-méthylguanine-ADN méthyltransferase (MGMT) code pour une enzyme spécifique réparatrice de l’ADN qui protège les cellules de la toxicité des agents alkylants. Ainsi, l’activité du MGMT est un mécanisme majeur de résistance aux agents alkylants. Il a été démontré qu’une diminution de l’expression du gène MGMT par une hyperméthylation du promoteur résulte en une amélioration de la survie chez les patients avec certains types de tumeurs qui sont traitées avec des agents chimiothérapeuthique alkylants. Objectifs: Déterminer la prévalence de la méthylation du gène MGMT chez des patients avec des cancers épidermoïdes localement avancés de la sphère ORL traités avec chimioradiothérapie et évaluer l’impact de cette méthylation sur la survie. Méthodes: Sur 428 patients consécutifs, traités avec chimioradiothérapie à notre institution et suivis pour un période médiane de 37 mois, 199 spécimens chirurgicaux paraffinés ont été récupérés. L’ADN était extrait et modifié par le traitement au bisulfite. Une réaction en chaîne de la polymérase, spécifique à la méthylation était entreprise pour évaluer l’état de méthylation du promoteur du gène du MGMT. Les résultats de laboratoire étaient corrélés avec la réponse clinique. L’analyse statistique était exécutée à l’aide du test de Fisher pour les données catégoriques et à l’aide des courbes de Kaplan-Meier pour les échecs au traitement. Résultats : Des 199 extraits d’ADN initiaux, 173 (87%) étaient modifiés au bisulfite avec succès. Des ces spécimens modifiés, 71 (41%) ont démontré une hyperméthylation du MGMT. Pour les cas de méthylation et nonméthylation du MGMT, les caractéristiques des patients n’étaient pas significativement différentes. Les taux de réponse étaient 71 et 73% (p=NS) respectivement. Le contrôle locorégional était respectivement 87 et 77% (p=0.26), la survie sans maladie était 80 et 60% (p=0.38), la survie sans métastase à distance était 92 et 78% (p=0.08) et la survie globale était 64 et 62% (p=0.99) à 3 ans. Conclusions : L’état de méthylation du MGMT est fortement prévalent (41%) et semble avoir un possible impact bénéfique sur la survie quand la chimioradiothérapie est administrée aux patients avec des stades avancés de cancers tête et cou.
Resumo:
Problématique : Bien que le tabac et l’alcool soient les facteurs causaux principaux des cancers épidermoïdes de l’oropharynx, le virus du papillome humain (VPH) serait responsable de l’augmentation récente de l’incidence de ces cancers, particulièrement chez les patients jeunes et/ou non-fumeurs. La prévalence du VPH à haut risque, essentiellement de type 16, est passée de 20% à plus de 60% au cours des vingt dernières années. Certaines études indiquent que les cancers VPH-positifs ont un meilleur pronostic que les VPH- négatifs, mais des données prospectives à cet égard sont rares dans la littérature, surtout pour les études de phase III avec stratification basée sur les risques. Hypothèses et objectifs : Il est présumé que la présence du VPH est un facteur de bon pronostic. L’étude vise à documenter la prévalence du VPH dans les cancers de l’oropharynx, et à établir son impact sur le pronostic, chez des patients traités avec un schéma thérapeutique incluant la chimio-radiothérapie. Méthodologie : Les tumeurs proviennent de cas traités au CHUM pour des cancers épidermoïdes de la sphère ORL à un stade localement avancé (III, IVA et IVB). Elles sont conservées dans une banque tumorale, et les données cliniques sur l’efficacité du traitement et les effets secondaires, recueillies prospectivement. La présence du VPH est établie par biologie moléculaire déterminant la présence du génome VPH et son génotype. Résultats: 255 spécimens ont été soumis au test de génotypage Linear Array HPV. Après amplification par PCR, de l’ADN viral a été détecté dans 175 (68.6%) échantillons tumoraux ; le VPH de type 16 était impliqué dans 133 cas (52.25 %). Conclusion: Une proportion grandissante de cancers ORL est liée au VPH. Notre étude confirme que la présence du VPH est fortement associée à une amélioration du pronostic chez les patients atteints de cancers ORL traités par chimio-radiothérapie, et devrait être un facteur de stratification dans les essais cliniques comprenant des cas de cancers ORL.
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
By using suitable parameters, we present a uni¯ed aproach for describing four methods for representing categorical data in a contingency table. These methods include: correspondence analysis (CA), the alternative approach using Hellinger distance (HD), the log-ratio (LR) alternative, which is appropriate for compositional data, and the so-called non-symmetrical correspondence analysis (NSCA). We then make an appropriate comparison among these four methods and some illustrative examples are given. Some approaches based on cumulative frequencies are also linked and studied using matrices. Key words: Correspondence analysis, Hellinger distance, Non-symmetrical correspondence analysis, log-ratio analysis, Taguchi inertia
Resumo:
This paper addresses the application of a PCA analysis on categorical data prior to diagnose a patients data set using a Case-Based Reasoning (CBR) system. The particularity is that the standard PCA techniques are designed to deal with numerical attributes, but our medical data set contains many categorical data and alternative methods as RS-PCA are required. Thus, we propose to hybridize RS-PCA (Regular Simplex PCA) and a simple CBR. Results show how the hybrid system produces similar results when diagnosing a medical data set, that the ones obtained when using the original attributes. These results are quite promising since they allow to diagnose with less computation effort and memory storage
Resumo:
Los solventes orgánicos son sustancias químicas que por sus propiedades físico-químicas son fácilmente inhalados o absorbidos por la piel, pueden causar daños de diversa índole en la salud. En Colombia existen normas que contemplan las medidas de protección, sin embargo persiste la informalidad en el sector de pintores de autos, por lo cual los trabajadores expuestos, a largo plazo pueden ver afectada su salud. En este estudio se analizó la relación entre individuos expuestos laboralmente a los solventes orgánicos versus no expuestos con respecto a la longitud de sus telómeros y formación de fragilidades. Se emplearon muestras de sangre extraídas por venopunción, recolectada en dos tubos: uno con Heparina, destinado al cultivo de linfocitos, para obtener cromosomas metafásicos y evaluar en ellos la presencia de fragilidades; el otro tubo con EDTA, fue empleado para la extracción de ADN y se utilizó para obtener los valores de longitud telomérica mediante la técnica de PCR cuantitativa. Los análisis estadísticos se realizaron aplicando la prueba de rangos de Wilcoxon, en el caso de la presencia de fragilidades se analizó la razón No.Fragilidades/No.Metafases, aplicando el método de Wilcoxon se encontró que existe diferencia estadísticamente significativa entre expuestos y no expuestos (p = 0,036), en donde los expuestos presentan mayor frecuencia de fragilidades. Por otra parte el valor relativo de longitud telomérica del grupo de expuestos fue mayor que el observado en el grupo de no expuestos, esta diferencia fue estadísticamente significativa (Wilcoxon, p = 0.002).
Resumo:
Objective To undertake a process evaluation of pharmacists' recommendations arising in the context of a complex IT-enabled pharmacist-delivered randomised controlled trial (PINCER trial) to reduce the risk of hazardous medicines management in general practices. Methods PINCER pharmacists manually recorded patients’ demographics, details of interventions recommended, actions undertaken by practice staff and time taken to manage individual cases of hazardous medicines management. Data were coded and double entered into SPSS v15, and then summarised using percentages for categorical data (with 95% CI) and, as appropriate, means (SD) or medians (IQR) for continuous data. Key findings Pharmacists spent a median of 20 minutes (IQR 10, 30) reviewing medical records, recommending interventions and completing actions in each case of hazardous medicines management. Pharmacists judged 72% (95%CI 70, 74) (1463/2026) of cases of hazardous medicines management to be clinically relevant. Pharmacists recommended 2105 interventions in 74% (95%CI 73, 76) (1516/2038) of cases and 1685 actions were taken in 61% (95%CI 59, 63) (1246/2038) of cases; 66% (95%CI 64, 68) (1383/2105) of interventions recommended by pharmacists were completed and 5% (95%CI 4, 6) (104/2105) of recommendations were accepted by general practitioners (GPs), but not completed at the end of the pharmacists’ placement; the remaining recommendations were rejected or considered not relevant by GPs. Conclusions The outcome measures were used to target pharmacist activity in general practice towards patients at risk from hazardous medicines management. Recommendations from trained PINCER pharmacists were found to be broadly acceptable to GPs and led to ameliorative action in the majority of cases. It seems likely that the approach used by the PINCER pharmacists could be employed by other practice pharmacists following appropriate training.
Resumo:
This article presents important properties of standard discrete distributions and its conjugate densities. The Bernoulli and Poisson processes are described as generators of such discrete models. A characterization of distributions by mixtures is also introduced. This article adopts a novel singular notation and representation. Singular representations are unusual in statistical texts. Nevertheless, the singular notation makes it simpler to extend and generalize theoretical results and greatly facilitates numerical and computational implementation.
Análise genética de escores de avaliação visual de bovinos com modelos bayesianos de limiar e linear
Resumo:
O objetivo deste trabalho foi comparar as estimativas de parâmetros genéticos obtidas em análises bayesianas uni-característica e bi-característica, em modelo animal linear e de limiar, considerando-se as características categóricas morfológicas de bovinos da raça Nelore. Os dados de musculosidade, estrutura física e conformação foram obtidos entre 2000 e 2005, em 3.864 animais de 13 fazendas participantes do Programa Nelore Brasil. Foram realizadas análises bayesianas uni e bi-características, em modelos de limiar e linear. de modo geral, os modelos de limiar e linear foram eficientes na estimação dos parâmetros genéticos para escores visuais em análises bayesianas uni-características. Nas análises bi-características, observou-se que: com utilização de dados contínuos e categóricos, o modelo de limiar proporcionou estimativas de correlação genética de maior magnitude do que aquelas do modelo linear; e com o uso de dados categóricos, as estimativas de herdabilidade foram semelhantes. A vantagem do modelo linear foi o menor tempo gasto no processamento das análises. Na avaliação genética de animais para escores visuais, o uso do modelo de limiar ou linear não influenciou a classificação dos animais, quanto aos valores genéticos preditos, o que indica que ambos os modelos podem ser utilizados em programas de melhoramento genético.