949 resultados para monotone missing data
Resumo:
OBJECTIVE: The objective of this study was to analyse the use of lights and siren (L&S) during transport to the hospital by the prehospital severity status of the patient and the time saved by the time of day of the mission. METHODS: We searched the Public Health Services data of a Swiss state from 1 January 2010 to 31 December 2010. All primary patient transports within the state were included (24 718). The data collected were on the use of L&S, patient demographics, the time and duration of transport, the type of mission (trauma vs. nontrauma) and the severity of the condition according to the National Advisory Committee for Aeronautics (NACA) score assigned by the paramedics and/or emergency physician. We excluded 212 transports because of missing data. RESULTS: A total of 24 506 ambulance transports met the inclusion criteria. L&S were used 4066 times, or in 16.6% of all missions. Of these, 40% were graded NACA less than 4. Overall, the mean total transport time to return to the hospital was 11.09 min (confidence interval 10.84-11.34) with L&S and 12.84 min (confidence interval 12.72-12.96) without. The difference was 1.75 min (105 s; P<0.001). For night-time runs alone, the mean time saved using L&S was 0.17 min (10.2 s; P=0.27). CONCLUSION: At present, the use of L&S seems questionable given the severity status or NACA score of transported patients. Our results should prompt the implementation of more specific regulations for L&S use during transport to the hospital, taking into consideration certain physiological criteria of the victim as well as time of day of transport.
Resumo:
Background: A patient's chest pain raises concern for the possibility of coronary heart disease (CHD). An easy to use clinical prediction rule has been derived from the TOPIC study in Lausanne. Our objective is to validate this clinical score for ruling out CHD in primary care patients with chest pain. Methods: This secondary analysis used data collected from a oneyear follow-up cohort study attending 76 GPs in Germany. Patients attending their GP with chest pain were questioned on their age, gender, duration of chest pain (1-60 min), sternal pain location, pain increases with exertion, absence of tenderness point at palpation, cardiovascular risks factors, and personal history of cardiovascular disease. Area under the curve (ROC), sensitivity and specificity of the Lausanne CHD score were calculated for patients with full data. Results: 1190 patients were included. Full data was available for 509 patients (42.8%). Missing data was not related to having CHD (p = 0.397) or having a cardiovascular risk factor (p = 0.275). 76 (14.9%) were diagnosed with a CHD. Prevalence of CHD were respectively of 68/344 (19.8%), 2/62 (3.2%), 6/103 (5.8%) in the high, intermediate and low risk category. ROC was of 72.9 (CI95% 66.8; 78.9). Ruling out patients with low risk has a sensitivity of 92.1% (CI95% 83.0; 96.7) and a specificity of 22.4% (CI95% 18.6%; 26.7%). Conclusion: The Lausanne CHD score shows reasonably good sensitivity and can be used to rule out coronary events in patients with chest pain. Patients at risk of CHD for other rarer reasons should nevertheless also be investigated.
Resumo:
There has been relatively little change over recent decades in the methods used in research on self-reported delinquency. Face-to-face interviews and selfadministered interviews in the classroom are still the predominant alternatives envisaged. New methods have been brought into the picture by recent computer technology, the Internet, and an increasing availability of computer equipment and Internet access in schools. In the autumn of 2004, a controlled experiment was conducted with 1,203 students in Lausanne (Switzerland), where "paper-and-pencil" questionnaires were compared with computer-assisted interviews through the Internet. The experiment included a test of two different definitions of the (same) reference period. After the introductory question ("Did you ever..."), students were asked how many times they had done it (or experienced it), if ever, "over the last 12 months" or "since the October 2003 vacation". Few significant differences were found between the results obtained by the two methods and for the two definitions of the reference period, in the answers concerning victimisation, self-reported delinquency, drug use, failure to respond (missing data). Students were found to be more motivated to respond through the Internet, take less time for filling out the questionnaire, and were apparently more confident of privacy, while the school principals were less reluctant to allow classes to be interviewed through the Internet. The Internet method also involves considerable cost reductions, which is a critical advantage if self-reported delinquency surveys are to become a routinely applied method of evaluation, particularly so in countries with limited resources. On balance, the Internet may be instrumental in making research on self-reported delinquency far more feasible in situations where limited resources so far have prevented its implementation.
Resumo:
INTRODUCTION: This study describes the characteristics of the metabolic syndrome in HIV-positive patients in the Data Collection on Adverse Events of Anti-HIV Drugs study and discusses the impact of different methodological approaches on estimates of the prevalence of metabolic syndrome over time. METHODS: We described the prevalence of the metabolic syndrome in patients under follow-up at the end of six calendar periods from 2000 to 2007. The definition that was used for the metabolic syndrome was modified to take account of the use of lipid-lowering and antihypertensive medication, measurement variability and missing values, and assessed the impact of these modifications on the estimated prevalence. RESULTS: For all definitions considered, there was an increasing prevalence of the metabolic syndrome over time, although the prevalence estimates themselves varied widely. Using our primary definition, we found an increase in prevalence from 19.4% in 2000/2001 to 41.6% in 2006/2007. Modification of the definition to incorporate antihypertensive and lipid-lowering medication had relatively little impact on the prevalence estimates, as did modification to allow for missing data. In contrast, modification to allow the metabolic syndrome to be reversible and to allow for measurement variability lowered prevalence estimates substantially. DISCUSSION: The prevalence of the metabolic syndrome in cohort studies is largely based on the use of nonstandardized measurements as they are captured in daily clinical care. As a result, bias is easily introduced, particularly when measurements are both highly variable and may be missing. We suggest that the prevalence of the metabolic syndrome in cohort studies should be based on two consecutive measurements of the laboratory components in the syndrome definition.
Resumo:
Background: The Valais's cancer registry (RVsT) of the Observatoire valaisan de le santé (OVS) and the department of oncology of Valais's Hospital conducted a study on the epidemiology and pattern of care of colorectal cancer in Valais. Colorectal cancer is the third cause of death by cancer in Switzerland with about 1600 deaths per year. It is the third most frequent cancer for males and the second most frequent for females in Valais. The number of new colorectal cancer cases (average per year) increased between 1989 and 2009 for males as well as for females in Valais. The number of colorectal cancer death cases (average per year) slightly increased between 1989 and 2009 for males as well as for females in Valais. Age-standardized rates of incidence were stable for males and females in Valais and in Switzerland between 1989 and 2009, while age-standardized rates of mortality decreased for males and females in Valais and Switzerland. Results: 774 cases were recorded (59% males). Median age at diagnosis was 70 years old. Most of cancers were invasive (79%) and the main localization was the colon (71%). The most frequent mode of detection was a consultation for non emergency symptoms (75%), but almost 10% of patients consulted in emergency. 82% of patients were treated within 30 days from diagnosis. 90% of the patients were treated by surgery alone or with combined treatment. The first treatment was surgery, including endoscopic resection in 86% of the cases. The treatment was different according to the localization and the stage of the cancer. Survival rate was 95% at 30 days and 79% at one year. The survival was dependent on the stage and the age at diagnosis. Cox model shows an association between mortality and age (better survival for young people) and between mortality and stage (better survival for the lower stages). Methods: RVsT collects information on all cancer cases since 1989 for people registered in the communes of Valais. RVsT has an authorization to collect non anonymized data. All new incident cancers are coded according to the International Classification of Diseases for Oncology (ICD-O-3) and the stages are coded according to the TNM classification. We studied all cases of in situ and invasive colorectal cancers diagnosed between 2006 and 2009 and registered routinely at the RVsT. We checked for data completeness and if necessary sent questionnaires to avoid missing data. A distance of 15 cm has been chosen to delimitate the colon (sigmoid) and the rectal cancers. We made an active follow-up for vital status to have a valid survival analysis. We analyzed the characteristics of the tumors according to age, sex, localization and stage with stata 9 software. Kaplan-Meier curves were generated and Cox model were fitted to analyze survival. Conclusion: The characteristics of patients and tumors and the one year survival were similar to those observed in Switzerland and some European countries. Patterns of care were close to those recommended in guidelines. Routine data recorded in a cancer registry can be used, not only to provide general statistics, but also to help clinicians assess local practices.
Resumo:
OBJECTIVES: High prevalence of trauma has been reported in psychosis. While role of trauma as a risk factor for developing psychosis is still debated, its negative impact on outcome has been described. Few studies have explored this issue in first-episode psychosis (FEP) patients. We assessed rate of stressful events, as well as premorbid and outcome correlates of past sexual and/or physical abuse (SPA) in an epidemiological FEP patients cohort. METHODS: The Early Psychosis Prevention and Intervention Centre admitted 786 FEP patients between 1998 and 2000. Data were collected from patients' files using a standardized questionnaire. A total of 704 files were available, 43 excluded because of a nonpsychotic diagnosis at end point and 3 due to missing data regarding past stressful events; 658 patients were analyzed. RESULTS: A total of 83% patients had been exposed to at least one stressful event and 34% to SPA. SPA patients were more likely to have presented other psychiatric disorders before psychosis onset (posttraumatic stress disorder, substance use disorder), to have made suicide attempts in the past, and to have had poorer premorbid functional levels. Additionally, SPA patients had higher rate of comorbid diagnosis at program entry and were more likely to attempt suicide during treatment. CONCLUSIONS: SPA prevalence is high in FEP patients and must be explored by clinicians considering its durable impact on psychological balance and link with long-lasting suicidal risk. More research is warranted to better understand mechanisms involved between trauma and its potential consequences, as well as to develop psychological interventions adapted to this very sensitive and complex issue.
Resumo:
AIM: The aim of this cross-sectional study was to provide normative data (ordinal scores and timed performances) for gross and fine motor tasks in typically developing children between 3 and 5 years of age using the Zurich Neuromotor Assessment (ZNA). METHOD: Typically developing children (n=101; 48 males, 53 females) between 3 and 5 years of age were enrolled from day-care centres in the greater Zurich area and tested using a modified version of the ZNA; the tests were recorded digitally on video. Intraobserver reliability was assessed on the videos of 20 children by one examiner. Interobserver reliability was assessed by two examiners. Test-retest reliability was performed on an additional 20 children. The modelling approach summarized the data with a linear age effect and an additive term for sex, while incorporating informative missing data in the normative values. Normative data for adaptive motor tasks, pure motor tasks, and static and dynamic balance were calculated with centile curves (for timed performance) and expected ordinal scores (for ordinal scales). RESULTS: Interobserver, intraobserver, and test-retest reliability of tasks were moderate to good. Nearly all tasks showed significant age effects, whereas sex was significant only for stringing beads and hopping on one leg. INTERPRETATION: These results indicate that timed performance and ordinal scales of neuromotor tasks can be reliably measured in preschool children and are characterized by developmental change and high interindividual variability.
Resumo:
Background: In order to provide a cost-effective tool to analyse pharmacogenetic markers in malaria treatment, DNA microarray technology was compared with sequencing of polymerase chain reaction (PCR) fragments to detect single nucleotide polymorphisms (SNPs) in a larger number of samples. Methods: The microarray was developed to affordably generate SNP data of genes encoding the human cytochrome P450 enzyme family (CYP) and N-acetyltransferase-2 (NAT2) involved in antimalarial drug metabolisms and with known polymorphisms, i.e. CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, and NAT2. Results: For some SNPs, i.e. CYP2A6*2, CYP2B6*5, CYP2C8*3, CYP2C9*3/*5, CYP2C19*3, CYP2D6*4 and NAT2*6/*7/*14, agreement between both techniques ranged from substantial to almost perfect (kappa index between 0.61 and 1.00), whilst for other SNPs a large variability from slight to substantial agreement (kappa index between 0.39 and 1.00) was found, e. g. CYP2D6*17 (2850C>T), CYP3A4*1B and CYP3A5*3. Conclusion: The major limit of the microarray technology for this purpose was lack of robustness and with a large number of missing data or with incorrect specificity.
Resumo:
En este trabajo se analiza el efecto de la selección de datos sobre las estimaciones de heredabilidad. Se estimó el valor de heredabilidad del tamaño de camada en una población porcina en la que los datos correspondientes a las cerdas más viejas eran una muestra seleccionada. Las estimaciones se obtuvieron usando distintos conjuntos de datos derivados de toda la información disponible. Esos conjunto de datos se compararon evaluando su capacidad predictiva. Se vio que las estimaciones de heredabilidad obtenidas utilizando todos los datos disponibles correspondían a valores infraestimados. También se simuló un carácter materno y se generó un conjunto de datos seleccionados eliminando aquellos correspondientes a las hembras sin padres conocidos. Distintos modelos, habitualmente empleados cuando no existe selección de registros, se consideraron para estimar el valor de heredabilidad. Los resultados mostraron que ninguno de esos modelos ofrecía estimaciones insesgadas. Sólo los modelos que tenían en cuenta el efecto de la selección sobre la media residual y la media y varianza genéticas ofrecían estimaciones poco sesgadas. Sin embargo, para poder aplicarlos se debe conocer la selección realizada. El problema de la selección de datos es difícil de abordar cuando se desconoce cual es el proceso de selección que se ha realizado en una población.
Resumo:
The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.
Resumo:
Affiliation: Département de Biochimie, Université de Montréal
Resumo:
Les données manquantes sont fréquentes dans les enquêtes et peuvent entraîner d’importantes erreurs d’estimation de paramètres. Ce mémoire méthodologique en sociologie porte sur l’influence des données manquantes sur l’estimation de l’effet d’un programme de prévention. Les deux premières sections exposent les possibilités de biais engendrées par les données manquantes et présentent les approches théoriques permettant de les décrire. La troisième section porte sur les méthodes de traitement des données manquantes. Les méthodes classiques sont décrites ainsi que trois méthodes récentes. La quatrième section contient une présentation de l’Enquête longitudinale et expérimentale de Montréal (ELEM) et une description des données utilisées. La cinquième expose les analyses effectuées, elle contient : la méthode d’analyse de l’effet d’une intervention à partir de données longitudinales, une description approfondie des données manquantes de l’ELEM ainsi qu’un diagnostic des schémas et du mécanisme. La sixième section contient les résultats de l’estimation de l’effet du programme selon différents postulats concernant le mécanisme des données manquantes et selon quatre méthodes : l’analyse des cas complets, le maximum de vraisemblance, la pondération et l’imputation multiple. Ils indiquent (I) que le postulat sur le type de mécanisme MAR des données manquantes semble influencer l’estimation de l’effet du programme et que (II) les estimations obtenues par différentes méthodes d’estimation mènent à des conclusions similaires sur l’effet de l’intervention.
Resumo:
L’explosion du nombre de séquences permet à la phylogénomique, c’est-à-dire l’étude des liens de parenté entre espèces à partir de grands alignements multi-gènes, de prendre son essor. C’est incontestablement un moyen de pallier aux erreurs stochastiques des phylogénies simple gène, mais de nombreux problèmes demeurent malgré les progrès réalisés dans la modélisation du processus évolutif. Dans cette thèse, nous nous attachons à caractériser certains aspects du mauvais ajustement du modèle aux données, et à étudier leur impact sur l’exactitude de l’inférence. Contrairement à l’hétérotachie, la variation au cours du temps du processus de substitution en acides aminés a reçu peu d’attention jusqu’alors. Non seulement nous montrons que cette hétérogénéité est largement répandue chez les animaux, mais aussi que son existence peut nuire à la qualité de l’inférence phylogénomique. Ainsi en l’absence d’un modèle adéquat, la suppression des colonnes hétérogènes, mal gérées par le modèle, peut faire disparaître un artéfact de reconstruction. Dans un cadre phylogénomique, les techniques de séquençage utilisées impliquent souvent que tous les gènes ne sont pas présents pour toutes les espèces. La controverse sur l’impact de la quantité de cellules vides a récemment été réactualisée, mais la majorité des études sur les données manquantes sont faites sur de petits jeux de séquences simulées. Nous nous sommes donc intéressés à quantifier cet impact dans le cas d’un large alignement de données réelles. Pour un taux raisonnable de données manquantes, il appert que l’incomplétude de l’alignement affecte moins l’exactitude de l’inférence que le choix du modèle. Au contraire, l’ajout d’une séquence incomplète mais qui casse une longue branche peut restaurer, au moins partiellement, une phylogénie erronée. Comme les violations de modèle constituent toujours la limitation majeure dans l’exactitude de l’inférence phylogénétique, l’amélioration de l’échantillonnage des espèces et des gènes reste une alternative utile en l’absence d’un modèle adéquat. Nous avons donc développé un logiciel de sélection de séquences qui construit des jeux de données reproductibles, en se basant sur la quantité de données présentes, la vitesse d’évolution et les biais de composition. Lors de cette étude nous avons montré que l’expertise humaine apporte pour l’instant encore un savoir incontournable. Les différentes analyses réalisées pour cette thèse concluent à l’importance primordiale du modèle évolutif.
Resumo:
Malgré des progrès constants en termes de capacité de calcul, mémoire et quantité de données disponibles, les algorithmes d'apprentissage machine doivent se montrer efficaces dans l'utilisation de ces ressources. La minimisation des coûts est évidemment un facteur important, mais une autre motivation est la recherche de mécanismes d'apprentissage capables de reproduire le comportement d'êtres intelligents. Cette thèse aborde le problème de l'efficacité à travers plusieurs articles traitant d'algorithmes d'apprentissage variés : ce problème est vu non seulement du point de vue de l'efficacité computationnelle (temps de calcul et mémoire utilisés), mais aussi de celui de l'efficacité statistique (nombre d'exemples requis pour accomplir une tâche donnée). Une première contribution apportée par cette thèse est la mise en lumière d'inefficacités statistiques dans des algorithmes existants. Nous montrons ainsi que les arbres de décision généralisent mal pour certains types de tâches (chapitre 3), de même que les algorithmes classiques d'apprentissage semi-supervisé à base de graphe (chapitre 5), chacun étant affecté par une forme particulière de la malédiction de la dimensionalité. Pour une certaine classe de réseaux de neurones, appelés réseaux sommes-produits, nous montrons qu'il peut être exponentiellement moins efficace de représenter certaines fonctions par des réseaux à une seule couche cachée, comparé à des réseaux profonds (chapitre 4). Nos analyses permettent de mieux comprendre certains problèmes intrinsèques liés à ces algorithmes, et d'orienter la recherche dans des directions qui pourraient permettre de les résoudre. Nous identifions également des inefficacités computationnelles dans les algorithmes d'apprentissage semi-supervisé à base de graphe (chapitre 5), et dans l'apprentissage de mélanges de Gaussiennes en présence de valeurs manquantes (chapitre 6). Dans les deux cas, nous proposons de nouveaux algorithmes capables de traiter des ensembles de données significativement plus grands. Les deux derniers chapitres traitent de l'efficacité computationnelle sous un angle différent. Dans le chapitre 7, nous analysons de manière théorique un algorithme existant pour l'apprentissage efficace dans les machines de Boltzmann restreintes (la divergence contrastive), afin de mieux comprendre les raisons qui expliquent le succès de cet algorithme. Finalement, dans le chapitre 8 nous présentons une application de l'apprentissage machine dans le domaine des jeux vidéo, pour laquelle le problème de l'efficacité computationnelle est relié à des considérations d'ingénierie logicielle et matérielle, souvent ignorées en recherche mais ô combien importantes en pratique.
Resumo:
We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.