111 resultados para data matching
em Université de Lausanne, Switzerland
Resumo:
Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.
Resumo:
La présente étude est à la fois une évaluation du processus de la mise en oeuvre et des impacts de la police de proximité dans les cinq plus grandes zones urbaines de Suisse - Bâle, Berne, Genève, Lausanne et Zurich. La police de proximité (community policing) est à la fois une philosophie et une stratégie organisationnelle qui favorise un partenariat renouvelé entre la police et les communautés locales dans le but de résoudre les problèmes relatifs à la sécurité et à l'ordre public. L'évaluation de processus a analysé des données relatives aux réformes internes de la police qui ont été obtenues par l'intermédiaire d'entretiens semi-structurés avec des administrateurs clés des cinq départements de police, ainsi que dans des documents écrits de la police et d'autres sources publiques. L'évaluation des impacts, quant à elle, s'est basée sur des variables contextuelles telles que des statistiques policières et des données de recensement, ainsi que sur des indicateurs d'impacts construit à partir des données du Swiss Crime Survey (SCS) relatives au sentiment d'insécurité, à la perception du désordre public et à la satisfaction de la population à l'égard de la police. Le SCS est un sondage régulier qui a permis d'interroger des habitants des cinq grandes zones urbaines à plusieurs reprises depuis le milieu des années 1980. L'évaluation de processus a abouti à un « Calendrier des activités » visant à créer des données de panel permettant de mesurer les progrès réalisés dans la mise en oeuvre de la police de proximité à l'aide d'une grille d'évaluation à six dimensions à des intervalles de cinq ans entre 1990 et 2010. L'évaluation des impacts, effectuée ex post facto, a utilisé un concept de recherche non-expérimental (observational design) dans le but d'analyser les impacts de différents modèles de police de proximité dans des zones comparables à travers les cinq villes étudiées. Les quartiers urbains, délimités par zone de code postal, ont ainsi été regroupés par l'intermédiaire d'une typologie réalisée à l'aide d'algorithmes d'apprentissage automatique (machine learning). Des algorithmes supervisés et non supervisés ont été utilisés sur les données à haute dimensionnalité relatives à la criminalité, à la structure socio-économique et démographique et au cadre bâti dans le but de regrouper les quartiers urbains les plus similaires dans des clusters. D'abord, les cartes auto-organisatrices (self-organizing maps) ont été utilisées dans le but de réduire la variance intra-cluster des variables contextuelles et de maximiser simultanément la variance inter-cluster des réponses au sondage. Ensuite, l'algorithme des forêts d'arbres décisionnels (random forests) a permis à la fois d'évaluer la pertinence de la typologie de quartier élaborée et de sélectionner les variables contextuelles clés afin de construire un modèle parcimonieux faisant un minimum d'erreurs de classification. Enfin, pour l'analyse des impacts, la méthode des appariements des coefficients de propension (propensity score matching) a été utilisée pour équilibrer les échantillons prétest-posttest en termes d'âge, de sexe et de niveau d'éducation des répondants au sein de chaque type de quartier ainsi identifié dans chacune des villes, avant d'effectuer un test statistique de la différence observée dans les indicateurs d'impacts. De plus, tous les résultats statistiquement significatifs ont été soumis à une analyse de sensibilité (sensitivity analysis) afin d'évaluer leur robustesse face à un biais potentiel dû à des covariables non observées. L'étude relève qu'au cours des quinze dernières années, les cinq services de police ont entamé des réformes majeures de leur organisation ainsi que de leurs stratégies opérationnelles et qu'ils ont noué des partenariats stratégiques afin de mettre en oeuvre la police de proximité. La typologie de quartier développée a abouti à une réduction de la variance intra-cluster des variables contextuelles et permet d'expliquer une partie significative de la variance inter-cluster des indicateurs d'impacts avant la mise en oeuvre du traitement. Ceci semble suggérer que les méthodes de géocomputation aident à équilibrer les covariables observées et donc à réduire les menaces relatives à la validité interne d'un concept de recherche non-expérimental. Enfin, l'analyse des impacts a révélé que le sentiment d'insécurité a diminué de manière significative pendant la période 2000-2005 dans les quartiers se trouvant à l'intérieur et autour des centres-villes de Berne et de Zurich. Ces améliorations sont assez robustes face à des biais dus à des covariables inobservées et covarient dans le temps et l'espace avec la mise en oeuvre de la police de proximité. L'hypothèse alternative envisageant que les diminutions observées dans le sentiment d'insécurité soient, partiellement, un résultat des interventions policières de proximité semble donc être aussi plausible que l'hypothèse nulle considérant l'absence absolue d'effet. Ceci, même si le concept de recherche non-expérimental mis en oeuvre ne peut pas complètement exclure la sélection et la régression à la moyenne comme explications alternatives. The current research project is both a process and impact evaluation of community policing in Switzerland's five major urban areas - Basel, Bern, Geneva, Lausanne, and Zurich. Community policing is both a philosophy and an organizational strategy that promotes a renewed partnership between the police and the community to solve problems of crime and disorder. The process evaluation data on police internal reforms were obtained through semi-structured interviews with key administrators from the five police departments as well as from police internal documents and additional public sources. The impact evaluation uses official crime records and census statistics as contextual variables as well as Swiss Crime Survey (SCS) data on fear of crime, perceptions of disorder, and public attitudes towards the police as outcome measures. The SCS is a standing survey instrument that has polled residents of the five urban areas repeatedly since the mid-1980s. The process evaluation produced a "Calendar of Action" to create panel data to measure community policing implementation progress over six evaluative dimensions in intervals of five years between 1990 and 2010. The impact evaluation, carried out ex post facto, uses an observational design that analyzes the impact of the different community policing models between matched comparison areas across the five cities. Using ZIP code districts as proxies for urban neighborhoods, geospatial data mining algorithms serve to develop a neighborhood typology in order to match the comparison areas. To this end, both unsupervised and supervised algorithms are used to analyze high-dimensional data on crime, the socio-economic and demographic structure, and the built environment in order to classify urban neighborhoods into clusters of similar type. In a first step, self-organizing maps serve as tools to develop a clustering algorithm that reduces the within-cluster variance in the contextual variables and simultaneously maximizes the between-cluster variance in survey responses. The random forests algorithm then serves to assess the appropriateness of the resulting neighborhood typology and to select the key contextual variables in order to build a parsimonious model that makes a minimum of classification errors. Finally, for the impact analysis, propensity score matching methods are used to match the survey respondents of the pretest and posttest samples on age, gender, and their level of education for each neighborhood type identified within each city, before conducting a statistical test of the observed difference in the outcome measures. Moreover, all significant results were subjected to a sensitivity analysis to assess the robustness of these findings in the face of potential bias due to some unobserved covariates. The study finds that over the last fifteen years, all five police departments have undertaken major reforms of their internal organization and operating strategies and forged strategic partnerships in order to implement community policing. The resulting neighborhood typology reduced the within-cluster variance of the contextual variables and accounted for a significant share of the between-cluster variance in the outcome measures prior to treatment, suggesting that geocomputational methods help to balance the observed covariates and hence to reduce threats to the internal validity of an observational design. Finally, the impact analysis revealed that fear of crime dropped significantly over the 2000-2005 period in the neighborhoods in and around the urban centers of Bern and Zurich. These improvements are fairly robust in the face of bias due to some unobserved covariate and covary temporally and spatially with the implementation of community policing. The alternative hypothesis that the observed reductions in fear of crime were at least in part a result of community policing interventions thus appears at least as plausible as the null hypothesis of absolutely no effect, even if the observational design cannot completely rule out selection and regression to the mean as alternative explanations.
Resumo:
One major methodological problem in analysis of sequence data is the determination of costs from which distances between sequences are derived. Although this problem is currently not optimally dealt with in the social sciences, it has some similarity with problems that have been solved in bioinformatics for three decades. In this article, the authors propose an optimization of substitution and deletion/insertion costs based on computational methods. The authors provide an empirical way of determining costs for cases, frequent in the social sciences, in which theory does not clearly promote one cost scheme over another. Using three distinct data sets, the authors tested the distances and cluster solutions produced by the new cost scheme in comparison with solutions based on cost schemes associated with other research strategies. The proposed method performs well compared with other cost-setting strategies, while it alleviates the justification problem of cost schemes.
Resumo:
Synchronization of data coming from different sources is of high importance in biomechanics to ensure reliable analyses. This synchronization can either be performed through hardware to obtain perfect matching of data, or post-processed digitally. Hardware synchronization can be achieved using trigger cables connecting different devices in many situations; however, this is often impractical, and sometimes impossible in outdoors situations. The aim of this paper is to describe a wireless system for outdoor use, allowing synchronization of different types of - potentially embedded and moving - devices. In this system, each synchronization device is composed of: (i) a GPS receiver (used as time reference), (ii) a radio transmitter, and (iii) a microcontroller. These components are used to provide synchronized trigger signals at the desired frequency to the measurement device connected. The synchronization devices communicate wirelessly, are very lightweight, battery-operated and thus very easy to set up. They are adaptable to every measurement device equipped with either trigger input or recording channel. The accuracy of the system was validated using an oscilloscope. The mean synchronization error was found to be 0.39 μs and pulses are generated with an accuracy of <2 μs. The system provides synchronization accuracy about two orders of magnitude better than commonly used post-processing methods, and does not suffer from any drift in trigger generation.
Resumo:
OBJECTIVES: This study aimed at investigating whether data from medical teleconsultations may contribute to influenza surveillance. METHODS: International Classification of Primary Care 2nd Edition (ICPC-2) codes were used to analyse the proportion of teleconsultations due to influenza-related symptoms. Results were compared with the weekly Swiss Sentinel reports. RESULTS: When using the ICPC-2 code for fever we could reproduce the seasonal influenza peaks of the winter seasons 07/08, 08/09 and 09/10 as depicted by the Sentinel data. For the pandemic influenza 09/10, we detected a much higher first peak in summer 2009 which correlated with a potential underreporting in the Sentinel system. CONCLUSIONS: ICPC-2 data from medical teleconsultations allows influenza surveillance in real time and correlates very well with the Swiss Sentinel system.
Resumo:
This letter describes a data telemetry biomedical experiment. An implant, consisting of a biometric data sensor, electronics, an antenna, and a biocompatible capsule, is described. All the elements were co-designed in order to maximize the transmission distance. The device was implanted in a pig for an in vivo experiment of temperature monitoring.
Resumo:
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open 'data commoning' culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared 'Investigation-Study-Assay' framework to support that vision.
Resumo:
Tobacco control has been recognized as a main public health concern in Seychelles for the past two decades. Tobacco advertising, sponsoring and promotion has been banned for years, tobacco products are submitted to high taxes, high-profile awareness programs are organized regularly, and several other control measures have been implemented. The Republic of Seychelles was the first country to ratify the WHO Framework Convention on Tobacco Control (FCTC) in the African region. Three population-based surveys have been conducted in adults in Seychelles and results showed a substantial decrease in the prevalence of smoking among adults between 1989 and 2004. A first survey in adolescents was conducted in Seychelles in 2002 (the Global Youth Tobacco Survey, GYTS) in a representative sample of 1321 girls and boys aged 13-15 years. The results show that approximately half of students had tried smoking and a quarter of both boys and girls had smoked at least one cigarette during the past 30 days. Although "current smoking" is defined differently in adolescents (>or=1 cigarette during the past 30 days) and in adults (>or=1 cigarette per day), which precludes direct comparison, the high smoking prevalence in youth in Seychelles likely predicts an increasing prevalence of tobacco use in the next adult generation, particularly in women. GYTS 2002 also provides important data on a wide range of specific individual and societal factors influencing tobacco use. Hence, GYTS can be a powerful tool for monitoring the situation of tobacco use in adolescents, for highlighting the need for new policy and programs, and for evaluating the impact of current and future programs.
Resumo:
A computerized handheld procedure is presented in this paper. It is intended as a database complementary tool, to enhance prospective risk analysis in the field of occupational health. The Pendragon forms software (version 3.2) has been used to implement acquisition procedures on Personal Digital Assistants (PDAs) and to transfer data to a computer in an MS-Access format. The data acquisition strategy proposed relies on the risk assessment method practiced at the Institute of Occupational Health Sciences (IST). It involves the use of a systematic hazard list and semi-quantitative risk assessment scales. A set of 7 modular forms has been developed to cover the basic need of field audits. Despite the minor drawbacks observed, the results obtained so far show that handhelds are adequate to support field risk assessment and follow-up activities. Further improvements must still be made in order to increase the tool effectiveness and field adequacy.
Resumo:
Knowledge of the spatial distribution of hydraulic conductivity (K) within an aquifer is critical for reliable predictions of solute transport and the development of effective groundwater management and/or remediation strategies. While core analyses and hydraulic logging can provide highly detailed information, such information is inherently localized around boreholes that tend to be sparsely distributed throughout the aquifer volume. Conversely, larger-scale hydraulic experiments like pumping and tracer tests provide relatively low-resolution estimates of K in the investigated subsurface region. As a result, traditional hydrogeological measurement techniques contain a gap in terms of spatial resolution and coverage, and they are often alone inadequate for characterizing heterogeneous aquifers. Geophysical methods have the potential to bridge this gap. The recent increased interest in the application of geophysical methods to hydrogeological problems is clearly evidenced by the formation and rapid growth of the domain of hydrogeophysics over the past decade (e.g., Rubin and Hubbard, 2005).
Resumo:
Pygmy Shrews in North America have variously been considered to be one species (Sorex hoyi) or two species (S. hoyi and S. thompsoni). Currently, only S. hoyi is recognized. In this study, we examine mitochondrial DNA sequence data for the cytochrome b gene to evaluate the level of differentiation and phylogeographic relationships among eleven samples of Pygmy Shrews from across Canada. Pygmy Shrews from eastern Canada (i.e., Ontario, Quebec, New Brunswick, Nova Scotia, and Prince Edward Island) are distinct from Pygmy Shrews from western Canada (Alberta, Yukon) and Alaska. The average level of sequence divergence between these clades (3.3%) falls within the range of values for other recognized pairs of sister species of shrews. A molecular clock based on third position transversion substitutions suggests that these two lineages diverged between 0.44 and 1.67 million years ago. These molecular phylogenetic data. combined with a reinterpretation of previously published morphological data, are suggestive of separate species status for S. hoyi and S. thompsoni as has been previously argued by others. Further analysis of specimens from geographically intermediate areas (e.g., Manitoba. northern Ontario) is required to determine if there is secondary contact and/or introgression between these two putative species.
Resumo:
Neurally adjusted ventilatory assist (NAVA) is a ventilation assist mode that delivers pressure in proportionality to electrical activity of the diaphragm (Eadi). Compared to pressure support ventilation (PS), it improves patient-ventilator synchrony and should allow a better expression of patient's intrinsic respiratory variability. We hypothesize that NAVA provides better matching in ventilator tidal volume (Vt) to patients inspiratory demand. 22 patients with acute respiratory failure, ventilated with PS were included in the study. A comparative study was carried out between PS and NAVA, with NAVA gain ensuring the same peak airway pressure as PS. Robust coefficients of variation (CVR) for Eadi and Vt were compared for each mode. The integral of Eadi (ʃEadi) was used to represent patient's inspiratory demand. To evaluate tidal volume and patient's demand matching, Range90 = 5-95 % range of the Vt/ʃEadi ratio was calculated, to normalize and compare differences in demand within and between patients and modes. In this study, peak Eadi and ʃEadi are correlated with median correlation of coefficients, R > 0.95. Median ʃEadi, Vt, neural inspiratory time (Ti_ ( Neural )), inspiratory time (Ti) and peak inspiratory pressure (PIP) were similar in PS and NAVA. However, it was found that individual patients have higher or smaller ʃEadi, Vt, Ti_ ( Neural ), Ti and PIP. CVR analysis showed greater Vt variability for NAVA (p < 0.005). Range90 was lower for NAVA than PS for 21 of 22 patients. NAVA provided better matching of Vt to ʃEadi for 21 of 22 patients, and provided greater variability Vt. These results were achieved regardless of differences in ventilatory demand (Eadi) between patients and modes.