80 resultados para Data Mining and its Application

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model-machine learning (ML) residuals sequential simulations-MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. NIL algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Originally composed of the single family Chlamydiaceae, the Chlamydiales order has extended considerably over the last several decades. Chlamydia-related bacteria were added and classified into six different families and family-level lineages: the Criblamydiaceae, Parachlamydiaceae, Piscichlamydiaceae, Rhabdochlamydiaceae, Simkaniaceae, and Waddliaceae. While several members of the Chlamydiaceae family are known pathogens, recent studies showed diverse associations of Chlamydia-related bacteria with human and animal infections. Some of these latter bacteria might be of medical importance since, given their ability to replicate in free-living amoebae, they may also replicate efficiently in other phagocytic cells, including cells of the innate immune system. Thus, a new Chlamydiales-specific real-time PCR targeting the conserved 16S rRNA gene was developed. This new molecular tool can detect at least five DNA copies and show very high specificity without cross-amplification from other bacterial clade DNA. The new PCR was validated with 128 clinical samples positive or negative for Chlamydia trachomatis or C. pneumoniae. Of 65 positive samples, 61 (93.8%) were found to be positive with the new PCR. The four discordant samples, retested with the original test, were determined to be negative or below detection limits. Then, the new PCR was applied to 422 nasopharyngeal swabs taken from children with or without pneumonia; a total of 48 (11.4%) samples were determined to be positive, and 45 of these were successfully sequenced. The majority of the sequences corresponded to Chlamydia-related bacteria and especially to members of the Parachlamydiaceae family.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Q fever is a worldwide zoonotic infectious disease due to Coxiella burnetii. The clinical presentation may be acute (pneumonia and/or hepatitis) or chronic (most commonly endocarditis). Diagnosis mainly relies on serology and PCR. We therefore developed a quantitative real-time PCR. We first tested blindly its performance on various clinical samples and then, when thoroughly validated, we applied it during a 7-year period for the diagnosis of both acute and persistent C. burnetii infection. Analytical sensitivity (< 10 copies/PCR) was excellent. When tested blindly on 183 samples, the specificity of the PCR was 100% (142/142) and the sensitivity was 71% (29/41). The sensitivity was 88% (7/8) on valvular samples, 69% (20/29) on blood samples and 50% (2/4) on urine samples. This new quantitative PCR was then successfully applied for the diagnosis of acute Q fever and endovascular infection due to C. burnetii, allowing the diagnosis of Q fever in six patients over a 7-year period. During a local small cluster of cases, the PCR was also applied to blood from 1355 blood donors; all were negative confirming the high specificity of this test. In conclusion, we developed a highly specific method with excellent sensitivity, which may be used on sera for the diagnosis of acute Q fever and on various samples such as sera, valvular samples, aortic specimens, bone and liver, for the diagnosis of persistent C. burnetii infection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study investigates the possibility to incorporate fracture intensity and block geometry as spatially continuous parameters in GIS-based systems. For this purpose, a deterministic method has been implemented to estimate block size (Bloc3D) and joint frequency (COLTOP). In addition to measuring the block size, the Bloc3D Method provides a 3D representation of the shape of individual blocks. These two methods were applied using field measurements (joint set orientation and spacing) performed over a large field area, in the Swiss Alps. This area is characterized by a complex geology, a number of different rock masses and varying degrees of metamorphism. The spatial variability of the parameters was evaluated with regard to lithology and major faults. A model incorporating these measurements and observations into a GIS system to assess the risk associated with rock falls is proposed. The analysis concludes with a discussion on the feasibility of such an application in regularly and irregularly jointed rock masses, with persistent and impersistent discontinuities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ethyl glucuronide (EtG) is a minor and direct metabolite of ethanol. EtG is incorporated into the growing hair allowing retrospective investigation of chronic alcohol abuse. In this study, we report the development and the validation of a method using gas chromatography-negative chemical ionization tandem mass spectrometry (GC-NCI-MS/MS) for the quantification of EtG in hair. EtG was extracted from about 30 mg of hair by aqueous incubation and purified by solid-phase extraction (SPE) using mixed mode extraction cartridges followed by derivation with perfluoropentanoic anhydride (PFPA). The analysis was performed in the selected reaction monitoring (SRM) mode using the transitions m/z 347-->163 (for the quantification) and m/z 347-->119 (for the identification) for EtG, and m/z 352-->163 for EtG-d(5) used as internal standard. For validation, we prepared quality controls (QC) using hair samples taken post mortem from 2 subjects with a known history of alcoholism. These samples were confirmed by a proficiency test with 7 participating laboratories. The assay linearity of EtG was confirmed over the range from 8.4 to 259.4 pg/mg hair, with a coefficient of determination (r(2)) above 0.999. The limit of detection (LOD) was estimated with 3.0 pg/mg. The lower limit of quantification (LLOQ) of the method was fixed at 8.4 pg/mg. Repeatability and intermediate precision (relative standard deviation, RSD%), tested at 4 QC levels, were less than 13.2%. The analytical method was applied to several hair samples obtained from autopsy cases with a history of alcoholism and/or lesions caused by alcohol. EtG concentrations in hair ranged from 60 to 820 pg/mg hair.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recombinant adeno-associated virus (rAAV) are effective gene delivery vehicles that can mediate long-lasting transgene expression. However, tight regulation and tissue-specific transgene expression is required for certain therapeutic applications. For regulatable expression from the liver we designed a hepatospecific bidirectional and autoregulatory tetracycline (Tet)-On system (Tet(bidir)Alb) flanked by AAV inverted terminal repeats (ITRs). We characterized the inducible hepatospecific system in comparison with an inducible ubiquitous expression system (Tet(bidir)CMV) using luciferase (luc). Although the ubiquitous system led to luc expression throughout the mouse, luc expression derived from the hepatospecific system was restricted to the liver. Interestingly, the induction rate of the Tet(bidir)Alb was significantly higher than that of Tet(bidir)CMV, whereas leakage of Tet(bidir)Alb was significantly lower. To evaluate the therapeutic potential of this vector, an AAV-Tet(bidir)-Alb-expressing interleukin-12 (IL-12) was tested in a murine model for hepatic colorectal metastasis. The vector induced dose-dependent levels of IL-12 and interferon-γ (IFN-γ), showing no significant toxicity. AAV-Tet(bidir)-Alb-IL-12 was highly efficient in preventing establishment of metastasis in the liver and induced an efficient T-cell memory response to tumor cells. Thus, we have demonstrated persistent, and inducible in vivo expression of a gene from a liver-specific Tet-On inducible construct delivered via an AAV vector and proved to be an efficient tool for treating liver cancer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding molecular recognition is one major requirement for drug discovery and design. Physicochemical and shape complementarity between two binding partners is the driving force during complex formation. In this study, the impact of shape within this process is analyzed. Protein binding pockets and co-crystallized ligands are represented by normalized principal moments of inertia ratios (NPRs). The corresponding descriptor space is triangular, with its corners occupied by spherical, discoid, and elongated shapes. An analysis of a selected set of sc-PDB complexes suggests that pockets and bound ligands avoid spherical shapes, which are, however, prevalent in small unoccupied pockets. Furthermore, a direct shape comparison confirms previous studies that on average only one third of a pocket is filled by its bound ligand, supplemented by a 50 % subpocket coverage. In this study, we found that shape complementary is expressed by low pairwise shape distances in NPR space, short distances between the centers-of-mass, and small deviations in the angle between the first principal ellipsoid axes. Furthermore, it is assessed how different binding pocket parameters are related to bioactivity and binding efficiency of the co-crystallized ligand. In addition, the performance of different shape and size parameters of pockets and ligands is evaluated in a virtual screening scenario performed on four representative targets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining can be defined as the extraction of previously unknown and potentially useful information from large datasets. The main principle is to devise computer programs that run through databases and automatically seek deterministic patterns. It is applied in different fields of application, e.g., remote sensing, biometry, speech recognition, but has seldom been applied to forensic case data. The intrinsic difficulty related to the use of such data lies in its heterogeneity, which comes from the many different sources of information. The aim of this study is to highlight potential uses of pattern recognition that would provide relevant results from a criminal intelligence point of view. The role of data mining within a global crime analysis methodology is to detect all types of structures in a dataset. Once filtered and interpreted, those structures can point to previously unseen criminal activities. The interpretation of patterns for intelligence purposes is the final stage of the process. It allows the researcher to validate the whole methodology and to refine each step if necessary. An application to cutting agents found in illicit drug seizures was performed. A combinatorial approach was done, using the presence and the absence of products. Methods coming from the graph theory field were used to extract patterns in data constituted by links between products and place and date of seizure. A data mining process completed using graphing techniques is called ``graph mining''. Patterns were detected that had to be interpreted and compared with preliminary knowledge to establish their relevancy. The illicit drug profiling process is actually an intelligence process that uses preliminary illicit drug classes to classify new samples. Methods proposed in this study could be used \textit{a priori} to compare structures from preliminary and post-detection patterns. This new knowledge of a repeated structure may provide valuable complementary information to profiling and become a source of intelligence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Puropse/Aim: To learn about the developement of post mortem CT angiography, its indications, benefits, pitfalls and practical application. Content Organization: A. Developement of post mortem CT angiography B. Technical prerequisites C. Practical application of post mortem CT angiography (preparation of the body, injection of contrast agent, examination protocol) D. Indications and benefits (including a comparison with conventional autopsy) E. Interpretation of imaging data (with case demonstrations) F. Artifacts, pitfalls and limitations G. Current and potential future use. Summary: This exhibit demonstrates the developement, application and interpretation of post mortem CT angiography. Teaching points: 1. post mortem CT angiography is feasible and useful for identification of the cause of death 2. depending on the indication it can be superior to autopsy 3. limitations and artifacts need to be known for interpreta

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La présente étude est à la fois une évaluation du processus de la mise en oeuvre et des impacts de la police de proximité dans les cinq plus grandes zones urbaines de Suisse - Bâle, Berne, Genève, Lausanne et Zurich. La police de proximité (community policing) est à la fois une philosophie et une stratégie organisationnelle qui favorise un partenariat renouvelé entre la police et les communautés locales dans le but de résoudre les problèmes relatifs à la sécurité et à l'ordre public. L'évaluation de processus a analysé des données relatives aux réformes internes de la police qui ont été obtenues par l'intermédiaire d'entretiens semi-structurés avec des administrateurs clés des cinq départements de police, ainsi que dans des documents écrits de la police et d'autres sources publiques. L'évaluation des impacts, quant à elle, s'est basée sur des variables contextuelles telles que des statistiques policières et des données de recensement, ainsi que sur des indicateurs d'impacts construit à partir des données du Swiss Crime Survey (SCS) relatives au sentiment d'insécurité, à la perception du désordre public et à la satisfaction de la population à l'égard de la police. Le SCS est un sondage régulier qui a permis d'interroger des habitants des cinq grandes zones urbaines à plusieurs reprises depuis le milieu des années 1980. L'évaluation de processus a abouti à un « Calendrier des activités » visant à créer des données de panel permettant de mesurer les progrès réalisés dans la mise en oeuvre de la police de proximité à l'aide d'une grille d'évaluation à six dimensions à des intervalles de cinq ans entre 1990 et 2010. L'évaluation des impacts, effectuée ex post facto, a utilisé un concept de recherche non-expérimental (observational design) dans le but d'analyser les impacts de différents modèles de police de proximité dans des zones comparables à travers les cinq villes étudiées. Les quartiers urbains, délimités par zone de code postal, ont ainsi été regroupés par l'intermédiaire d'une typologie réalisée à l'aide d'algorithmes d'apprentissage automatique (machine learning). Des algorithmes supervisés et non supervisés ont été utilisés sur les données à haute dimensionnalité relatives à la criminalité, à la structure socio-économique et démographique et au cadre bâti dans le but de regrouper les quartiers urbains les plus similaires dans des clusters. D'abord, les cartes auto-organisatrices (self-organizing maps) ont été utilisées dans le but de réduire la variance intra-cluster des variables contextuelles et de maximiser simultanément la variance inter-cluster des réponses au sondage. Ensuite, l'algorithme des forêts d'arbres décisionnels (random forests) a permis à la fois d'évaluer la pertinence de la typologie de quartier élaborée et de sélectionner les variables contextuelles clés afin de construire un modèle parcimonieux faisant un minimum d'erreurs de classification. Enfin, pour l'analyse des impacts, la méthode des appariements des coefficients de propension (propensity score matching) a été utilisée pour équilibrer les échantillons prétest-posttest en termes d'âge, de sexe et de niveau d'éducation des répondants au sein de chaque type de quartier ainsi identifié dans chacune des villes, avant d'effectuer un test statistique de la différence observée dans les indicateurs d'impacts. De plus, tous les résultats statistiquement significatifs ont été soumis à une analyse de sensibilité (sensitivity analysis) afin d'évaluer leur robustesse face à un biais potentiel dû à des covariables non observées. L'étude relève qu'au cours des quinze dernières années, les cinq services de police ont entamé des réformes majeures de leur organisation ainsi que de leurs stratégies opérationnelles et qu'ils ont noué des partenariats stratégiques afin de mettre en oeuvre la police de proximité. La typologie de quartier développée a abouti à une réduction de la variance intra-cluster des variables contextuelles et permet d'expliquer une partie significative de la variance inter-cluster des indicateurs d'impacts avant la mise en oeuvre du traitement. Ceci semble suggérer que les méthodes de géocomputation aident à équilibrer les covariables observées et donc à réduire les menaces relatives à la validité interne d'un concept de recherche non-expérimental. Enfin, l'analyse des impacts a révélé que le sentiment d'insécurité a diminué de manière significative pendant la période 2000-2005 dans les quartiers se trouvant à l'intérieur et autour des centres-villes de Berne et de Zurich. Ces améliorations sont assez robustes face à des biais dus à des covariables inobservées et covarient dans le temps et l'espace avec la mise en oeuvre de la police de proximité. L'hypothèse alternative envisageant que les diminutions observées dans le sentiment d'insécurité soient, partiellement, un résultat des interventions policières de proximité semble donc être aussi plausible que l'hypothèse nulle considérant l'absence absolue d'effet. Ceci, même si le concept de recherche non-expérimental mis en oeuvre ne peut pas complètement exclure la sélection et la régression à la moyenne comme explications alternatives. The current research project is both a process and impact evaluation of community policing in Switzerland's five major urban areas - Basel, Bern, Geneva, Lausanne, and Zurich. Community policing is both a philosophy and an organizational strategy that promotes a renewed partnership between the police and the community to solve problems of crime and disorder. The process evaluation data on police internal reforms were obtained through semi-structured interviews with key administrators from the five police departments as well as from police internal documents and additional public sources. The impact evaluation uses official crime records and census statistics as contextual variables as well as Swiss Crime Survey (SCS) data on fear of crime, perceptions of disorder, and public attitudes towards the police as outcome measures. The SCS is a standing survey instrument that has polled residents of the five urban areas repeatedly since the mid-1980s. The process evaluation produced a "Calendar of Action" to create panel data to measure community policing implementation progress over six evaluative dimensions in intervals of five years between 1990 and 2010. The impact evaluation, carried out ex post facto, uses an observational design that analyzes the impact of the different community policing models between matched comparison areas across the five cities. Using ZIP code districts as proxies for urban neighborhoods, geospatial data mining algorithms serve to develop a neighborhood typology in order to match the comparison areas. To this end, both unsupervised and supervised algorithms are used to analyze high-dimensional data on crime, the socio-economic and demographic structure, and the built environment in order to classify urban neighborhoods into clusters of similar type. In a first step, self-organizing maps serve as tools to develop a clustering algorithm that reduces the within-cluster variance in the contextual variables and simultaneously maximizes the between-cluster variance in survey responses. The random forests algorithm then serves to assess the appropriateness of the resulting neighborhood typology and to select the key contextual variables in order to build a parsimonious model that makes a minimum of classification errors. Finally, for the impact analysis, propensity score matching methods are used to match the survey respondents of the pretest and posttest samples on age, gender, and their level of education for each neighborhood type identified within each city, before conducting a statistical test of the observed difference in the outcome measures. Moreover, all significant results were subjected to a sensitivity analysis to assess the robustness of these findings in the face of potential bias due to some unobserved covariates. The study finds that over the last fifteen years, all five police departments have undertaken major reforms of their internal organization and operating strategies and forged strategic partnerships in order to implement community policing. The resulting neighborhood typology reduced the within-cluster variance of the contextual variables and accounted for a significant share of the between-cluster variance in the outcome measures prior to treatment, suggesting that geocomputational methods help to balance the observed covariates and hence to reduce threats to the internal validity of an observational design. Finally, the impact analysis revealed that fear of crime dropped significantly over the 2000-2005 period in the neighborhoods in and around the urban centers of Bern and Zurich. These improvements are fairly robust in the face of bias due to some unobserved covariate and covary temporally and spatially with the implementation of community policing. The alternative hypothesis that the observed reductions in fear of crime were at least in part a result of community policing interventions thus appears at least as plausible as the null hypothesis of absolutely no effect, even if the observational design cannot completely rule out selection and regression to the mean as alternative explanations.