138 resultados para rule mining, closed sequential patterns
em Université de Lausanne, Switzerland
Resumo:
Le "data mining", ou "fouille de données", est un ensemble de méthodes et de techniques attractif qui a connu une popularité fulgurante ces dernières années, spécialement dans le domaine du marketing. Le développement récent de l'analyse ou du renseignement criminel soulève des problèmatiques auxqwuelles il est tentant de d'appliquer ces méthodes et techniques. Le potentiel et la place du data mining dans le contexte de l'analyse criminelle doivent être mieux définis afin de piloter son application. Cette réflexion est menée dans le cadre du renseignement produit par des systèmes de détection et de suivi systématique de la criminalité répétitive, appelés processus de veille opérationnelle. Leur fonctionnement nécessite l'existence de patterns inscrits dans les données, et justifiés par les approches situationnelles en criminologie. Muni de ce bagage théorique, l'enjeu principal revient à explorer les possibilités de détecter ces patterns au travers des méthodes et techniques de data mining. Afin de répondre à cet objectif, une recherche est actuellement menée au Suisse à travers une approche interdisciplinaire combinant des connaissances forensiques, criminologiques et computationnelles.
Resumo:
Data mining can be defined as the extraction of previously unknown and potentially useful information from large datasets. The main principle is to devise computer programs that run through databases and automatically seek deterministic patterns. It is applied in different fields of application, e.g., remote sensing, biometry, speech recognition, but has seldom been applied to forensic case data. The intrinsic difficulty related to the use of such data lies in its heterogeneity, which comes from the many different sources of information. The aim of this study is to highlight potential uses of pattern recognition that would provide relevant results from a criminal intelligence point of view. The role of data mining within a global crime analysis methodology is to detect all types of structures in a dataset. Once filtered and interpreted, those structures can point to previously unseen criminal activities. The interpretation of patterns for intelligence purposes is the final stage of the process. It allows the researcher to validate the whole methodology and to refine each step if necessary. An application to cutting agents found in illicit drug seizures was performed. A combinatorial approach was done, using the presence and the absence of products. Methods coming from the graph theory field were used to extract patterns in data constituted by links between products and place and date of seizure. A data mining process completed using graphing techniques is called ``graph mining''. Patterns were detected that had to be interpreted and compared with preliminary knowledge to establish their relevancy. The illicit drug profiling process is actually an intelligence process that uses preliminary illicit drug classes to classify new samples. Methods proposed in this study could be used \textit{a priori} to compare structures from preliminary and post-detection patterns. This new knowledge of a repeated structure may provide valuable complementary information to profiling and become a source of intelligence.
Resumo:
Directed evolution of life through millions of years, such as increasing adult body size, is one of the most intriguing patterns displayed by fossil lineages. Processes and causes of such evolutionary trends are still poorly understood. Ammonoids (externally shelled marine cephalopods) are well known to have experienced repetitive morphological evolutionary trends of their adult size, shell geometry and ornamentation. This study analyses the evolutionary trends of the family Acrochordiceratidae Arthaber, 1911 from the Early to Middle Triassic (251228 Ma). Exceptionally large and bed-rock-controlled collections of this ammonoid family were obtained from strata of Anisian age (Middle Triassic) in north-west Nevada and north-east British Columbia. They enable quantitative and statistical analyses of its morphological evolutionary trends. This study demonstrates that the monophyletic clade Acrochordiceratidae underwent the classical evolute to involute evolutionary trend (i.e. increasing coiling of the shell), an increase in its shell adult size (conch diameter) and an increase in the indentation of its shell suture shape. These evolutionary trends are statistically robust and seem more or less gradual. Furthermore, they are nonrandom with the sustained shift in the mean, the minimum and the maximum of studied shell characters. These results can be classically interpreted as being constrained by the persistence and common selection pressure on this mostly anagenetic lineage characterized by relatively moderate evolutionary rates. Increasing involution of ammonites is traditionally interpreted by increasing adaptation mostly in terms of improved hydrodynamics. However, this trend in ammonoid geometry can also be explained as a case of Copes rule (increasing adult body size) instead of functional explanation of coiling, because both shell diameter and shell involution are two possible paths for ammonoids to accommodate size increase.
Resumo:
La présente étude est à la fois une évaluation du processus de la mise en oeuvre et des impacts de la police de proximité dans les cinq plus grandes zones urbaines de Suisse - Bâle, Berne, Genève, Lausanne et Zurich. La police de proximité (community policing) est à la fois une philosophie et une stratégie organisationnelle qui favorise un partenariat renouvelé entre la police et les communautés locales dans le but de résoudre les problèmes relatifs à la sécurité et à l'ordre public. L'évaluation de processus a analysé des données relatives aux réformes internes de la police qui ont été obtenues par l'intermédiaire d'entretiens semi-structurés avec des administrateurs clés des cinq départements de police, ainsi que dans des documents écrits de la police et d'autres sources publiques. L'évaluation des impacts, quant à elle, s'est basée sur des variables contextuelles telles que des statistiques policières et des données de recensement, ainsi que sur des indicateurs d'impacts construit à partir des données du Swiss Crime Survey (SCS) relatives au sentiment d'insécurité, à la perception du désordre public et à la satisfaction de la population à l'égard de la police. Le SCS est un sondage régulier qui a permis d'interroger des habitants des cinq grandes zones urbaines à plusieurs reprises depuis le milieu des années 1980. L'évaluation de processus a abouti à un « Calendrier des activités » visant à créer des données de panel permettant de mesurer les progrès réalisés dans la mise en oeuvre de la police de proximité à l'aide d'une grille d'évaluation à six dimensions à des intervalles de cinq ans entre 1990 et 2010. L'évaluation des impacts, effectuée ex post facto, a utilisé un concept de recherche non-expérimental (observational design) dans le but d'analyser les impacts de différents modèles de police de proximité dans des zones comparables à travers les cinq villes étudiées. Les quartiers urbains, délimités par zone de code postal, ont ainsi été regroupés par l'intermédiaire d'une typologie réalisée à l'aide d'algorithmes d'apprentissage automatique (machine learning). Des algorithmes supervisés et non supervisés ont été utilisés sur les données à haute dimensionnalité relatives à la criminalité, à la structure socio-économique et démographique et au cadre bâti dans le but de regrouper les quartiers urbains les plus similaires dans des clusters. D'abord, les cartes auto-organisatrices (self-organizing maps) ont été utilisées dans le but de réduire la variance intra-cluster des variables contextuelles et de maximiser simultanément la variance inter-cluster des réponses au sondage. Ensuite, l'algorithme des forêts d'arbres décisionnels (random forests) a permis à la fois d'évaluer la pertinence de la typologie de quartier élaborée et de sélectionner les variables contextuelles clés afin de construire un modèle parcimonieux faisant un minimum d'erreurs de classification. Enfin, pour l'analyse des impacts, la méthode des appariements des coefficients de propension (propensity score matching) a été utilisée pour équilibrer les échantillons prétest-posttest en termes d'âge, de sexe et de niveau d'éducation des répondants au sein de chaque type de quartier ainsi identifié dans chacune des villes, avant d'effectuer un test statistique de la différence observée dans les indicateurs d'impacts. De plus, tous les résultats statistiquement significatifs ont été soumis à une analyse de sensibilité (sensitivity analysis) afin d'évaluer leur robustesse face à un biais potentiel dû à des covariables non observées. L'étude relève qu'au cours des quinze dernières années, les cinq services de police ont entamé des réformes majeures de leur organisation ainsi que de leurs stratégies opérationnelles et qu'ils ont noué des partenariats stratégiques afin de mettre en oeuvre la police de proximité. La typologie de quartier développée a abouti à une réduction de la variance intra-cluster des variables contextuelles et permet d'expliquer une partie significative de la variance inter-cluster des indicateurs d'impacts avant la mise en oeuvre du traitement. Ceci semble suggérer que les méthodes de géocomputation aident à équilibrer les covariables observées et donc à réduire les menaces relatives à la validité interne d'un concept de recherche non-expérimental. Enfin, l'analyse des impacts a révélé que le sentiment d'insécurité a diminué de manière significative pendant la période 2000-2005 dans les quartiers se trouvant à l'intérieur et autour des centres-villes de Berne et de Zurich. Ces améliorations sont assez robustes face à des biais dus à des covariables inobservées et covarient dans le temps et l'espace avec la mise en oeuvre de la police de proximité. L'hypothèse alternative envisageant que les diminutions observées dans le sentiment d'insécurité soient, partiellement, un résultat des interventions policières de proximité semble donc être aussi plausible que l'hypothèse nulle considérant l'absence absolue d'effet. Ceci, même si le concept de recherche non-expérimental mis en oeuvre ne peut pas complètement exclure la sélection et la régression à la moyenne comme explications alternatives. The current research project is both a process and impact evaluation of community policing in Switzerland's five major urban areas - Basel, Bern, Geneva, Lausanne, and Zurich. Community policing is both a philosophy and an organizational strategy that promotes a renewed partnership between the police and the community to solve problems of crime and disorder. The process evaluation data on police internal reforms were obtained through semi-structured interviews with key administrators from the five police departments as well as from police internal documents and additional public sources. The impact evaluation uses official crime records and census statistics as contextual variables as well as Swiss Crime Survey (SCS) data on fear of crime, perceptions of disorder, and public attitudes towards the police as outcome measures. The SCS is a standing survey instrument that has polled residents of the five urban areas repeatedly since the mid-1980s. The process evaluation produced a "Calendar of Action" to create panel data to measure community policing implementation progress over six evaluative dimensions in intervals of five years between 1990 and 2010. The impact evaluation, carried out ex post facto, uses an observational design that analyzes the impact of the different community policing models between matched comparison areas across the five cities. Using ZIP code districts as proxies for urban neighborhoods, geospatial data mining algorithms serve to develop a neighborhood typology in order to match the comparison areas. To this end, both unsupervised and supervised algorithms are used to analyze high-dimensional data on crime, the socio-economic and demographic structure, and the built environment in order to classify urban neighborhoods into clusters of similar type. In a first step, self-organizing maps serve as tools to develop a clustering algorithm that reduces the within-cluster variance in the contextual variables and simultaneously maximizes the between-cluster variance in survey responses. The random forests algorithm then serves to assess the appropriateness of the resulting neighborhood typology and to select the key contextual variables in order to build a parsimonious model that makes a minimum of classification errors. Finally, for the impact analysis, propensity score matching methods are used to match the survey respondents of the pretest and posttest samples on age, gender, and their level of education for each neighborhood type identified within each city, before conducting a statistical test of the observed difference in the outcome measures. Moreover, all significant results were subjected to a sensitivity analysis to assess the robustness of these findings in the face of potential bias due to some unobserved covariates. The study finds that over the last fifteen years, all five police departments have undertaken major reforms of their internal organization and operating strategies and forged strategic partnerships in order to implement community policing. The resulting neighborhood typology reduced the within-cluster variance of the contextual variables and accounted for a significant share of the between-cluster variance in the outcome measures prior to treatment, suggesting that geocomputational methods help to balance the observed covariates and hence to reduce threats to the internal validity of an observational design. Finally, the impact analysis revealed that fear of crime dropped significantly over the 2000-2005 period in the neighborhoods in and around the urban centers of Bern and Zurich. These improvements are fairly robust in the face of bias due to some unobserved covariate and covary temporally and spatially with the implementation of community policing. The alternative hypothesis that the observed reductions in fear of crime were at least in part a result of community policing interventions thus appears at least as plausible as the null hypothesis of absolutely no effect, even if the observational design cannot completely rule out selection and regression to the mean as alternative explanations.
Resumo:
(from the journal abstract) Scientific interest for the concept of alliance has been maintained and stimulated by repeated findings that a strong alliance is associated with facilitative treatment process and favourable treatment outcome. However, because the alliance is not in itself a therapeutic technique, these findings were unsuccessful in bringing about significant improvements in clinical practice. An essential issue in modern psychotherapeutic research concerns the relation between common factors which are known to explain great variance in empirical results and the specific therapeutic techniques which are the primary basis of clinical training and practice. This pilot study explored sequences in therapist interventions over four sessions of brief psychodynamic investigation. It aims at determining if patterns of interventions can be found during brief psychodynamic investigation and if these patterns can be associated with differences in the therapeutic alliance. Therapist interventions where coded using the Psychodynamic Intervention Rating Scale (PIRS) which enables the classification of each therapist utterance into one of 9 categories of interpretive interventions (defence interpretation, transference interpretation), supportive interventions (question, clarification, association, reflection, supportive strategy) or interventions about the therapeutic frame (work-enhancing statement, contractual arrangement). Data analysis was done using lag sequential analysis, a statistical procedure which identifies contingent relationships in time among a large number of behaviours. The sample includes N = 20 therapist-patient dyads assigned to three groups with: (1) a high and stable alliance profile, (2) a low and stable alliance profile and (3) an improving alliance profile. Results suggest that therapists most often have one single intention when interacting with patients. Large sequences of questions, associations and clarifications were found, which indicate that if a therapist asks a question, clarifies or associates, there is a significant probability that he will continue doing so. A single theme sequence involving frame interventions was also observed. These sequences were found in all three alliance groups. One exception was found for mixed sequences of interpretations and supportive interventions. The simultaneous use of these two interventions was associated with a high or an improving alliance over the course of treatment, but not with a low and stable alliance where only single theme sequences of interpretations were found. In other words, in this last group, therapists were either supportive or interpretative, whereas with high or improving alliance, interpretations were always given along with supportive interventions. This finding provides evidence that examining therapist interpretation individually can only yield incomplete findings. How interpretations were given is important for alliance building. It also suggests that therapists should carefully dose their interpretations and be supportive when necessary in order to build a strong therapeutic alliance. And from a research point of view, to study technical interventions, we must look into dynamic variables such as dosage, the supportive quality of an intervention, and timing. (PsycINFO Database Record (c) 2005 APA, all rights reserved)
Resumo:
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
Resumo:
New plate-tectonic reconstructions of the Gondwana margin suggest that the location of Gondwana-derived terranes should not only be guided by the models, but should also consider the possible detrital input from some Asian blocks (Hunia), supposed to have been located along the Cambrian Gondwana margin, and accreted in the Silurian to the North-Chinese block. Consequently, the Gondwana margin has to be subdivided into a more western domain, where the future Avalonian blocks will be separated from Gondwana by the opening Rheic Ocean, whereas in its eastern continuation, hosting the future basement areas of Central Europe, different periods of crustal extension should be distinguished. Instead of applying a rather cylindrical model, it is supposed that crustal extension follows a much more complex pattern, where local back-arcs or intra-continental rifts are involved. Guided by the age data of magmatic rocks and the pattern of subsidence curves, the following extensional events can be distinguished: During the early to middle Cambrian, a back-arc setting guided the evolution at the Gondwana margin. Contemporaneous intra-continental rift basins developed at other places related to a general post-PanAfrican extensional phase affecting Africa Upper Cambrian formation of oceanic crust is manifested in the Chamrousse area, and may have lateral cryptic relics preserved in other places. This is regarded as the oceanisation of some marginal basins in a context of back-arc rifting. These basins were closed in a mid-Ordovician tectonic phase, related to the subduction of buoyant material (mid-ocean ridge?) Since the Early Ordovician, a new phase of extension is observed, accompanied by a large-scale volcanic activity, erosion of the rift shoulders generated detritus (Armorican Quartzite) and the rift basins collected detrital zircons from a wide hinterland. This phase heralded the opening of Palaeotethys, but it failed due to the Silurian collision (Eo-Variscan phase) of an intra-oceanic arc with the Gondwana margin. During this time period, at the eastern wing of the Gondwana margin begins the drift of the future Hunia microcontinents, through the opening of an eastern prolongation of the already existing Rheic Ocean. The passive margin of the remaining Gondwana was composed of the Galatian superterranes, constituents of the future Variscan basement areas. Remaining under the influence of crustal extension, they will start their drift to Laurussia since the earliest Devonian during the opening of the Palaeotethys Ocean. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The paper presents some contemporary approaches to spatial environmental data analysis. The main topics are concentrated on the decision-oriented problems of environmental spatial data mining and modeling: valorization and representativity of data with the help of exploratory data analysis, spatial predictions, probabilistic and risk mapping, development and application of conditional stochastic simulation models. The innovative part of the paper presents integrated/hybrid model-machine learning (ML) residuals sequential simulations-MLRSS. The models are based on multilayer perceptron and support vector regression ML algorithms used for modeling long-range spatial trends and sequential simulations of the residuals. NIL algorithms deliver non-linear solution for the spatial non-stationary problems, which are difficult for geostatistical approach. Geostatistical tools (variography) are used to characterize performance of ML algorithms, by analyzing quality and quantity of the spatially structured information extracted from data with ML algorithms. Sequential simulations provide efficient assessment of uncertainty and spatial variability. Case study from the Chernobyl fallouts illustrates the performance of the proposed model. It is shown that probability mapping, provided by the combination of ML data driven and geostatistical model based approaches, can be efficiently used in decision-making process. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
PURPOSE: To evaluate a diagnostic strategy for pulmonary embolism that combined clinical assessment, plasma D-dimer measurement, lower limb venous ultrasonography, and helical computed tomography (CT). METHODS: A cohort of 965 consecutive patients presenting to the emergency departments of three general and teaching hospitals with clinically suspected pulmonary embolism underwent sequential noninvasive testing. Clinical probability was assessed by a prediction rule combined with implicit judgment. All patients were followed for 3 months. RESULTS: A normal D-dimer level (<500 microg/L by a rapid enzyme-linked immunosorbent assay) ruled out venous thromboembolism in 280 patients (29%), and finding a deep vein thrombosis by ultrasonography established the diagnosis in 92 patients (9.5%). Helical CT was required in only 593 patients (61%) and showed pulmonary embolism in 124 patients (12.8%). Pulmonary embolism was considered ruled out in the 450 patients (46.6%) with a negative ultrasound and CT scan and a low-to-intermediate clinical probability. The 8 patients with a negative ultrasound and CT scan despite a high clinical probability proceeded to pulmonary angiography (positive: 2; negative: 6). Helical CT was inconclusive in 11 patients (pulmonary embolism: 4; no pulmonary embolism: 7). The overall prevalence of pulmonary embolism was 23%. Patients classified as not having pulmonary embolism were not anticoagulated during follow-up and had a 3-month thromboembolic risk of 1.0% (95% confidence interval: 0.5% to 2.1%). CONCLUSION: A noninvasive diagnostic strategy combining clinical assessment, D-dimer measurement, ultrasonography, and helical CT yielded a diagnosis in 99% of outpatients suspected of pulmonary embolism, and appeared to be safe, provided that CT was combined with ultrasonography to rule out the disease.
Resumo:
Oxygen uptake was studied during the establishment of cephalocaudal polarity in the very early chick embryo, i.e., 10 hr before (stage VI) and at laying (stage X). Oxygen fluxes in minute regions of the intact blastoderms were measured in vitro by scanning microspectrophotometry in the presence or absence of glucose. The oxygen consumption of the whole blastoderm remained constant (6 nmol O2 X hr-1) throughout the period studied, although the number of cells increased more than twofold. The regional oxygen fluxes varied from 0.41 to 1.13 nmol O2 X hr-1 X mm-2 at stage VI and from 0.42 to 0.70 nmol O2 X hr-1 X mm-2 at stage X. At stage VI, the oxygen flux in the center of the blastoderm was significantly higher than that in its periphery. This pattern remained evident when the values were corrected for cell number or for cytoplasmic volume. At stage X, there was a tendency for the oxygen fluxes to decrease from the posterior to the anterior regions of the area pellucida. Thus the pattern of oxidative metabolism in the late uterine embryos seems to change from radial to bilateral. This change of symmetry probably reflects the process of formation of the embryonic axis. In addition, the fact that the oxygen uptake was similar in the presence or absence of glucose suggests that early chick embryos metabolize essentially intracellular stores.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
Knowledge of the spatial distribution of hydraulic conductivity (K) within an aquifer is critical for reliable predictions of solute transport and the development of effective groundwater management and/or remediation strategies. While core analyses and hydraulic logging can provide highly detailed information, such information is inherently localized around boreholes that tend to be sparsely distributed throughout the aquifer volume. Conversely, larger-scale hydraulic experiments like pumping and tracer tests provide relatively low-resolution estimates of K in the investigated subsurface region. As a result, traditional hydrogeological measurement techniques contain a gap in terms of spatial resolution and coverage, and they are often alone inadequate for characterizing heterogeneous aquifers. Geophysical methods have the potential to bridge this gap. The recent increased interest in the application of geophysical methods to hydrogeological problems is clearly evidenced by the formation and rapid growth of the domain of hydrogeophysics over the past decade (e.g., Rubin and Hubbard, 2005).