853 resultados para Unsupervised clustering
Resumo:
La présente étude est à la fois une évaluation du processus de la mise en oeuvre et des impacts de la police de proximité dans les cinq plus grandes zones urbaines de Suisse - Bâle, Berne, Genève, Lausanne et Zurich. La police de proximité (community policing) est à la fois une philosophie et une stratégie organisationnelle qui favorise un partenariat renouvelé entre la police et les communautés locales dans le but de résoudre les problèmes relatifs à la sécurité et à l'ordre public. L'évaluation de processus a analysé des données relatives aux réformes internes de la police qui ont été obtenues par l'intermédiaire d'entretiens semi-structurés avec des administrateurs clés des cinq départements de police, ainsi que dans des documents écrits de la police et d'autres sources publiques. L'évaluation des impacts, quant à elle, s'est basée sur des variables contextuelles telles que des statistiques policières et des données de recensement, ainsi que sur des indicateurs d'impacts construit à partir des données du Swiss Crime Survey (SCS) relatives au sentiment d'insécurité, à la perception du désordre public et à la satisfaction de la population à l'égard de la police. Le SCS est un sondage régulier qui a permis d'interroger des habitants des cinq grandes zones urbaines à plusieurs reprises depuis le milieu des années 1980. L'évaluation de processus a abouti à un « Calendrier des activités » visant à créer des données de panel permettant de mesurer les progrès réalisés dans la mise en oeuvre de la police de proximité à l'aide d'une grille d'évaluation à six dimensions à des intervalles de cinq ans entre 1990 et 2010. L'évaluation des impacts, effectuée ex post facto, a utilisé un concept de recherche non-expérimental (observational design) dans le but d'analyser les impacts de différents modèles de police de proximité dans des zones comparables à travers les cinq villes étudiées. Les quartiers urbains, délimités par zone de code postal, ont ainsi été regroupés par l'intermédiaire d'une typologie réalisée à l'aide d'algorithmes d'apprentissage automatique (machine learning). Des algorithmes supervisés et non supervisés ont été utilisés sur les données à haute dimensionnalité relatives à la criminalité, à la structure socio-économique et démographique et au cadre bâti dans le but de regrouper les quartiers urbains les plus similaires dans des clusters. D'abord, les cartes auto-organisatrices (self-organizing maps) ont été utilisées dans le but de réduire la variance intra-cluster des variables contextuelles et de maximiser simultanément la variance inter-cluster des réponses au sondage. Ensuite, l'algorithme des forêts d'arbres décisionnels (random forests) a permis à la fois d'évaluer la pertinence de la typologie de quartier élaborée et de sélectionner les variables contextuelles clés afin de construire un modèle parcimonieux faisant un minimum d'erreurs de classification. Enfin, pour l'analyse des impacts, la méthode des appariements des coefficients de propension (propensity score matching) a été utilisée pour équilibrer les échantillons prétest-posttest en termes d'âge, de sexe et de niveau d'éducation des répondants au sein de chaque type de quartier ainsi identifié dans chacune des villes, avant d'effectuer un test statistique de la différence observée dans les indicateurs d'impacts. De plus, tous les résultats statistiquement significatifs ont été soumis à une analyse de sensibilité (sensitivity analysis) afin d'évaluer leur robustesse face à un biais potentiel dû à des covariables non observées. L'étude relève qu'au cours des quinze dernières années, les cinq services de police ont entamé des réformes majeures de leur organisation ainsi que de leurs stratégies opérationnelles et qu'ils ont noué des partenariats stratégiques afin de mettre en oeuvre la police de proximité. La typologie de quartier développée a abouti à une réduction de la variance intra-cluster des variables contextuelles et permet d'expliquer une partie significative de la variance inter-cluster des indicateurs d'impacts avant la mise en oeuvre du traitement. Ceci semble suggérer que les méthodes de géocomputation aident à équilibrer les covariables observées et donc à réduire les menaces relatives à la validité interne d'un concept de recherche non-expérimental. Enfin, l'analyse des impacts a révélé que le sentiment d'insécurité a diminué de manière significative pendant la période 2000-2005 dans les quartiers se trouvant à l'intérieur et autour des centres-villes de Berne et de Zurich. Ces améliorations sont assez robustes face à des biais dus à des covariables inobservées et covarient dans le temps et l'espace avec la mise en oeuvre de la police de proximité. L'hypothèse alternative envisageant que les diminutions observées dans le sentiment d'insécurité soient, partiellement, un résultat des interventions policières de proximité semble donc être aussi plausible que l'hypothèse nulle considérant l'absence absolue d'effet. Ceci, même si le concept de recherche non-expérimental mis en oeuvre ne peut pas complètement exclure la sélection et la régression à la moyenne comme explications alternatives. The current research project is both a process and impact evaluation of community policing in Switzerland's five major urban areas - Basel, Bern, Geneva, Lausanne, and Zurich. Community policing is both a philosophy and an organizational strategy that promotes a renewed partnership between the police and the community to solve problems of crime and disorder. The process evaluation data on police internal reforms were obtained through semi-structured interviews with key administrators from the five police departments as well as from police internal documents and additional public sources. The impact evaluation uses official crime records and census statistics as contextual variables as well as Swiss Crime Survey (SCS) data on fear of crime, perceptions of disorder, and public attitudes towards the police as outcome measures. The SCS is a standing survey instrument that has polled residents of the five urban areas repeatedly since the mid-1980s. The process evaluation produced a "Calendar of Action" to create panel data to measure community policing implementation progress over six evaluative dimensions in intervals of five years between 1990 and 2010. The impact evaluation, carried out ex post facto, uses an observational design that analyzes the impact of the different community policing models between matched comparison areas across the five cities. Using ZIP code districts as proxies for urban neighborhoods, geospatial data mining algorithms serve to develop a neighborhood typology in order to match the comparison areas. To this end, both unsupervised and supervised algorithms are used to analyze high-dimensional data on crime, the socio-economic and demographic structure, and the built environment in order to classify urban neighborhoods into clusters of similar type. In a first step, self-organizing maps serve as tools to develop a clustering algorithm that reduces the within-cluster variance in the contextual variables and simultaneously maximizes the between-cluster variance in survey responses. The random forests algorithm then serves to assess the appropriateness of the resulting neighborhood typology and to select the key contextual variables in order to build a parsimonious model that makes a minimum of classification errors. Finally, for the impact analysis, propensity score matching methods are used to match the survey respondents of the pretest and posttest samples on age, gender, and their level of education for each neighborhood type identified within each city, before conducting a statistical test of the observed difference in the outcome measures. Moreover, all significant results were subjected to a sensitivity analysis to assess the robustness of these findings in the face of potential bias due to some unobserved covariates. The study finds that over the last fifteen years, all five police departments have undertaken major reforms of their internal organization and operating strategies and forged strategic partnerships in order to implement community policing. The resulting neighborhood typology reduced the within-cluster variance of the contextual variables and accounted for a significant share of the between-cluster variance in the outcome measures prior to treatment, suggesting that geocomputational methods help to balance the observed covariates and hence to reduce threats to the internal validity of an observational design. Finally, the impact analysis revealed that fear of crime dropped significantly over the 2000-2005 period in the neighborhoods in and around the urban centers of Bern and Zurich. These improvements are fairly robust in the face of bias due to some unobserved covariate and covary temporally and spatially with the implementation of community policing. The alternative hypothesis that the observed reductions in fear of crime were at least in part a result of community policing interventions thus appears at least as plausible as the null hypothesis of absolutely no effect, even if the observational design cannot completely rule out selection and regression to the mean as alternative explanations.
Resumo:
A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/.
Resumo:
We develop a full theoretical approach to clustering in complex networks. A key concept is introduced, the edge multiplicity, that measures the number of triangles passing through an edge. This quantity extends the clustering coefficient in that it involves the properties of two¿and not just one¿vertices. The formalism is completed with the definition of a three-vertex correlation function, which is the fundamental quantity describing the properties of clustered networks. The formalism suggests different metrics that are able to thoroughly characterize transitive relations. A rigorous analysis of several real networks, which makes use of this formalism and the metrics, is also provided. It is also found that clustered networks can be classified into two main groups: the weak and the strong transitivity classes. In the first class, edge multiplicity is small, with triangles being disjoint. In the second class, edge multiplicity is high and so triangles share many edges. As we shall see in the following paper, the class a network belongs to has strong implications in its percolation properties.
Resumo:
The percolation properties of clustered networks are analyzed in detail. In the case of weak clustering, we present an analytical approach that allows us to find the critical threshold and the size of the giant component. Numerical simulations confirm the accuracy of our results. In more general terms, we show that weak clustering hinders the onset of the giant component whereas strong clustering favors its appearance. This is a direct consequence of the differences in the k-core structure of the networks, which are found to be totally different depending on the level of clustering. An empirical analysis of a real social network confirms our predictions.
Resumo:
We present a generator of random networks where both the degree-dependent clustering coefficient and the degree distribution are tunable. Following the same philosophy as in the configuration model, the degree distribution and the clustering coefficient for each class of nodes of degree k are fixed ad hoc and a priori. The algorithm generates corresponding topologies by applying first a closure of triangles and second the classical closure of remaining free stubs. The procedure unveils an universal relation among clustering and degree-degree correlations for all networks, where the level of assortativity establishes an upper limit to the level of clustering. Maximum assortativity ensures no restriction on the decay of the clustering coefficient whereas disassortativity sets a stronger constraint on its behavior. Correlation measures in real networks are seen to observe this structural bound.
Resumo:
Background: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure.Results: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae.Conclusion: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.
Resumo:
The present research deals with the review of the analysis and modeling of Swiss franc interest rate curves (IRC) by using unsupervised (SOM, Gaussian Mixtures) and supervised machine (MLP) learning algorithms. IRC are considered as objects embedded into different feature spaces: maturities; maturity-date, parameters of Nelson-Siegel model (NSM). Analysis of NSM parameters and their temporal and clustering structures helps to understand the relevance of model and its potential use for the forecasting. Mapping of IRC in a maturity-date feature space is presented and analyzed for the visualization and forecasting purposes.
Resumo:
Thy-1 is a membrane glycoprotein suggested to stabilize or inhibit growth of neuronal processes. However, its precise function has remained obscure, because its endogenous ligand is unknown. We previously showed that Thy-1 binds directly to α(V)β(3) integrin in trans eliciting responses in astrocytes. Nonetheless, whether α(V)β(3) integrin might also serve as a Thy-1-ligand triggering a neuronal response has not been explored. Thus, utilizing primary neurons and a neuron-derived cell line CAD, Thy-1-mediated effects of α(V)β(3) integrin on growth and retraction of neuronal processes were tested. In astrocyte-neuron co-cultures, endogenous α(V)β(3) integrin restricted neurite outgrowth. Likewise, α(V)β(3)-Fc was sufficient to suppress neurite extension in Thy-1(+), but not in Thy-1(-) CAD cells. In differentiating primary neurons exposed to α(V)β(3)-Fc, fewer and shorter dendrites were detected. This effect was abolished by cleavage of Thy-1 from the neuronal surface using phosphoinositide-specific phospholipase C (PI-PLC). Moreover, α(V)β(3)-Fc also induced retraction of already extended Thy-1(+)-axon-like neurites in differentiated CAD cells as well as of axonal terminals in differentiated primary neurons. Axonal retraction occurred when redistribution and clustering of Thy-1 molecules in the plasma membrane was induced by α(V)β(3) integrin. Binding of α(V)β(3)-Fc was detected in Thy-1 clusters during axon retraction of primary neurons. Moreover, α(V)β(3)-Fc-induced Thy-1 clustering correlated in time and space with redistribution and inactivation of Src kinase. Thus, our data indicates that α(V)β(3) integrin is a ligand for Thy-1 that upon binding not only restricts the growth of neurites, but also induces retraction of already existing processes by inducing Thy-1 clustering. We propose that these events participate in bi-directional astrocyte-neuron communication relevant to axonal repair after neuronal damage.
Resumo:
We present a new framework for large-scale data clustering. The main idea is to modify functional dimensionality reduction techniques to directly optimize over discrete labels using stochastic gradient descent. Compared to methods like spectral clustering our approach solves a single optimization problem, rather than an ad-hoc two-stage optimization approach, does not require a matrix inversion, can easily encode prior knowledge in the set of implementable functions, and does not have an ?out-of-sample? problem. Experimental results on both artificial and real-world datasets show the usefulness of our approach.
Resumo:
T cell receptor (TCR-CD3) triggering involves both receptor clustering and conformational changes at the cytoplasmic tails of the CD3 subunits. The mechanism by which TCRalphabeta ligand binding confers conformational changes to CD3 is unknown. By using well-defined ligands, we showed that induction of the conformational change requires both multivalent engagement and the mobility restriction of the TCR-CD3 imposed by the plasma membrane. The conformational change is elicited by cooperative rearrangements of two TCR-CD3 complexes and does not require accompanying changes in the structure of the TCRalphabeta ectodomains. This conformational change at CD3 reverts upon ligand dissociation and is required for T cell activation. Thus, our permissive geometry model provides a molecular mechanism that rationalizes how the information of ligand binding to TCRalphabeta is transmitted to the CD3 subunits and to the intracellular signaling machinery.
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
BACKGROUND: HCV coinfection remains a major cause of morbidity and mortality among HIV-infected individuals and its incidence has increased dramatically in HIV-infected men who have sex with men(MSM). METHODS: Hepatitis C virus (HCV) coinfection in the Swiss HIV Cohort Study(SHCS) was studied by combining clinical data with HIV-1 pol-sequences from the SHCS Drug Resistance Database(DRDB). We inferred maximum-likelihood phylogenetic trees, determined Swiss HIV-transmission pairs as monophyletic patient pairs, and then considered the distribution of HCV on those pairs. RESULTS: Among the 9748 patients in the SHCS-DRDB with known HCV status, 2768(28%) were HCV-positive. Focusing on subtype B(7644 patients), we identified 1555 potential HIV-1 transmission pairs. There, we found that, even after controlling for transmission group, calendar year, age and sex, the odds for an HCV coinfection were increased by an odds ratio (OR) of 3.2 [95% confidence interval (CI) 2.2, 4.7) if a patient clustered with another HCV-positive case. This strong association persisted if transmission groups of intravenous drug users (IDUs), MSMs and heterosexuals (HETs) were considered separately(in all cases OR>2). Finally we found that HCV incidence was increased by a hazard ratio of 2.1 (1.1, 3.8) for individuals paired with an HCV-positive partner. CONCLUSIONS: Patients whose HIV virus is closely related to the HIV virus of HIV/HCV-coinfected patients have a higher risk for carrying or acquiring HCV themselves. This indicates the occurrence of domestic and sexual HCV transmission and allows the identification of patients with a high HCV-infection risk.