175 resultados para agglomerative clustering
Resumo:
It is estimated that around 230 people die each year due to radon (222Rn) exposure in Switzerland. 222Rn occurs mainly in closed environments like buildings and originates primarily from the subjacent ground. Therefore it depends strongly on geology and shows substantial regional variations. Correct identification of these regional variations would lead to substantial reduction of 222Rn exposure of the population based on appropriate construction of new and mitigation of already existing buildings. Prediction of indoor 222Rn concentrations (IRC) and identification of 222Rn prone areas is however difficult since IRC depend on a variety of different variables like building characteristics, meteorology, geology and anthropogenic factors. The present work aims at the development of predictive models and the understanding of IRC in Switzerland, taking into account a maximum of information in order to minimize the prediction uncertainty. The predictive maps will be used as a decision-support tool for 222Rn risk management. The construction of these models is based on different data-driven statistical methods, in combination with geographical information systems (GIS). In a first phase we performed univariate analysis of IRC for different variables, namely the detector type, building category, foundation, year of construction, the average outdoor temperature during measurement, altitude and lithology. All variables showed significant associations to IRC. Buildings constructed after 1900 showed significantly lower IRC compared to earlier constructions. We observed a further drop of IRC after 1970. In addition to that, we found an association of IRC with altitude. With regard to lithology, we observed the lowest IRC in sedimentary rocks (excluding carbonates) and sediments and the highest IRC in the Jura carbonates and igneous rock. The IRC data was systematically analyzed for potential bias due to spatially unbalanced sampling of measurements. In order to facilitate the modeling and the interpretation of the influence of geology on IRC, we developed an algorithm based on k-medoids clustering which permits to define coherent geological classes in terms of IRC. We performed a soil gas 222Rn concentration (SRC) measurement campaign in order to determine the predictive power of SRC with respect to IRC. We found that the use of SRC is limited for IRC prediction. The second part of the project was dedicated to predictive mapping of IRC using models which take into account the multidimensionality of the process of 222Rn entry into buildings. We used kernel regression and ensemble regression tree for this purpose. We could explain up to 33% of the variance of the log transformed IRC all over Switzerland. This is a good performance compared to former attempts of IRC modeling in Switzerland. As predictor variables we considered geographical coordinates, altitude, outdoor temperature, building type, foundation, year of construction and detector type. Ensemble regression trees like random forests allow to determine the role of each IRC predictor in a multidimensional setting. We found spatial information like geology, altitude and coordinates to have stronger influences on IRC than building related variables like foundation type, building type and year of construction. Based on kernel estimation we developed an approach to determine the local probability of IRC to exceed 300 Bq/m3. In addition to that we developed a confidence index in order to provide an estimate of uncertainty of the map. All methods allow an easy creation of tailor-made maps for different building characteristics. Our work is an essential step towards a 222Rn risk assessment which accounts at the same time for different architectural situations as well as geological and geographical conditions. For the communication of 222Rn hazard to the population we recommend to make use of the probability map based on kernel estimation. The communication of 222Rn hazard could for example be implemented via a web interface where the users specify the characteristics and coordinates of their home in order to obtain the probability to be above a given IRC with a corresponding index of confidence. Taking into account the health effects of 222Rn, our results have the potential to substantially improve the estimation of the effective dose from 222Rn delivered to the Swiss population.
Resumo:
The present research studies the spatial patterns of the distribution of the Swiss population (DSP). This description is carried out using a wide variety of global spatial structural analysis tools such as topological, statistical and fractal measures, which enable the estimation of the spatial degree of clustering of a point pattern. A particular attention is given to the analysis of the multifractality to characterize the spatial structure of the DSP at different scales. This will be achieved by measuring the generalized q-dimensions and the singularity spectrum. This research is based on high quality data of the Swiss Population Census of the Year 2000 at a hectometric resolution (grid 100 x 100 m) issued by the Swiss Federal Statistical Office (FSO).
Resumo:
A headspace solid-phase microextraction procedure (HS-SPME) was developed for the profiling of traces present in 3,4-methylenedioxymethylampethamine (MDMA). Traces were first extracted using HS-SPME and then analyzed by gas chromatography-mass spectroscopy (GC-MS). The HS-SPME conditions were optimized using varying conditions. Optimal results were obtained when 40 mg of crushed MDMA sample was heated at 80 °C for 15 min, followed by extraction at 80 °C for 15 min with a polydimethylsiloxane/divinylbenzene coated fibre. A total of 31 compounds were identified as traces related to MDMA synthesis, namely precursors, intermediates or by-products. In addition some fatty acids used as tabletting materials and caffeine used as adulterant, were also detected. The use of a restricted set of 10 target compounds was also proposed for developing a screening tool for clustering samples having close profile. 114 seizures were analyzed using an SPME auto-sampler (MultiPurpose Samples MPS2), purchased from Gerstel GMBH & Co. (Germany), and coupled to GC-MS. The data was handled using various pre-treatment methods, followed by the study of similarities between sample pairs based on the Pearson correlation. The results show that HS-SPME, coupled with the suitable statistical method is a powerful tool for distinguishing specimens coming from the same seizure and specimens coming from different seizures. This information can be used by law enforcement personnel to visualize the ecstasy distribution network as well as the clandestine tablet manufacturing.
Resumo:
Remote sensing image processing is nowadays a mature research area. The techniques developed in the field allow many real-life applications with great societal value. For instance, urban monitoring, fire detection or flood prediction can have a great impact on economical and environmental issues. To attain such objectives, the remote sensing community has turned into a multidisciplinary field of science that embraces physics, signal theory, computer science, electronics, and communications. From a machine learning and signal/image processing point of view, all the applications are tackled under specific formalisms, such as classification and clustering, regression and function approximation, image coding, restoration and enhancement, source unmixing, data fusion or feature selection and extraction. This paper serves as a survey of methods and applications, and reviews the last methodological advances in remote sensing image processing.
Resumo:
Interviewer performance with respect to convincing sample members to participate in surveys is an important dimension of survey quality. However, unlike in CAPI surveys where each sample case 'belongs' to one interviewer, there are hardly any good measures of interview performance for centralised CATI surveys, where even single contacts are assigned to interviewers at random. If more than one interviewer works one sample case, it is not clear how to attribute success or failure to the interviewers involved. In this article, we propose two correlated methods to measure interviewer contact performance in centralised CATI surveys. Their modelling must take complex multilevel clustering effects, which need not be hierarchical, into account. Results are consistent with findings from CAPI data modelling, and we find that when comparing effects with a direct ('naive') measure of interviewer contact results, interviewer random effects are largely underestimated using the naive measure.
Resumo:
A three-dimensional cell culture system was used as a model to study the influence of low levels of mercury in the developing brain. Aggregating cell cultures of fetal rat telencephalon were treated for 10 days either during an early developmental period (i.e., between days 5 and 15 in vitro) or during a phase of advanced maturation (i.e., between days 25 and 35) with mercury. An inorganic (HgCl2) and an organic mercury compound (monomethylmercury chloride, MeHgCl) were examined. By monitoring changes in cell type-specific enzymes activities, the concentration-dependent toxicity of the compounds was determined. In immature cultures, a general cytotoxicity was observed at 10(-6) M for both mercury compounds. In these cultures, HgCl2 appeared somewhat more toxic than MeHgCl. However, no appreciable demethylation of MeHgCl could be detected, indicating similar toxic potencies for both mercury compounds. In highly differentiated cultures, by contrast, MeHgCl exhibited a higher toxic potency than HgCl2. In addition, at 10(-6) M, MeHgCl showed pronounced neuron-specific toxicity. Below the cytotoxic concentrations, distinct glia-specific reactions could be observed with both mercury compounds. An increase in the immunoreactivity for glial fibrillary acidic protein, typical for gliosis, could be observed at concentrations between 10(-9) M and 10(-7) M in immature cultures, and between 10(-8) M and 3 x 10(-5) M in highly differentiated cultures. A conspicuous increase in the number and clustering of GSI-B4 lectin-binding cells, indicating a microglial response, was found at concentrations between 10(-10) M and 10(-7) M. These development-dependent and cell type-specific effects may reflect the pathogenic potential of long-term exposure to subclinical doses of mercury.
Resumo:
Single-trial analysis of human electroencephalography (EEG) has been recently proposed for better understanding the contribution of individual subjects to a group-analysis effect as well as for investigating single-subject mechanisms. Independent Component Analysis (ICA) has been repeatedly applied to concatenated single-trial responses and at a single-subject level in order to extract those components that resemble activities of interest. More recently we have proposed a single-trial method based on topographic maps that determines which voltage configurations are reliably observed at the event-related potential (ERP) level taking advantage of repetitions across trials. Here, we investigated the correspondence between the maps obtained by ICA versus the topographies that we obtained by the single-trial clustering algorithm that best explained the variance of the ERP. To do this, we used exemplar data provided from the EEGLAB website that are based on a dataset from a visual target detection task. We show there to be robust correspondence both at the level of the activation time courses and at the level of voltage configurations of a subset of relevant maps. We additionally show the estimated inverse solution (based on low-resolution electromagnetic tomography) of two corresponding maps occurring at approximately 300 ms post-stimulus onset, as estimated by the two aforementioned approaches. The spatial distribution of the estimated sources significantly correlated and had in common a right parietal activation within Brodmann's Area (BA) 40. Despite their differences in terms of theoretical bases, the consistency between the results of these two approaches shows that their underlying assumptions are indeed compatible.
Resumo:
Coordination games are important to explain efficient and desirable social behavior. Here we study these games by extensive numerical simulation on networked social structures using an evolutionary approach. We show that local network effects may promote selection of efficient equilibria in both pure and general coordination games and may explain social polarization. These results are put into perspective with respect to known theoretical results. The main insight we obtain is that clustering, and especially community structure in social networks has a positive role in promoting socially efficient outcomes.
Resumo:
Defining the limits of an urban agglomeration is essential both for fundamental and applied studies in quantitative and theoretical geography. A simple and consistent way for defining such urban clusters is important for performing different statistical analysis and comparisons. Traditionally, agglomerations are defined using a rather qualitative approach based on various statistical measures. This definition varies generally from one country to another, and the data taken into account are different. In this paper, we explore the use of the City Clustering Algorithm (CCA) for the agglomeration definition in Switzerland. This algorithm provides a systemic and easy way to define an urban area based only on population data. The CCA allows the specification of the spatial resolution for defining the urban clusters. The results from different resolutions are compared and analysed, and the effect of filtering the data investigated. Different scales and parameters allow highlighting different phenomena. The study of Zipf's law using the visual rank-size rule shows that it is valid only for some specific urban clusters, inside a narrow range of the spatial resolution of the CCA. The scale where emergence of one main cluster occurs can also be found in the analysis using Zipf's law. The study of the urban clusters at different scales using the lacunarity measure - a complementary measure to the fractal dimension - allows to highlight the change of scale at a given range.
Resumo:
The recognition that colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behaviour and response to therapy translates into an urgent need for robust molecular disease subclassifiers that can explain this heterogeneity beyond current parameters (MSI, KRAS, BRAF). Attempts to fill this gap are emerging. The Cancer Genome Atlas (TGCA) reported two main CRC groups, based on the incidence and spectrum of mutated genes, and another paper reported an EMT expression signature defined subgroup. We performed a prior free analysis of CRC heterogeneity on 1113 CRC gene expression profiles and confronted our findings to established molecular determinants and clinical, histopathological and survival data. Unsupervised clustering based on gene modules allowed us to distinguish at least five different gene expression CRC subtypes, which we call surface crypt-like, lower crypt-like, CIMP-H-like, mesenchymal and mixed. A gene set enrichment analysis combined with literature search of gene module members identified distinct biological motifs in different subtypes. The subtypes, which were not derived based on outcome, nonetheless showed differences in prognosis. Known gene copy number variations and mutations in key cancer-associated genes differed between subtypes, but the subtypes provided molecular information beyond that contained in these variables. Morphological features significantly differed between subtypes. The objective existence of the subtypes and their clinical and molecular characteristics were validated in an independent set of 720 CRC expression profiles. Our subtypes provide a novel perspective on the heterogeneity of CRC. The proposed subtypes should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity. Original microarray data were uploaded to the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) under Accession Nos E-MTAB-990 and E-MTAB-1026. © 2013 Swiss Institute of Bioinformatics. Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Resumo:
Les larves aquatiques d'éphémères (Ephemeroptera) colonisent toutes les eaux douces du monde et sont couramment utilisées comme bio-indicateurs de la qualité de l'eau. Le genre Rhithrogena (Heptageniidae) est le deuxième plus diversifié chez les éphémères, et plusieurs espèces européennes ont une distribution restreinte dans des environnements alpins sensibles. Les espèces de Rhithrogena ont été classées en "groupes d'espèces" faciles à identifier. Cependant, malgré leur importance écologique et en terme de conservation, beaucoup d'espèces présentent des différences morphologiques ambiguës, suggérant que lataxonomie actuelle ne refléterait pas correctement leur diversité évolutive. De plus, aucune information sur leurs relations, leur origine, le taux de spéciation ou les mécanismes ayant provoqué leur remarquable diversification dans les Alpes n'est disponible. Nous avons d'abord examiné le statut spécifique d'environ 50% des espèces européennes de Rhithrogena en utilisant un large échantillonnage de populations alpines incluant 22 localités typiques, ainsi qu'une analyse basée sur le modèle général mixte de Yule et de coalescence (GMYC) appliqué à un gène mitochondrial standard (coxl) et à un gène nucléaire développé spécifiquement pour cette étude. Nous avons observé un regroupement significatif des séquences coxl en 31 espèces potentielles, et nos résultats ont fortement suggéré la présence d'espèces cryptiques et de fractionnements taxonomiques excessifs chez les Rhithrogena. Nos analyses phylogénétiques ont démontré la monophylie de quatre des six groupes d'espèces reconnus présents dans notre échantillonnage. La taxonomie ADN développée dans cette étude pose les bases d'une future révision de ce genre important mais cryptique en Europe. Puis nous avons mené une étude phylogénétique multi-gènes entre les espèces européennes de Rhithrogena. Les données provenant de trois gènes nucléaires et de deux gènes mitochondriaux ont été largement concordantes, et les relations entre les espèces bien résolues au sein de la plupart des groupes d'espèces dans une analyse combinant tous les gènes. En l'absence de points de calibration extérieurs tels que des fossiles, nous avons appliqué à nos données mitochondriales une horloge moléculaire standard pour les insectes, suggérant une origine des Rhithrogena alpins à la limite Oligocène / Miocène. Nos résultats ont montré le rôle prépondérant qu'ont joué les glaciations du quaternaire dans leur diversification, favorisant la spéciation d'au moins la moitié des espèces actuelle dans les Alpes. La biodiversité et le taux d'endémisme à Madagascar, notamment au niveau de la faune des eaux douces, sont parmi les plus extraordinaires et les plus menacés au monde. On pense que beaucoup d'espèces d'éphémères sont restreintes à un seul bassin versant (microendémisme) dans les zones forestières, ce qui les rendrait particulièrement sensibles à la réduction et à la dégradation de leur habitat. Mis à part deux espèces décrites, Afronurus matitensis et Compsoneuria josettae, les Heptageniidae sont pratiquement inconnus à Madagascar. Les deux genres ont une distribution discontinue en Afrique, à Madagascar et en Asie du Sud-Est, et leur taxonomie complexe est régulièrement révisée. L'approche standard pour comprendre leur diversité, leur endémisme et leur origine requerrait un échantillonnage étendu sur plusieurs continents et des années de travaux taxonomiques. Pour accélérer le processus, nous avons utilisé des collections de musées ainsi que des individus fraîchement collectés, et appliqué une approche combinant taxonomie ADN et phylogénie. L'analyses GMYC du gène coxl a délimité 14 espèces potentielles à Madagascar, dont 70% vraisemblablement microendémiques. Une analyse phylogénique incluant des espèces africaines et asiatiques portant sur deux gènes mitochondriaux et quatre gènes nucléaires a montré que les Heptageniidae malgaches sont monophylétiques et groupe frère des Compsoneuria africains. L'existence de cette lignée unique, ainsi qu'un taux élevé de microendémisme, mettent en évidence leur importance en terme de conservation. Nos résultats soulignent également le rôle important que peuvent jouer les collections de musées dans les études moléculaires et en conservation. - Aquatic nymphs of mayflies (Ephemeroptera) colonize all types of freshwaters throughout the world and are extensively used as bio-indicators of water quality. Rhithrogena (Heptageniidae) is the second most species-rich genus of mayflies, and several European species have restricted distributions in sensitive Alpine environments and therefore are of conservation interest. The European Rhithrogena species are arranged into "species groups" that are easily identifiable. However, despite their ecological and conservation importance, ambiguous morphological differences among many species suggest that the current taxonomy may not accurately reflect their evolutionary diversity. Moreover, no information about their relationships, origin, timing of speciation and mechanisms promoting their successful diversification in the Alps is available. We first examined the species status of ca. 50% of European Rhithrogena diversity using a widespread sampling scheme of Alpine species that included 22 type localities, general mixed Yule- coalescent (GMYC) model analysis of one standard mitochondrial (coxl) and one newly developed nuclear marker. We observed significant clustering of coxl into 31 GMYC species, and our results strongly suggest the presence of both cryptic diversity and taxonomic oversplitting in Rhithrogena. Phylogenetic analyses recovered four of the six recognized species groups in our samples as monophyletic. The DNA taxonomy developed here lays the groundwork for a future revision of this important but cryptic genus in Europe. Then we conducted a species-level, multiple-gene phylogenetic study of European Rhithrogena. Data from three nuclear and two mitochondrial loci were broadly congruent, and species-level relationships were well resolved within most species groups in a combined analysis. In the absence of external calibration points like fossils, we applied a standard insect molecular clock hypothesis to our mitochondrial data, suggesting an origin of Alpine Rhithrogena in the Oligocene / Miocene boundary. Our results highlighted the preponderant role that quaternary glaciations played in their diversification, promoting speciation of at least half of the current diversity in the Alps. Madagascar's biodiversity and endemism are among the most extraordinary and endangered in the world. This includes the island's freshwater biodiversity, although detailed knowledge of the diversity, endemism, and biogeographic origin of freshwater invertebrates is lacking. Many mayfly species are thought to be restricted to single river basins (microendemic species) in forested areas, making them particularly sensitive to habitat reduction and degradation. The Heptageniidae are practically unknown in Madagascar except for two described species, Afronurus matitensis and Compsoneuria josettae. Both genera have a disjunct distribution in Africa, Madagascar and Southeast Asia, and a complex taxonomic status still in flux. The standard approach to understanding their diversity, endemism, and origin would require extensive field sampling on several continents and years of taxonomic work. Here we circumvent this using museum collections and freshly collected individuals in a combined approach of DNA taxonomy and phylogeny. The cox/-based GMYC analysis revealed 14 putative species on Madagascar, 70% of which potentially microendemics. A phylogenetic analysis that included African and Asian species and data from two mitochondrial and four nuclear loci indicated the Malagasy Heptageniidae are monophyletic and sister to African Compsoneuria. The observed monophyly and high microendemism highlight their conservation importance. Our results also underline the important role that museum collections can play in molecular studies, especially in critically endangered biodiversity hotspots like Madagascar.
Resumo:
This paper examines a dataset that derives from an observational tracking, in order to analyze where and how middle-class working families spend time at home. We use an ethnographic approach to study the everyday lives of Italian dual-income middle-class families, with the aim to analyze quantitatively the use of home spaces and the types of activities of family members on weekday afternoons and evenings. The different analyses (multiple correspondence analysis, agglomerative hierarchical cluster, discriminant analysis) show how particular spaces and activities in these spaces are dominated by certain family members. We suggest a combination of qualitative and quantitative methodologies as useful tools to explore in detail the everyday lives of families, and to understand how family members use the domestic spaces. In particular, we consider relevant the use of quantitative analyses to examine ethnographic data, especially in connection with the methodological reflexivity among researchers
Resumo:
Abstract The giant hogweed (Heracleum mantegazzianum) has successfully invaded 19 European countries as well as parts of North America. It has become a problematic species due to its ability to displace native flora and to cause public health hazards. Applying population genetics to species invasion can help reconstruct invasion history and may promote more efficient management practice. We thus analysed levels of genetic variation and population genetic structure of H. mantegazzianum in an invaded area of the western Swiss Alps as well as in its native range (the Caucasus), using eight nuclear microsatellite loci together with plastid DNA markers and sequences. On both nuclear and plastid genomes, native populations exhibited significantly higher levels of genetic diversity compared to invasive populations, confirming an important founder event during the invasion process. Invasive populations were also significantly more differentiated than native populations. Bayesian clustering analysis identified five clusters in the native range that corresponded to geographically and ecologically separated groups. In the invaded range, 10 clusters occurred. Unlike native populations, invasive clusters were characterized by a mosaic pattern in the landscape, possibly caused by anthropogenic dispersal of the species via roads and direct collection for ornamental purposes. Lastly, our analyses revealed four main divergent groups in the western Swiss Alps, likely as a consequence of multiple independent establishments of H. mantegazzianum.
Resumo:
Microarray gene expression profiles of fresh clinical samples of chronic myeloid leukaemia in chronic phase, acute promyelocytic leukaemia and acute monocytic leukaemia were compared with profiles from cell lines representing the corresponding types of leukaemia (K562, NB4, HL60). In a hierarchical clustering analysis, all clinical samples clustered separately from the cell lines, regardless of leukaemic subtype. Gene ontology analysis showed that cell lines chiefly overexpressed genes related to macromolecular metabolism, whereas in clinical samples genes related to the immune response were abundantly expressed. These findings must be taken into consideration when conclusions from cell line-based studies are extrapolated to patients.
Resumo:
The ability to obtain gene expression profiles from human disease specimens provides an opportunity to identify relevant gene pathways, but is limited by the absence of data sets spanning a broad range of conditions. Here, we analyzed publicly available microarray data from 16 diverse skin conditions in order to gain insight into disease pathogenesis. Unsupervised hierarchical clustering separated samples by disease as well as common cellular and molecular pathways. Disease-specific signatures were leveraged to build a multi-disease classifier, which predicted the diagnosis of publicly and prospectively collected expression profiles with 93% accuracy. In one sample, the molecular classifier differed from the initial clinical diagnosis and correctly predicted the eventual diagnosis as the clinical presentation evolved. Finally, integration of IFN-regulated gene programs with the skin database revealed a significant inverse correlation between IFN-β and IFN-γ programs across all conditions. Our study provides an integrative approach to the study of gene signatures from multiple skin conditions, elucidating mechanisms of disease pathogenesis. In addition, these studies provide a framework for developing tools for personalized medicine toward the precise prediction, prevention, and treatment of disease on an individual level.