39 resultados para constrained clustering

em Université de Lausanne, Switzerland


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Specific properties emerge from the structure of large networks, such as that of worldwide air traffic, including a highly hierarchical node structure and multi-level small world sub-groups that strongly influence future dynamics. We have developed clustering methods to understand the form of these structures, to identify structural properties, and to evaluate the effects of these properties. Graph clustering methods are often constructed from different components: a metric, a clustering index, and a modularity measure to assess the quality of a clustering method. To understand the impact of each of these components on the clustering method, we explore and compare different combinations. These different combinations are used to compare multilevel clustering methods to delineate the effects of geographical distance, hubs, network densities, and bridges on worldwide air passenger traffic. The ultimate goal of this methodological research is to demonstrate evidence of combined effects in the development of an air traffic network. In fact, the network can be divided into different levels of âeurooecohesionâeuro, which can be qualified and measured by comparative studies (Newman, 2002; Guimera et al., 2005; Sales-Pardo et al., 2007).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Distribution of socio-economic features in urban space is an important source of information for land and transportation planning. The metropolization phenomenon has changed the distribution of types of professions in space and has given birth to different spatial patterns that the urban planner must know in order to plan a sustainable city. Such distributions can be discovered by statistical and learning algorithms through different methods. In this paper, an unsupervised classification method and a cluster detection method are discussed and applied to analyze the socio-economic structure of Switzerland. The unsupervised classification method, based on Ward's classification and self-organized maps, is used to classify the municipalities of the country and allows to reduce a highly-dimensional input information to interpret the socio-economic landscape. The cluster detection method, the spatial scan statistics, is used in a more specific manner in order to detect hot spots of certain types of service activities. The method is applied to the distribution services in the agglomeration of Lausanne. Results show the emergence of new centralities and can be analyzed in both transportation and social terms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Inspired by experiments that use single-particle tracking to measure the regions of confinement of selected chromosomal regions within cell nuclei, we have developed an analytical approach that takes into account various possible positions and shapes of the confinement regions. We show, in particular, that confinement of a particle into a subregion that is entirely enclosed within a spherical volume can lead to a higher limit of the mean radial square displacement value than the one associated with a particle that can explore the entire spherical volume. Finally, we apply the theory to analyse the motion of extrachromosomal chromatin rings within nuclei of living yeast.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: This study assessed clustering of multiple risk behaviors (i.e., low leisure-time physical activity, low fruits/vegetables intake, and high alcohol consumption) with level of cigarette consumption. METHODS: Data from the 2002 Swiss Health Survey, a population-based cross-sectional telephone survey assessing health and self-reported risk behaviors, were used. 18,005 subjects (8052 men and 9953 women) aged 25 years old or more participated. RESULTS: Smokers more frequently had low leisure time physical activity, low fruits/vegetables intake, and high alcohol consumption than non- and ex-smokers. Frequency of each risk behavior increased steadily with cigarette consumption. Clustering of risk behaviors increased with cigarette consumption in both men and women. For men, the odds ratios of multiple (> or =2) risk behaviors other than smoking, adjusted for age, nationality, and educational level, were 1.14 (95% confidence interval: 0.97, 1.33) for ex-smokers, 1.24 (0.93, 1.64) for light smokers (1-9 cigarettes/day), 1.72 (1.36, 2.17) for moderate smokers (10-19 cigarettes/day), and 3.07 (2.59, 3.64) for heavy smokers (> or =20 cigarettes/day) versus non-smokers. Similar odds ratios were found for women for corresponding groups, i.e., 1.01 (0.86, 1.19), 1.26 (1.00, 1.58), 1.62 (1.33, 1.98), and 2.75 (2.30, 3.29). CONCLUSIONS: Counseling and intervention with smokers should take into account the strong clustering of risk behaviors with level of cigarette consumption.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract This thesis proposes a set of adaptive broadcast solutions and an adaptive data replication solution to support the deployment of P2P applications. P2P applications are an emerging type of distributed applications that are running on top of P2P networks. Typical P2P applications are video streaming, file sharing, etc. While interesting because they are fully distributed, P2P applications suffer from several deployment problems, due to the nature of the environment on which they perform. Indeed, defining an application on top of a P2P network often means defining an application where peers contribute resources in exchange for their ability to use the P2P application. For example, in P2P file sharing application, while the user is downloading some file, the P2P application is in parallel serving that file to other users. Such peers could have limited hardware resources, e.g., CPU, bandwidth and memory or the end-user could decide to limit the resources it dedicates to the P2P application a priori. In addition, a P2P network is typically emerged into an unreliable environment, where communication links and processes are subject to message losses and crashes, respectively. To support P2P applications, this thesis proposes a set of services that address some underlying constraints related to the nature of P2P networks. The proposed services include a set of adaptive broadcast solutions and an adaptive data replication solution that can be used as the basis of several P2P applications. Our data replication solution permits to increase availability and to reduce the communication overhead. The broadcast solutions aim, at providing a communication substrate encapsulating one of the key communication paradigms used by P2P applications: broadcast. Our broadcast solutions typically aim at offering reliability and scalability to some upper layer, be it an end-to-end P2P application or another system-level layer, such as a data replication layer. Our contributions are organized in a protocol stack made of three layers. In each layer, we propose a set of adaptive protocols that address specific constraints imposed by the environment. Each protocol is evaluated through a set of simulations. The adaptiveness aspect of our solutions relies on the fact that they take into account the constraints of the underlying system in a proactive manner. To model these constraints, we define an environment approximation algorithm allowing us to obtain an approximated view about the system or part of it. This approximated view includes the topology and the components reliability expressed in probabilistic terms. To adapt to the underlying system constraints, the proposed broadcast solutions route messages through tree overlays permitting to maximize the broadcast reliability. Here, the broadcast reliability is expressed as a function of the selected paths reliability and of the use of available resources. These resources are modeled in terms of quotas of messages translating the receiving and sending capacities at each node. To allow a deployment in a large-scale system, we take into account the available memory at processes by limiting the view they have to maintain about the system. Using this partial view, we propose three scalable broadcast algorithms, which are based on a propagation overlay that tends to the global tree overlay and adapts to some constraints of the underlying system. At a higher level, this thesis also proposes a data replication solution that is adaptive both in terms of replica placement and in terms of request routing. At the routing level, this solution takes the unreliability of the environment into account, in order to maximize reliable delivery of requests. At the replica placement level, the dynamically changing origin and frequency of read/write requests are analyzed, in order to define a set of replica that minimizes communication cost.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The importance of competition between similar species in driving community assembly is much debated. Recently, phylogenetic patterns in species composition have been investigated to help resolve this question: phylogenetic clustering is taken to imply environmental filtering, and phylogenetic overdispersion to indicate limiting similarity between species. We used experimental plant communities with random species compositions and initially even abundance distributions to examine the development of phylogenetic pattern in species abundance distributions. Where composition was held constant by weeding, abundance distributions became overdispersed through time, but only in communities that contained distantly related clades, some with several species (i.e., a mix of closely and distantly related species). Phylogenetic pattern in composition therefore constrained the development of overdispersed abundance distributions, and this might indicate limiting similarity between close relatives and facilitation/complementarity between distant relatives. Comparing the phylogenetic patterns in these communities with those expected from the monoculture abundances of the constituent species revealed that interspecific competition caused the phylogenetic patterns. Opening experimental communities to colonization by all species in the species pool led to convergence in phylogenetic diversity. At convergence, communities were composed of several distantly related but species-rich clades and had overdispersed abundance distributions. This suggests that limiting similarity processes determine which species dominate a community but not which species occur in a community. Crucially, as our study was carried out in experimental communities, we could rule out local evolutionary or dispersal explanations for the patterns and identify ecological processes as the driving force, underlining the advantages of studying these processes in experimental communities. Our results show that phylogenetic relations between species provide a good guide to understanding community structure and add a new perspective to the evidence that niche complementarity is critical in driving community assembly.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract: To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

AIMS/HYPOTHESIS: The metabolic syndrome comprises a clustering of cardiovascular risk factors but the underlying mechanism is not known. Mice with targeted disruption of endothelial nitric oxide synthase (eNOS) are hypertensive and insulin resistant. We wondered, whether eNOS deficiency in mice is associated with a phenotype mimicking the human metabolic syndrome. METHODS AND RESULTS: In addition to arterial pressure and insulin sensitivity (euglycaemic hyperinsulinaemic clamp), we measured the plasma concentration of leptin, insulin, cholesterol, triglycerides, free fatty acids, fibrinogen and uric acid in 10 to 12 week old eNOS-/- and wild type mice. We also assessed glucose tolerance under basal conditions and following a metabolic stress with a high fat diet. As expected eNOS-/- mice were hypertensive and insulin resistant, as evidenced by fasting hyperinsulinaemia and a roughly 30 percent lower steady state glucose infusion rate during the clamp. eNOS-/- mice had a 1.5 to 2-fold elevation of the cholesterol, triglyceride and free fatty acid plasma concentration. Even though body weight was comparable, the leptin plasma level was 30% higher in eNOS-/- than in wild type mice. Finally, uric acid and fibrinogen were elevated in the eNOS-/- mice. Whereas under basal conditions, glucose tolerance was comparable in knock out and control mice, on a high fat diet, knock out mice became significantly more glucose intolerant than control mice. CONCLUSIONS: A single gene defect, eNOS deficiency, causes a clustering of cardiovascular risk factors in young mice. We speculate that defective nitric oxide synthesis could trigger many of the abnormalities making up the metabolic syndrome in humans.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Little is known about engagement in multiple health behaviours in childhood cancer survivors. METHODS: Using latent class analysis, we identified health behaviour patterns in 835 adult survivors of childhood cancer (age 20-35 years) and 1670 age- and sex-matched controls from the general population. Behaviour groups were determined from replies to questions on smoking, drinking, cannabis use, sporting activities, diet, sun protection and skin examination. RESULTS: The model identified four health behaviour patterns: 'risk-avoidance', with a generally healthy behaviour; 'moderate drinking', with higher levels of sporting activities, but moderate alcohol-consumption; 'risk-taking', engaging in several risk behaviours; and 'smoking', smoking but not drinking. Similar proportions of survivors and controls fell into the 'risk-avoiding' (42% vs 44%) and the 'risk-taking' cluster (14% vs 12%), but more survivors were in the 'moderate drinking' (39% vs 28%) and fewer in the 'smoking' cluster (5% vs 16%). Determinants of health behaviour clusters were gender, migration background, income and therapy. CONCLUSION: A comparable proportion of childhood cancer survivors as in the general population engage in multiple health-compromising behaviours. Because of increased vulnerability of survivors, multiple risk behaviours should be addressed in targeted health interventions.