83 resultados para Labeling hierarchical clustering

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a hierarchical clustering method for semantic Web service discovery. This method aims to improve the accuracy and efficiency of the traditional service discovery using vector space model. The Web service is converted into a standard vector format through the Web service description document. With the help of WordNet, a semantic analysis is conducted to reduce the dimension of the term vector and to make semantic expansion to meet the user’s service request. The process and algorithm of hierarchical clustering based semantic Web service discovery is discussed. Validation is carried out on the dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cognitive experiments involving motor execution (ME) and motor imagery (MI) have been intensively studied using functional magnetic resonance imaging (fMRI). However, the functional networks of a multitask paradigm which include ME and MI were not widely explored. In this article, we aimed to investigate the functional networks involved in MI and ME using a method combining the hierarchical clustering analysis (HCA) and the independent component analysis (ICA). Ten right-handed subjects were recruited to participate a multitask experiment with conditions such as visual cue, MI, ME and rest. The results showed that four activation clusters were found including parts of the visual network, ME network, the MI network and parts of the resting state network. Furthermore, the integration among these functional networks was also revealed. The findings further demonstrated that the combined HCA with ICA approach was an effective method to analyze the fMRI data of multitasks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The diversity of social bees was assessed at 15 sites across five locations of the Nilgiri Biosphere Reserve, Western Ghats, India, from January to December 2007. We also conducted floristic analyses of local vegetation in each site using one-hectare sample plots. All woody species with a dbh (diameter at breast height) : 30 cm were recorded within the plots. A total area of 9.72 ha was assessed for floristic composition. Similarity of floristic composition between sites was determined using the Jaccard's distance measure and a dendrogram constructed based on the hierarchical clustering of floristic dissimilarities between sites. A Bee Importance Index (BII) was developed to give a measure of the bee diversity at each site. This index was a sum of the species richness of bee species in a site and their visitation frequencies to flowers, calculated as mean flower visits hour 1 within 2 focal patches within one hectare plots. The visits of bee species to flowers were also recorded. The Jaccard distance measure indicated that the montane sites were quite dissimilar to the low elevation sites in floristic diversity. The BII was 7-9 for the wet forest sites and ranged from 4-6 for drier forest sites. Seventy three plant species were identified as social bee plants and of them 45% were visited by one species of bee, 37% by two bee species and 18% by more than two bee species, indicating a certain degree of floral specialization among bees.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The polar winter stratospheric vortex is a coherent structure that undergoes different types of deformation that can be revealed by the geometric invariant moments. Three moments are used—the aspect ratio, the centroid latitude, and the area of the vortex based on stratospheric data from the 40-yr ECMWF Re-Analysis (ERA-40) project—to study sudden stratospheric warmings. Hierarchical clustering combined with data image visualization techniques is used as well. Using the gap statistic, three optimal clusters are obtained based on the three geometric moments considered here. The 850-K potential vorticity field, as well as the vertical profiles of polar temperature and zonal wind, provides evidence that the clusters represent, respectively, the undisturbed (U), displaced (D), and split (S) states of the polar vortex. This systematic method for identifying and characterizing the state of the polar vortex using objective methods is useful as a tool for analyzing observations and as a test for climate models to simulate the observations. The method correctly identifies all previously identified major warmings and also identifies significant minor warmings where the atmosphere is substantially disturbed but does not quite meet the criteria to qualify as a major stratospheric warming.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The presence of 10 virulence genes was examined using polymerase chain reaction (PCR) in 365 European O157 and non-O157 Escherichia coli isolates associated with verotoxin production. Strain-specific PCR data were analysed using hierarchical clustering. The resulting dendrogram clearly separated O157 from non-O157 strains. The former clustered typical high-risk seropathotype (SPT) A strains from all regions, including Sweden and Spain, which were homogenous by Cramer's V statistic, and strains with less typical O157 features mostly from Hungary. The non-O157 strains divided into a high-risk SPTB harbouring O26, O111 and O103 strains, a group pathogenic to pigs, and a group with few virulence genes other than for verotoxin. The data demonstrate SPT designation and selected PCR separated verotoxigenic E. coli of high and low risk to humans; although more virulence genes or pulsed-field gel electrophoresis will need to be included to separate high-risk strains further for epidemiological tracing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The urban heat island is a well-known phenomenon that impacts a wide variety of city operations. With greater availability of cheap meteorological sensors, it is possible to measure the spatial patterns of urban atmospheric characteristics with greater resolution. To develop robust and resilient networks, recognizing sensors may malfunction, it is important to know when measurement points are providing additional information and also the minimum number of sensors needed to provide spatial information for particular applications. Here we consider the example of temperature data, and the urban heat island, through analysis of a network of sensors in the Tokyo metropolitan area (Extended METROS). The effect of reducing observation points from an existing meteorological measurement network is considered, using random sampling and sampling with clustering. The results indicated the sampling with hierarchical clustering can yield similar temperature patterns with up to a 30% reduction in measurement sites in Tokyo. The methods presented have broader utility in evaluating the robustness and resilience of existing urban temperature networks and in how networks can be enhanced by new mobile and open data sources.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The clustering in time (seriality) of extratropical cyclones is responsible for large cumulative insured losses in western Europe, though surprisingly little scientific attention has been given to this important property. This study investigates and quantifies the seriality of extratropical cyclones in the Northern Hemisphere using a point-process approach. A possible mechanism for serial clustering is the time-varying effect of the large-scale flow on individual cyclone tracks. Another mechanism is the generation by one parent cyclone of one or more offspring through secondary cyclogenesis. A long cyclone-track database was constructed for extended October March winters from 1950 to 2003 using 6-h analyses of 850-mb relative vorticity derived from the NCEP NCAR reanalysis. A dispersion statistic based on the varianceto- mean ratio of monthly cyclone counts was used as a measure of clustering. It reveals extensive regions of statistically significant clustering in the European exit region of the North Atlantic storm track and over the central North Pacific. Monthly cyclone counts were regressed on time-varying teleconnection indices with a log-linear Poisson model. Five independent teleconnection patterns were found to be significant factors over Europe: the North Atlantic Oscillation (NAO), the east Atlantic pattern, the Scandinavian pattern, the east Atlantic western Russian pattern, and the polar Eurasian pattern. The NAO alone is not sufficient for explaining the variability of cyclone counts in the North Atlantic region and western Europe. Rate dependence on time-varying teleconnection indices accounts for the variability in monthly cyclone counts, and a cluster process did not need to be invoked.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We demonstrate that it is possible to link multi-chain molecular dynamics simulations with the tube model using a single chain slip-links model as a bridge. This hierarchical approach allows significant speed up of simulations, permitting us to span the time scales relevant for a comparison with the tube theory. Fitting the mean-square displacement of individual monomers in molecular dynamics simulations with the slip-spring model, we show that it is possible to predict the stress relaxation. Then, we analyze the stress relaxation from slip-spring simulations in the framework of the tube theory. In the absence of constraint release, we establish that the relaxation modulus can be decomposed as the sum of contributions from fast and longitudinal Rouse modes, and tube survival. Finally, we discuss some open questions regarding possible future directions that could be profitable in rendering the tube model quantitative, even for mildly entangled polymers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A methodology for discovering the mechanisms and dynamics of protein clustering on solid surfaces is presented. In situ atomic force microscopy images are quantitatively compared to Monte Carlo simulations using cluster statistics to differentiate various models. We study lysozyme adsorption on mica as a model system and find that all surface-supported clusters are mobile, not just the monomers, with diffusion constant inversely related to cluster size. The surface monomer diffusion constant is measured to be D1∼9×10-16  cm2 s-1, such a low value being difficult to measure using other techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The variogram is essential for local estimation and mapping of any variable by kriging. The variogram itself must usually be estimated from sample data. The sampling density is a compromise between precision and cost, but it must be sufficiently dense to encompass the principal spatial sources of variance. A nested, multi-stage, sampling with separating distances increasing in geometric progression from stage to stage will do that. The data may then be analyzed by a hierarchical analysis of variance to estimate the components of variance for every stage, and hence lag. By accumulating the components starting from the shortest lag one obtains a rough variogram for modest effort. For balanced designs the analysis of variance is optimal; for unbalanced ones, however, these estimators are not necessarily the best, and the analysis by residual maximum likelihood (REML) will usually be preferable. The paper summarizes the underlying theory and illustrates its application with data from three surveys, one in which the design had four stages and was balanced and two implemented with unbalanced designs to economize when there were more stages. A Fortran program is available for the analysis of variance, and code for the REML analysis is listed in the paper. (c) 2005 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An unbalanced nested sampling design was used to investigate the spatial scale of soil and herbicide interactions at the field scale. A hierarchical analysis of variance based on residual maximum likelihood (REML) was used to analyse the data and provide a first estimate of the variogram. Soil samples were taken at 108 locations at a range of separating distances in a 9 ha field to explore small and medium scale spatial variation. Soil organic matter content, pH, particle size distribution, microbial biomass and the degradation and sorption of the herbicide, isoproturon, were determined for each soil sample. A large proportion of the spatial variation in isoproturon degradation and sorption occurred at sampling intervals less than 60 m, however, the sampling design did not resolve the variation present at scales greater than this. A sampling interval of 20-25 m should ensure that the main spatial structures are identified for isoproturon degradation rate and sorption without too great a loss of information in this field.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The variogram is essential for local estimation and mapping of any variable by kriging. The variogram itself must usually be estimated from sample data. The sampling density is a compromise between precision and cost, but it must be sufficiently dense to encompass the principal spatial sources of variance. A nested, multi-stage, sampling with separating distances increasing in geometric progression from stage to stage will do that. The data may then be analyzed by a hierarchical analysis of variance to estimate the components of variance for every stage, and hence lag. By accumulating the components starting from the shortest lag one obtains a rough variogram for modest effort. For balanced designs the analysis of variance is optimal; for unbalanced ones, however, these estimators are not necessarily the best, and the analysis by residual maximum likelihood (REML) will usually be preferable. The paper summarizes the underlying theory and illustrates its application with data from three surveys, one in which the design had four stages and was balanced and two implemented with unbalanced designs to economize when there were more stages. A Fortran program is available for the analysis of variance, and code for the REML analysis is listed in the paper. (c) 2005 Elsevier Ltd. All rights reserved.