933 resultados para agglomerative clustering


Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we describe the approaches adopted to generate the runs submitted to ImageCLEFPhoto 2009 with an aim to promote document diversity in the rankings. Four of our runs are text based approaches that employ textual statistics extracted from the captions of images, i.e. MMR [1] as a state of the art method for result diversification, two approaches that combine relevance information and clustering techniques, and an instantiation of Quantum Probability Ranking Principle. The fifth run exploits visual features of the provided images to re-rank the initial results by means of Factor Analysis. The results reveal that our methods based on only text captions consistently improve the performance of the respective baselines, while the approach that combines visual features with textual statistics shows lower levels of improvements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The impact of acid rock drainage (ARD) and eutrophication on microbial communities in stream sediments above and below an abandoned mine site in the Adelaide Hills, South Australia, was quantified by PLFA analysis. Multivariate analysis of water quality parameters, including anions, soluble heavy metals, pH, and conductivity, as well as total extractable metal concentrations in sediments, produced clustering of sample sites into three distinct groups. These groups corresponded with levels of nutrient enrichment and/or concentration of pollutants associated with ARD. Total PLFA concentration, which is indicative of microbial biomass, was reduced by >70% at sites along the stream between the mine site and as far as 18 km downstream. Further downstream, however, recovery of the microbial abundance was apparent, possibly reflecting dilution effect by downstream tributaries. Total PLFA was >40% higher at, and immediately below, the mine site (0-0.1 km), compared with sites further downstream (2.5-18 km), even after accounting for differences in specific surface area of different sediment samples. The increased microbial population in the proximity of the mine source may be associated with the presence of a thriving iron-oxidizing bacteria community as a consequence of optimal conditions for these organisms while the lower microbial population further downstream corresponded with greater sediments' metal concentrations. PCA of relative abundance revealed a number of PLFAs which were most influential in discriminating between ARD-polluted sites and the rest of the sites. These PLFA included the hydroxy fatty acids: 2OH12:0, 3OH12:0, 2OH16:0; the fungal marker: 18:2ω6; the sulfate-reducing bacteria marker 10Me16:1ω7; and the saturated fatty acids 12:0, 16:0, 18:0. Partial constrained ordination revealed that the environmental parameters with the greatest bearing on the PLFA profiles included pH, soluble aluminum, total extractable iron, and zinc. The study demonstrated the successful application of PLFA analysis to rapidly assess the toxicity of ARD-affected waters and sediments and to differentiate this response from the effects of other pollutants, such as increased nutrients and salinity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Data associated with germplasm collections are typically large and multivariate with a considerable number of descriptors measured on each of many accessions. Pattern analysis methods of clustering and ordination have been identified as techniques for statistically evaluating the available diversity in germplasm data. While used in many studies, the approaches have not dealt explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions). To consider the application of these techniques to germplasm evaluation data, 11328 accessions of groundnut (Arachis hypogaea L) from the International Research Institute for the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the rainy and post-rainy growing seasons were used. The ordination technique of principal component analysis was used to reduce the dimensionality of the germplasm data. The identification of phenotypically similar groups of accessions within large scale data via the computationally intensive hierarchical clustering techniques was not feasible and non-hierarchical techniques had to be used. Finite mixture models that maximise the likelihood of an accession belonging to a cluster were used to cluster the accessions in this collection. The patterns of response for the different growing seasons were found to be highly correlated. However, in relating the results to passport and other characterisation and evaluation descriptors, the observed patterns did not appear to be related to taxonomy or any other well known characteristics of groundnut.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As a sequel to a paper that dealt with the analysis of two-way quantitative data in large germplasm collections, this paper presents analytical methods appropriate for two-way data matrices consisting of mixed data types, namely, ordered multicategory and quantitative data types. While various pattern analysis techniques have been identified as suitable for analysis of the mixed data types which occur in germplasm collections, the clustering and ordination methods used often can not deal explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions) with incomplete information. However, it is shown that the ordination technique of principal component analysis and the mixture maximum likelihood method of clustering can be employed to achieve such analyses. Germplasm evaluation data for 11436 accessions of groundnut (Arachis hypogaea L.) from the International Research Institute of the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the post-rainy season and five ordered multicategory descriptors were used. Pattern analysis results generally indicated that the accessions could be distinguished into four regions along the continuum of growth habit (or plant erectness). Interpretation of accession membership in these regions was found to be consistent with taxonomic information, such as subspecies. Each growth habit region contained accessions from three of the most common groundnut botanical varieties. This implies that within each of the habit types there is the full range of expression for the other descriptors used in the analysis. Using these types of insights, the patterns of variability in germplasm collections can provide scientists with valuable information for their plant improvement programs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Information on the variation available for different plant attributes has enabled germplasm collections to be effectively utilised in plant breeding. A world sourced collection of white clover germplasm has been developed at the White Clover Resource Centre at Glen Innes, New South Wales. This collection of 439 accessions was characterised under field conditions as a preliminary study of the genotypic variation for morphological attributes; stolon density, stolon branching, number of nodes. number of rooted nodes, stolon thickness, internode length, leaf length, plant height and plant spread, together with seasonal herbage yield. Characterisation was conducted on different batches of germplasm (subsets of accessions taken from the complete collection) over a period of five years. Inclusion of two check cultivars, Haifa and Huia, in each batch enabled adjustment of the characterisation data for year effects and attribute-by-year interaction effects. The component of variance for seasonal herbage yield among batches was large relative to that for accessions. Accession-by-experiment and accession-by-season interactions for herbage yield were not detected. Accession mean repeatability for herbage yield across seasons was intermediate (0.453). The components of genotypic variance among accessions for all attributes, except plant height, were larger than their respective standard errors. The estimates of accession mean repeatability for the attributes ranged from low (0.277 for plant height) to intermediate (0.544 for internode length). Multivariate techniques of clustering and ordination were used to investigate the diversity present among the accessions in the collection. Both cluster analysis and principal component analysis suggested that seven groups of accessions existed. It was also proposed from the pattern analysis results that accessions from a group characterised by large leaves, tall plants and thick stolons could be crossed with accessions from a group that had above average stolon density and stolon branching. This material could produce breeding populations to be used in recurrent selection for the development of white clover cultivars for dryland summer moisture stress environments in Australia. The germplasm collection was also found to be deficient in genotypes with high stolon density, high number of branches high number of rooted nodes and large leaves. This warrants addition of new germplasm accessions possessing these characteristics to the present germplasm collection.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces a minimalistic approach to produce a visual hybrid map of a mobile robot’s working environment. The proposed system uses omnidirectional images along with odometry information to build an initial dense posegraph map. Then a two level hybrid map is extracted from the dense graph. The hybrid map consists of global and local levels. The global level contains a sparse topological map extracted from the initial graph using a dual clustering approach. The local level contains a spherical view stored at each node of the global level. The spherical views provide both an appearance signature for the nodes, which the robot uses to localize itself in the environment, and heading information when the robot uses the map for visual navigation. In order to show the usefulness of the map, an experiment was conducted where the map was used for multiple visual navigation tasks inside an office workplace.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Time series classification has been extensively explored in many fields of study. Most methods are based on the historical or current information extracted from data. However, if interest is in a specific future time period, methods that directly relate to forecasts of time series are much more appropriate. An approach to time series classification is proposed based on a polarization measure of forecast densities of time series. By fitting autoregressive models, forecast replicates of each time series are obtained via the bias-corrected bootstrap, and a stationarity correction is considered when necessary. Kernel estimators are then employed to approximate forecast densities, and discrepancies of forecast densities of pairs of time series are estimated by a polarization measure, which evaluates the extent to which two densities overlap. Following the distributional properties of the polarization measure, a discriminant rule and a clustering method are proposed to conduct the supervised and unsupervised classification, respectively. The proposed methodology is applied to both simulated and real data sets, and the results show desirable properties.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spreading cell fronts are essential features of development, repair and disease processes. Many mathematical models used to describe the motion of cell fronts, such as Fisher’s equation, invoke a mean–field assumption which implies that there is no spatial structure, such as cell clustering, present. Here, we examine the presence of spatial structure using a combination of in vitro circular barrier assays, discrete random walk simulations and pair correlation functions. In particular, we analyse discrete simulation data using pair correlation functions to show that spatial structure can form in a spreading population of cells either through sufficiently strong cell–to–cell adhesion or sufficiently rapid cell proliferation. We analyse images from a circular barrier assay describing the spreading of a population of MM127 melanoma cells using the same pair correlation functions. Our results indicate that the spreading melanoma cell populations remain very close to spatially uniform, suggesting that the strength of cell–to–cell adhesion and the rate of cell proliferation are both sufficiently small so as not to induce any spatial patterning in the spreading populations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The primary aim of this paper was to investigate heterogeneity in language abilities of children with a confirmed diagnosis of an ASD (N = 20) and children with typical development (TD; N = 15). Group comparisons revealed no differences between ASD and TD participants on standard clinical assessments of language ability, reading ability or nonverbal intelligence. However, a hierarchical cluster analysis based on spoken nonword repetition and sentence repetition identified two clusters within the combined group of ASD and TD participants. The first cluster (N = 6) presented with significantly poorer performances than the second cluster (N = 29) on both of the clustering variables in addition to single word and nonword reading. The significant differences between the two clusters occur within a context of Cluster 1 having language impairment and a tendency towards more severe autistic symptomatology. Differences between the oral language abilities of the first and second clusters are considered in light of diagnosis, attention and verbal short term memory skills and reading impairment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Variations that exist in the treatment of patients (with similar symptoms) across different hospitals do substantially impact the quality and costs of healthcare. Consequently, it is important to understand the similarities and differences between the practices across different hospitals. This paper presents a case study on the application of process mining techniques to measure and quantify the differences in the treatment of patients presenting with chest pain symptoms across four South Australian hospitals. Our case study focuses on cross-organisational benchmarking of processes and their performance. Techniques such as clustering, process discovery, performance analysis, and scientific workflows were applied to facilitate such comparative analyses. Lessons learned in overcoming unique challenges in cross-organisational process mining, such as ensuring population comparability, data granularity comparability, and experimental repeatability are also presented.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective To evaluate methods for monitoring monthly aggregated hospital adverse event data that display clustering, non-linear trends and possible autocorrelation. Design Retrospective audit. Setting The Northern Hospital, Melbourne, Australia. Participants 171,059 patients admitted between January 2001 and December 2006. Measurements The analysis is illustrated with 72 months of patient fall injury data using a modified Shewhart U control chart, and charts derived from a quasi-Poisson generalised linear model (GLM) and a generalised additive mixed model (GAMM) that included an approximate upper control limit. Results The data were overdispersed and displayed a downward trend and possible autocorrelation. The downward trend was followed by a predictable period after December 2003. The GLM-estimated incidence rate ratio was 0.98 (95% CI 0.98 to 0.99) per month. The GAMM-fitted count fell from 12.67 (95% CI 10.05 to 15.97) in January 2001 to 5.23 (95% CI 3.82 to 7.15) in December 2006 (p<0.001). The corresponding values for the GLM were 11.9 and 3.94. Residual plots suggested that the GLM underestimated the rate at the beginning and end of the series and overestimated it in the middle. The data suggested a more rapid rate fall before 2004 and a steady state thereafter, a pattern reflected in the GAMM chart. The approximate upper two-sigma equivalent control limit in the GLM and GAMM charts identified 2 months that showed possible special-cause variation. Conclusion Charts based on GAMM analysis are a suitable alternative to Shewhart U control charts with these data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Given the drawbacks for using geo-political areas in mapping outcomes unrelated to geo-politics, a compromise is to aggregate and analyse data at the grid level. This has the advantage of allowing spatial smoothing and modelling at a biologically or physically relevant scale. This article addresses two consequent issues: the choice of the spatial smoothness prior and the scale of the grid. Firstly, we describe several spatial smoothness priors applicable for grid data and discuss the contexts in which these priors can be employed based on different aims. Two such aims are considered, i.e., to identify regions with clustering and to model spatial dependence in the data. Secondly, the choice of the grid size is shown to depend largely on the spatial patterns. We present a guide on the selection of spatial scales and smoothness priors for various point patterns based on the two aims for spatial smoothing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spatial data are now prevalent in a wide range of fields including environmental and health science. This has led to the development of a range of approaches for analysing patterns in these data. In this paper, we compare several Bayesian hierarchical models for analysing point-based data based on the discretization of the study region, resulting in grid-based spatial data. The approaches considered include two parametric models and a semiparametric model. We highlight the methodology and computation for each approach. Two simulation studies are undertaken to compare the performance of these models for various structures of simulated point-based data which resemble environmental data. A case study of a real dataset is also conducted to demonstrate a practical application of the modelling approaches. Goodness-of-fit statistics are computed to compare estimates of the intensity functions. The deviance information criterion is also considered as an alternative model evaluation criterion. The results suggest that the adaptive Gaussian Markov random field model performs well for highly sparse point-based data where there are large variations or clustering across the space; whereas the discretized log Gaussian Cox process produces good fit in dense and clustered point-based data. One should generally consider the nature and structure of the point-based data in order to choose the appropriate method in modelling a discretized spatial point-based data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Immigrant entrepreneurs tend to start businesses within their ethnic enclave (EE), as it is an integral part of their social and cultural context and the location where ethnic resources reside (Logan, Alba, & Stults, 2003). Ethnic enclaves can be seen as a form of geographic cluster, China Towns are exemplar EEs, easily identified by the clustering of Chinese restaurants and other ethnic businesses in one central location. Studies on EE thus far have neglected the life cycles stages of EE and its impact on the business experiences of the entrepreneurs. In this paper, we track the formation, growth and decline of an EE. We argue that EE is a special industrial cluster and as such it follows the growth conditions proposed by the cluster life cycle theory (Menzel & Fornahl, 2009). We report a mixed method study of Chinese Restaurants in South East Queensland. Based on multiple sources of data, we concluded that changes in government policies leading to a sharp increase of immigrant numbers from a distinctive culture group can lead to the initiation and growth of the EE. Continuous incoming of new immigrants and increase competition within the cluster mark the mature stage of the EE, making the growth condition more favourable “inside” the cluster. A decline in new immigrants from the same ethnic group and the increased competition within the EE may eventually lead to the decline of such an industrial cluster, thus providing more favorable condition for growth of business outside the cluster.