974 resultados para cluster analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

[EN]In this paper an architecture for an estimator of short-term wind farm power is proposed. The estimator is made up of a Linear Machine classifier and a set of k Multilayer Perceptrons, training each one for a specific subspace of the input space. The splitting of the input dataset into the k clusters is done using a k-means technique, obtaining the equivalent Linear Machine classifier from the cluster centroids...

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cluster analysis has been identified as a core task in data mining. What constitutes a cluster, or a good clustering, may depend on the background of researchers and applications. This paper proposes two optimization criteria of abstract degree and fidelity in the field of image abstract. To satisfy the fidelity criteria, a novel clustering algorithm named Global Optimized Color-based DBSCAN Clustering (GOC-DBSCAN) is provided. Also, non-optimized local color information based version of GOC-DBSCAN, called HSV-DBSCAN, is given. Both of them are based on HSV color space. Clusters of GOC-DBSCAN are analyzed to find the factors that impact on the performance of both abstract degree and fidelity. Examples show generally the greater the abstract degree is, the less is the fidelity. It also shows GOC-DBSCAN outperforms HSV-DBSCAN when they are evaluated by the two optimization criteria.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Flood Vulnerability Index (FloodVI) was developed using Principal Component Analysis (PCA) and a new aggregation method based on Cluster Analysis (CA). PCA simplifies a large number of variables into a few uncorrelated factors representing the social, economic, physical and environmental dimensions of vulnerability. CA groups areas that have the same characteristics in terms of vulnerability into vulnerability classes. The grouping of the areas determines their classification contrary to other aggregation methods in which the areas' classification determines their grouping. While other aggregation methods distribute the areas into classes, in an artificial manner, by imposing a certain probability for an area to belong to a certain class, as determined by the assumption that the aggregation measure used is normally distributed, CA does not constrain the distribution of the areas by the classes. FloodVI was designed at the neighbourhood level and was applied to the Portuguese municipality of Vila Nova de Gaia where several flood events have taken place in the recent past. The FloodVI sensitivity was assessed using three different aggregation methods: the sum of component scores, the first component score and the weighted sum of component scores. The results highlight the sensitivity of the FloodVI to different aggregation methods. Both sum of component scores and weighted sum of component scores have shown similar results. The first component score aggregation method classifies almost all areas as having medium vulnerability and finally the results obtained using the CA show a distinct differentiation of the vulnerability where hot spots can be clearly identified. The information provided by records of previous flood events corroborate the results obtained with CA, because the inundated areas with greater damages are those that are identified as high and very high vulnerability areas by CA. This supports the fact that CA provides a reliable FloodVI.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Chromatographic fingerprints of 46 Eucommia Bark samples were obtained by liquid chromatography-diode array detector (LC-DAD). These samples were collected from eight provinces in China, with different geographical locations, and climates. Seven common LC peaks that could be used for fingerprinting this common popular traditional Chinese medicine were found, and six were identified as substituted resinols (4 compounds), geniposidic acid and chlorogenic acid by LC-MS. Principal components analysis (PCA) indicated that samples from the Sichuan, Hubei, Shanxi and Anhui—the SHSA provinces, clustered together. The other objects from the four provinces, Guizhou, Jiangxi, Gansu and Henan, were discriminated and widely scattered on the biplot in four province clusters. The SHSA provinces are geographically close together while the others are spread out. Thus, such results suggested that the composition of the Eucommia Bark samples was dependent on their geographic location and environment. In general, the basis for discrimination on the PCA biplot from the original 46 objects× 7 variables data matrix was the same as that for the SHSA subset (36 × 7 matrix). The seven marker compound loading vectors grouped into three sets: (1) three closely correlating substituted resinol compounds and chlorogenic acid; (2) the fourth resinol compound identified by the OCH3 substituent in the R4 position, and an unknown compound; and (3) the geniposidic acid, which was independent of the set 1 variables, and which negatively correlated with the set 2 ones above. These observations from the PCA biplot were supported by hierarchical cluster analysis, and indicated that Eucommia Bark preparations may be successfully compared with the use of the HPLC responses from the seven marker compounds and chemometric methods such as PCA and the complementary hierarchical cluster analysis (HCA).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This overview focuses on the application of chemometrics techniques for the investigation of soils contaminated by polycyclic aromatic hydrocarbons (PAHs) and metals because these two important and very diverse groups of pollutants are ubiquitous in soils. The salient features of various studies carried out in the micro- and recreational environments of humans, are highlighted in the context of the various multivariate statistical techniques available across discipline boundaries that have been effectively used in soil studies. Particular attention is paid to techniques employed in the geosciences that may be effectively utilized for environmental soil studies; classical multivariate approaches that may be used in isolation or as complementary methods to these are also discussed. Chemometrics techniques widely applied in atmospheric studies for identifying sources of pollutants or for determining the importance of contaminant source contributions to a particular site, have seen little use in soil studies, but may be effectively employed in such investigations. Suitable programs are also available for suggesting mitigating measures in cases of soil contamination, and these are also considered. Specific techniques reviewed include pattern recognition techniques such as Principal Components Analysis (PCA), Fuzzy Clustering (FC) and Cluster Analysis (CA); geostatistical tools include variograms, Geographical Information Systems (GIS), contour mapping and kriging; source identification and contribution estimation methods reviewed include Positive Matrix Factorisation (PMF), and Principal Component Analysis on Absolute Principal Component Scores (PCA/APCS). Mitigating measures to limit or eliminate pollutant sources may be suggested through the use of ranking analysis and multi criteria decision making methods (MCDM). These methods are mainly represented in this review by studies employing the Preference Ranking Organisation Method for Enrichment Evaluation (PROMETHEE) and its associated graphic output, Geometrical Analysis for Interactive Aid (GAIA).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This research paper aims to develop a method to explore the travel behaviour differences between disadvantaged and non-disadvantaged populations. It also aims to develop a modelling approach or a framework to integrate disadvantage analysis into transportation planning models (TPMs). The methodology employed identifies significantly disadvantaged groups through a cluster analysis and the paper presents a disadvantage-integrated TPM. This model could be useful in determining areas with concentrated disadvantaged population and also developing and formulating relevant disadvantage sensitive policies. (a) For the covering entry of this conference, please see ITRD abstract no. E214666.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Concerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques(e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Barmah Forest virus (BFV) disease is one of the most widespread mosquito-borne diseases in Australia. The number of outbreaks and the incidence rate of BFV in Australia have attracted growing concerns about the spatio-temporal complexity and underlying risk factors of BFV disease. A large number of notifications has been recorded continuously in Queensland since 1992. Yet, little is known about the spatial and temporal characteristics of the disease. I aim to use notification data to better understand the effects of climatic, demographic, socio-economic and ecological risk factors on the spatial epidemiology of BFV disease transmission, develop predictive risk models and forecast future disease risks under climate change scenarios. Computerised data files of daily notifications of BFV disease and climatic variables in Queensland during 1992-2008 were obtained from Queensland Health and Australian Bureau of Meteorology, respectively. Projections on climate data for years 2025, 2050 and 2100 were obtained from Council of Scientific Industrial Research Organisation. Data on socio-economic, demographic and ecological factors were also obtained from relevant government departments as follows: 1) socio-economic and demographic data from Australian Bureau of Statistics; 2) wetlands data from Department of Environment and Resource Management and 3) tidal readings from Queensland Department of Transport and Main roads. Disease notifications were geocoded and spatial and temporal patterns of disease were investigated using geostatistics. Visualisation of BFV disease incidence rates through mapping reveals the presence of substantial spatio-temporal variation at statistical local areas (SLA) over time. Results reveal high incidence rates of BFV disease along coastal areas compared to the whole area of Queensland. A Mantel-Haenszel Chi-square analysis for trend reveals a statistically significant relationship between BFV disease incidence rates and age groups (ƒÓ2 = 7587, p<0.01). Semi-variogram analysis and smoothed maps created from interpolation techniques indicate that the pattern of spatial autocorrelation was not homogeneous across the state. A cluster analysis was used to detect the hot spots/clusters of BFV disease at a SLA level. Most likely spatial and space-time clusters are detected at the same locations across coastal Queensland (p<0.05). The study demonstrates heterogeneity of disease risk at a SLA level and reveals the spatial and temporal clustering of BFV disease in Queensland. Discriminant analysis was employed to establish a link between wetland classes, climate zones and BFV disease. This is because the importance of wetlands in the transmission of BFV disease remains unclear. The multivariable discriminant modelling analyses demonstrate that wetland types of saline 1, riverine and saline tidal influence were the most significant risk factors for BFV disease in all climate and buffer zones, while lacustrine, palustrine, estuarine and saline 2 and saline 3 wetlands were less important. The model accuracies were 76%, 98% and 100% for BFV risk in subtropical, tropical and temperate climate zones, respectively. This study demonstrates that BFV disease risk varied with wetland class and climate zone. The study suggests that wetlands may act as potential breeding habitats for BFV vectors. Multivariable spatial regression models were applied to assess the impact of spatial climatic, socio-economic and tidal factors on the BFV disease in Queensland. Spatial regression models were developed to account for spatial effects. Spatial regression models generated superior estimates over a traditional regression model. In the spatial regression models, BFV disease incidence shows an inverse relationship with minimum temperature, low tide and distance to coast, and positive relationship with rainfall in coastal areas whereas in whole Queensland the disease shows an inverse relationship with minimum temperature and high tide and positive relationship with rainfall. This study determines the most significant spatial risk factors for BFV disease across Queensland. Empirical models were developed to forecast the future risk of BFV disease outbreaks in coastal Queensland using existing climatic, socio-economic and tidal conditions under climate change scenarios. Logistic regression models were developed using BFV disease outbreak data for the existing period (2000-2008). The most parsimonious model had high sensitivity, specificity and accuracy and this model was used to estimate and forecast BFV disease outbreaks for years 2025, 2050 and 2100 under climate change scenarios for Australia. Important contributions arising from this research are that: (i) it is innovative to identify high-risk coastal areas by creating buffers based on grid-centroid and the use of fine-grained spatial units, i.e., mesh blocks; (ii) a spatial regression method was used to account for spatial dependence and heterogeneity of data in the study area; (iii) it determined a range of potential spatial risk factors for BFV disease; and (iv) it predicted the future risk of BFV disease outbreaks under climate change scenarios in Queensland, Australia. In conclusion, the thesis demonstrates that the distribution of BFV disease exhibits a distinct spatial and temporal variation. Such variation is influenced by a range of spatial risk factors including climatic, demographic, socio-economic, ecological and tidal variables. The thesis demonstrates that spatial regression method can be applied to better understand the transmission dynamics of BFV disease and its risk factors. The research findings show that disease notification data can be integrated with multi-factorial risk factor data to develop build-up models and forecast future potential disease risks under climate change scenarios. This thesis may have implications in BFV disease control and prevention programs in Queensland.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The latest paradigm shift in government, termed Transformational Government, puts the citizen in the centre of attention. Including citizens in the design of online one-stop portals can help governmental organisations to become more customer focussed. This study describes the initial efforts of an Australian state government to develop an information architecture to structure the content of their future one-stop portal. Hereby, card sorting exercises have been conducted and analysed, utilising contemporary approaches found in academic and non-scientific literature. This paper describes the findings of the card sorting exercises in this particular case and discusses the suitability of the applied approaches in general. These are distinguished into non-statistical, statistical, and hybrid approaches. Thus, on the one hand, this paper contributes to academia by describing the application of different card sorting approaches and discussing their strengths and weaknesses. On the other hand, this paper contributes to practice by explaining the approach that has been taken by the authors’ research partner in order to develop a customer-focussed governmental one-stop portal. Thus, they provide decision support for practitioners with regard to different analysis methods that can be used to complement recent approaches in Transformational Government.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data in germplasm collections contain a mixture of data types; binary, multistate and quantitative. Given the multivariate nature of these data, the pattern analysis methods of classification and ordination have been identified as suitable techniques for statistically evaluating the available diversity. The proximity (or resemblance) measure, which is in part the basis of the complementary nature of classification and ordination techniques, is often specific to particular data types. The use of a combined resemblance matrix has an advantage over data type specific proximity measures. This measure accommodates the different data types without manipulating them to be of a specific type. Descriptors are partitioned into their data types and an appropriate proximity measure is used on each. The separate proximity matrices, after range standardisation, are added as a weighted average and the combined resemblance matrix is then used for classification and ordination. Germplasm evaluation data for 831 accessions of groundnut (Arachis hypogaea L.) from the Australian Tropical Field Crops Genetic Resource Centre, Biloela, Queensland were examined. Data for four binary, five ordered multistate and seven quantitative descriptors have been documented. The interpretative value of different weightings - equal and unequal weighting of data types to obtain a combined resemblance matrix - was investigated by using principal co-ordinate analysis (ordination) and hierarchical cluster analysis. Equal weighting of data types was found to be more valuable for these data as the results provided a greater insight into the patterns of variability available in the Australian groundnut germplasm collection. The complementary nature of pattern analysis techniques enables plant breeders to identify relevant accessions in relation to the descriptors which distinguish amongst them. This additional information may provide plant breeders with a more defined entry point into the germplasm collection for identifying sources of variability for their plant improvement program, thus improving the utilisation of germplasm resources.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Information on the variation available for different plant attributes has enabled germplasm collections to be effectively utilised in plant breeding. A world sourced collection of white clover germplasm has been developed at the White Clover Resource Centre at Glen Innes, New South Wales. This collection of 439 accessions was characterised under field conditions as a preliminary study of the genotypic variation for morphological attributes; stolon density, stolon branching, number of nodes. number of rooted nodes, stolon thickness, internode length, leaf length, plant height and plant spread, together with seasonal herbage yield. Characterisation was conducted on different batches of germplasm (subsets of accessions taken from the complete collection) over a period of five years. Inclusion of two check cultivars, Haifa and Huia, in each batch enabled adjustment of the characterisation data for year effects and attribute-by-year interaction effects. The component of variance for seasonal herbage yield among batches was large relative to that for accessions. Accession-by-experiment and accession-by-season interactions for herbage yield were not detected. Accession mean repeatability for herbage yield across seasons was intermediate (0.453). The components of genotypic variance among accessions for all attributes, except plant height, were larger than their respective standard errors. The estimates of accession mean repeatability for the attributes ranged from low (0.277 for plant height) to intermediate (0.544 for internode length). Multivariate techniques of clustering and ordination were used to investigate the diversity present among the accessions in the collection. Both cluster analysis and principal component analysis suggested that seven groups of accessions existed. It was also proposed from the pattern analysis results that accessions from a group characterised by large leaves, tall plants and thick stolons could be crossed with accessions from a group that had above average stolon density and stolon branching. This material could produce breeding populations to be used in recurrent selection for the development of white clover cultivars for dryland summer moisture stress environments in Australia. The germplasm collection was also found to be deficient in genotypes with high stolon density, high number of branches high number of rooted nodes and large leaves. This warrants addition of new germplasm accessions possessing these characteristics to the present germplasm collection.