891 resultados para Hierarchical clustering model
Resumo:
ICT clusters have attracted much attention because of their rapid growth and their value for other economic activities. Using a nested multi-level model, we examine how conditions at the country level and at the city level affect ICT clustering activity in 227 cities across 22 European countries. We test for the influence of three country regulations (starting a business, registering property, enforcing contracts) and two city conditions (proximity to university, network density) on ICT clustering. We consider heterogeneity within the sector and study two types of ICT activities: ICT product firms and ICT content firms. Our results indicate that country conditions and city conditions each have idiosyncratic implications for ICT clustering, and further, that these can vary by activities in ICT products or ICT content manufacturing.
Resumo:
Clustering methods are increasingly being applied to residential smart meter data, providing a number of important opportunities for distribution network operators (DNOs) to manage and plan the low voltage networks. Clustering has a number of potential advantages for DNOs including, identifying suitable candidates for demand response and improving energy profile modelling. However, due to the high stochasticity and irregularity of household level demand, detailed analytics are required to define appropriate attributes to cluster. In this paper we present in-depth analysis of customer smart meter data to better understand peak demand and major sources of variability in their behaviour. We find four key time periods in which the data should be analysed and use this to form relevant attributes for our clustering. We present a finite mixture model based clustering where we discover 10 distinct behaviour groups describing customers based on their demand and their variability. Finally, using an existing bootstrapping technique we show that the clustering is reliable. To the authors knowledge this is the first time in the power systems literature that the sample robustness of the clustering has been tested.
Resumo:
Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.
Resumo:
The benefits of breastfeeding for the children`s health have been highlighted in many studies. The innovative aspect of the present study lies in its use of a multilevel model, a technique that has rarely been applied to studies on breastfeeding. The data reported were collected from a larger study, the Family Budget Survey-Pesquisa de Orcamentos Familiares, carried out between 2002 and 2003 in Brazil that involved a sample of 48 470 households. A representative national sample of 1477 infants aged 0-6 months was used. The statistical analysis was performed using a multilevel model, with two levels grouped by region. In Brazil, breastfeeding prevalence was 58%. The factors that bore a negative influence on breastfeeding were over four residents living in the same household [odds ratio (OR) = 0.68, 90% confidence interval (CI) = 0.51-0.89] and mothers aged 30 years or more (OR = 0.68, 90% CI = 0.53-0.89). The factors that positively influenced breastfeeding were the following: higher socio-economic levels (OR = 1.37, 90% CI = 1.01-1.88), families with over two infants under 5 years (OR = 1.25, 90% CI = 1.00-1.58) and being a resident in rural areas (OR = 1.25, 90% CI = 1.00-1.58). Although majority of the mothers was aware of the value of maternal milk and breastfed their babies, the prevalence of breastfeeding remains lower than the rate advised by the World Health Organization, and the number of residents living in the same household along with mothers aged 30 years or older were both factors associated with early cessation of infant breastfeeding before 6 months.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.
Resumo:
Differently from theoretical scale-free networks, most real networks present multi-scale behavior, with nodes structured in different types of functional groups and communities. While the majority of approaches for classification of nodes in a complex network has relied on local measurements of the topology/connectivity around each node, valuable information about node functionality can be obtained by concentric (or hierarchical) measurements. This paper extends previous methodologies based on concentric measurements, by studying the possibility of using agglomerative clustering methods, in order to obtain a set of functional groups of nodes, considering particular institutional collaboration network nodes, including various known communities (departments of the University of Sao Paulo). Among the interesting obtained findings, we emphasize the scale-free nature of the network obtained, as well as identification of different patterns of authorship emerging from different areas (e.g. human and exact sciences). Another interesting result concerns the relatively uniform distribution of hubs along concentric levels, contrariwise to the non-uniform pattern found in theoretical scale-free networks such as the BA model. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Hierarchical assemblies of CaMoO4 (CM) nano-octahedrons were obtained by microwave-assisted hydrothemial synthesis at 120 degrees C for different times. These structures were structurally, morphologically and optically characterized by X-ray diffraction, micro-Raman spectroscopy, field-emission gun scanning electron microscopy, ultraviolet-visible absorption spectroscopy and photoluminescence measurements. First-principle calculations have been carried out to understand the structural and electronic order-disorder effects as a function of the particle/region size. Supercells of different dimensions were constructed to simulate the geometric distortions along both they and z planes of the scheelite structure. Based on these experimental results and with the help of detailed structural simulations, we were able to model the nature of the order-disorder in this important class of materials and discuss the consequent implications on its physical properties, in particular, the photoluminescence properties of CM nanocrystals.
Resumo:
There is a need of scientific evidence of claimed nutraceutical effects, but also there is a social movement towards the use of natural products and among them algae are seen as rich resources. Within this scenario, the development of methodology for rapid and reliable assessment of markers of efficiency and security of these extracts is necessary. The rat treated with streptozotocin has been proposed as the most appropriate model of systemic oxidative stress for studying antioxidant therapies. Cystoseira is a brown alga containing fucoxanthin and other carothenes whose pressure-assisted extracts were assayed to discover a possible beneficial effect on complications related to diabetes evolution in an acute but short-term model. Urine was selected as the sample and CE-TOF-MS as the analytical technique to obtain the fingerprints in a non-target metabolomic approach. Multivariate data analysis revealed a good clustering of the groups and permitted the putative assignment of compounds statistically significant in the classification. Interestingly a group of compounds associated to lysine glycation and cleavage from proteins was found to be increased in diabetic animals receiving vehicle as compared to control animals receiving vehicle (N6, N6, N6-trimethyl-L-lysine, N-methylnicotinamide, galactosylhydroxylysine, L-carnitine, N6-acetyl-N6-hydroxylysine, fructose-lysine, pipecolic acid, urocanic acid, amino-isobutanoate, formylisoglutamine. Fructoselysine significantly decreased after the treatment changing from a 24% increase to a 19% decrease. CE-MS fingerprinting of urine has provided a group of compounds different to those detected with other techniques and therefore proves the necessity of a cross-platform analysis to obtain a broad view of biological samples.
Resumo:
Background: Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike’s information criterion using h-likelihood to select the best fitting model. Methods: We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike’s information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Results: Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike’s information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. Conclusion: The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Resumo:
We present the hglm package for fitting hierarchical generalized linear models. It can be used for linear mixed models and generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects. Fixed effects can also be fitted in the dispersion part of the model.
Resumo:
Background: The sensitivity to microenvironmental changes varies among animals and may be under genetic control. It is essential to take this element into account when aiming at breeding robust farm animals. Here, linear mixed models with genetic effects in the residual variance part of the model can be used. Such models have previously been fitted using EM and MCMC algorithms. Results: We propose the use of double hierarchical generalized linear models (DHGLM), where the squared residuals are assumed to be gamma distributed and the residual variance is fitted using a generalized linear model. The algorithm iterates between two sets of mixed model equations, one on the level of observations and one on the level of variances. The method was validated using simulations and also by re-analyzing a data set on pig litter size that was previously analyzed using a Bayesian approach. The pig litter size data contained 10,060 records from 4,149 sows. The DHGLM was implemented using the ASReml software and the algorithm converged within three minutes on a Linux server. The estimates were similar to those previously obtained using Bayesian methodology, especially the variance components in the residual variance part of the model. Conclusions: We have shown that variance components in the residual variance part of a linear mixed model can be estimated using a DHGLM approach. The method enables analyses of animal models with large numbers of observations. An important future development of the DHGLM methodology is to include the genetic correlation between the random effects in the mean and residual variance parts of the model as a parameter of the DHGLM.
Resumo:
The development of strategies for structural health monitoring (SHM) has become increasingly important because of the necessity of preventing undesirable damage. This paper describes an approach to this problem using vibration data. It involves a three-stage process: reduction of the time-series data using principle component analysis (PCA), the development of a data-based model using an auto-regressive moving average (ARMA) model using data from an undamaged structure, and the classification of whether or not the structure is damaged using a fuzzy clustering approach. The approach is applied to data from a benchmark structure from Los Alamos National Laboratory, USA. Two fuzzy clustering algorithms are compared: fuzzy c-means (FCM) and Gustafson-Kessel (GK) algorithms. It is shown that while both fuzzy clustering algorithms are effective, the GK algorithm marginally outperforms the FCM algorithm. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)