60 resultados para Capacitated clustering
Resumo:
A great part of the interest in complex networks has been motivated by the presence of structured, frequently nonuniform, connectivity. Because diverse connectivity patterns tend to result in distinct network dynamics, and also because they provide the means to identify and classify several types of complex network, it becomes important to obtain meaningful measurements of the local network topology. In addition to traditional features such as the node degree, clustering coefficient, and shortest path, motifs have been introduced in the literature in order to provide complementary descriptions of the network connectivity. The current work proposes a different type of motif, namely, chains of nodes, that is, sequences of connected nodes with degree 2. These chains have been subdivided into cords, tails, rings, and handles, depending on the type of their extremities (e.g., open or connected). A theoretical analysis of the density of such motifs in random and scale-free networks is described, and an algorithm for identifying these motifs in general networks is presented. The potential of considering chains for network characterization has been illustrated with respect to five categories of real-world networks including 16 cases. Several interesting findings were obtained, including the fact that several chains were observed in real-world networks, especially the world wide web, books, and the power grid. The possibility of chains resulting from incompletely sampled networks is also investigated.
Resumo:
We numerically study the dynamics of a discrete spring-block model introduced by Olami, Feder, and Christensen (OFC) to mimic earthquakes and investigate to what extent this simple model is able to reproduce the observed spatiotemporal clustering of seismicity. Following a recently proposed method to characterize such clustering by networks of recurrent events [J. Davidsen, P. Grassberger, and M. Paczuski, Geophys. Res. Lett. 33, L11304 (2006)], we find that for synthetic catalogs generated by the OFC model these networks have many nontrivial statistical properties. This includes characteristic degree distributions, very similar to what has been observed for real seismicity. There are, however, also significant differences between the OFC model and earthquake catalogs, indicating that this simple model is insufficient to account for certain aspects of the spatiotemporal clustering of seismicity.
Resumo:
Large-scale cortical networks exhibit characteristic topological properties that shape communication between brain regions and global cortical dynamics. Analysis of complex networks allows the description of connectedness, distance, clustering, and centrality that reveal different aspects of how the network's nodes communicate. Here, we focus on a novel analysis of complex walks in a series of mammalian cortical networks that model potential dynamics of information flow between individual brain regions. We introduce two new measures called absorption and driftness. Absorption is the average length of random walks between any two nodes, and takes into account all paths that may diffuse activity throughout the network. Driftness is the ratio between absorption and the corresponding shortest path length. For a given node of the network, we also define four related measurements, namely in-and out-absorption as well as in-and out-driftness, as the averages of the corresponding measures from all nodes to that node, and from that node to all nodes, respectively. We find that the cat thalamo-cortical system incorporates features of two classic network topologies, Erdos-Renyi graphs with respect to in-absorption and in-driftness, and configuration models with respect to out-absorption and out-driftness. Moreover, taken together these four measures separate the network nodes based on broad functional roles (visual, auditory, somatomotor, and frontolimbic).
Resumo:
Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.
Resumo:
Biological neuronal networks constitute a special class of dynamical systems, as they are formed by individual geometrical components, namely the neurons. In the existing literature, relatively little attention has been given to the influence of neuron shape on the overall connectivity and dynamics of the emerging networks. The current work addresses this issue by considering simplified neuronal shapes consisting of circular regions (soma/axons) with spokes (dendrites). Networks are grown by placing these patterns randomly in the two-dimensional (2D) plane and establishing connections whenever a piece of dendrite falls inside an axon. Several topological and dynamical properties of the resulting graph are measured, including the degree distribution, clustering coefficients, symmetry of connections, size of the largest connected component, as well as three hierarchical measurements of the local topology. By varying the number of processes of the individual basic patterns, we can quantify relationships between the individual neuronal shape and the topological and dynamical features of the networks. Integrate-and-fire dynamics on these networks is also investigated with respect to transient activation from a source node, indicating that long-range connections play an important role in the propagation of avalanches.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
The k(0)-method instrumental neutron activation analysis (k(0)-INAA) was employed for determining chemical elements in bird feathers. A collection was obtained taking into account several bird species from wet ecosystems in diverse regions of Brazil. For comparison reason, feathers were actively sampled in a riparian forest from the Marins Stream, Piracicaba, Sao Paulo State, using mist nets specific for capturing birds. Biological certified reference materials were used for assessing the quality of analytical procedure. Quantification of chemical elements was performed using the k(0)-INAA Quantu Software. Sixteen chemical elements, including macro and micronutrients, and trace elements, have been quantified in feathers, in which analytical uncertainties varied from 2% to 40% depending on the chemical element mass fraction. Results indicated high mass fractions of Br (max=7.9 mgkg(-1)), Co (max= 0.47 mg kg(-1)), Cr (max =68 mg kg(-1)), Hg (max =2.79 mg kg(-1)), Sb (max= 0.20 mg kg(-1)), Se (max=1.3 mg kg(-1)) and Zn (max =192 mg kg(-1)) in bird feathers, probably associated with the degree of pollution of the areas evaluated. In order to corroborate the use of k(0)-INAA results in biomonitoring studies using avian community, different factor analysis methods were used to check chemical element source apportionment and locality clustering based on feather chemical composition. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Various molecular systems are available for epidemiological, genetic, evolutionary, taxonomic and systematic studies of innumerable fungal infections, especially those caused by the opportunistic pathogen C. albicans. A total of 75 independent oral isolates were selected in order to compare Multilocus Enzyme Electrophoresis (MLEE), Electrophoretic Karyotyping (EK) and Microsatellite Markers (Simple Sequence Repeats - SSRs), in their abilities to differentiate and group C. albicans isolates (discriminatory power), and also, to evaluate the concordance and similarity of the groups of strains determined by cluster analysis for each fingerprinting method. Isoenzyme typing was performed using eleven enzyme systems: Adh, Sdh, M1p, Mdh, Idh, Gdh, G6pdh, Asd, Cat, Po, and Lap (data previously published). The EK method consisted of chromosomal DNA separation by pulsed-field gel electrophoresis using a CHEF system. The microsatellite markers were investigated by PCR using three polymorphic loci: EF3, CDC3, and HIS3. Dendrograms were generated by the SAHN method and UPGMA algorithm based on similarity matrices (S(SM)). The discriminatory power of the three methods was over 95%, however a paired analysis among them showed a parity of 19.7-22.4% in the identification of strains. Weak correlation was also observed among the genetic similarity matrices (S(SM)(MLEE) x S(SM)(EK) x S(SM)(SSRs)). Clustering analyses showed a mean of 9 +/- 12.4 isolates per cluster (3.8 +/- 8 isolates/taxon) for MLEE, 6.2 +/- 4.9 isolates per cluster (4 +/- 4.5 isolates/taxon) for SSRs, and 4.1 +/- 2.3 isolates per cluster (2.6 +/- 2.3 isolates/taxon) for EK. A total of 45 (13%), 39(11.2%), 5 (1.4%) and 3 (0.9%) clusters pairs from 347 showed similarity (Si) of 0.1-10%, 10.1-20%, 20.1-30% and 30.1-40%, respectively. Clinical and molecular epidemiological correlation involving the opportunistic pathogen C. albicans may be attributed dependently of each method of genotyping (i.e., MLEE, EK, and SSRs) supplemented with similarity and grouping analysis. Therefore, the use of genotyping systems that give results which offer minimum disparity, or the combination of the results of these systems, can provide greater security and consistency in the determination of strains and their genetic relationships. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.
Resumo:
The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
This paper analyses the presence of financial constraint in the investment decisions of 367 Brazilian firms from 1997 to 2004, using a Bayesian econometric model with group-varying parameters. The motivation for this paper is the use of clustering techniques to group firms in a totally endogenous form. In order to classify the firms we used a hybrid clustering method, that is, hierarchical and non-hierarchical clustering techniques jointly. To estimate the parameters a Bayesian approach was considered. Prior distributions were assumed for the parameters, classifying the model in random or fixed effects. Ordinate predictive density criterion was used to select the model providing a better prediction. We tested thirty models and the better prediction considers the presence of 2 groups in the sample, assuming the fixed effect model with a Student t distribution with 20 degrees of freedom for the error. The results indicate robustness in the identification of financial constraint when the firms are classified by the clustering techniques. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this paper, a framework for detection of human skin in digital images is proposed. This framework is composed of a training phase and a detection phase. A skin class model is learned during the training phase by processing several training images in a hybrid and incremental fuzzy learning scheme. This scheme combines unsupervised-and supervised-learning: unsupervised, by fuzzy clustering, to obtain clusters of color groups from training images; and supervised to select groups that represent skin color. At the end of the training phase, aggregation operators are used to provide combinations of selected groups into a skin model. In the detection phase, the learned skin model is used to detect human skin in an efficient way. Experimental results show robust and accurate human skin detection performed by the proposed framework.
Resumo:
The canonical representation of speech constitutes a perfect reconstruction (PR) analysis-synthesis system. Its parameters are the autoregressive (AR) model coefficients, the pitch period and the voiced and unvoiced components of the excitation represented as transform coefficients. Each set of parameters may be operated on independently. A time-frequency unvoiced excitation (TFUNEX) model is proposed that has high time resolution and selective frequency resolution. Improved time-frequency fit is obtained by using for antialiasing cancellation the clustering of pitch-synchronous transform tracks defined in the modulation transform domain. The TFUNEX model delivers high-quality speech while compressing the unvoiced excitation representation about 13 times over its raw transform coefficient representation for wideband speech.
Resumo:
We study the spreading of contagious diseases in a population of constant size using susceptible-infective-recovered (SIR) models described in terms of ordinary differential equations (ODEs) and probabilistic cellular automata (PCA). In the PCA model, each individual (represented by a cell in the lattice) is mainly locally connected to others. We investigate how the topological properties of the random network representing contacts among individuals influence the transient behavior and the permanent regime of the epidemiological system described by ODE and PCA. Our main conclusions are: (1) the basic reproduction number (commonly called R(0)) related to a disease propagation in a population cannot be uniquely determined from some features of transient behavior of the infective group; (2) R(0) cannot be associated to a unique combination of clustering coefficient and average shortest path length characterizing the contact network. We discuss how these results can embarrass the specification of control strategies for combating disease propagations. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
This paper presents the design and implementation of an embedded soft sensor, i. e., a generic and autonomous hardware module, which can be applied to many complex plants, wherein a certain variable cannot be directly measured. It is implemented based on a fuzzy identification algorithm called ""Limited Rules"", employed to model continuous nonlinear processes. The fuzzy model has a Takagi-Sugeno-Kang structure and the premise parameters are defined based on the Fuzzy C-Means (FCM) clustering algorithm. The firmware contains the soft sensor and it runs online, estimating the target variable from other available variables. Tests have been performed using a simulated pH neutralization plant. The results of the embedded soft sensor have been considered satisfactory. A complete embedded inferential control system is also presented, including a soft sensor and a PID controller. (c) 2007, ISA. Published by Elsevier Ltd. All rights reserved.