853 resultados para Unsupervised clustering
Resumo:
A great part of the interest in complex networks has been motivated by the presence of structured, frequently nonuniform, connectivity. Because diverse connectivity patterns tend to result in distinct network dynamics, and also because they provide the means to identify and classify several types of complex network, it becomes important to obtain meaningful measurements of the local network topology. In addition to traditional features such as the node degree, clustering coefficient, and shortest path, motifs have been introduced in the literature in order to provide complementary descriptions of the network connectivity. The current work proposes a different type of motif, namely, chains of nodes, that is, sequences of connected nodes with degree 2. These chains have been subdivided into cords, tails, rings, and handles, depending on the type of their extremities (e.g., open or connected). A theoretical analysis of the density of such motifs in random and scale-free networks is described, and an algorithm for identifying these motifs in general networks is presented. The potential of considering chains for network characterization has been illustrated with respect to five categories of real-world networks including 16 cases. Several interesting findings were obtained, including the fact that several chains were observed in real-world networks, especially the world wide web, books, and the power grid. The possibility of chains resulting from incompletely sampled networks is also investigated.
Resumo:
We numerically study the dynamics of a discrete spring-block model introduced by Olami, Feder, and Christensen (OFC) to mimic earthquakes and investigate to what extent this simple model is able to reproduce the observed spatiotemporal clustering of seismicity. Following a recently proposed method to characterize such clustering by networks of recurrent events [J. Davidsen, P. Grassberger, and M. Paczuski, Geophys. Res. Lett. 33, L11304 (2006)], we find that for synthetic catalogs generated by the OFC model these networks have many nontrivial statistical properties. This includes characteristic degree distributions, very similar to what has been observed for real seismicity. There are, however, also significant differences between the OFC model and earthquake catalogs, indicating that this simple model is insufficient to account for certain aspects of the spatiotemporal clustering of seismicity.
Resumo:
Large-scale cortical networks exhibit characteristic topological properties that shape communication between brain regions and global cortical dynamics. Analysis of complex networks allows the description of connectedness, distance, clustering, and centrality that reveal different aspects of how the network's nodes communicate. Here, we focus on a novel analysis of complex walks in a series of mammalian cortical networks that model potential dynamics of information flow between individual brain regions. We introduce two new measures called absorption and driftness. Absorption is the average length of random walks between any two nodes, and takes into account all paths that may diffuse activity throughout the network. Driftness is the ratio between absorption and the corresponding shortest path length. For a given node of the network, we also define four related measurements, namely in-and out-absorption as well as in-and out-driftness, as the averages of the corresponding measures from all nodes to that node, and from that node to all nodes, respectively. We find that the cat thalamo-cortical system incorporates features of two classic network topologies, Erdos-Renyi graphs with respect to in-absorption and in-driftness, and configuration models with respect to out-absorption and out-driftness. Moreover, taken together these four measures separate the network nodes based on broad functional roles (visual, auditory, somatomotor, and frontolimbic).
Resumo:
Biological neuronal networks constitute a special class of dynamical systems, as they are formed by individual geometrical components, namely the neurons. In the existing literature, relatively little attention has been given to the influence of neuron shape on the overall connectivity and dynamics of the emerging networks. The current work addresses this issue by considering simplified neuronal shapes consisting of circular regions (soma/axons) with spokes (dendrites). Networks are grown by placing these patterns randomly in the two-dimensional (2D) plane and establishing connections whenever a piece of dendrite falls inside an axon. Several topological and dynamical properties of the resulting graph are measured, including the degree distribution, clustering coefficients, symmetry of connections, size of the largest connected component, as well as three hierarchical measurements of the local topology. By varying the number of processes of the individual basic patterns, we can quantify relationships between the individual neuronal shape and the topological and dynamical features of the networks. Integrate-and-fire dynamics on these networks is also investigated with respect to transient activation from a source node, indicating that long-range connections play an important role in the propagation of avalanches.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
The k(0)-method instrumental neutron activation analysis (k(0)-INAA) was employed for determining chemical elements in bird feathers. A collection was obtained taking into account several bird species from wet ecosystems in diverse regions of Brazil. For comparison reason, feathers were actively sampled in a riparian forest from the Marins Stream, Piracicaba, Sao Paulo State, using mist nets specific for capturing birds. Biological certified reference materials were used for assessing the quality of analytical procedure. Quantification of chemical elements was performed using the k(0)-INAA Quantu Software. Sixteen chemical elements, including macro and micronutrients, and trace elements, have been quantified in feathers, in which analytical uncertainties varied from 2% to 40% depending on the chemical element mass fraction. Results indicated high mass fractions of Br (max=7.9 mgkg(-1)), Co (max= 0.47 mg kg(-1)), Cr (max =68 mg kg(-1)), Hg (max =2.79 mg kg(-1)), Sb (max= 0.20 mg kg(-1)), Se (max=1.3 mg kg(-1)) and Zn (max =192 mg kg(-1)) in bird feathers, probably associated with the degree of pollution of the areas evaluated. In order to corroborate the use of k(0)-INAA results in biomonitoring studies using avian community, different factor analysis methods were used to check chemical element source apportionment and locality clustering based on feather chemical composition. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Various molecular systems are available for epidemiological, genetic, evolutionary, taxonomic and systematic studies of innumerable fungal infections, especially those caused by the opportunistic pathogen C. albicans. A total of 75 independent oral isolates were selected in order to compare Multilocus Enzyme Electrophoresis (MLEE), Electrophoretic Karyotyping (EK) and Microsatellite Markers (Simple Sequence Repeats - SSRs), in their abilities to differentiate and group C. albicans isolates (discriminatory power), and also, to evaluate the concordance and similarity of the groups of strains determined by cluster analysis for each fingerprinting method. Isoenzyme typing was performed using eleven enzyme systems: Adh, Sdh, M1p, Mdh, Idh, Gdh, G6pdh, Asd, Cat, Po, and Lap (data previously published). The EK method consisted of chromosomal DNA separation by pulsed-field gel electrophoresis using a CHEF system. The microsatellite markers were investigated by PCR using three polymorphic loci: EF3, CDC3, and HIS3. Dendrograms were generated by the SAHN method and UPGMA algorithm based on similarity matrices (S(SM)). The discriminatory power of the three methods was over 95%, however a paired analysis among them showed a parity of 19.7-22.4% in the identification of strains. Weak correlation was also observed among the genetic similarity matrices (S(SM)(MLEE) x S(SM)(EK) x S(SM)(SSRs)). Clustering analyses showed a mean of 9 +/- 12.4 isolates per cluster (3.8 +/- 8 isolates/taxon) for MLEE, 6.2 +/- 4.9 isolates per cluster (4 +/- 4.5 isolates/taxon) for SSRs, and 4.1 +/- 2.3 isolates per cluster (2.6 +/- 2.3 isolates/taxon) for EK. A total of 45 (13%), 39(11.2%), 5 (1.4%) and 3 (0.9%) clusters pairs from 347 showed similarity (Si) of 0.1-10%, 10.1-20%, 20.1-30% and 30.1-40%, respectively. Clinical and molecular epidemiological correlation involving the opportunistic pathogen C. albicans may be attributed dependently of each method of genotyping (i.e., MLEE, EK, and SSRs) supplemented with similarity and grouping analysis. Therefore, the use of genotyping systems that give results which offer minimum disparity, or the combination of the results of these systems, can provide greater security and consistency in the determination of strains and their genetic relationships. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
This paper analyses the presence of financial constraint in the investment decisions of 367 Brazilian firms from 1997 to 2004, using a Bayesian econometric model with group-varying parameters. The motivation for this paper is the use of clustering techniques to group firms in a totally endogenous form. In order to classify the firms we used a hybrid clustering method, that is, hierarchical and non-hierarchical clustering techniques jointly. To estimate the parameters a Bayesian approach was considered. Prior distributions were assumed for the parameters, classifying the model in random or fixed effects. Ordinate predictive density criterion was used to select the model providing a better prediction. We tested thirty models and the better prediction considers the presence of 2 groups in the sample, assuming the fixed effect model with a Student t distribution with 20 degrees of freedom for the error. The results indicate robustness in the identification of financial constraint when the firms are classified by the clustering techniques. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we propose an approach to the transient and steady-state analysis of the affine combination of one fast and one slow adaptive filters. The theoretical models are based on expressions for the excess mean-square error (EMSE) and cross-EMSE of the component filters, which allows their application to different combinations of algorithms, such as least mean-squares (LMS), normalized LMS (NLMS), and constant modulus algorithm (CMA), considering white or colored inputs and stationary or nonstationary environments. Since the desired universal behavior of the combination depends on the correct estimation of the mixing parameter at every instant, its adaptation is also taken into account in the transient analysis. Furthermore, we propose normalized algorithms for the adaptation of the mixing parameter that exhibit good performance. Good agreement between analysis and simulation results is always observed.
Resumo:
The canonical representation of speech constitutes a perfect reconstruction (PR) analysis-synthesis system. Its parameters are the autoregressive (AR) model coefficients, the pitch period and the voiced and unvoiced components of the excitation represented as transform coefficients. Each set of parameters may be operated on independently. A time-frequency unvoiced excitation (TFUNEX) model is proposed that has high time resolution and selective frequency resolution. Improved time-frequency fit is obtained by using for antialiasing cancellation the clustering of pitch-synchronous transform tracks defined in the modulation transform domain. The TFUNEX model delivers high-quality speech while compressing the unvoiced excitation representation about 13 times over its raw transform coefficient representation for wideband speech.
Resumo:
We derive an easy-to-compute approximate bound for the range of step-sizes for which the constant-modulus algorithm (CMA) will remain stable if initialized close to a minimum of the CM cost function. Our model highlights the influence, of the signal constellation used in the transmission system: for smaller variation in the modulus of the transmitted symbols, the algorithm will be more robust, and the steady-state misadjustment will be smaller. The theoretical results are validated through several simulations, for long and short filters and channels.
Resumo:
As is well known, Hessian-based adaptive filters (such as the recursive-least squares algorithm (RLS) for supervised adaptive filtering, or the Shalvi-Weinstein algorithm (SWA) for blind equalization) converge much faster than gradient-based algorithms [such as the least-mean-squares algorithm (LMS) or the constant-modulus algorithm (CMA)]. However, when the problem is tracking a time-variant filter, the issue is not so clear-cut: there are environments for which each family presents better performance. Given this, we propose the use of a convex combination of algorithms of different families to obtain an algorithm with superior tracking capability. We show the potential of this combination and provide a unified theoretical model for the steady-state excess mean-square error for convex combinations of gradient- and Hessian-based algorithms, assuming a random-walk model for the parameter variations. The proposed model is valid for algorithms of the same or different families, and for supervised (LMS and RLS) or blind (CMA and SWA) algorithms.
Resumo:
We study the spreading of contagious diseases in a population of constant size using susceptible-infective-recovered (SIR) models described in terms of ordinary differential equations (ODEs) and probabilistic cellular automata (PCA). In the PCA model, each individual (represented by a cell in the lattice) is mainly locally connected to others. We investigate how the topological properties of the random network representing contacts among individuals influence the transient behavior and the permanent regime of the epidemiological system described by ODE and PCA. Our main conclusions are: (1) the basic reproduction number (commonly called R(0)) related to a disease propagation in a population cannot be uniquely determined from some features of transient behavior of the infective group; (2) R(0) cannot be associated to a unique combination of clustering coefficient and average shortest path length characterizing the contact network. We discuss how these results can embarrass the specification of control strategies for combating disease propagations. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
This paper presents the design and implementation of an embedded soft sensor, i. e., a generic and autonomous hardware module, which can be applied to many complex plants, wherein a certain variable cannot be directly measured. It is implemented based on a fuzzy identification algorithm called ""Limited Rules"", employed to model continuous nonlinear processes. The fuzzy model has a Takagi-Sugeno-Kang structure and the premise parameters are defined based on the Fuzzy C-Means (FCM) clustering algorithm. The firmware contains the soft sensor and it runs online, estimating the target variable from other available variables. Tests have been performed using a simulated pH neutralization plant. The results of the embedded soft sensor have been considered satisfactory. A complete embedded inferential control system is also presented, including a soft sensor and a PID controller. (c) 2007, ISA. Published by Elsevier Ltd. All rights reserved.