951 resultados para Complex networks. Magnetic system. Metropolis
Resumo:
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
This work introduces the phenomenon of Collective Almost Synchronisation (CAS), which describes a universal way of how patterns can appear in complex networks for small coupling strengths. The CAS phenomenon appears due to the existence of an approximately constant local mean field and is characterised by having nodes with trajectories evolving around periodic stable orbits. Common notion based on statistical knowledge would lead one to interpret the appearance of a local constant mean field as a consequence of the fact that the behaviour of each node is not correlated to the behaviours of the others. Contrary to this common notion, we show that various well known weaker forms of synchronisation (almost, time-lag, phase synchronisation, and generalised synchronisation) appear as a result of the onset of an almost constant local mean field. If the memory is formed in a brain by minimising the coupling strength among neurons and maximising the number of possible patterns, then the CAS phenomenon is a plausible explanation for it.
Resumo:
Semi-supervised learning techniques have gained increasing attention in the machine learning community, as a result of two main factors: (1) the available data is exponentially increasing; (2) the task of data labeling is cumbersome and expensive, involving human experts in the process. In this paper, we propose a network-based semi-supervised learning method inspired by the modularity greedy algorithm, which was originally applied for unsupervised learning. Changes have been made in the process of modularity maximization in a way to adapt the model to propagate labels throughout the network. Furthermore, a network reduction technique is introduced, as well as an extensive analysis of its impact on the network. Computer simulations are performed for artificial and real-world databases, providing a numerical quantitative basis for the performance of the proposed method.
Resumo:
The use of statistical methods to analyze large databases of text has been useful in unveiling patterns of human behavior and establishing historical links between cultures and languages. In this study, we identified literary movements by treating books published from 1590 to 1922 as complex networks, whose metrics were analyzed with multivariate techniques to generate six clusters of books. The latter correspond to time periods coinciding with relevant literary movements over the last five centuries. The most important factor contributing to the distinctions between different literary styles was the average shortest path length, in particular the asymmetry of its distribution. Furthermore, over time there has emerged a trend toward larger average shortest path lengths, which is correlated with increased syntactic complexity, and a more uniform use of the words reflected in a smaller power-law coefficient for the distribution of word frequency. Changes in literary style were also found to be driven by opposition to earlier writing styles, as revealed by the analysis performed with geometrical concepts. The approaches adopted here are generic and may be extended to analyze a number of features of languages and cultures.
Resumo:
In this paper we have quantified the consistency of word usage in written texts represented by complex networks, where words were taken as nodes, by measuring the degree of preservation of the node neighborhood. Words were considered highly consistent if the authors used them with the same neighborhood. When ranked according to the consistency of use, the words obeyed a log-normal distribution, in contrast to Zipf's law that applies to the frequency of use. Consistency correlated positively with the familiarity and frequency of use, and negatively with ambiguity and age of acquisition. An inspection of some highly consistent words confirmed that they are used in very limited semantic contexts. A comparison of consistency indices for eight authors indicated that these indices may be employed for author recognition. Indeed, as expected, authors of novels could be distinguished from those who wrote scientific texts. Our analysis demonstrated the suitability of the consistency indices, which can now be applied in other tasks, such as emotion recognition.
Resumo:
The measurement called accessibility has been proposed as a means to quantify the efficiency of the communication between nodes in complex networks. This article reports results regarding the properties of accessibility, including its relationship with the average minimal time to visit all nodes reachable after h steps along a random walk starting from a source, as well as the number of nodes that are visited after a finite period of time. We characterize the relationship between accessibility and the average number of walks required in order to visit all reachable nodes (the exploration time), conjecture that the maximum accessibility implies the minimal exploration time, and confirm the relationship between the accessibility values and the number of nodes visited after a basic time unit. The latter relationship is investigated with respect to three types of dynamics: traditional random walks, self-avoiding random walks, and preferential random walks.
Resumo:
Competitive learning is an important machine learning approach which is widely employed in artificial neural networks. In this paper, we present a rigorous definition of a new type of competitive learning scheme realized on large-scale networks. The model consists of several particles walking within the network and competing with each other to occupy as many nodes as possible, while attempting to reject intruder particles. The particle's walking rule is composed of a stochastic combination of random and preferential movements. The model has been applied to solve community detection and data clustering problems. Computer simulations reveal that the proposed technique presents high precision of community and cluster detections, as well as low computational complexity. Moreover, we have developed an efficient method for estimating the most likely number of clusters by using an evaluator index that monitors the information generated by the competition process itself. We hope this paper will provide an alternative way to the study of competitive learning.
Resumo:
The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with regard to the authors or journals cited. In this paper, we assess approaches for citations considering two recommendations for authors to follow while preparing a manuscript: (i) consider similarity of contents with the topics investigated, lest related work should be reproduced or ignored; (ii) perform a systematic search over the network of citations including seminal or very related papers. We use formalisms of complex networks for two datasets of papers from the arXiv and the Web of Science repositories to show that neither of these two criteria is fulfilled in practice. By representing the texts as complex networks we estimated a similarity index between pieces of texts and found that the list of references did not contain the most similar papers in the dataset. This was quantified by calculating a consistency index, whose maximum value is one if the references in a given paper are the most similar in the dataset. For the areas of "complex networks" and "graphenes", the consistency index was only 0.11-0.23 and 0.10-0.25, respectively. To simulate a systematic search in the citation network, we employed a traditional random walk search (i.e. diffusion) and a random walk whose probabilities of transition are proportional to the number of the ingoing edges of the neighbours. The frequency of visits to the nodes (papers) in the network had a very small correlation with either the actual list of references in the papers or with the number of downloads from the arXiv repository. Therefore, apparently the authors and users of the repository did not follow the criterion related to a systematic search over the network of citations. Based on these results, we propose an approach that we believe is fairer for evaluating and complementing citations of a given author, effectively leading to a virtual scientometry.
Resumo:
The field of complex systems is a growing body of knowledge, It can be applied to countless different topics, from physics to computer science, biology, information theory and sociology. The main focus of this work is the use of microscopic models to study the behavior of urban mobility, which characteristics make it a paradigmatic example of complexity. In particular, simulations are used to investigate phase changes in a finite size open Manhattan-like urban road network under different traffic conditions, in search for the parameters to identify phase transitions, equilibrium and non-equilibrium conditions . It is shown how the flow-density macroscopic fundamental diagram of the simulation shows,like real traffic, hysteresis behavior in the transition from the congested phase to the free flow phase, and how the different regimes can be identified studying the statistics of road occupancy.
Resumo:
Many diseases have a genetic origin, and a great effort is being made to detect the genes that are responsible for their insurgence. One of the most promising techniques is the analysis of genetic information through the use of complex networks theory. Yet, a practical problem of this approach is its computational cost, which scales as the square of the number of features included in the initial dataset. In this paper, we propose the use of an iterative feature selection strategy to identify reduced subsets of relevant features, and show an application to the analysis of congenital Obstructive Nephropathy. Results demonstrate that, besides achieving a drastic reduction of the computational cost, the topologies of the obtained networks still hold all the relevant information, and are thus able to fully characterize the severity of the disease.
Resumo:
In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.
Resumo:
We propose a novel measure to assess the presence of meso-scale structures in complex networks. This measure is based on the identi?cation of regular patterns in the adjacency matrix of the network, and on the calculation of the quantity of information lost when pairs of nodes are iteratively merged. We show how this measure is able to quantify several meso-scale structures, like the presence of modularity, bipartite and core-periphery con?gurations, or motifs. Results corresponding to a large set of real networks are used to validate its ability to detect non-trivial topological patterns.
Resumo:
We propose a new measure to characterize the dimension of complex networks based on the ergodic theory of dynamical systems. This measure is derived from the correlation sum of a trajectory generated by a random walker navigating the network, and extends the classical Grassberger-Procaccia algorithm to the context of complex networks. The method is validated with reliable results for both synthetic networks and real-world networks such as the world air-transportation network or urban networks, and provides a computationally fast way for estimating the dimensionality of networks which only relies on the local information provided by the walkers.
Resumo:
In this paper structural controllability of complex networks is anyzed. A new algorithm is proposed which constructs a structural control scheme for a given network by avoiding the absence of dilations and by guaranteeing the accessibility of all nodes. Such accessibility is solved via a wiring procedure; this procedure, based on determining the non-accessible regions of the network, has been improved in this new proposed algorithm. This way, the number of dedicated controllers is reduced with respect to the one provided by previous existing algorithms.
Resumo:
In this paper new results on personalized PageRank are shown. We consider directed graphs that may contain dangling nodes. The main result presented gives an analytical characterization of all the possible values of the personalized PageRank for any node.We use this result to give a theoretical justification of a recent model that uses the personalized PageRank to classify users of Social Networks Sites. We introduce new concepts concerning competitivity and leadership in complex networks. We also present some theoretical techniques to locate leaders and competitors which are valid for any personalization vector and by using only information related to the adjacency matrix of the graph and the distribution of its dangling nodes.