24 resultados para Networks analysis
Resumo:
Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. Furthermore, in less complex texts the distance between concepts, represented as nodes, tended to decrease. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability. Copyright (c) EPLA, 2012
Resumo:
As ligações e interações propiciadas pelas redes sociais permitem compreender como ocorrem os fluxos de informação entre indivíduos e instituições que unem esforços na busca de metas comuns. O artigo apresenta aspectos conceituais sobre redes e redes sociais ressaltando que a estrutura e as relações de interação e intermediação entre os elos da rede impulsionam mudanças nos fluxos de informação. Descreve a metodologia de Análise de Redes Sociais (ARS) sinalizando como esta pode ser utilizada na área da Ciência da Informação para compreender os fluxos de informação que se configuram e re-configuram nas redes sociais a partir da estrutura de relacionamento
Resumo:
O estudo da dinâmica de constituição de uma rede visa identificar que tipos de eventos ocorreram nas conexões entre os nós que levaram a formação da estrutura atual da rede em análise. Entender esses eventos é entender as formas específicas e estratégias de conectividade que deram origem a rede. O presente trabalho tem por objetivo analisar esses eventos geradores com foco específico em redes de colaboração científica, considerando relações de coautoria e participação em bancas de defesas de teses e dissertações. Analisando mais de 11.000 documentos específicos da área das Ciências da Comunicação, propomos dois tipos característicos de eventos que pretendem explicar a dinâmica de formação das redes em análise.
Resumo:
Visual analysis of social networks is usually based on graph drawing algorithms and tools. However, social networks are a special kind of graph in the sense that interpretation of displayed relationships is heavily dependent on context. Context, in its turn, is given by attributes associated with graph elements, such as individual nodes, edges, and groups of edges, as well as by the nature of the connections between individuals. In most systems, attributes of individuals and communities are not taken into consideration during graph layout, except to derive weights for force-based placement strategies. This paper proposes a set of novel tools for displaying and exploring social networks based on attribute and connectivity mappings. These properties are employed to layout nodes on the plane via multidimensional projection techniques. For the attribute mapping, we show that node proximity in the layout corresponds to similarity in attribute, leading to easiness in locating similar groups of nodes. The projection based on connectivity yields an initial placement that forgoes force-based or graph analysis algorithm, reaching a meaningful layout in one pass. When a force algorithm is then applied to this initial mapping, the final layout presents better properties than conventional force-based approaches. Numerical evaluations show a number of advantages of pre-mapping points via projections. User evaluation demonstrates that these tools promote ease of manipulation as well as fast identification of concepts and associations which cannot be easily expressed by conventional graph visualization alone. In order to allow better space usage for complex networks, a graph mapping on the surface of a sphere is also implemented.
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
Data visualization techniques are powerful in the handling and analysis of multivariate systems. One such technique known as parallel coordinates was used to support the diagnosis of an event, detected by a neural network-based monitoring system, in a boiler at a Brazilian Kraft pulp mill. Its attractiveness is the possibility of the visualization of several variables simultaneously. The diagnostic procedure was carried out step-by-step going through exploratory, explanatory, confirmatory, and communicative goals. This tool allowed the visualization of the boiler dynamics in an easier way, compared to commonly used univariate trend plots. In addition it facilitated analysis of other aspects, namely relationships among process variables, distinct modes of operation and discrepant data. The whole analysis revealed firstly that the period involving the detected event was associated with a transition between two distinct normal modes of operation, and secondly the presence of unusual changes in process variables at this time.
Resumo:
The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. Methods: We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. Results: For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. Conclusions: From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.
Resumo:
As the available public cerebral gene expression image data increasingly grows, the demand for automated methods to analyze such large amount of data also increases. An important study that can be carried out on these data is related to the spatial relationship between gene expressions. Similar spatial density distribution of expression between genes may indicate they are functionally correlated, thus the identification of these similarities is useful in suggesting directions of investigation to discover gene interactions and their correlated functions. In this paper, we describe the use of a high-throughput methodology based on Voronoi diagrams to automatically analyze and search for possible local spatial density relationships between gene expression images. We tested this method using mouse brain section images from the Allen Mouse Brain Atlas public database. This methodology provided measurements able to characterize the similarity of the density distribution between gene expressions and allowed the visualization of the results through networks and Principal Component Analysis (PCA). These visualizations are useful to analyze the similarity level between gene expression patterns, as well as to compare connection patterns between region networks. Some genes were found to have the same type of function and to be near each other in the PCA visualizations. These results suggest cerebral density correlations between gene expressions that could be further explored. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Semi-supervised learning techniques have gained increasing attention in the machine learning community, as a result of two main factors: (1) the available data is exponentially increasing; (2) the task of data labeling is cumbersome and expensive, involving human experts in the process. In this paper, we propose a network-based semi-supervised learning method inspired by the modularity greedy algorithm, which was originally applied for unsupervised learning. Changes have been made in the process of modularity maximization in a way to adapt the model to propagate labels throughout the network. Furthermore, a network reduction technique is introduced, as well as an extensive analysis of its impact on the network. Computer simulations are performed for artificial and real-world databases, providing a numerical quantitative basis for the performance of the proposed method.
Resumo:
Two novel coordination polymers with the formula {[Ln(2)(2,5-tdc)(3)(dmso)(2)].H2O}(n) (Ln = Tb(III) for (1) and Dy(III) for (2)), (2,5-tdc(2-) = 2,5-thiophenedicarboxylate and dmso = dimethylsulfoxide) have been synthesized by the diffusion method and characterized by thermal analysis, vibrational spectroscopy and single crystal X-ray diffraction analysis. Structure analysis reveals that 2,5-tdc(2-) play a versatile role toward different lanthanide ions to form three-dimensional metal-organic frameworks (MOFs) in which the lanthanides ions are heptacoordinated. Photophysical properties were studied using excitation and emission spectra, where the photoluminescence data show the high emission intensity of the characteristic transitions D-5(4 ->) F-7(J) (J= 6, 5, 4 and 3) for (1) and (F9/2 -> HJ)-F-4-H-6 (J = 15/2, 13/2 and 11/2) for (2), indicating that 2,5-tdc(2-) is a good sensitizer. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The use of statistical methods to analyze large databases of text has been useful in unveiling patterns of human behavior and establishing historical links between cultures and languages. In this study, we identified literary movements by treating books published from 1590 to 1922 as complex networks, whose metrics were analyzed with multivariate techniques to generate six clusters of books. The latter correspond to time periods coinciding with relevant literary movements over the last five centuries. The most important factor contributing to the distinctions between different literary styles was the average shortest path length, in particular the asymmetry of its distribution. Furthermore, over time there has emerged a trend toward larger average shortest path lengths, which is correlated with increased syntactic complexity, and a more uniform use of the words reflected in a smaller power-law coefficient for the distribution of word frequency. Changes in literary style were also found to be driven by opposition to earlier writing styles, as revealed by the analysis performed with geometrical concepts. The approaches adopted here are generic and may be extended to analyze a number of features of languages and cultures.
Resumo:
In this paper, a novel method for power quality signal decomposition is proposed based on Independent Component Analysis (ICA). This method aims to decompose the power system signal (voltage or current) into components that can provide more specific information about the different disturbances which are occurring simultaneously during a multiple disturbance situation. The ICA is originally a multichannel technique. However, the method proposes its use to blindly separate out disturbances existing in a single measured signal (single channel). Therefore, a preprocessing step for the ICA is proposed using a filter bank. The proposed method was applied to synthetic data, simulated data, as well as actual power system signals, showing a very good performance. A comparison with the decomposition provided by the Discrete Wavelet Transform shows that the proposed method presented better decoupling for the analyzed data. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
In this paper we have quantified the consistency of word usage in written texts represented by complex networks, where words were taken as nodes, by measuring the degree of preservation of the node neighborhood. Words were considered highly consistent if the authors used them with the same neighborhood. When ranked according to the consistency of use, the words obeyed a log-normal distribution, in contrast to Zipf's law that applies to the frequency of use. Consistency correlated positively with the familiarity and frequency of use, and negatively with ambiguity and age of acquisition. An inspection of some highly consistent words confirmed that they are used in very limited semantic contexts. A comparison of consistency indices for eight authors indicated that these indices may be employed for author recognition. Indeed, as expected, authors of novels could be distinguished from those who wrote scientific texts. Our analysis demonstrated the suitability of the consistency indices, which can now be applied in other tasks, such as emotion recognition.
Resumo:
Congenital heart disease (CHD) occurs in similar to 1% of newborns. CHD arises from many distinct etiologies, ranging from genetic or genomic variation to exposure to teratogens, which elicit diverse cell and molecular responses during cardiac development. To systematically explore the relationships between CHD risk factors and responses, we compiled and integrated comprehensive datasets from studies of CHD in humans and model organisms. We examined two alternative models of potential functional relationships between genes in these datasets: direct convergence, in which CHD risk factors significantly and directly impact the same genes and molecules and functional convergence, in which risk factors significantly impact different molecules that participate in a discrete heart development network. We observed no evidence for direct convergence. In contrast, we show that CHD risk factors functionally converge in protein networks driving the development of specific anatomical structures (e.g., outflow tract, ventricular septum, and atrial septum) that are malformed by CHD. This integrative analysis of CHD risk factors and responses suggests a complex pattern of functional interactions between genomic variation and environmental exposures that modulate critical biological systems during heart development.