942 resultados para networks text analysis text network graph Gephi network measures shuffed text Zipf Heap Python
Resumo:
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.
Resumo:
To have good data quality with high complexity is often seen to be important. Intuition says that the higher accuracy and complexity the data have the better the analytic solutions becomes if it is possible to handle the increasing computing time. However, for most of the practical computational problems, high complexity data means that computational times become too long or that heuristics used to solve the problem have difficulties to reach good solutions. This is even further stressed when the size of the combinatorial problem increases. Consequently, we often need a simplified data to deal with complex combinatorial problems. In this study we stress the question of how the complexity and accuracy in a network affect the quality of the heuristic solutions for different sizes of the combinatorial problem. We evaluate this question by applying the commonly used p-median model, which is used to find optimal locations in a network of p supply points that serve n demand points. To evaluate this, we vary both the accuracy (the number of nodes) of the network and the size of the combinatorial problem (p). The investigation is conducted by the means of a case study in a region in Sweden with an asymmetrically distributed population (15,000 weighted demand points), Dalecarlia. To locate 5 to 50 supply points we use the national transport administrations official road network (NVDB). The road network consists of 1.5 million nodes. To find the optimal location we start with 500 candidate nodes in the network and increase the number of candidate nodes in steps up to 67,000 (which is aggregated from the 1.5 million nodes). To find the optimal solution we use a simulated annealing algorithm with adaptive tuning of the temperature. The results show that there is a limited improvement in the optimal solutions when the accuracy in the road network increase and the combinatorial problem (low p) is simple. When the combinatorial problem is complex (large p) the improvements of increasing the accuracy in the road network are much larger. The results also show that choice of the best accuracy of the network depends on the complexity of the combinatorial (varying p) problem.
Resumo:
The idea for organizing a cooperative market on Waterville Main Street was proposed by Aime Schwartz in the fall of 2008. The Co-op would entail an open market located on Main Street to provide fresh, local produce and crafts to town locals. Through shorter delivery distances and agreements with local farmers, the co-op theoretically will offer consumers lower prices on produce than can be found in conventional grocery stores, as well as an opportunity to support local agriculture. One of the tasks involved with organizing the Co-op is to source all of the produce from among the hundreds of farmers located in Maine. The purpose of this project is to show how Geographic Information System (GIS) tools can be used to help the Co-op and other businesses a) site nearby farms that carry desired produce and products, and b) determine which farms are closest to the business site. Using GIS for this purpose will make it easier and more efficient to source produce suppliers, and reduce the workload on business planners. GIS Network Analyst is a tool that provides network-based spatial analysis, and can be used in conjunction with traditional GIS technologies to determine not only the geometric distance between points, but also distance over existing networks (like roads). We used Network Analyst to find the closest produce suppliers to the Co-op for specific produce items, and compute how far they are over existing roads. This will enable business planners to source potential suppliers by distance before contacting individual farmers, allowing for more efficient use of their time and a faster planning process.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. Furthermore, in less complex texts the distance between concepts, represented as nodes, tended to decrease. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability. Copyright (c) EPLA, 2012
Resumo:
This thesis presents Bayesian solutions to inference problems for three types of social network data structures: a single observation of a social network, repeated observations on the same social network, and repeated observations on a social network developing through time. A social network is conceived as being a structure consisting of actors and their social interaction with each other. A common conceptualisation of social networks is to let the actors be represented by nodes in a graph with edges between pairs of nodes that are relationally tied to each other according to some definition. Statistical analysis of social networks is to a large extent concerned with modelling of these relational ties, which lends itself to empirical evaluation. The first paper deals with a family of statistical models for social networks called exponential random graphs that takes various structural features of the network into account. In general, the likelihood functions of exponential random graphs are only known up to a constant of proportionality. A procedure for performing Bayesian inference using Markov chain Monte Carlo (MCMC) methods is presented. The algorithm consists of two basic steps, one in which an ordinary Metropolis-Hastings up-dating step is used, and another in which an importance sampling scheme is used to calculate the acceptance probability of the Metropolis-Hastings step. In paper number two a method for modelling reports given by actors (or other informants) on their social interaction with others is investigated in a Bayesian framework. The model contains two basic ingredients: the unknown network structure and functions that link this unknown network structure to the reports given by the actors. These functions take the form of probit link functions. An intrinsic problem is that the model is not identified, meaning that there are combinations of values on the unknown structure and the parameters in the probit link functions that are observationally equivalent. Instead of using restrictions for achieving identification, it is proposed that the different observationally equivalent combinations of parameters and unknown structure be investigated a posteriori. Estimation of parameters is carried out using Gibbs sampling with a switching devise that enables transitions between posterior modal regions. The main goal of the procedures is to provide tools for comparisons of different model specifications. Papers 3 and 4, propose Bayesian methods for longitudinal social networks. The premise of the models investigated is that overall change in social networks occurs as a consequence of sequences of incremental changes. Models for the evolution of social networks using continuos-time Markov chains are meant to capture these dynamics. Paper 3 presents an MCMC algorithm for exploring the posteriors of parameters for such Markov chains. More specifically, the unobserved evolution of the network in-between observations is explicitly modelled thereby avoiding the need to deal with explicit formulas for the transition probabilities. This enables likelihood based parameter inference in a wider class of network evolution models than has been available before. Paper 4 builds on the proposed inference procedure of Paper 3 and demonstrates how to perform model selection for a class of network evolution models.
Resumo:
The study of animal sociality investigates the immediate and long-term consequences that a social structure has on its group members. Typically, social behavior is observed from interactions between two individuals at the dyadic level. However, a new framework for studying social behavior has emerged that allows the researcher to assess social complexity at multiple scales. Social Network Analysis has been recently applied in the field of ethology, and this novel tool enables an approach of focusing on social behavior in context of the global network rather than limited to dyadic interactions. This new technique was applied to a group of captive hamadryas baboons (Papio hamadryas hamadryas) in order to assess how overall network topology of the social group changes over time with the decline of an aging leader male. Observations on aggressive, grooming, and proximity spatial interactions were collected from three separate years in order to serve as `snapshots¿ of the current state of the group. Data on social behavior were collected from the group when the male was in prime health, when the male was at an old age, and after the male¿s death. A set of metrics was obtained from each time period for each type of social behavior and quantified a change in the patterns of interactions. The results suggest that baboon social behavior varies across context, and changes with the attributes of its individual members. Possible mechanisms for adapting to a changing social environment were also explored.
Resumo:
Withdrawal reflexes of the mollusk Aplysia exhibit sensitization, a simple form of long-term memory (LTM). Sensitization is due, in part, to long-term facilitation (LTF) of sensorimotor neuron synapses. LTF is induced by the modulatory actions of serotonin (5-HT). Pettigrew et al. developed a computational model of the nonlinear intracellular signaling and gene network that underlies the induction of 5-HT-induced LTF. The model simulated empirical observations that repeated applications of 5-HT induce persistent activation of protein kinase A (PKA) and that this persistent activation requires a suprathreshold exposure of 5-HT. This study extends the analysis of the Pettigrew model by applying bifurcation analysis, singularity theory, and numerical simulation. Using singularity theory, classification diagrams of parameter space were constructed, identifying regions with qualitatively different steady-state behaviors. The graphical representation of these regions illustrates the robustness of these regions to changes in model parameters. Because persistent protein kinase A (PKA) activity correlates with Aplysia LTM, the analysis focuses on a positive feedback loop in the model that tends to maintain PKA activity. In this loop, PKA phosphorylates a transcription factor (TF-1), thereby increasing the expression of an ubiquitin hydrolase (Ap-Uch). Ap-Uch then acts to increase PKA activity, closing the loop. This positive feedback loop manifests multiple, coexisting steady states, or multiplicity, which provides a mechanism for a bistable switch in PKA activity. After the removal of 5-HT, the PKA activity either returns to its basal level (reversible switch) or remains at a high level (irreversible switch). Such an irreversible switch might be a mechanism that contributes to the persistence of LTM. The classification diagrams also identify parameters and processes that might be manipulated, perhaps pharmacologically, to enhance the induction of memory. Rational drug design, to affect complex processes such as memory formation, can benefit from this type of analysis.
Resumo:
Here, by the example of the transfer of cultivated plants in the context of the correspondence networks of Albrecht von Haller and the Economic Society, a multi-level network analysis is suggested. By a multi-level procedure, the chronological dynamics, the social structure, the spatial distribution and the functional networking are analyzed one after the other. These four levels of network analysis do not compete with each other but are mutually supporting. This aims at a deeper understanding of how these networks contributed to an international transfer of knowledge in the 18th century.
Resumo:
Objective To determine the comparative effectiveness and safety of current maintenance strategies in preventing exacerbations of asthma. Design Systematic review and network meta-analysis using Bayesian statistics. Data sources Cochrane systematic reviews on chronic asthma, complemented by an updated search when appropriate. Eligibility criteria Trials of adults with asthma randomised to maintenance treatments of at least 24 weeks duration and that reported on asthma exacerbations in full text. Low dose inhaled corticosteroid treatment was the comparator strategy. The primary effectiveness outcome was the rate of severe exacerbations. The secondary outcome was the composite of moderate or severe exacerbations. The rate of withdrawal was analysed as a safety outcome. Results 64 trials with 59 622 patient years of follow-up comparing 15 strategies and placebo were included. For prevention of severe exacerbations, combined inhaled corticosteroids and long acting β agonists as maintenance and reliever treatment and combined inhaled corticosteroids and long acting β agonists in a fixed daily dose performed equally well and were ranked first for effectiveness. The rate ratios compared with low dose inhaled corticosteroids were 0.44 (95% credible interval 0.29 to 0.66) and 0.51 (0.35 to 0.77), respectively. Other combined strategies were not superior to inhaled corticosteroids and all single drug treatments were inferior to single low dose inhaled corticosteroids. Safety was best for conventional best (guideline based) practice and combined maintenance and reliever therapy. Conclusions Strategies with combined inhaled corticosteroids and long acting β agonists are most effective and safe in preventing severe exacerbations of asthma, although some heterogeneity was observed in this network meta-analysis of full text reports.
Resumo:
Computational network analysis provides new methods to analyze the brain's structural organization based on diffusion imaging tractography data. Networks are characterized by global and local metrics that have recently given promising insights into diagnosis and the further understanding of psychiatric and neurologic disorders. Most of these metrics are based on the idea that information in a network flows along the shortest paths. In contrast to this notion, communicability is a broader measure of connectivity which assumes that information could flow along all possible paths between two nodes. In our work, the features of network metrics related to communicability were explored for the first time in the healthy structural brain network. In addition, the sensitivity of such metrics was analysed using simulated lesions to specific nodes and network connections. Results showed advantages of communicability over conventional metrics in detecting densely connected nodes as well as subsets of nodes vulnerable to lesions. In addition, communicability centrality was shown to be widely affected by the lesions and the changes were negatively correlated with the distance from lesion site. In summary, our analysis suggests that communicability metrics that may provide an insight into the integrative properties of the structural brain network and that these metrics may be useful for the analysis of brain networks in the presence of lesions. Nevertheless, the interpretation of communicability is not straightforward; hence these metrics should be used as a supplement to the more standard connectivity network metrics.
Resumo:
This study focused on the relationship between social network size (number of friends and relatives), perceived sufficiency of the network and self-rated health utilizing data from the National Survey of Personal Health Practices and Consequences, 1979. For men neither perceived sufficiency nor number of relatives were associated with self-rated health status. The number of friends was positively associated with health status. For women perceived network sufficiency was positively and significantly related to health status, independent of network size. The number of friends and relatives was not associated with self-rated health status. The sociodemographic variables accounted for most of the explained variance in health status for both males and females. Social networks may hold different meanings for women and men, and may require qualitative as well as quantitative analysis. There may have been insufficient variance in the major variables to produce meaningful results. ^
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^
Resumo:
The Biomolecular Interaction Network Database (BIND; http://binddb.org) is a database designed to store full descriptions of interactions, molecular complexes and pathways. Development of the BIND 2.0 data model has led to the incorporation of virtually all components of molecular mechanisms including interactions between any two molecules composed of proteins, nucleic acids and small molecules. Chemical reactions, photochemical activation and conformational changes can also be described. Everything from small molecule biochemistry to signal transduction is abstracted in such a way that graph theory methods may be applied for data mining. The database can be used to study networks of interactions, to map pathways across taxonomic branches and to generate information for kinetic simulations. BIND anticipates the coming large influx of interaction information from high-throughput proteomics efforts including detailed information about post-translational modifications from mass spectrometry. Version 2.0 of the BIND data model is discussed as well as implementation, content and the open nature of the BIND project. The BIND data specification is available as ASN.1 and XML DTD.