The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.
With the development of information technology, the theory and methodology of complex network has been introduced to the language research, which transforms the system of language in a complex networks composed of nodes and edges for the quantitative analysis about the language structure. The development of dependency grammar provides theoretical support for the construction of a treebank corpus, making possible a statistic analysis of complex networks. This paper introduces the theory and methodology of the complex network and builds dependency syntactic networks based on the treebank of speeches from the EEE-4 oral test. According to the analysis of the overall characteristics of the networks, including the number of edges, the number of the nodes, the average degree, the average path length, the network centrality and the degree distribution, it aims to find in the networks potential difference and similarity between various grades of speaking performance. Through clustering analysis, this research intends to prove the network parameters’ discriminating feature and provide potential reference for scoring speaking performance.
This thesis proposes a novel graphical model for inference called the Affinity Network,which displays the closeness between pairs of variables and is an alternative to Bayesian Networks and Dependency Networks. The Affinity Network shares some similarities with Bayesian Networks and Dependency Networks but avoids their heuristic and stochastic graph construction algorithms by using a message passing scheme. A comparison with the above two instances of graphical models is given for sparse discrete and continuous medical data and data taken from the UCI machine learning repository. The experimental study reveals that the Affinity Network graphs tend to be more accurate on the basis of an exhaustive search with the small datasets. Moreover, the graph construction algorithm is faster than the other two methods with huge datasets. The Affinity Network is also applied to data produced by a synchronised system. A detailed analysis and numerical investigation into this dynamical system is provided and it is shown that the Affinity Network can be used to characterise its emergent behaviour even in the presence of noise.
Les logiciels de correction grammaticale commettent parfois des détections illégitimes (fausses alertes), que nous appelons ici surdétections. La présente étude décrit les expériences de mise au point d’un système créé pour identifier et mettre en sourdine les surdétections produites par le correcteur du français conçu par la société Druide informatique. Plusieurs classificateurs ont été entraînés de manière supervisée sur 14 types de détections faites par le correcteur, en employant des traits couvrant di-verses informations linguistiques (dépendances et catégories syntaxiques, exploration du contexte des mots, etc.) extraites de phrases avec et sans surdétections. Huit des 14 classificateurs développés sont maintenant intégrés à la nouvelle version d’un correcteur commercial très populaire. Nos expériences ont aussi montré que les modèles de langue probabilistes, les SVM et la désambiguïsation sémantique améliorent la qualité de ces classificateurs. Ce travail est un exemple réussi de déploiement d’une approche d’apprentissage machine au service d’une application langagière grand public robuste.
This book argues for novel strategies to integrate engineering design procedures and structural analysis data into architectural design. Algorithmic procedures that recently migrated into the architectural practice are utilized to improve the interface of both disciplines. Architectural design is predominately conducted as a negotiation process of various factors but often lacks rigor and data structures to link it to quantitative procedures. Numerical structural design on the other hand could act as a role model for handling data and robust optimization but it often lacks the complexity of architectural design. The goal of this research is to bring together robust methods from structural design and complex dependency networks from architectural design processes. The book presents three case studies of tools and methods that are developed to exemplify, analyze and evaluate a collaborative work flow.
Machine Learning applicato al Web Semantico: Statistical Relational Learning vs Tensor Factorization
Obiettivo della tesi è analizzare e testare i principali approcci di Machine Learning applicabili in contesti semantici, partendo da algoritmi di Statistical Relational Learning, quali Relational Probability Trees, Relational Bayesian Classifiers e Relational Dependency Networks, per poi passare ad approcci basati su fattorizzazione tensori, in particolare CANDECOMP/PARAFAC, Tucker e RESCAL.
In this work, a Langevin dynamics model of the diffusion of water in articular cartilage was developed. Numerical simulations of the translational dynamics of water molecules and their interaction with collagen fibers were used to study the quantitative relationship between the organization of the collagen fiber network and the diffusion tensor of water in model cartilage. Langevin dynamics was used to simulate water diffusion in both ordered and partially disordered cartilage models. In addition, an analytical approach was developed to estimate the diffusion tensor for a network comprising a given distribution of fiber orientations. The key findings are that (1) an approximately linear relationship was observed between collagen volume fraction and the fractional anisotropy of the diffusion tensor in fiber networks of a given degree of alignment, (2) for any given fiber volume fraction, fractional anisotropy follows a fiber alignment dependency similar to the square of the second Legendre polynomial of cos(θ), with the minimum anisotropy occurring at approximately the magic angle (θMA), and (3) a decrease in the principal eigenvalue and an increase in the transverse eigenvalues is observed as the fiber orientation angle θ progresses from 0◦ to 90◦. The corresponding diffusion ellipsoids are prolate for θ < θMA, spherical for θ ≈ θMA, and oblate for θ > θMA. Expansion of the model to include discrimination between the combined effects of alignment disorder and collagen fiber volume fraction on the diffusion tensor is discussed.
Actin is the most abundantly distributed protein in living cells which plays critical roles in the cell interior force generation and transmission. The fracture mechanism of microfilament networks, whose principle component is actin, would provide insights which can contribute to the understandings of self-protective characters of cytoskeleton. In this study, molecular simulations are conducted to investigate the molecular mechanisms of disruption of microfilament networks from the viewpoint of biophysics. By employing a coarse-grained (CG) model of actin filament networks, we focused on the ultimate strength and crack growth mode of microfilament networks that have dependency on the crack length. It can be found that, the fracture mechanism of microfilament network has dependency on the structural properties of microfilament networks. The structure flaws marginally change the strength of microfilament networks which would explain the self-protective characters of cytoskeleton.
A model of crosslinker unbinding is implemented in a highly coarsegrained granular model of F-actin cytoskeleton. We employ this specific granular model to study the mechanisms of the compressive responses of F-actin networks. It is found that the compressive response of F-actin cytoskeleton has dependency on the strain rate. The evolution of deformation energy in the network indicates that crosslinker unbinding events can induce the remodelling of F-actin cytoskeleton in response to external loadings. The internal stress in F-actin cytoskeleton can efficiently dissipate with the help of crosslinker unbinding, which could lead to the spontaneous relaxation of living cells.
The quantity of fruit consumed by dispersers is highly variable among individuals within plant populations. The outcome Of Such selection operated by firugivores has been examined mostly with respect to changing spatial contexts. The influence of varying temporal contexts on frugivore choice, and their possible demographic and evolutionary consequences is poorly understood. We examined if temporal variation in fruit availability across a hierarchy of nested temporal levels (interannual, intraseasonal, 120 h, 24 h) altered frugivore choice for a complex seed dispersal system in dry tropical forests of southern India. The interactions between Phyllanthus emblica and its primary disperser (ruminants) was mediated by another frugivore (a primate),which made large quantities of fruit available on the ground to ruminants. The direction and strength of crop size and neighborhood effects on this interaction varied with changing temporal contexts.Fruit availability was higher in the first of the two study years, and at the start of the season in both years. Fruit persistence on trees,determined by primate foraging, was influenced by crop size andconspecific neighborhood densities only in the high fruit availability year. Fruit removal by ruminants was influenced by crop size in both years and neighborhood densities only in the high availability year. In both years, these effects were stronger at the start of the season.Intraseasonal reduction in fruit availability diminished inequalities in fruit removal by ruminants and the influence of crop size and fruiting neighborhoods. All trees were not equally attractive to frugivores in a P. emblica population at all points of time. Temporal asymmetry in frugivore-mediated selection could reduce potential for co-evolution between firugivores and plants by diluting selective pressures. Inter-dependencies; formed between disparate animal consumers can add additional levels of complexity to plant-frugivore mutualistic networks and have potential reproductive consequences for specific individuals within populations.
We outline the design and creation of a syntactically and morphologically annotated corpora of Finnish for use by the research community. We motivate a definitional, systematic “grammar definition corpus” as a first step in an three-year annotation effort to help create higher-quality, better-documented extensive parsebanks at a later stage. The syntactic representation, consisting of a dependency structure and a basic set of dependency functions, is outlined with examples. Reference is made to double-blind annotation experiments to measure the applicability of the newgrammar definition corpus methodology.
Numerous linguistic operations have been assigned to cortical brain areas, but the contributions of subcortical structures to human language processing are still being discussed. Using simultaneous EEG recordings directly from deep brain structures and the scalp, we show that the human thalamus systematically reacts to syntactic and semantic parameters of auditorily presented language in a temporally interleaved manner in coordination with cortical regions. In contrast, two key structures of the basal ganglia, the globus pallidus internus and the subthalamic nucleus, were not found to be engaged in these processes. We therefore propose that syntactic and semantic language analysis is primarily realized within cortico-thalamic networks, whereas a cohesive basal ganglia network is not involved in these essential operations of language analysis.