838 resultados para Networks analysis
Resumo:
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.
Resumo:
Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. Furthermore, in less complex texts the distance between concepts, represented as nodes, tended to decrease. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability. Copyright (c) EPLA, 2012
Resumo:
This morning Dr. Risser will introduce you to the basic ideas of social network analysis. You will learn some history behind the study of social networks. Dr. Risser will introduce you to mathematical measures of social networks including centrality measures and measures of spread and cohesion. You will also learn how to use a computer program to analyze social network data
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Complex Networks analysis turn out to be a very promising field of research, testified by many research projects and works that span different fields. Those analysis have been usually focused on characterize a single aspect of the system and a study that considers many informative axes along with a network evolve is lacking. We propose a new multidimensional analysis that is able to inspect networks in the two most important dimensions, space and time. To achieve this goal, we studied them singularly and investigated how the variation of the constituting parameters drives changes to the network as a whole. By focusing on space dimension, we characterized spatial alteration in terms of abstraction levels. We proposed a novel algorithm that, by applying a fuzziness function, can reconstruct networks under different level of details. We verified that statistical indicators depend strongly on the granularity with which a system is described and on the class of networks. We keep fixed the space axes and we isolated the dynamics behind networks evolution process. We detected new instincts that trigger social networks utilization and spread the adoption of novel communities. We formalized this enhanced social network evolution by adopting special nodes (called sirens) that, thanks to their ability to attract new links, were able to construct efficient connection patterns. We simulated the dynamics of the system by considering three well-known growth models. Applying this framework to real and synthetic networks, we showed that the sirens, even when used for a limited time span, effectively shrink the time needed to get a network in mature state. In order to provide a concrete context of our findings, we formalized the cost of setting up such enhancement and provided the best combinations of system's parameters, such as number of sirens, time span of utilization and attractiveness.
Resumo:
In this poster we presented our preliminary work on the study of spammer detection and analysis with 50 active honeypot profiles implemented on Weibo.com and QQ.com microblogging networks. We picked out spammers from legitimate users by manually checking every captured user's microblogs content. We built a spammer dataset for each social network community using these spammer accounts and a legitimate user dataset as well. We analyzed several features of the two user classes and made a comparison on these features, which were found to be useful to distinguish spammers from legitimate users. The followings are several initial observations from our analysis on the features of spammers captured on Weibo.com and QQ.com. ¦The following/follower ratio of spammers is usually higher than legitimate users. They tend to follow a large amount of users in order to gain popularity but always have relatively few followers. ¦There exists a big gap between the average numbers of microblogs posted per day from these two classes. On Weibo.com, spammers post quite a lot microblogs every day, which is much more than legitimate users do; while on QQ.com spammers post far less microblogs than legitimate users. This is mainly due to the different strategies taken by spammers on these two platforms. ¦More spammers choose a cautious spam posting pattern. They mix spam microblogs with ordinary ones so that they can avoid the anti-spam mechanisms taken by the service providers. ¦Aggressive spammers are more likely to be detected so they tend to have a shorter life while cautious spammers can live much longer and have a deeper influence on the network. The latter kind of spammers may become the trend of social network spammer. © 2012 IEEE.
Resumo:
A new relationship type of social networks - online dating - are gaining popularity. With a large member base, users of a dating network are overloaded with choices about their ideal partners. Recommendation methods can be utilized to overcome this problem. However, traditional recommendation methods do not work effectively for online dating networks where the dataset is sparse and large, and a two-way matching is required. This paper applies social networking concepts to solve the problem of developing a recommendation method for online dating networks. We propose a method by using clustering, SimRank and adapted SimRank algorithms to recommend matching candidates. Empirical results show that the proposed method can achieve nearly double the performance of the traditional collaborative filtering and common neighbor methods of recommendation.
Resumo:
The rapid growth in the number of users using social networks and the information that a social network requires about their users make the traditional matching systems insufficiently adept at matching users within social networks. This paper introduces the use of clustering to form communities of users and, then, uses these communities to generate matches. Forming communities within a social network helps to reduce the number of users that the matching system needs to consider, and helps to overcome other problems from which social networks suffer, such as the absence of user activities' information about a new user. The proposed system has been evaluated on a dataset obtained from an online dating website. Empirical analysis shows that accuracy of the matching process is increased using the community information.
Resumo:
This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.
Resumo:
[EN]Based on the theoretical tools of Complex Networks, this work provides a basic descriptive study of a synonyms dictionary, the Spanish Open Thesaurus represented as a graph. We study the main structural measures of the network compared with those of a random graph. Numerical results show that Open-Thesaurus is a graph whose topological properties approximate a scale-free network, but seems not to present the small-world property because of its sparse structure. We also found that the words of highest betweenness centrality are terms that suggest the vocabulary of psychoanalysis: placer (pleasure), ayudante (in the sense of assistant or worker), and regular (to regulate).
Resumo:
It is well known that the storage capacity may be large if all memory patterns are orthogonal to each other. In this paper, a clear description is given about the relation between the dimension N and the maximal number of orthogonal vectors with components +/-1, and also the conception of attractive index is proposed to estimate the basin of attraction. Theoretic analysis and computer simulation show that each memory pattern's basin of attraction contains at least one Hamming ball when the storage capacity is less than 0.33N which is better than usual 0.15 N.