43 resultados para PageRank


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis we are going to analyze the dictionary graphs and some other kinds of graphs using the PagerRank algorithm. We calculated the correlation between the degree and PageRank of all nodes for a graph obtained from Merriam-Webster dictionary, a French dictionary and WordNet hypernym and synonym dictionaries. Our conclusion was that PageRank can be a good tool to compare the quality of dictionaries. We studied some artificial social and random graphs. We found that when we omitted some random nodes from each of the graphs, we have not noticed any significant changes in the ranking of the nodes according to their PageRank. We also discovered that some social graphs selected for our study were less resistant to the changes of PageRank.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The our reality is characterized by a constant progress and, to follow that, people need to stay up to date on the events. In a world with a lot of existing news, search for the ideal ones may be difficult, because the obstacles that make it arduous will be expanded more and more over time, due to the enrichment of data. In response, a great help is given by Information Retrieval, an interdisciplinary branch of computer science that deals with the management and the retrieval of the information. An IR system is developed to search for contents, contained in a reference dataset, considered relevant with respect to the need expressed by an interrogative query. To satisfy these ambitions, we must consider that most of the developed IR systems rely solely on textual similarity to identify relevant information, defining them as such when they include one or more keywords expressed by the query. The idea studied here is that this is not always sufficient, especially when it's necessary to manage large databases, as is the web. The existing solutions may generate low quality responses not allowing, to the users, a valid navigation through them. The intuition, to overcome these limitations, has been to define a new concept of relevance, to differently rank the results. So, the light was given to Temporal PageRank, a new proposal for the Web Information Retrieval that relies on a combination of several factors to increase the quality of research on the web. Temporal PageRank incorporates the advantages of a ranking algorithm, to prefer the information reported by web pages considered important by the context itself in which they reside, and the potential of techniques belonging to the world of the Temporal Information Retrieval, exploiting the temporal aspects of data, describing their chronological contexts. In this thesis, the new proposal is discussed, comparing its results with those achieved by the best known solutions, analyzing its strengths and its weaknesses.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper new results on personalized PageRank are shown. We consider directed graphs that may contain dangling nodes. The main result presented gives an analytical characterization of all the possible values of the personalized PageRank for any node.We use this result to give a theoretical justification of a recent model that uses the personalized PageRank to classify users of Social Networks Sites. We introduce new concepts concerning competitivity and leadership in complex networks. We also present some theoretical techniques to locate leaders and competitors which are valid for any personalization vector and by using only information related to the adjacency matrix of the graph and the distribution of its dangling nodes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed variants are analyzed using different data distribution strategies. The reported experiments show the behavior and effectiveness of the designed algorithms for realistic test data using either OpenMP, MPI or an hybrid OpenMP/MPI approach to exploit the benefits of shared memory inside the nodes of current SMP supercomputers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several technical indicators have been proposed to assess the impact of authors and institutions. Here, we combine the h-index and the PageRank algorithm to do away with some of the individual limitations of these two indices. Most importantly, we aim to take into account value differences between citations-evaluating the citation sources by defining the h-index using the PageRank score rather than with citations. The resulting PR-index is then constructed by evaluating source popularity as well as the source publication authority. Extensive tests on available collections data (i.e., Microsoft Academic Search and benchmarks on the SIGKDD innovation award) show that the PR-index provides a more balanced impact measure than many existing indices. Due to its simplicity and similarity to the popular h-index, the PR-index may thus become a welcome addition to the technical indices already in use. Moreover, growth dynamics prior to the SIGKDD innovation award indicate that the PR-index might have notable predictive power.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As increasing numbers of Chinese language learners choose to learn English online (CNNIC, 2012), there is a need to investigate popular websites and their language learning designs. This paper reports on the first stage of a study that analysed the pedagogical, linguistic and content features of 25 Chinese English Language Learning (ELL) websites ranked according to their value and importance to users. The website ranking was undertaken using a system known as PageRank. The aim of the study was to identify the features characterising popular sites as opposed to those of less popular sites for the purpose of producing a framework for ELL website design in the Chinese context. The study found that a pedagogical focus with developmental instructional materials accommodating diverse proficiency levels was a major contributor to website popularity. Chinese language use for translations and teaching directives and intermediate level English for learning materials were also significant features. Content topics included Anglophone/Western and non-Anglophone/Eastern contexts. Overall, popular websites were distinguished by their mediation of access to and scaffolded support for ELL.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For TREC Crowdsourcing 2011 (Stage 2) we propose a networkbased approach for assigning an indicative measure of worker trustworthiness in crowdsourced labelling tasks. Workers, the gold standard and worker/gold standard agreements are modelled as a network. For the purpose of worker trustworthiness assignment, a variant of the PageRank algorithm, named TurkRank, is used to adaptively combine evidence that suggests worker trustworthiness, i.e., agreement with other trustworthy co-workers and agreement with the gold standard. A single parameter controls the importance of co-worker agreement versus gold standard agreement. The TurkRank score calculated for each worker is incorporated with a worker-weighted mean label aggregation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we present a dynamic model to identify influential users of micro-blogging services. Micro-blogging services, such as Twitter, allow their users (twitterers) to publish tweets and choose to follow other users to receive tweets. Previous work on user influence on Twitter, concerns more on following link structure and the contents user published, seldom emphasizes the importance of interactions among users. We argue that, by emphasizing on user actions in micro-blogging platform, user influence could be measured more accurately. Since micro-blogging is a powerful social media and communication platform, identifying influential users according to user interactions has more practical meanings, e.g., advertisers may concern how many actions – buying, in this scenario – the influential users could initiate rather than how many advertisements they spread. By introducing the idea of PageRank algorithm, innovatively, we propose our model using action-based network which could capture the ability of influential users when they interacting with micro-blogging platform. Taking the evolving prosperity of micro-blogging into consideration, we extend our actionbaseduser influence model into a dynamic one, which could distinguish influential users in different time periods. Simulation results demonstrate that our models could support and give reasonable explanations for the scenarios that we considered.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Users can rarely reveal their information need in full detail to a search engine within 1--2 words, so search engines need to "hedge their bets" and present diverse results within the precious 10 response slots. Diversity in ranking is of much recent interest. Most existing solutions estimate the marginal utility of an item given a set of items already in the response, and then use variants of greedy set cover. Others design graphs with the items as nodes and choose diverse items based on visit rates (PageRank). Here we introduce a radically new and natural formulation of diversity as finding centers in resistive graphs. Unlike in PageRank, we do not specify the edge resistances (equivalently, conductances) and ask for node visit rates. Instead, we look for a sparse set of center nodes so that the effective conductance from the center to the rest of the graph has maximum entropy. We give a cogent semantic justification for turning PageRank thus on its head. In marked deviation from prior work, our edge resistances are learnt from training data. Inference and learning are NP-hard, but we give practical solutions. In extensive experiments with subtopic retrieval, social network search, and document summarization, our approach convincingly surpasses recently-published diversity algorithms like subtopic cover, max-marginal relevance (MMR), Grasshopper, DivRank, and SVMdiv.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

传统的基于“词袋”的文本表示方法假定词的权重只和它本身的出现频率有关,而忽略上下文信息。 本文提出了一种基于上下文的图模型文本表示方法,使用一种类似PageRank的图模型来建立词和词之间的相 互推荐关系,该方法克服了传统文本表示认为词和词之间相互独立,忽略词的上下文环境的缺陷。在复邑中文 文本分类和20newsgroup英文文本分类的语料库上的实验表明,我们的方法可以有效地提升文本分类的性能。

Relevância:

10.00% 10.00%

Publicador:

Resumo:

信息检索与查询是网络用户最常用的网络服务之一。信息检索技术旨在帮助 用户在有限的时间内找到感兴趣的文档。近年来,基于语言模型的信息检索技术 因其良好的性能和较完备的理论基础而吸引了众多研究者。 文档模型和查询模型 的估计是影响语言模型信息检索系统性能的两个重要因素, 针对现有方法存在的 问题,本文从理论和实际应用两个方面开展了研究。主要工作总结如下: 第一、对目前统计语言模型信息检索技术进行了较为全面和深入的综述。从 相关度估计方法、文档模型和查询模型等方面,对目前语言模型信息检索研究工 作进行了分析,并对一些有代表性的工作进行了介绍。讨论了语言模型信息检索 技术在个性化搜索中的应用。最后基于上述研究现状的分析,给出了语言模型信 息检索未来值得研究的几个方向。 第二、提出一种基于局部词图的文档平滑方法。对于每个文档,我们获取这 篇文档最相似的K个文档,使用这K个文档和文档本身建立一个局部文档集L。在 这个局部文档集L上建立一个局部词图:节点为L中的词项,边为两个节点在L中 的同现次数。然后在这个词图上使用类似PageRank的排序方法计算节点的权重, 使用节点的权重估计文档模型。 在3个TREC数据集上的实验结果验证了该方法的 有效性。 第三、提出一种基于图的迭代增强个性化检索算法。利用文档与文档、文档 与词、词与词这三种相互增强关系来计算词的权值和检索结果文档的得分。并根 据词的权值进行查询扩展,根据文档的得分进行查询结果的重排序。通过查询扩 展可以丰富结果文档,并通过重排序把用户关注的个性化文档推荐给用户。实验 结果表明,本文提出的个性化检索算法能够有效地提高检索精度。最后,基于该 算法,我们实现了IE浏览器插件形式的个性化检索工具GBAIR。

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2012

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Complex networks have recently attracted a significant amount of research attention due to their ability to model real world phenomena. One important problem often encountered is to limit diffusive processes spread over the network, for example mitigating pandemic disease or computer virus spread. A number of problem formulations have been proposed that aim to solve such problems based on desired network characteristics, such as maintaining the largest network component after node removal. The recently formulated critical node detection problem aims to remove a small subset of vertices from the network such that the residual network has minimum pairwise connectivity. Unfortunately, the problem is NP-hard and also the number of constraints is cubic in number of vertices, making very large scale problems impossible to solve with traditional mathematical programming techniques. Even many approximation algorithm strategies such as dynamic programming, evolutionary algorithms, etc. all are unusable for networks that contain thousands to millions of vertices. A computationally efficient and simple approach is required in such circumstances, but none currently exist. In this thesis, such an algorithm is proposed. The methodology is based on a depth-first search traversal of the network, and a specially designed ranking function that considers information local to each vertex. Due to the variety of network structures, a number of characteristics must be taken into consideration and combined into a single rank that measures the utility of removing each vertex. Since removing a vertex in sequential fashion impacts the network structure, an efficient post-processing algorithm is also proposed to quickly re-rank vertices. Experiments on a range of common complex network models with varying number of vertices are considered, in addition to real world networks. The proposed algorithm, DFSH, is shown to be highly competitive and often outperforms existing strategies such as Google PageRank for minimizing pairwise connectivity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Science is search for the laws of underlying phenomena of the nature. Engineering constructs the nature as we wish. Interestingly the huge engineering infrastructure like world wide web has grown in such a complex structure such that we need to see the fundamental science behind the structure and behaviour of these networks. This talk covers the science behind the complex networks like web, biological, social etc. The talk aim to discuss the basic theories that govern the static as well as the dynamics of such interesting networks