33 resultados para PageRank
Resumo:
In this thesis we are going to analyze the dictionary graphs and some other kinds of graphs using the PagerRank algorithm. We calculated the correlation between the degree and PageRank of all nodes for a graph obtained from Merriam-Webster dictionary, a French dictionary and WordNet hypernym and synonym dictionaries. Our conclusion was that PageRank can be a good tool to compare the quality of dictionaries. We studied some artificial social and random graphs. We found that when we omitted some random nodes from each of the graphs, we have not noticed any significant changes in the ranking of the nodes according to their PageRank. We also discovered that some social graphs selected for our study were less resistant to the changes of PageRank.
Resumo:
The our reality is characterized by a constant progress and, to follow that, people need to stay up to date on the events. In a world with a lot of existing news, search for the ideal ones may be difficult, because the obstacles that make it arduous will be expanded more and more over time, due to the enrichment of data. In response, a great help is given by Information Retrieval, an interdisciplinary branch of computer science that deals with the management and the retrieval of the information. An IR system is developed to search for contents, contained in a reference dataset, considered relevant with respect to the need expressed by an interrogative query. To satisfy these ambitions, we must consider that most of the developed IR systems rely solely on textual similarity to identify relevant information, defining them as such when they include one or more keywords expressed by the query. The idea studied here is that this is not always sufficient, especially when it's necessary to manage large databases, as is the web. The existing solutions may generate low quality responses not allowing, to the users, a valid navigation through them. The intuition, to overcome these limitations, has been to define a new concept of relevance, to differently rank the results. So, the light was given to Temporal PageRank, a new proposal for the Web Information Retrieval that relies on a combination of several factors to increase the quality of research on the web. Temporal PageRank incorporates the advantages of a ranking algorithm, to prefer the information reported by web pages considered important by the context itself in which they reside, and the potential of techniques belonging to the world of the Temporal Information Retrieval, exploiting the temporal aspects of data, describing their chronological contexts. In this thesis, the new proposal is discussed, comparing its results with those achieved by the best known solutions, analyzing its strengths and its weaknesses.
Resumo:
In this paper new results on personalized PageRank are shown. We consider directed graphs that may contain dangling nodes. The main result presented gives an analytical characterization of all the possible values of the personalized PageRank for any node.We use this result to give a theoretical justification of a recent model that uses the personalized PageRank to classify users of Social Networks Sites. We introduce new concepts concerning competitivity and leadership in complex networks. We also present some theoretical techniques to locate leaders and competitors which are valid for any personalization vector and by using only information related to the adjacency matrix of the graph and the distribution of its dangling nodes.
Resumo:
In this paper, parallel Relaxed and Extrapolated algorithms based on the Power method for accelerating the PageRank computation are presented. Different parallel implementations of the Power method and the proposed variants are analyzed using different data distribution strategies. The reported experiments show the behavior and effectiveness of the designed algorithms for realistic test data using either OpenMP, MPI or an hybrid OpenMP/MPI approach to exploit the benefits of shared memory inside the nodes of current SMP supercomputers.
Resumo:
In questa tesi abbiamo analizzato il non backtracking PageRank, un algoritmo di classificazione variante del PageRank che non considera il backtracking, cioè i cammini che tornano nel nodo da cui sono partiti al passo subito successivo. Lo scopo di questa variante è ottenere una classificazione migliore in tutti quei problemi in cui il backtracking viene evitato. Siamo partiti introducendo il PageRank standard, per poi spiegare nel dettaglio il non backtracking PageRank e quali fossero le analogie e differenze tra i due. Ci siamo poi chiesti come risolvere computazionalmente il problema, studiando il risolutore di sistemi lineari GMRES e facendo delle osservazioni su come si possano ridurre il numero di iterazioni e il tempo di calcolo tramite il precondizionamento. Infine, abbiamo eseguito degli esperimenti sulle reti stradali di alcune città e confrontato i risultati ottenuti tramite le diverse classificazioni.
Resumo:
In questo lavoro di tesi si analizzerà un metodo per risolvere il problema del PageRank alternativo rispetto al tradizionale metodo delle potenze. Verso la fine degli anni '90, con l’avvento del World Wide Web, fu necessario sviluppare degli algoritmi di classificazione in grado di ordinare i siti web in base alla loro rilevanza. Davanti a questa sfida i due matematici A.N.Langville e C.D.Meyer svilupparono il metodo SIAD, "special iterative aggregation/disaggregation method". Lo scopo di questa tesi è in primo luogo di ricostruire il metodo SIAD e analizzarne le proprietà. Seguendo le analisi in un articolo di I.C.Ipsen e S.Kirkland, si ricostruirà nel dettaglio il metodo SIAD, così da esplicitare la convergenza asintotica del metodo in relazione al complemento stocastico scelto. In secondo luogo si analizzerà il metodo SIAD applicato ad una matrice di Google che rispetta ipotesi determinate, le matrici di Google sono solitamente utilizzate per rappresentare il problema del PageRank. Successivamente, si dimostrerà un importante teorema che prova come per ogni matrice di Google si possa individuare un complemento stocastico per cui il metodo SIAD converge più velocemente del metodo delle potenze. Infine, nell’ultimo capitolo si implementerà con il inguaggio di programmazione Matlab il metodo SIAD per una matrice generica e per una matrice di Google. In particolare, si sfrutterà la struttura della matrice di Google per ridurre sensibilmente il costo computazionale del metodo quando applicato applicato ad una tale matrice.
Resumo:
En este proyecto se va a realizar una evaluación a Google para encontrar los puntos débiles de la aplicación y proponer soluciones y/o mejoras.Empezaremos introduciendo la historia de Google para tener referencias de cómo y dónde surgió, el algoritmo de PageRank que es el núcleo del motor de búsqueda y el hardware y software que ha desarrollado con su propia tecnología.Previamente se introducirán los requisitos que se necesitarán para entender cómo se van a evaluar los cuestionarios, es decir, se explicará la escalera Likert y las dos aplicaciones desarrolladas para realizar el análisis de las queries obtenidas.A continuación se detallará como se realizará la evaluación y se propondrá un cuestionario para este fin. Una vez enviado el cuestionario, obtendremos los datos necesarios para poder evaluar Google.Al concluir la evaluación, se propondrán 5 mejoras para dar más control al usuario y para poder evaluarlas se creará otro cuestionario. Con los datos que se obtendrán de este, se realizará una evaluación de las mejoras y se analizará si tienen una buena acogidas por parte de los usuarios.Para finalizar el proyecto, se realizarán unas conclusiones globales de todos los datos analizados y de las propuestas de mejora.
Resumo:
Complex networks have recently attracted a significant amount of research attention due to their ability to model real world phenomena. One important problem often encountered is to limit diffusive processes spread over the network, for example mitigating pandemic disease or computer virus spread. A number of problem formulations have been proposed that aim to solve such problems based on desired network characteristics, such as maintaining the largest network component after node removal. The recently formulated critical node detection problem aims to remove a small subset of vertices from the network such that the residual network has minimum pairwise connectivity. Unfortunately, the problem is NP-hard and also the number of constraints is cubic in number of vertices, making very large scale problems impossible to solve with traditional mathematical programming techniques. Even many approximation algorithm strategies such as dynamic programming, evolutionary algorithms, etc. all are unusable for networks that contain thousands to millions of vertices. A computationally efficient and simple approach is required in such circumstances, but none currently exist. In this thesis, such an algorithm is proposed. The methodology is based on a depth-first search traversal of the network, and a specially designed ranking function that considers information local to each vertex. Due to the variety of network structures, a number of characteristics must be taken into consideration and combined into a single rank that measures the utility of removing each vertex. Since removing a vertex in sequential fashion impacts the network structure, an efficient post-processing algorithm is also proposed to quickly re-rank vertices. Experiments on a range of common complex network models with varying number of vertices are considered, in addition to real world networks. The proposed algorithm, DFSH, is shown to be highly competitive and often outperforms existing strategies such as Google PageRank for minimizing pairwise connectivity.
Resumo:
Science is search for the laws of underlying phenomena of the nature. Engineering constructs the nature as we wish. Interestingly the huge engineering infrastructure like world wide web has grown in such a complex structure such that we need to see the fundamental science behind the structure and behaviour of these networks. This talk covers the science behind the complex networks like web, biological, social etc. The talk aim to discuss the basic theories that govern the static as well as the dynamics of such interesting networks
Resumo:
As the number of resources on the web exceeds by far the number of documents one can track, it becomes increasingly difficult to remain up to date on ones own areas of interest. The problem becomes more severe with the increasing fraction of multimedia data, from which it is difficult to extract some conceptual description of their contents. One way to overcome this problem are social bookmark tools, which are rapidly emerging on the web. In such systems, users are setting up lightweight conceptual structures called folksonomies, and overcome thus the knowledge acquisition bottleneck. As more and more people participate in the effort, the use of a common vocabulary becomes more and more stable. We present an approach for discovering topic-specific trends within folksonomies. It is based on a differential adaptation of the PageRank algorithm to the triadic hypergraph structure of a folksonomy. The approach allows for any kind of data, as it does not rely on the internal structure of the documents. In particular, this allows to consider different data types in the same analysis step. We run experiments on a large-scale real-world snapshot of a social bookmarking system.
Resumo:
What are ways of searching in graphs? In this class, we will discuss basics of link analysis, including Google's PageRank algorithm as an example. Readings: The PageRank Citation Ranking: Bringing Order to the Web, L. Page and S. Brin and R. Motwani and T. Winograd (1998) Stanford Tecnical Report
Resumo:
Slides and an essay on the Web Graph, search engines and how Google calculates Page Rank
Resumo:
Purpose - The purpose of this paper is to identify the most popular techniques used to rank a web page highly in Google. Design/methodology/approach - The paper presents the results of a study into 50 highly optimized web pages that were created as part of a Search Engine Optimization competition. The study focuses on the most popular techniques that were used to rank highest in this competition, and includes an analysis on the use of PageRank, number of pages, number of in-links, domain age and the use of third party sites such as directories and social bookmarking sites. A separate study was made into 50 non-optimized web pages for comparison. Findings - The paper provides insight into the techniques that successful Search Engine Optimizers use to ensure a page ranks highly in Google. Recognizes the importance of PageRank and links as well as directories and social bookmarking sites. Research limitations/implications - Only the top 50 web sites for a specific query were analyzed. Analysing more web sites and comparing with similar studies in different competition would provide more concrete results. Practical implications - The paper offers a revealing insight into the techniques used by industry experts to rank highly in Google, and the success or other-wise of those techniques. Originality/value - This paper fulfils an identified need for web sites and e-commerce sites keen to attract a wider web audience.
Resumo:
In this article, we investigate how the choice of the attenuation factor in an extended version of Katz centrality influences the centrality of the nodes in evolving communication networks. For given snapshots of a network, observed over a period of time, recently developed communicability indices aim to identify the best broadcasters and listeners (receivers) in the network. Here we explore the attenuation factor constraint, in relation to the spectral radius (the largest eigenvalue) of the network at any point in time and its computation in the case of large networks. We compare three different communicability measures: standard, exponential, and relaxed (where the spectral radius bound on the attenuation factor is relaxed and the adjacency matrix is normalised, in order to maintain the convergence of the measure). Furthermore, using a vitality-based measure of both standard and relaxed communicability indices, we look at the ways of establishing the most important individuals for broadcasting and receiving of messages related to community bridging roles. We compare those measures with the scores produced by an iterative version of the PageRank algorithm and illustrate our findings with two examples of real-life evolving networks: the MIT reality mining data set, consisting of daily communications between 106 individuals over the period of one year, a UK Twitter mentions network, constructed from the direct \emph{tweets} between 12.4k individuals during one week, and a subset the Enron email data set.
Resumo:
Questa dissertazione esamina le sfide e i limiti che gli algoritmi di analisi di grafi incontrano in architetture distribuite costituite da personal computer. In particolare, analizza il comportamento dell'algoritmo del PageRank così come implementato in una popolare libreria C++ di analisi di grafi distribuiti, la Parallel Boost Graph Library (Parallel BGL). I risultati qui presentati mostrano che il modello di programmazione parallela Bulk Synchronous Parallel è inadatto all'implementazione efficiente del PageRank su cluster costituiti da personal computer. L'implementazione analizzata ha infatti evidenziato una scalabilità negativa, il tempo di esecuzione dell'algoritmo aumenta linearmente in funzione del numero di processori. Questi risultati sono stati ottenuti lanciando l'algoritmo del PageRank della Parallel BGL su un cluster di 43 PC dual-core con 2GB di RAM l'uno, usando diversi grafi scelti in modo da facilitare l'identificazione delle variabili che influenzano la scalabilità. Grafi rappresentanti modelli diversi hanno dato risultati differenti, mostrando che c'è una relazione tra il coefficiente di clustering e l'inclinazione della retta che rappresenta il tempo in funzione del numero di processori. Ad esempio, i grafi Erdős–Rényi, aventi un basso coefficiente di clustering, hanno rappresentato il caso peggiore nei test del PageRank, mentre i grafi Small-World, aventi un alto coefficiente di clustering, hanno rappresentato il caso migliore. Anche le dimensioni del grafo hanno mostrato un'influenza sul tempo di esecuzione particolarmente interessante. Infatti, si è mostrato che la relazione tra il numero di nodi e il numero di archi determina il tempo totale.