969 resultados para COMMUNITY DETECTION
Resumo:
Motivated by the observation that communities in real world social networks form due to actions of rational individuals in networks, we propose a novel game theory inspired algorithm to determine communities in networks. The algorithm is decentralized and only uses local information at each node. We show the efficacy of the proposed algorithm through extensive experimentation on several real world social network data sets.
Resumo:
In the scope of the current thesis we review and analyse networks that are formed by nodes with several attributes. We suppose that different layers of communities are embedded in such networks, besides each of the layers is connected with nodes' attributes. For example, examine one of a variety of online social networks: an user participates in a plurality of different groups/communities – schoolfellows, colleagues, clients, etc. We introduce a detection algorithm for the above-mentioned communities. Normally the result of the detection is the community supplemented just by the most dominant attribute, disregarding others. We propose an algorithm that bypasses dominant communities and detects communities which are formed by other nodes' attributes. We also review formation models of the attributed networks and present a Human Communication Network (HCN) model. We introduce a High School Texting Network (HSTN) and examine our methods for that network.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Wireless Sensor Networks (WSN) are a special kind of ad-hoc networks that is usually deployed in a monitoring field in order to detect some physical phenomenon. Due to the low dependability of individual nodes, small radio coverage and large areas to be monitored, the organization of nodes in small clusters is generally used. Moreover, a large number of WSN nodes is usually deployed in the monitoring area to increase WSN dependability. Therefore, the best cluster head positioning is a desirable characteristic in a WSN. In this paper, we propose a hybrid clustering algorithm based on community detection in complex networks and traditional K-means clustering technique: the QK-Means algorithm. Simulation results show that QK-Means detect communities and sub-communities thus lost message rate is decreased and WSN coverage is increased. © 2012 IEEE.
Resumo:
Complex networks analysis is a very popular topic in computer science. Unfortunately this networks, extracted from different contexts, are usually very large and the analysis may be very complicated: computation of metrics on these structures could be very complex. Among all metrics we analyse the extraction of subnetworks called communities: they are groups of nodes that probably play the same role within the whole structure. Communities extraction is an interesting operation in many different fields (biology, economics,...). In this work we present a parallel community detection algorithm that can operate on networks with huge number of nodes and edges. After an introduction to graph theory and high performance computing, we will explain our design strategies and our implementation. Then, we will show some performance evaluation made on a distributed memory architectures i.e. the supercomputer IBM-BlueGene/Q "Fermi" at the CINECA supercomputing center, Italy, and we will comment our results.
Resumo:
Fuzzy community detection is to identify fuzzy communities in a network, which are groups of vertices in the network such that the membership of a vertex in one community is in [0,1] and that the sum of memberships of vertices in all communities equals to 1. Fuzzy communities are pervasive in social networks, but only a few works have been done for fuzzy community detection. Recently, a one-step forward extension of Newman’s Modularity, the most popular quality function for disjoint community detection, results into the Generalized Modularity (GM) that demonstrates good performance in finding well-known fuzzy communities. Thus, GMis chosen as the quality function in our research. We first propose a generalized fuzzy t-norm modularity to investigate the effect of different fuzzy intersection operators on fuzzy community detection, since the introduction of a fuzzy intersection operation is made feasible by GM. The experimental results show that the Yager operator with a proper parameter value performs better than the product operator in revealing community structure. Then, we focus on how to find optimal fuzzy communities in a network by directly maximizing GM, which we call it Fuzzy Modularity Maximization (FMM) problem. The effort on FMM problem results into the major contribution of this thesis, an efficient and effective GM-based fuzzy community detection method that could automatically discover a fuzzy partition of a network when it is appropriate, which is much better than fuzzy partitions found by existing fuzzy community detection methods, and a crisp partition of a network when appropriate, which is competitive with partitions resulted from the best disjoint community detections up to now. We address FMM problem by iteratively solving a sub-problem called One-Step Modularity Maximization (OSMM). We present two approaches for solving this iterative procedure: a tree-based global optimizer called Find Best Leaf Node (FBLN) and a heuristic-based local optimizer. The OSMM problem is based on a simplified quadratic knapsack problem that can be solved in linear time; thus, a solution of OSMM can be found in linear time. Since the OSMM algorithm is called within FBLN recursively and the structure of the search tree is non-deterministic, we can see that the FMM/FBLN algorithm runs in a time complexity of at least O (n2). So, we also propose several highly efficient and very effective heuristic algorithms namely FMM/H algorithms. We compared our proposed FMM/H algorithms with two state-of-the-art community detection methods, modified MULTICUT Spectral Fuzzy c-Means (MSFCM) and Genetic Algorithm with a Local Search strategy (GALS), on 10 real-world data sets. The experimental results suggest that the H2 variant of FMM/H is the best performing version. The H2 algorithm is very competitive with GALS in producing maximum modularity partitions and performs much better than MSFCM. On all the 10 data sets, H2 is also 2-3 orders of magnitude faster than GALS. Furthermore, by adopting a simply modified version of the H2 algorithm as a mutation operator, we designed a genetic algorithm for fuzzy community detection, namely GAFCD, where elite selection and early termination are applied. The crossover operator is designed to make GAFCD converge fast and to enhance GAFCD’s ability of jumping out of local minimums. Experimental results on all the data sets show that GAFCD uncovers better community structure than GALS.
Resumo:
Peer reviewed
Resumo:
One of the main challenges of fuzzy community detection problems is to be able to measure the quality of a fuzzy partition. In this paper, we present an alternative way of measuring the quality of a fuzzy community detection output based on n-dimensional grouping and overlap functions. Moreover, the proposed modularity measure generalizes the classical Girvan–Newman (GN) modularity for crisp community detection problems and also for crisp overlapping community detection problems. Therefore, it can be used to compare partitions of different nature (i.e. those composed of classical, overlapping and fuzzy communities). Particularly, as is usually done with the GN modularity, the proposed measure may be used to identify the optimal number of communities to be obtained by any network clustering algorithm in a given network. We illustrate this usage by adapting in this way a well-known algorithm for fuzzy community detection problems, extending it to also deal with overlapping community detection problems and produce a ranking of the overlapping nodes. Some computational experiments show the feasibility of the proposed approach to modularity measures through n-dimensional overlap and grouping functions.
Resumo:
Identification and classification of overlapping nodes in networks are important topics in data mining. In this paper, a network-based (graph-based) semi-supervised learning method is proposed. It is based on competition and cooperation among walking particles in a network to uncover overlapping nodes by generating continuous-valued outputs (soft labels), corresponding to the levels of membership from the nodes to each of the communities. Moreover, the proposed method can be applied to detect overlapping data items in a data set of general form, such as a vector-based data set, once it is transformed to a network. Usually, label propagation involves risks of error amplification. In order to avoid this problem, the proposed method offers a mechanism to identify outliers among the labeled data items, and consequently prevents error propagation from such outliers. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method. © 2012 Springer-Verlag.
Resumo:
This thesis improves the process of recommending people to people in social networks using new clustering algorithms and ranking methods. The proposed system and methods are evaluated on the data collected from a real life social network. The empirical analysis of this research confirms that the proposed system and methods achieved improvements in the accuracy and efficiency of matching and recommending people, and overcome some of the problems that social matching systems usually suffer.
Resumo:
This thesis elaborates on the problem of preprocessing a large graph so that single-pair shortest-path queries can be answered quickly at runtime. Computing shortest paths is a well studied problem, but exact algorithms do not scale well to real-world huge graphs in applications that require very short response time. The focus is on approximate methods for distance estimation, in particular in landmarks-based distance indexing. This approach involves choosing some nodes as landmarks and computing (offline), for each node in the graph its embedding, i.e., the vector of its distances from all the landmarks. At runtime, when the distance between a pair of nodes is queried, it can be quickly estimated by combining the embeddings of the two nodes. Choosing optimal landmarks is shown to be hard and thus heuristic solutions are employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the techniques presented in this thesis is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach which considers selecting landmarks at random. Finally, they are applied in two important problems arising naturally in large-scale graphs, namely social search and community detection.
Resumo:
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications. In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random. Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.
Resumo:
In this paper we propose a graph stream clustering algorithm with a unied similarity measure on both structural and attribute properties of vertices, with each attribute being treated as a vertex. Unlike others, our approach does not require an input parameter for the number of clusters, instead, it dynamically creates new sketch-based clusters and periodically merges existing similar clusters. Experiments on two publicly available datasets reveal the advantages of our approach in detecting vertex clusters in the graph stream. We provide a detailed investigation into how parameters affect the algorithm performance. We also provide a quantitative evaluation and comparison with a well-known offline community detection algorithm which shows that our streaming algorithm can achieve comparable or better average cluster purity.
Resumo:
RESUMO: A infecção por H. pylori, enquadra-se nas doenças infecciosas gastroduodenais e estima-se que mais de 50% da população mundial esteja infectada. A história natural da infecção por H. pylori, sofre interferências relacionadas com a genética do hospedeiro, a estirpe e as características da toxicidade da bactéria. Associam-se a estes factores, o tempo de exposição à infecção, assim como as condições sociais e higiéno-sanitárias. Paralelamente, o H. pylori é considerado o principal agente patogénico das doenças gastroduodenais. Este estudo teve como objectivo principal caracterizar a infecção por H. pylori em populações de Angola e sua avaliação como problema de Saúde Pública. Trata-se de um estudo prospectivo dirigido a dois grupos populacionais, um constituído por indivíduos aparentemente saudáveis, sem queixas gástricas específicas, em ambiente de comunidade, Grupo I, e outro, Grupo II, constituído por doentes que acorreram ao serviço de Gastrenterologia do Hospital Militar Principal de Luanda (HMP). No que diz respeito ao estudo na comunidade a pesquisa de H. pylori foi realizada pelo método ELISA de pesquisa de antigénios nas fezes. Por sua vez, a nível hospitalar, os métodos de diagnóstico da infecção por H. pylori foram: a endoscopia digestiva alta para a colheita de biópsias da mucosa gástrica destinadas ao exame anatomopatológico, ao exame citobacteriológico e aos métodos moleculares. Como método não invasivos foi utilizado o teste respiratório com ureia marcada. Grupo I: o diagnóstico da infecção por H. pylori, realizado pela pesquisa de antigénios deste microrganismo nas fezes, revelou uma frequência de 69,6% na população em estudo. Considerando em cada região, verificou-se que a região do Sambizanga possuía o valor mais elevado de frequência, 81,2%, seguida do Dinge com 79,5%, estatisticamente significativas (p 0,001). A avaliação da distribuição da frequência da infecção por grupo etário, revelou que os indivíduos com idade inferior a 15 anos, possuíam uma frequência de infecção de 63,5% e sendo de 76% nos indivíduos com idade superior a 15 anos. Este estudo permitiu concluir que a frequência da infecção por H. pylori nas regiões estudadas, é de 70% à excepção do Capulo, zona litoral em que não obstante as precárias condições de saneamento, a frequência da infecção por H. pylori é baixa. Grupo II: dos 309 doentes avaliados, verificou-se que 22 (7%), apresentavam uma mucosa normal e 287 (93%) uma mucosa alterada. A avaliação histológica das biópsias do antro, em 270 amostras de acordo com o Sistema de Sidney, em 235 (87,0%), revelou a presença de gastrite, 13 (4,8%) a presença de úlcera e em 9 (3,3%), uma lesão tumoral. A avaliação histológica da actividade nas 226 amostras do antro gástrico, verificou-se que 129 (57%) possuíam actividade e 97 (43%) não possuíam. O estudo das 255 biópsias do corpo, revelou em 212 (83,1%), a presença de lesões de gastrite, em 7 (2,7%), observaram-se lesões tumorais e 2 (0.8%) apresentaram úlcera. Dos 263 doentes avaliados histologicamente para pesquisa do H. pylori, 148 (58,2%) revelaram a presença positiva desta bactéria e 106 (41,7%) foram negativas. No que diz respeito à susceptibilidade aos macrólidos, do universo de 158 doentes com H. pylori positivo, 125 (79,1%) doentes apresentaram estirpes sensíveis aos macrólidos e 33 (20,9%) estirpes resistentes. Em relação aos factores de virulência, na avaliação conjunta dos dois factores de virulência estudados (cagA e vacA), em relação ao tipo de lesões encontradas na mucosa gástrica, verificou-se que dos 11 doentes com úlcera, 7 (63,6%), apresentavam uma estirpe cagA negativa, sendo 6 vacA s1 (85,7%), uma s2 e 4 (36,3%) com uma estirpe cagA positiva e vacA s1. Por sua vez dos 2 doentes com tumor, ambas as estirpes eram cagA negativas, sendo uma vacA s1 e outra vacA s2. Em relação aos factores de virulência nos doentes aos quais se diagnosticou úlcera e tumor apresentavam estirpe cagA negativa, vacAs1. Em relação ás lesões gástricas inflamatórias, os doentes com gastrite apresentavam cagA positivo. Do presente trabalho, em atenção aos resultados obtidos no que concerne a prevalência em populações sem queixas gastrenterológicas, recomenda-se que o mesmo se possa vir a replicar numa abrangência maior, realizando-se, por exemplo, estudos comparativos de prevalência entre as populações residentes no litoral (beira-mar) e as do interior. Pelas características genotípicas de H. pylori, em correspondência com as lesões encontradas, após novos estudos mais abrangentes, recomenda-se a avaliação de uma terapêutica mais acessível para o doente e que seja de maior eficácia. Face à escassez de médicos especialistas em gastrenterologia em Angola e de meios de diagnóstico, recomenda-se um estudo mais alargado da eficácia do seguimento do doente dispéptico, conforme protocolo avaliado pelo Colégio da Especialidade de Gastrenterologia da Ordem dos Médicos de Angola e já em prática em algumas instituições de saúde.--------------------------- ABSTRACT: H.pylori infection, is part of the gastroduodenal infectious diseases and it is estimated that over 50% of the world population is infected. The natural history of H.pylori infection, is influenced by host genetic, strain type, of bacterial virulence factors, time of exposure to the infection, as well as social and hygienic-sanitary conditions. In parallel, H.pylori is considered the main pathogen of gastroduodenal diseases. This study's main objective was to characterize H.pylori infection in populations of Angola and its evaluation as a public health problem. This is a prospective study conducted in two population groups, one in community environment composed by healthy individuals without specific gastric complaints - Group I, and Group II consisting of patients who went to the Gastroenterology Service of the Hospital Military of Luanda (HMP). As regards to the study in the community detection of H.pylori was carried out by antigen search in faeces using ELISA method. At hospital level H.pylori infection diagnostic methods were: upper gastrointestinal endoscopy to obtain gastric mucosal biopsies for histology, culture and molecular methods. As a non-invasive breath test with labelled urea was used. Group I: the diagnosis of H.pylori infection, by antigens detection in faeces, revealed a frequency of 69.6% in the study population. Whereas in each region, it was found that the Sambizanga region had the highest frequency of positive cases, 81.2% , followed by Dinge with 79.5%, Funda with 78.7 and Capulo with 39.8% being differences statistically significant (p=0.001). The evaluation of the distribution of the infection frequency by age group, revealed that individuals younger than 15 years had a frequency of 63.5% and in individuals older than 15 years, 76%. This study showed that the frequency of H.pylori infection in the regions studied was 70% exception due to Capulo, a coastal zone where despite the poor sanitation conditions; the frequency of H.pylori infection is lower. Group II: from the 309 patients evaluated, it was found that 22 (7%) had a normal mucosa and 287 (93%) a modified mucosa. Histological evaluation of antrum biopsies in 270 samples according to the Sydney System revealed the presence of gastritis in 235 (87.0%), the presence of ulcers in 13 (4.8%) and a tumour in 9 (3 3%). Histological assessment of activity in the gastric antrum of 226 samples, revealed that 129 (57%) had activity and 97 (43%) did not. The evaluation of the 255 corpus biopsies showed in 212 (83.1%), the presence of lesions of gastritis, in 7 (2.7%) tumour lesions and in 2 (0.8%) an ulcer. Of the 263 patients histological evaluated for H.pylori, 148 (58.2%) revealed the presence of this bacteria and 106 (41.7%) were negative. As regards susceptibility to macrolides from the universe of 158 patients with H.pylori, 125 (79.1%) patients had macrolides susceptible strains and 33 (20.9%) resistant strains. Regarding virulence factors (vacA and cagA), it was found that from the 11 patients with ulcers, 7 (63.6%), had a cagA negative strain, being 6 vacA s1, (85.7%) one vacA s2 and 4 (36.3%) with a cagA positive strain vacA s1. Concerning the 2 patients with tumour, both strains were cagA negative, one vacA s1 and other vacA s2. Patients with ulcer and tumour had cagA negative strains vacAs1. From this work, considering the prevalence of H.pylori obtained in health population, it is recommended that the same study should be performed in larger scale to confirm these results. The results of H.pylori genotyping suggest that more comprehensive studies are needed. Given the reduce number gastroenterology specialist in Angola and the lack of diagnostics methods, we recommend a larger study of the effectiveness of follow-up the patient dyspeptic, according to the protocol assessed by the College of Gastroenterology Specialty of the Order of Doctors and Angola already in place in some health institutions.
Resumo:
Einhergehend mit der Entwicklung und zunehmenden Verfügbarkeit des Internets hat sich die Art der Informationsbereitstellung und der Informationsbeschaffung deutlich geändert. Die einstmalige Trennung zwischen Publizist und Konsument wird durch kollaborative Anwendungen des sogenannten Web 2.0 aufgehoben, wo jeder Teilnehmer gleichsam Informationen bereitstellen und konsumieren kann. Zudem können Einträge anderer Teilnehmer erweitert, kommentiert oder diskutiert werden. Mit dem Social Web treten schließlich die sozialen Beziehungen und Interaktionen der Teilnehmer in den Vordergrund. Dank mobiler Endgeräte können zu jeder Zeit und an nahezu jedem Ort Nachrichten verschickt und gelesen werden, neue Bekannschaften gemacht oder der aktuelle Status dem virtuellen Freundeskreis mitgeteilt werden. Mit jeder Aktivität innerhalb einer solchen Applikation setzt sich ein Teilnehmer in Beziehung zu Datenobjekten und/oder anderen Teilnehmern. Dies kann explizit geschehen, indem z.B. ein Artikel geschrieben wird und per E-Mail an Freunde verschickt wird. Beziehungen zwischen Datenobjekten und Nutzern fallen aber auch implizit an, wenn z.B. die Profilseite eines anderen Teilnehmers aufgerufen wird oder wenn verschiedene Teilnehmer einen Artikel ähnlich bewerten. Im Rahmen dieser Arbeit wird ein formaler Ansatz zur Analyse und Nutzbarmachung von Beziehungsstrukturen entwickelt, welcher auf solchen expliziten und impliziten Datenspuren aufbaut. In einem ersten Teil widmet sich diese Arbeit der Analyse von Beziehungen zwischen Nutzern in Applikationen des Social Web unter Anwendung von Methoden der sozialen Netzwerkanalyse. Innerhalb einer typischen sozialen Webanwendung haben Nutzer verschiedene Möglichkeiten zu interagieren. Aus jedem Interaktionsmuster werden Beziehungsstrukturen zwischen Nutzern abgeleitet. Der Vorteil der impliziten Nutzer-Interaktionen besteht darin, dass diese häufig vorkommen und quasi nebenbei im Betrieb des Systems abfallen. Jedoch ist anzunehmen, dass eine explizit angegebene Freundschaftsbeziehung eine stärkere Aussagekraft hat, als entsprechende implizite Interaktionen. Ein erster Schwerpunkt dieser Arbeit ist entsprechend der Vergleich verschiedener Beziehungsstrukturen innerhalb einer sozialen Webanwendung. Der zweite Teil dieser Arbeit widmet sich der Analyse eines der weit verbreitetsten Profil-Attributen von Nutzern in sozialen Webanwendungen, dem Vornamen. Hierbei finden die im ersten Teil vorgestellten Verfahren und Analysen Anwendung, d.h. es werden Beziehungsnetzwerke für Namen aus Daten von sozialen Webanwendungen gewonnen und mit Methoden der sozialen Netzwerkanalyse untersucht. Mithilfe externer Beschreibungen von Vornamen werden semantische Ähnlichkeiten zwischen Namen bestimmt und mit jeweiligen strukturellen Ähnlichkeiten in den verschiedenen Beziehungsnetzwerken verglichen. Die Bestimmung von ähnlichen Namen entspricht in einer praktischen Anwendung der Suche von werdenden Eltern nach einem passenden Vornamen. Die Ergebnisse zu der Analyse von Namensbeziehungen sind die Grundlage für die Implementierung der Namenssuchmaschine Nameling, welche im Rahmen dieser Arbeit entwickelt wurde. Mehr als 35.000 Nutzer griffen innerhalb der ersten sechs Monate nach Inbetriebnahme auf Nameling zu. Die hierbei anfallenden Nutzungsdaten wiederum geben Aufschluss über individuelle Vornamenspräferenzen der Anwender. Im Rahmen dieser Arbeit werden diese Nutzungsdaten vorgestellt und zur Bestimmung sowie Bewertung von personalisierten Vornamensempfehlungen verwendet. Abschließend werden Ansätze zur Diversifizierung von personalisierten Vornamensempfehlungen vorgestellt, welche statische Beziehungsnetzwerke für Namen mit den individuellen Nutzungsdaten verknüpft.