3 resultados para streaming

em DRUM (Digital Repository at the University of Maryland)


Relevância:

10.00% 10.00%

Publicador:

Resumo:

In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis aims to understand how cells coordinate their motion during collective migration. As previously shown, the motion of individually migrating cells is governed by wave-like cell shape dynamics. The mechanisms that regulate these dynamic behaviors in response to extracellular environment remain largely unclear. I applied shape dynamics analysis to Dictyostelium cells migrating in pairs and in multicellular streams and found that wave-like membrane protrusions are highly coupled between touching cells. I further characterized cell motion by using principle component analysis (PCA) to decompose complex cell shape changes into a serial shape change modes, from which I found that streaming cells exhibit localized anterior protrusion, termed front narrowing, to facilitate cell-cell coupling. I next explored cytoskeleton-based mechanisms of cell-cell coupling by measuring the dynamics of actin polymerization. Actin polymerization waves observed in individual cells were significantly suppressed in multicellular streams. Streaming cells exclusively produced F-actin at cell-cell contact regions, especially at cell fronts. I demonstrated that such restricted actin polymerization is associated with cell-cell coupling, as reducing actin polymerization with Latrunculin A leads to the assembly of F-actin at the side of streams, the decrease of front narrowing, and the decoupling of protrusion waves. My studies also suggest that collective migration is guided by cell-surface interactions. I examined the aggregation of Dictyostelim cells under distinct conditions and found that both chemical compositions of surfaces and surface-adhesion defects in cells result in altered collective migration patterns. I also investigated the shape dynamics of cells suspended on PEG-coated surfaces, which showed that coupling of protrusion waves disappears on touching suspended cells. These observations indicate that collective migration requires a balance between cell-cell and cell-surface adhesions. I hypothesized such a balance is reached via the regulation of cytoskeleton. Indeed, I found cells actively regulate cytoskeleton to retain optimal cell-surface adhesions on varying surfaces, and cells lacking the link between actin and surfaces (talin A) could not retain the optimal adhesions. On the other hand, suspended cells exhibited enhanced actin filament assembly on the periphery of cell groups instead of in cell-cell contact regions, which facilitates their aggregation in a clumping fashion.