Biblioteca Digital

3 resultados para clusters of galaxies

em Duke University

The Excess Burden of Cytomegalovirus in African American Communities: A Geospatial Analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background. Cytomegalovirus (CMV) is a common cause of birth defects and hearing loss in infants and opportunistic infections in the immunocompromised. Previous studies have found higher CMV seroprevalence rates among minorities and among persons with lower socioeconomic status. No studies have investigated the geographic distribution of CMV and its relationship to age, race, and poverty in the community. Methods. We identified patients from 6 North Carolina counties who were tested in the Duke University Health System for CMV immunoglobulin G. We performed spatial statistical analyses to analyze the distributions of seropositive and seronegative individuals. Results. Of 1884 subjects, 90% were either white or African American. Cytomegalovirus seropositivity was significantly more common among African Americans (73% vs 42%; odds ratio, 3.31; 95% confidence interval, 2.7-4.1), and this disparity persisted across the life span. We identified clusters of high and low CMV odds, both of which were largely explained by race. Clusters of high CMV odds were found in communities with high proportions of African Americans. Conclusions. Cytomegalovirus seropositivity is geographically clustered, and its distribution is strongly determined by a community's racial composition. African American communities have high prevalence rates of CMV infection, and there may be a disparate burden of CMV-associated morbidity in these communities.

Veja mais

Studying Recommender Systems to Enhance Distributed Computing Schedulers

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Distributed Computing frameworks belong to a class of programming models that allow developers to

launch workloads on large clusters of machines. Due to the dramatic increase in the volume of

data gathered by ubiquitous computing devices, data analytic workloads have become a common

case among distributed computing applications, making Data Science an entire field of

Computer Science. We argue that Data Scientist's concern lays in three main components: a dataset,

a sequence of operations they wish to apply on this dataset, and some constraint they may have

related to their work (performances, QoS, budget, etc). However, it is actually extremely

difficult, without domain expertise, to perform data science. One need to select the right amount

and type of resources, pick up a framework, and configure it. Also, users are often running their

application in shared environments, ruled by schedulers expecting them to specify precisely their resource

needs. Inherent to the distributed and concurrent nature of the cited frameworks, monitoring and

profiling are hard, high dimensional problems that block users from making the right

configuration choices and determining the right amount of resources they need. Paradoxically, the

system is gathering a large amount of monitoring data at runtime, which remains unused.

In the ideal abstraction we envision for data scientists, the system is adaptive, able to exploit

monitoring data to learn about workloads, and process user requests into a tailored execution

context. In this work, we study different techniques that have been used to make steps toward

such system awareness, and explore a new way to do so by implementing machine learning

techniques to recommend a specific subset of system configurations for Apache Spark applications.

Furthermore, we present an in depth study of Apache Spark executors configuration, which highlight

the complexity in choosing the best one for a given workload.

Veja mais

Algorithms for Geometric Matching, Clustering, and Covering

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the popularization of GPS-enabled devices such as mobile phones, location data are becoming available at an unprecedented scale. The locations may be collected from many different sources such as vehicles moving around a city, user check-ins in social networks, and geo-tagged micro-blogging photos or messages. Besides the longitude and latitude, each location record may also have a timestamp and additional information such as the name of the location. Time-ordered sequences of these locations form trajectories, which together contain useful high-level information about people's movement patterns.

The first part of this thesis focuses on a few geometric problems motivated by the matching and clustering of trajectories. We first give a new algorithm for computing a matching between a pair of curves under existing models such as dynamic time warping (DTW). The algorithm is more efficient than standard dynamic programming algorithms both theoretically and practically. We then propose a new matching model for trajectories that avoids the drawbacks of existing models. For trajectory clustering, we present an algorithm that computes clusters of subtrajectories, which correspond to common movement patterns. We also consider trajectories of check-ins, and propose a statistical generative model, which identifies check-in clusters as well as the transition patterns between the clusters.

The second part of the thesis considers the problem of covering shortest paths in a road network, motivated by an EV charging station placement problem. More specifically, a subset of vertices in the road network are selected to place charging stations so that every shortest path contains enough charging stations and can be traveled by an EV without draining the battery. We first introduce a general technique for the geometric set cover problem. This technique leads to near-linear-time approximation algorithms, which are the state-of-the-art algorithms for this problem in either running time or approximation ratio. We then use this technique to develop a near-linear-time algorithm for this

shortest-path cover problem.

Veja mais

3 resultados para clusters of galaxies

em Duke University

Filtro por publicador

The Excess Burden of Cytomegalovirus in African American Communities: A Geospatial Analysis.

Studying Recommender Systems to Enhance Distributed Computing Schedulers

Algorithms for Geometric Matching, Clustering, and Covering