5 resultados para Spira, Francesco, -1548.
em Boston University Digital Common
Resumo:
http://www.archive.org/details/spanishpioneersa009682mbp
Resumo:
The SIEGE (Smoking Induced Epithelial Gene Expression) database is a clinical resource for compiling and analyzing gene expression data from epithelial cells of the human intra-thoracic airway. This database supports a translational research study whose goal is to profile the changes in airway gene expression that are induced by cigarette smoke. RNA is isolated from airway epithelium obtained at bronchoscopy from current-, former- and never-smoker subjects, and hybridized to Affymetrix HG-U133A Genechips, which measure the level of expression of ~22 500 human transcripts. The microarray data generated along with relevant patient information is uploaded to SIEGE by study administrators using the database's web interface, found at http://pulm.bumc.bu.edu/siegeDB. PERL-coded scripts integrated with SIEGE perform various quality control functions including the processing, filtering and formatting of stored data. The R statistical package is used to import database expression values and execute a number of statistical analyses including t-tests, correlation coefficients and hierarchical clustering. Values from all statistical analyses can be queried through CGI-based tools and web forms found on the �Search� section of the database website. Query results are embedded with graphical capabilities as well as with links to other databases containing valuable gene resources, including Entrez Gene, GO, Biocarta, GeneCards, dbSNP and the NCBI Map Viewer.
Resumo:
We study the impact of heterogeneity of nodes, in terms of their energy, in wireless sensor networks that are hierarchically clustered. In these networks some of the nodes become cluster heads, aggregate the data of their cluster members and transmit it to the sink. We assume that a percentage of the population of sensor nodes is equipped with additional energy resources-this is a source of heterogeneity which may result from the initial setting or as the operation of the network evolves. We also assume that the sensors are randomly (uniformly) distributed and are not mobile, the coordinates of the sink and the dimensions of the sensor field are known. We show that the behavior of such sensor networks becomes very unstable once the first node dies, especially in the presence of node heterogeneity. Classical clustering protocols assume that all the nodes are equipped with the same amount of energy and as a result, they can not take full advantage of the presence of node heterogeneity. We propose SEP, a heterogeneous-aware protocol to prolong the time interval before the death of the first node (we refer to as stability period), which is crucial for many applications where the feedback from the sensor network must be reliable. SEP is based on weighted election probabilities of each node to become cluster head according to the remaining energy in each node. We show by simulation that SEP always prolongs the stability period compared to (and that the average throughput is greater than) the one obtained using current clustering protocols. We conclude by studying the sensitivity of our SEP protocol to heterogeneity parameters capturing energy imbalance in the network. We found that SEP yields longer stability region for higher values of extra energy brought by more powerful nodes.
Resumo:
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications. In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random. Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.
Resumo:
Large probabilistic graphs arise in various domains spanning from social networks to biological and communication networks. An important query in these graphs is the k nearest-neighbor query, which involves finding and reporting the k closest nodes to a specific node. This query assumes the existence of a measure of the "proximity" or the "distance" between any two nodes in the graph. To that end, we propose various novel distance functions that extend well known notions of classical graph theory, such as shortest paths and random walks. We argue that many meaningful distance functions are computationally intractable to compute exactly. Thus, in order to process nearest-neighbor queries, we resort to Monte Carlo sampling and exploit novel graph-transformation ideas and pruning opportunities. In our extensive experimental analysis, we explore the trade-offs of our approximation algorithms and demonstrate that they scale well on real-world probabilistic graphs with tens of millions of edges.