952 resultados para Select top-k patterns
Resumo:
Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.
Resumo:
Effective conservation and management of top predators requires a comprehensive understanding of their distributions and of the underlying biological and physical processes that affect these distributions. The Mid-Atlantic Bight shelf break system is a dynamic and productive region where at least 32 species of cetaceans have been recorded through various systematic and opportunistic marine mammal surveys from the 1970s through 2012. My dissertation characterizes the spatial distribution and habitat of cetaceans in the Mid-Atlantic Bight shelf break system by utilizing marine mammal line-transect survey data, synoptic multi-frequency active acoustic data, and fine-scale hydrographic data collected during the 2011 summer Atlantic Marine Assessment Program for Protected Species (AMAPPS) survey. Although studies describing cetacean habitat and distributions have been previously conducted in the Mid-Atlantic Bight, my research specifically focuses on the shelf break region to elucidate both the physical and biological processes that influence cetacean distribution patterns within this cetacean hotspot.
In Chapter One I review biologically important areas for cetaceans in the Atlantic waters of the United States. I describe the study area, the shelf break region of the Mid-Atlantic Bight, in terms of the general oceanography, productivity and biodiversity. According to recent habitat-based cetacean density models, the shelf break region is an area of high cetacean abundance and density, yet little research is directed at understanding the mechanisms that establish this region as a cetacean hotspot.
In Chapter Two I present the basic physical principles of sound in water and describe the methodology used to categorize opportunistically collected multi-frequency active acoustic data using frequency responses techniques. Frequency response classification methods are usually employed in conjunction with net-tow data, but the logistics of the 2011 AMAPPS survey did not allow for appropriate net-tow data to be collected. Biologically meaningful information can be extracted from acoustic scattering regions by comparing the frequency response curves of acoustic regions to theoretical curves of known scattering models. Using the five frequencies on the EK60 system (18, 38, 70, 120, and 200 kHz), three categories of scatterers were defined: fish-like (with swim bladder), nekton-like (e.g., euphausiids), and plankton-like (e.g., copepods). I also employed a multi-frequency acoustic categorization method using three frequencies (18, 38, and 120 kHz) that has been used in the Gulf of Maine and Georges Bank which is based the presence or absence of volume backscatter above a threshold. This method is more objective than the comparison of frequency response curves because it uses an established backscatter value for the threshold. By removing all data below the threshold, only strong scattering information is retained.
In Chapter Three I analyze the distribution of the categorized acoustic regions of interest during the daytime cross shelf transects. Over all transects, plankton-like acoustic regions of interest were detected most frequently, followed by fish-like acoustic regions and then nekton-like acoustic regions. Plankton-like detections were the only significantly different acoustic detections per kilometer, although nekton-like detections were only slightly not significant. Using the threshold categorization method by Jech and Michaels (2006) provides a more conservative and discrete detection of acoustic scatterers and allows me to retrieve backscatter values along transects in areas that have been categorized. This provides continuous data values that can be integrated at discrete spatial increments for wavelet analysis. Wavelet analysis indicates significant spatial scales of interest for fish-like and nekton-like acoustic backscatter range from one to four kilometers and vary among transects.
In Chapter Four I analyze the fine scale distribution of cetaceans in the shelf break system of the Mid-Atlantic Bight using corrected sightings per trackline region, classification trees, multidimensional scaling, and random forest analysis. I describe habitat for common dolphins, Risso’s dolphins and sperm whales. From the distribution of cetacean sightings, patterns of habitat start to emerge: within the shelf break region of the Mid-Atlantic Bight, common dolphins were sighted more prevalently over the shelf while sperm whales were more frequently found in the deep waters offshore and Risso’s dolphins were most prevalent at the shelf break. Multidimensional scaling presents clear environmental separation among common dolphins and Risso’s dolphins and sperm whales. The sperm whale random forest habitat model had the lowest misclassification error (0.30) and the Risso’s dolphin random forest habitat model had the greatest misclassification error (0.37). Shallow water depth (less than 148 meters) was the primary variable selected in the classification model for common dolphin habitat. Distance to surface density fronts and surface temperature fronts were the primary variables selected in the classification models to describe Risso’s dolphin habitat and sperm whale habitat respectively. When mapped back into geographic space, these three cetacean species occupy different fine-scale habitats within the dynamic Mid-Atlantic Bight shelf break system.
In Chapter Five I present a summary of the previous chapters and present potential analytical steps to address ecological questions pertaining the dynamic shelf break region. Taken together, the results of my dissertation demonstrate the use of opportunistically collected data in ecosystem studies; emphasize the need to incorporate middle trophic level data and oceanographic features into cetacean habitat models; and emphasize the importance of developing more mechanistic understanding of dynamic ecosystems.
Resumo:
Schinus terebinthifolius Raddi (Schinus) is one of the most widely found woody exotic species in South Florida. This exotic is distributed across environments with different hydrologic regimes, from upland pine forests to the edges of sawgrass marshes and into saline mangrove forests. To determine if this invasive exotic had different physiological attributes compared to native species in a coastal habitat, we measured predawn xylem water potentials (Ψ), oxygen stable isotope signatures (δ18O), and sodium (Na+) and potassium (K+) contents of sap water from plants within: (1) a transition zone (between a mangrove forest and upland pineland) and (2) an upland pineland in Southwest Florida. Under dynamic salinity and hydrologic conditions, Ψ of Schinus appeared less subject to fluctuations caused by seasonality when compared with native species. Although stem water δ18O values could not be used to distinguish the depth of Schinus and native species' water uptake in the transition zone, Ψ and sap Na+/K+ patterns showed that Schinus was less of a salt excluder relative to the native upland species during the dry season. This exotic also exhibited Na+/K+ ratios similar to the mangrove species, indicating some salinity tolerance. In the upland pineland, Schinus water uptake patterns were not significantly different from those of native species. Differences between Schinus and native upland species, however, may provide this exotic an advantage over native species within mangrove transition zones.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Computational fluid dynamics, CFD, is becoming an essential tool in the prediction of the hydrodynamic efforts and flow characteristics of underwater vehicles for manoeuvring studies. However, when applied to the manoeuvrability of autonomous underwater vehicles, AUVs, most studies have focused on the de- termination of static coefficients without considering the effects of the vehicle control surface deflection. This paper analyses the hydrodynamic efforts generated on an AUV considering the combined effects of the control surface deflection and the angle of attack using CFD software based on the Reynolds-averaged Navier–Stokes formulations. The CFD simulations are also independently conducted for the AUV bare hull and control surface to better identify their individual and interference efforts and to validate the simulations by comparing the experimental results obtained in a towing tank. Several simulations of the bare hull case were conducted to select the k –ω SST turbulent model with the viscosity approach that best predicts its hydrodynamic efforts. Mesh sensitivity analyses were conducted for all simulations. For the flow around the control surfaces, the CFD results were analysed according to two different methodologies, standard and nonlinear. The nonlinear regression methodology provides better results than the standard methodology does for predicting the stall at the control surface. The flow simulations have shown that the occurrence of the control surface stall depends on a linear relationship between the angle of attack and the control surface deflection. This type of information can be used in designing the vehicle’s autopilot system.
Resumo:
Spatial data mining recently emerges from a number of real applications, such as real-estate marketing, urban planning, weather forecasting, medical image analysis, road traffic accident analysis, etc. It demands for efficient solutions for many new, expensive, and complicated problems. In this paper, we investigate the problem of evaluating the top k distinguished “features” for a “cluster” based on weighted proximity relationships between the cluster and features. We measure proximity in an average fashion to address possible nonuniform data distribution in a cluster. Combining a standard multi-step paradigm with new lower and upper proximity bounds, we presented an efficient algorithm to solve the problem. The algorithm is implemented in several different modes. Our experiment results not only give a comparison among them but also illustrate the efficiency of the algorithm.
Resumo:
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
Resumo:
We build a system to support search and visualization on heterogeneous information networks. We first build our system on a specialized heterogeneous information network: DBLP. The system aims to facilitate people, especially computer science researchers, toward a better understanding and user experience about academic information networks. Then we extend our system to the Web. Our results are much more intuitive and knowledgeable than the simple top-k blue links from traditional search engines, and bring more meaningful structural results with correlated entities. We also investigate the ranking algorithm, and we show that the personalized PageRank and proposed Hetero-personalized PageRank outperform the TF-IDF ranking or mixture of TF-IDF and authority ranking. Our work opens several directions for future research.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.
Resumo:
In this ambitious book, Burgoon, Stern, and Dillman present the most comprehensive coverage of the literature on interpersonal adaptation that I have seen in recent years. Their mission is to make a critical examination of this whole area from both theoretical and methodological perspectives, and then to present their own synthetic theory (interpersonal adaptation theory, IAT) and research agenda. Such a mission produces very high expectations in readers, and inevitably some readers will feel that the authors do not achieve all of it. Personally, I was impressed by how much they do achieve, and I was intrigued by the questions they did not answer. One can ask no more than this of any single book.
Resumo:
Diseases and insect pests are major causes of low yields of common bean (Phaseolus vulgaris L.) in Latin America and Africa. Anthracnose, angular leaf spot and common bacterial blight are widespread foliar diseases of common bean that also infect pods and seeds. One thousand and eighty-two accessions from a common bean core collection from the primary centres of origin were investigated for reaction to these three diseases. Angular leaf spot and common bacterial blight were evaluated in the field at Santander de Quilichao, Colombia, and anthracnose was evaluated in a screenhouse in Popayan, Colombia. By using the 15-group level from a hierarchical clustering procedure, it was found that 7 groups were formed with mainly Andean common bean accessions (Andean gene pool), 7 groups with mainly Middle American accessions (Middle American gene pool), while 1 group contained mixed accessions. Consistent with the theory of co-evolution, it was generally observed that accessions from the Andean gene pool were resistant to Middle American pathogen isolates causing anthracnoxe, while the Middle American accessions were resistant to pathogen isolates from the Andes. Different combinations of resistance patterns were found, and breeders can use this information to select a specific group of accessions on the basis of their need.
Resumo:
Four Trypanosoma cruzi strains from zymodermes A, B, C and D were successively clonedon BHI-LIT-agar-blood BLAB). Twenty clones from the first generation (F1), 10 from The second (F2) and 4 from the third (F3) from the strains A138, B147 and C23 were isolated. The D150 strain provied 29 F1 and F2 clones. The strains and clones had their isoenzyme and K-DNA patterns determined. The clones from A138, Bl47 and C231 strains presented isoemzyme and K-DNA patterns identical between thewmselves and their respective parental strains. Therefore showing the homogenety and stability of isoenzyme and K-DNA patterns after successive cloning. The Dl50 strain from zymodeme D (ZD) showed heterogeneity. Twenty-eight out of 29 clones of the first generation were of zymodeme A and only one was of zymodeme C, confirming previous reports that ZD strains consisted of ZA and ZC parasite populations. The only D150 strain clone of zymodeme C showed a K-DNA pattern identical to its parental strain. The remining clones although similar among themselves were different from the parental strain. Thus the T. cruzi strains had either homonogeneus or heterogeneous populations. The clones produced by successive cloning provided genetically homonogeous populations. Their experimental use will make future results more reliable and reproducible.
Resumo:
In many fields, the spatial clustering of sampled data points has many consequences. Therefore, several indices have been proposed to assess the level of clustering affecting datasets (e.g. the Morisita index, Ripley's Kfunction and Rényi's generalized entropy). The classical Morisita index measures how many times it is more likely to select two measurement points from the same quadrats (the data set is covered by a regular grid of changing size) than it would be in the case of a random distribution generated from a Poisson process. The multipoint version (k-Morisita) takes into account k points with k >= 2. The present research deals with a new development of the k-Morisita index for (1) monitoring network characterization and for (2) detection of patterns in monitored phenomena. From a theoretical perspective, a connection between the k-Morisita index and multifractality has also been found and highlighted on a mathematical multifractal set.
Resumo:
Voltage-gated K+ channels of the Kv3 subfamily have unusual electrophysiological properties, including activation at very depolarized voltages (positive to −10 mV) and very fast deactivation rates, suggesting special roles in neuronal excitability. In the brain, Kv3 channels are prominently expressed in select neuronal populations, which include fast-spiking (FS) GABAergic interneurons of the neocortex, hippocampus, and caudate, as well as other high-frequency firing neurons. Although evidence points to a key role in high-frequency firing, a definitive understanding of the function of these channels has been hampered by a lack of selective pharmacological tools. We therefore generated mouse lines in which one of the Kv3 genes, Kv3.2, was disrupted by gene-targeting methods. Whole-cell electrophysiological recording showed that the ability to fire spikes at high frequencies was impaired in immunocytochemically identified FS interneurons of deep cortical layers (5-6) in which Kv3.2 proteins are normally prominent. No such impairment was found for FS neurons of superficial layers (2-4) in which Kv3.2 proteins are normally only weakly expressed. These data directly support the hypothesis that Kv3 channels are necessary for high-frequency firing. Moreover, we found that Kv3.2 −/− mice showed specific alterations in their cortical EEG patterns and an increased susceptibility to epileptic seizures consistent with an impairment of cortical inhibitory mechanisms. This implies that, rather than producing hyperexcitability of the inhibitory interneurons, Kv3.2 channel elimination suppresses their activity. These data suggest that normal cortical operations depend on the ability of inhibitory interneurons to generate high-frequency firing.