979 resultados para Top-k retrieval
Resumo:
We investigate the problem of influence limitation in the presence of competing campaigns in a social network. Given a negative campaign which starts propagating from a specified source and a positive/counter campaign that is initiated, after a certain time delay, to limit the the influence or spread of misinformation by the negative campaign, we are interested in finding the top k influential nodes at which the positive campaign may be triggered. This problem has numerous applications in situations such as limiting the propagation of rumor, arresting the spread of virus through inoculation, initiating a counter-campaign against malicious propaganda, etc. The influence function for the generic influence limitation problem is non-submodular. Restricted versions of the influence limitation problem, reported in the literature, assume submodularity of the influence function and do not capture the problem in a realistic setting. In this paper, we propose a novel computational approach for the influence limitation problem based on Shapley value, a solution concept in cooperative game theory. Our approach works equally effectively for both submodular and non-submodular influence functions. Experiments on standard real world social network datasets reveal that the proposed approach outperforms existing heuristics in the literature. As a non-trivial extension, we also address the problem of influence limitation in the presence of multiple competing campaigns.
Resumo:
随着信息处理技术在通信、金融、工业生产等领域的广泛应用,数据已经不 仅仅拘泥于文件、数据表等传统形式。大量连续、变化的流式数据在越来越多 的现代应用中出现,例如军事指挥、交通控制、传感器数据处理、网络监控、金 融数据分析等。在这些应用中,数据以流的形式不断到达,系统需要对这些数据 进行连续、及时的处理。虽然现有的数据流应用已经收集了大量的流数据,但 是其中用户所关心的事件通常是那些异常事件,因为异常事件往往隐藏着更多 值得关注的信息。为了能够将单个数据流上的异常事件及时、准确的分发到复 合事件检测模块,具有QoS自适应能力的事件通知模型是一个理想的选择。复 合事件的产生,迎合了实际应用中的复杂需求,它通常是由原子事件通过逻辑 连接符和各种操作符连接组合而成。另外,为了能够及时的做出响应,预先定 义的动作应该被触发,例如:发出警报或在该异常事件发生地进行拍照,这就 要求检测系统具有一定的主动性。ECA规则作为“发现-响应”模型的基石,可 以满足上述主动性需求。目前大多数研究集中于对正常事件的离线挖掘与分析, 而对于实时数据流原子异常事件检测技术、具有QoS自适应的事件通知模型和 面向任意顺序数据流的复合事件检测技术研究则比较少,因此本文重点讨论上 述三个方面的内容。 首先,本文系统地分析了数据流应用的需求和特点,并提出一个面向实时 数据流的异常事件检测框架HAPS。HAPS共分为四层,分别为:数据流原子异 常事件检测层、QoS自适应的实时事件通知服务、复合事件检测层和动作执行 层。其次,本文对现有上述四个方面的研究工作进行了全面、详细的综述,分析 了现有研究的不足之处。然后针对上述四个方面的不足之处分别进行了深入的 研究,取得了部分创新性的研究成果。最后实现了框架的原型系统,并使用大 量仿真数据流和真实数据流进行了实验,实验结果表明在这四个方面的研究均 达到了预期的目标。 本文的主要创新点为: 1. 基于局部相关指数,提出一种增量式数据流异常事件检测算法(简称 为incLOCI),时间复杂度仅为O(NlogN)。证明了无论是事件的新增还是过时事件的删除都仅只影响其有限个近邻。 2. 提出了一个“近似”top-k实时事件通知模型,它使用事件内容与订阅 要求之间的相关程度作为匹配标准,在截止期内,自适应的选择“近 似”top-k相关数据。 3. 提出了一个面向任意顺序数据流的复合事件检测模型,该模型支持三种典 型的事件语境,同时还支持聚集函数,设计并实现了模型中使用的数据结 构和算法。 4. 实现了一个面向实时数据流的异常事件检测原型系统。它不仅可以检测实 时数据流中的原子异常事件,还能够通过QoS自适应实时事件通知服务将 原子事件分发到复合事件层,生成复合事件,最后利用RECA规则对异常 情况做出及时响应。 面向实时数据流的异常事件检测技术研究具有较高的应用价值和广阔的应 用前景。本文的研究成果为进一步探讨实时数据流上原子异常事件检测技术和 复合事件检测技术提供了良好的基础。
Resumo:
Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not suffcient. We propose an evidential approach to combining multiple matchers using Dempster-Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Finally it uses some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly accurate and effective.
Resumo:
With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group together satisfy a query.
We define the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords and such that the objects are nearest to the query location and have the smallest inter-object distances. Specifically, we study three instantiations of this problem, all of which are NP-hard. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. In addition, we solve the problems of retrieving top-k groups of three instantiations, and study a weighted version of the problem that incorporates object weights. We present empirical studies that offer insight into the efficiency of the solutions, as well as the accuracy of the approximate solutions.
Resumo:
Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. These geo-textual data cover a wide range of topics. Users are interested in receiving up-to-date geo-textual objects (e.g., geo-tagged Tweets) such that their locations meet users’ need and their texts are interesting to users. For example, a user may want to be updated with tweets near her home on the topic “dengue fever headache.” In this demonstration, we present SOPS, the Spatial-Keyword Publish/Subscribe System, that is capable of efficiently processing spatial keyword continuous queries. SOPS supports two types of queries: (1) Boolean Range Continuous (BRC) query that can be used to subscribe the geo-textual objects satisfying a boolean keyword expression and falling in a specified spatial region; (2) Temporal Spatial-Keyword Top-k Continuous (TaSK) query that continuously maintains up-to-date top-k most relevant results over a stream of geo-textual objects. SOPS enables users to formulate their queries and view the real-time results over a stream of geotextual objects by browser-based user interfaces. On the server side, we propose solutions to efficiently processing a large number of BRC queries (tens of millions) and TaSK queries over a stream of geo-textual objects.
Resumo:
Spatial data mining recently emerges from a number of real applications, such as real-estate marketing, urban planning, weather forecasting, medical image analysis, road traffic accident analysis, etc. It demands for efficient solutions for many new, expensive, and complicated problems. In this paper, we investigate the problem of evaluating the top k distinguished “features” for a “cluster” based on weighted proximity relationships between the cluster and features. We measure proximity in an average fashion to address possible nonuniform data distribution in a cluster. Combining a standard multi-step paradigm with new lower and upper proximity bounds, we presented an efficient algorithm to solve the problem. The algorithm is implemented in several different modes. Our experiment results not only give a comparison among them but also illustrate the efficiency of the algorithm.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.
Resumo:
The subthreshold slope, transconductance, threshold voltage, and hysteresis of a carbon nanotube field-effect transistor (CNT FET) were examined as its configuration was changed from bottom-gate exposed channel, bottom-gate covered channel to top-gate FET. An individual single wall CNT was grown by chemical vapor deposition and its gate configuration was changed while determining its transistor characteristics to ensure that the measurements were not a function of different chirality or diameter CNTs. The bottom-gate exposed CNT FET utilized 900 nm SiO2 as the gate insulator. This CNT FET was then covered with TiO2 to form the bottom-gate covered channel CNT FET. Finally, the top-gate CNT FET was fabricated and the device utilized TiO 2 (K ∼ 80, equivalent oxide thickness=0.25 nm) as the gate insulator. Of the three configurations investigated, the top-gate device exhibited best subthreshold slope (67-70 mV/dec), highest transconductance (1.3 μS), and negligible hysteresis in terms of threshold voltage shift. © 2006 American Institute of Physics.
Resumo:
The retrieval (estimation) of sea surface temperatures (SSTs) from space-based infrared observations is increasingly performed using retrieval coefficients derived from radiative transfer simulations of top-of-atmosphere brightness temperatures (BTs). Typically, an estimate of SST is formed from a weighted combination of BTs at a few wavelengths, plus an offset. This paper addresses two questions about the radiative transfer modeling approach to deriving these weighting and offset coefficients. How precisely specified do the coefficients need to be in order to obtain the required SST accuracy (e.g., scatter <0.3 K in week-average SST, bias <0.1 K)? And how precisely is it actually possible to specify them using current forward models? The conclusions are that weighting coefficients can be obtained with adequate precision, while the offset coefficient will often require an empirical adjustment of the order of a few tenths of a kelvin against validation data. Thus, a rational approach to defining retrieval coefficients is one of radiative transfer modeling followed by offset adjustment. The need for this approach is illustrated from experience in defining SST retrieval schemes for operational meteorological satellites. A strategy is described for obtaining the required offset adjustment, and the paper highlights some of the subtler aspects involved with reference to the example of SST retrievals from the imager on the geostationary satellite GOES-8.
Resumo:
YBaCuO and GdBaCuO + 15 wt% Ag large, single-grain, bulk superconductors have been fabricated via the top-seeded, melt-growth (TSMG) process using a generic NdBCO seed. The mechanical behavior of both materials has been investigated by means of three-point bending (TPB) and transversal tensile tests at 77 and 300 K. The strength, fracture toughness and hardness of the samples were studied for two directions of applied load to obtain comprehensive information about the effect of microstructural anisotropy on the macroscopic and microscopic mechanical properties of these technologically important materials. Splitting (Brazilian) tests were carried out on as-melt-processed cylindrical samples following a standard oxygenation process and with the load applied parallel to the growth-facet lines characteristic of the TSMG process. In addition, the elastic modulus of each material was measured by three different techniques and related to the microstructure of each sample using optical microscopy. The results show that both the mechanical properties and the elastic modulus of both YBCO and GdBCP/Ag are improved at 77 K. However, the GdBCO/Ag samples are less anisotropic and exhibit better mechanical behavior due to the presence of silver particles in the bulk, superconducting matrix. The splitting tensile strength was determined at 77 K and both materials were found to exhibit similar behavior, independently of their differences in microstructure.
Resumo:
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.
Resumo:
This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.
Resumo:
Random Indexing K-tree is the combination of two algorithms suited for large scale document clustering.
Resumo:
This paper describes the approach taken to the clustering task at INEX 2009 by a group at the Queensland University of Technology. The Random Indexing (RI) K-tree has been used with a representation that is based on the semantic markup available in the INEX 2009 Wikipedia collection. The RI K-tree is a scalable approach to clustering large document collections. This approach has produced quality clustering when evaluated using two different methodologies.