879 resultados para 280103 Information Storage, Retrieval and Management


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Terrain can be approximated by a triangular mesh consisting millions of 3D points. Multiresolution triangular mesh (MTM) structures are designed to support applications that use terrain data at variable levels of detail (LOD). Typically, an MTM adopts a tree structure where a parent node represents a lower-resolution approximation of its descendants. Given a region of interest (ROI) and a LOD, the process of retrieving the required terrain data from the database is to traverse the MTM tree from the root to reach all the nodes satisfying the ROI and LOD conditions. This process, while being commonly used for multiresolution terrain visualization, is inefficient as either a large number of sequential I/O operations or fetching a large amount of extraneous data is incurred. Various spatial indexes have been proposed in the past to address this problem, however level-by-level tree traversal remains a common practice in order to obtain topological information among the retrieved terrain data. A new MTM data structure called direct mesh is proposed. We demonstrate that with direct mesh the amount of data retrieval can be substantially reduced. Comparing with existing MTM indexing methods, a significant performance improvement has been observed for real-life terrain data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The vision presented in this paper and its technical content are a result of close collaboration between several researchers from the University of Queensland, Australia and the SAP Corporate Research Center, Brisbane, Australia. In particular; Dr Wasim Sadiq (SAP), Dr Shazia Sadiq (UQ), and Dr Karsten Schultz (SAP) are the prime contributors to the ideas presented. Also, PhD students Mr Dat Ma Cao and Ms Belinda Carter are involved in the research program. Additionally, the Australian Research Council Discovery Project Scheme and Australian Research Council Linkage Project Scheme support some aspects of research work towards the HMT solution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A major task of traditional temporal event sequence mining is to predict the occurrences of a special type of event (called target event) in a long temporal sequence. Our previous work has defined a new type of pattern, called event-oriented pattern, which can potentially predict the target event within a certain period of time. However, in the event-oriented pattern discovery, because the size of interval for prediction is pre-defined, the mining results could be inaccurate and carry misleading information. In this paper, we introduce a new concept, called temporal feature, to rectify this shortcoming. Generally, for any event-oriented pattern discovered under the pre-given size of interval, the temporal feature is the minimal size of interval that makes the pattern interesting. Thus, by further investigating the temporal features of discovered event-oriented patterns, we can refine the knowledge for the target event prediction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sharing data among organizations often leads to mutual benefit. Recent technology in data mining has enabled efficient extraction of knowledge from large databases. This, however, increases risks of disclosing the sensitive knowledge when the database is released to other parties. To address this privacy issue, one may sanitize the original database so that the sensitive knowledge is hidden. The challenge is to minimize the side effect on the quality of the sanitized database so that nonsensitive knowledge can still be mined. In this paper, we study such a problem in the context of hiding sensitive frequent itemsets by judiciously modifying the transactions in the database. To preserve the non-sensitive frequent itemsets, we propose a border-based approach to efficiently evaluate the impact of any modification to the database during the hiding process. The quality of database can be well maintained by greedily selecting the modifications with minimal side effect. Experiments results are also reported to show the effectiveness of the proposed approach. © 2005 IEEE

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collaborative filtering is regarded as one of the most promising recommendation algorithms. The item-based approaches for collaborative filtering identify the similarity between two items by comparing users' ratings on them. In these approaches, ratings produced at different times are weighted equally. That is to say, changes in user purchase interest are not taken into consideration. For example, an item that was rated recently by a user should have a bigger impact on the prediction of future user behaviour than an item that was rated a long time ago. In this paper, we present a novel algorithm to compute the time weights for different items in a manner that will assign a decreasing weight to old data. More specifically, the users' purchase habits vary. Even the same user has quite different attitudes towards different items. Our proposed algorithm uses clustering to discriminate between different kinds of items. To each item cluster, we trace each user's purchase interest change and introduce a personalized decay factor according to the user own purchase behaviour. Empirical studies have shown that our new algorithm substantially improves the precision of item-based collaborative filtering without introducing higher order computational complexity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conventionally, document classification researches focus on improving the learning capabilities of classifiers. Nevertheless, according to our observation, the effectiveness of classification is limited by the suitability of document representation. Intuitively, the more features that are used in representation, the more comprehensive that documents are represented. However, if a representation contains too many irrelevant features, the classifier would suffer from not only the curse of high dimensionality, but also overfitting. To address this problem of suitableness of document representations, we present a classifier-independent approach to measure the effectiveness of document representations. Our approach utilises a labelled document corpus to estimate the distribution of documents in the feature space. By looking through documents in this way, we can clearly identify the contributions made by different features toward the document classification. Some experiments have been performed to show how the effectiveness is evaluated. Our approach can be used as a tool to assist feature selection, dimensionality reduction and document classification.