978 resultados para Index Structure
Resumo:
Since multimedia data, such as images and videos, are way more expressive and informative than ordinary text-based data, people find it more attractive to communicate and express with them. Additionally, with the rising popularity of social networking tools such as Facebook and Twitter, multimedia information retrieval can no longer be considered a solitary task. Rather, people constantly collaborate with one another while searching and retrieving information. But the very cause of the popularity of multimedia data, the huge and different types of information a single data object can carry, makes their management a challenging task. Multimedia data is commonly represented as multidimensional feature vectors and carry high-level semantic information. These two characteristics make them very different from traditional alpha-numeric data. Thus, to try to manage them with frameworks and rationales designed for primitive alpha-numeric data, will be inefficient. An index structure is the backbone of any database management system. It has been seen that index structures present in existing relational database management frameworks cannot handle multimedia data effectively. Thus, in this dissertation, a generalized multidimensional index structure is proposed which accommodates the atypical multidimensional representation and the semantic information carried by different multimedia data seamlessly from within one single framework. Additionally, the dissertation investigates the evolving relationships among multimedia data in a collaborative environment and how such information can help to customize the design of the proposed index structure, when it is used to manage multimedia data in a shared environment. Extensive experiments were conducted to present the usability and better performance of the proposed framework over current state-of-art approaches.
Resumo:
Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.
Resumo:
An aggregated farm-level index, the Agri-environmental Footprint Index (AFI), based on multiple criteria methods and representing a harmonised approach to evaluation of EU agri-environmental schemes is described. The index uses a common framework for the design and evaluation of policy that can be customised to locally relevant agri-environmental issues and circumstances. Evaluation can be strictly policy-focused, or broader and more holistic in that context-relevant assessment criteria that are not necessarily considered in the evaluated policy can nevertheless be incorporated. The Index structure is flexible, and can respond to diverse local needs. The process of Index construction is interactive, engaging farmers and other relevant stakeholders in a transparent decision-making process that can ensure acceptance of the outcome, help to forge an improved understanding of local agri-environmental priorities and potentially increase awareness of the critical role of farmers in environmental management. The structure of the AFI facilitates post-evaluation analysis of relative performance in different dimensions of the agri-environment, permitting identification of current strengths and weaknesses, and enabling future improvement in policy design. Quantification of the environmental impact of agriculture beyond the stated aims of policy using an 'unweighted' form of the AFI has potential as the basis of an ongoing system of environmental audit within a specified agricultural context. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.
Resumo:
Moving objects database systems are the most challenging sub-category among Spatio-Temporal database systems. A database system that updates in real-time the location information of GPS-equipped moving vehicles has to meet even stricter requirements. Currently existing data storage models and indexing mechanisms work well only when the number of moving objects in the system is relatively small. This dissertation research aimed at the real-time tracking and history retrieval of massive numbers of vehicles moving on road networks. A total solution has been provided for the real-time update of the vehicles’ location and motion information, range queries on current and history data, and prediction of vehicles’ movement in the near future. To achieve these goals, a new approach called Segmented Time Associated to Partitioned Space (STAPS) was first proposed in this dissertation for building and manipulating the indexing structures for moving objects databases. Applying the STAPS approach, an indexing structure of associating a time interval tree to each road segment was developed for real-time database systems of vehicles moving on road networks. The indexing structure uses affordable storage to support real-time data updates and efficient query processing. The data update and query processing performance it provides is consistent without restrictions such as a time window or assuming linear moving trajectories. An application system design based on distributed system architecture with centralized organization was developed to maximally support the proposed data and indexing structures. The suggested system architecture is highly scalable and flexible. Finally, based on a real-world application model of vehicles moving in region-wide, main issues on the implementation of such a system were addressed.
Resumo:
Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.
Resumo:
Tecnologias da Web Semântica como RDF, OWL e SPARQL sofreram nos últimos anos um forte crescimento e aceitação. Projectos como a DBPedia e Open Street Map começam a evidenciar o verdadeiro potencial da Linked Open Data. No entanto os motores de pesquisa semânticos ainda estão atrasados neste crescendo de tecnologias semânticas. As soluções disponíveis baseiam-se mais em recursos de processamento de linguagem natural. Ferramentas poderosas da Web Semântica como ontologias, motores de inferência e linguagens de pesquisa semântica não são ainda comuns. Adicionalmente a esta realidade, existem certas dificuldades na implementação de um Motor de Pesquisa Semântico. Conforme demonstrado nesta dissertação, é necessária uma arquitectura federada de forma a aproveitar todo o potencial da Linked Open Data. No entanto um sistema federado nesse ambiente apresenta problemas de performance que devem ser resolvidos através de cooperação entre fontes de dados. O standard actual de linguagem de pesquisa na Web Semântica, o SPARQL, não oferece um mecanismo para cooperação entre fontes de dados. Esta dissertação propõe uma arquitectura federada que contém mecanismos que permitem cooperação entre fontes de dados. Aborda o problema da performance propondo um índice gerido de forma centralizada assim como mapeamentos entre os modelos de dados de cada fonte de dados. A arquitectura proposta é modular, permitindo um crescimento de repositórios e funcionalidades simples e de forma descentralizada, à semelhança da Linked Open Data e da própria World Wide Web. Esta arquitectura trabalha com pesquisas por termos em linguagem natural e também com inquéritos formais em linguagem SPARQL. No entanto os repositórios considerados contêm apenas dados em formato RDF. Esta dissertação baseia-se em múltiplas ontologias partilhadas e interligadas.
Resumo:
Contamination of weather radar echoes by anomalous propagation (anaprop) mechanisms remains a serious issue in quality control of radar precipitation estimates. Although significant progress has been made identifying clutter due to anaprop there is no unique method that solves the question of data reliability without removing genuine data. The work described here relates to the development of a software application that uses a numerical weather prediction (NWP) model to obtain the temperature, humidity and pressure fields to calculate the three dimensional structure of the atmospheric refractive index structure, from which a physically based prediction of the incidence of clutter can be made. This technique can be used in conjunction with existing methods for clutter removal by modifying parameters of detectors or filters according to the physical evidence for anomalous propagation conditions. The parabolic equation method (PEM) is a well established technique for solving the equations for beam propagation in a non-uniformly stratified atmosphere, but although intrinsically very efficient, is not sufficiently fast to be practicable for near real-time modelling of clutter over the entire area observed by a typical weather radar. We demonstrate a fast hybrid PEM technique that is capable of providing acceptable results in conjunction with a high-resolution terrain elevation model, using a standard desktop personal computer. We discuss the performance of the method and approaches for the improvement of the model profiles in the lowest levels of the troposphere.
Resumo:
In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets.
Resumo:
Tanto los Sistemas de Información Geográfica como la Recuperación de Información han sido campos de investigación muy importantes en las últimas décadas. Recientemente, un nuevo campo de investigación llamado Recuperación de Información Geográfica ha surgido fruto de la confluencia de estos dos campos. El objetivo principal de este campo es definir estructuras de indexación y técnicas para almacenar y recuperar documentos de manera eficiente empleando tanto las referencias textuales como las referencias geográficas contenidas en el texto. En este artículo presentamos la arquitectura de un sistema para recuperación de información geográfica y definimos el flujo de trabajo para la extracción de las referencias geográficas de los documentos. Presentamos además una nueva estructura de indexación que combina un índice invertido, un índice espacial y una ontología. Esta estructura mejora las capacidades de consulta de otras propuestas
Resumo:
The effects of a non-uniform wind field along the path of a scintillometer are investigated. Theoretical spectra are calculated for a range of scenarios where the crosswind varies in space or time and compared to the ‘ideal’ spectrum based on a constant uniform crosswind. It is verified that the refractive-index structure parameter relation with the scintillometer signal remains valid and invariant for both spatially and temporally-varying crosswinds. However, the spectral shape may change significantly preventing accurate estimation of the crosswind speed from the peak of the frequency spectrum and retrieval of the structure parameter from the plateau of the power spectrum. On comparison with experimental data, non-uniform crosswind conditions could be responsible for previously unexplained features sometimes seen in observed spectra. By accounting for the distribution of crosswind, theoretical spectra can be generated that closely replicate the observations, leading to a better understanding of the measurements. Spatial variability of wind speeds should be expected for paths other than those that are parallel to the surface and over flat, homogenous areas, whilst fluctuations in time are important for all sites.
Resumo:
Simultaneous scintillometer measurements at multiple wavelengths (pairing visible or infrared with millimetre or radio waves) have the potential to provide estimates of path-averaged surface fluxes of sensible and latent heat. Traditionally, the equations to deduce fluxes from measurements of the refractive index structure parameter at the two wavelengths have been formulated in terms of absolute humidity. Here, it is shown that formulation in terms of specific humidity has several advantages. Specific humidity satisfies the requirement for a conserved variable in similarity theory and inherently accounts for density effects misapportioned through the use of absolute humidity. The validity and interpretation of both formulations are assessed and the analogy with open-path infrared gas analyser density corrections is discussed. Original derivations using absolute humidity to represent the influence of water vapour are shown to misrepresent the latent heat flux. The errors in the flux, which depend on the Bowen ratio (larger for drier conditions), may be of the order of 10%. The sensible heat flux is shown to remain unchanged. It is also verified that use of a single scintillometer at optical wavelengths is essentially unaffected by these new formulations. Where it may not be possible to reprocess two-wavelength results, a density correction to the latent heat flux is proposed for scintillometry, which can be applied retrospectively to reduce the error.
Resumo:
In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components: The first component is a 2D vector that reflects a distance range ( minimum and maximum values) of the f features with respect to a reference point ( the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: The first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B+-tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files.
Resumo:
Music similarity query based on acoustic content is becoming important with the ever-increasing growth of the music information from emerging applications such as digital libraries and WWW. However, relative techniques are still in their infancy and much less than satisfactory. In this paper, we present a novel index structure, called Composite Feature tree, CF-tree, to facilitate efficient content-based music search adopting multiple musical features. Before constructing the tree structure, we use PCA to transform the extracted features into a new space sorted by the importance of acoustic features. The CF-tree is a balanced multi-way tree structure where each level represents the data space at different dimensionalities. The PCA transformed data and reduced dimensions in the upper levels can alleviate suffering from dimensionality curse. To accurately mimic human perception, an extension, named CF+-tree, is proposed, which further applies multivariable regression to determine the weight of each individual feature. We conduct extensive experiments to evaluate the proposed structures against state-of-art techniques. The experimental results demonstrate superiority of our technique.
Resumo:
By allowing the estimation of forest structural and biophysical characteristics at different temporal and spatial scales, remote sensing may contribute to our understanding and monitoring of planted forests. Here, we studied 9-year time-series of the Normalized Difference Vegetation Index (NDVI) from the Moderate Resolution Imaging Spectroradiometer (MODIS) on a network of 16 stands in fast-growing Eucalyptus plantations in Sao Paulo State, Brazil. We aimed to examine the relationships between NDVI time-series spanning entire rotations and stand structural characteristics (volume, dominant height, mean annual increment) in these simple forest ecosystems. Our second objective was to examine spatial and temporal variations of light use efficiency for wood production, by comparing time-series of Absorbed Photosynthetically Active Radiation (APAR) with inventory data. Relationships were calibrated between the NDVI and the fractions of intercepted diffuse and direct radiation, using hemispherical photographs taken on the studied stands at two seasons. APAR was calculated from the NDVI time-series using these relationships. Stem volume and dominant height were strongly correlated with summed NDVI values between planting date and inventory date. Stand productivity was correlated with mean NDVI values. APAR during the first 2 years of growth was variable between stands and was well correlated with stem wood production (r(2) = 0.78). In contrast, APAR during the following years was less variable and not significantly correlated with stem biomass increments. Production of wood per unit of absorbed light varied with stand age and with site index. In our study, a better site index was accompanied both by increased APAR during the first 2 years of growth and by higher light use efficiency for stem wood production during the whole rotation. Implications for simple process-based modelling are discussed. (C) 2009 Elsevier B.V. All rights reserved.