Biblioteca Digital

949 resultados para Readability, Text pre-processing

Gaussian anamorphosis in the analysis step of the EnKF: a joint state-variable/observation approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The analysis step of the (ensemble) Kalman filter is optimal when (1) the distribution of the background is Gaussian, (2) state variables and observations are related via a linear operator, and (3) the observational error is of additive nature and has Gaussian distribution. When these conditions are largely violated, a pre-processing step known as Gaussian anamorphosis (GA) can be applied. The objective of this procedure is to obtain state variables and observations that better fulfil the Gaussianity conditions in some sense. In this work we analyse GA from a joint perspective, paying attention to the effects of transformations in the joint state variable/observation space. First, we study transformations for state variables and observations that are independent from each other. Then, we introduce a targeted joint transformation with the objective to obtain joint Gaussianity in the transformed space. We focus primarily in the univariate case, and briefly comment on the multivariate one. A key point of this paper is that, when (1)-(3) are violated, using the analysis step of the EnKF will not recover the exact posterior density in spite of any transformations one may perform. These transformations, however, provide approximations of different quality to the Bayesian solution of the problem. Using an example in which the Bayesian posterior can be analytically computed, we assess the quality of the analysis distributions generated after applying the EnKF analysis step in conjunction with different GA options. The value of the targeted joint transformation is particularly clear for the case when the prior is Gaussian, the marginal density for the observations is close to Gaussian, and the likelihood is a Gaussian mixture.

Development of a wind gust model to estimate gust speeds and their return periods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatially dense observations of gust speeds are necessary for various applications, but their availability is limited in space and time. This work presents an approach to help to overcome this problem. The main objective is the generation of synthetic wind gust velocities. With this aim, theoretical wind and gust distributions are estimated from 10 yr of hourly observations collected at 123 synoptic weather stations provided by the German Weather Service. As pre-processing, an exposure correction is applied on measurements of the mean wind velocity to reduce the influence of local urban and topographic effects. The wind gust model is built as a transfer function between distribution parameters of wind and gust velocities. The aim of this procedure is to estimate the parameters of gusts at stations where only wind speed data is available. These parameters can be used to generate synthetic gusts, which can improve the accuracy of return periods at test sites with a lack of observations. The second objective is to determine return periods much longer than the nominal length of the original time series by considering extreme value statistics. Estimates for both local maximum return periods and average return periods for single historical events are provided. The comparison of maximum and average return periods shows that even storms with short average return periods may lead to local wind gusts with return periods of several decades. Despite uncertainties caused by the short length of the observational records, the method leads to consistent results, enabling a wide range of possible applications.

A survey of data mining techniques for social media analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Hand Gesture Detection & Recognition System

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The project introduces an application using computer vision for Hand gesture recognition. A camera records a live video stream, from which a snapshot is taken with the help of interface. The system is trained for each type of count hand gestures (one, two, three, four, and five) at least once. After that a test gesture is given to it and the system tries to recognize it.A research was carried out on a number of algorithms that could best differentiate a hand gesture. It was found that the diagonal sum algorithm gave the highest accuracy rate. In the preprocessing phase, a self-developed algorithm removes the background of each training gesture. After that the image is converted into a binary image and the sums of all diagonal elements of the picture are taken. This sum helps us in differentiating and classifying different hand gestures.Previous systems have used data gloves or markers for input in the system. I have no such constraints for using the system. The user can give hand gestures in view of the camera naturally. A completely robust hand gesture recognition system is still under heavy research and development; the implemented system serves as an extendible foundation for future work.

Real time video segmentation for recognising paint marks on bad wooden railway sleepers

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wooden railway sleeper inspections in Sweden are currently performed manually by a human operator; such inspections are based on visual analysis. Machine vision based approach has been done to emulate the visual abilities of human operator to enable automation of the process. Through this process bad sleepers are identified, and a spot is marked on it with specific color (blue in the current case) on the rail so that the maintenance operators are able to identify the spot and replace the sleeper. The motive of this thesis is to help the operators to identify those sleepers which are marked by color (spots), using an “Intelligent Vehicle” which is capable of running on the track. Capturing video while running on the track and segmenting the object of interest (spot) through this vehicle; we can automate this work and minimize the human intuitions. The video acquisition process depends on camera position and source light to obtain fine brightness in acquisition, we have tested 4 different types of combinations (camera position and source light) here to record the video and test the validity of proposed method. A sequence of real time rail frames are extracted from these videos and further processing (depending upon the data acquisition process) is done to identify the spots. After identification of spot each frame is divided in to 9 regions to know the particular region where the spot lies to avoid overlapping with noise, and so on. The proposed method will generate the information regarding in which region the spot lies, based on nine regions in each frame. From the generated results we have made some classification regarding data collection techniques, efficiency, time and speed. In this report, extensive experiments using image sequences from particular camera are reported and the experiments were done using intelligent vehicle as well as test vehicle and the results shows that we have achieved 95% success in identifying the spots when we use video as it is, in other method were we can skip some frames in pre-processing to increase the speed of video but the segmentation results we reduced to 85% and the time was very less compared to previous one. This shows the validity of proposed method in identification of spots lying on wooden railway sleepers where we can compromise between time and efficiency to get the desired result.

Simulation of four stroke engine cycle for a 4 - valve pentroof engine in KIVA 3VR2

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main purpose of this project is to understand the process of engine simulation using the open source CFD code called KIVA. This report mainly discusses the simulation of the 4-valve Pentroof engine through KIVA 3VR2. KIVA is an open source FORTRAN code which is used to solve the fluid flow field in the engines with the transient 2D and 3D chemically reactive flow with spray. It also focuses on the complete procedure to simulate an engine cycle starting from pre- processing until the final results. This report will serve a handbook for the using the KIVA code.

Carbon dioxide sequestration in cement kiln dust through miner carbonation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The feasibility of carbon sequestration in cement kiln dust (CKD) was investigated in a series of batch and column experiments conducted under ambient temperature and pressure conditions. The significance of this work is the demonstration that alkaline wastes, such as CKD, are highly reactive with carbon dioxide (CO2). In the presence of water, CKD can sequester greater than 80% of its theoretical capacity for carbon without any amendments or modifications to the waste. Other mineral carbonation technologies for carbon sequestration rely on the use of mined mineral feedstocks as the source of oxides. The mining, pre-processing and reaction conditions needed to create favorable carbonation kinetics all require significant additions of energy to the system. Therefore, their actual net reduction in CO2 is uncertain. Many suitable alkaline wastes are produced at sites that also generate significant quantities of CO2. While independently, the reduction in CO2 emissions from mineral carbonation in CKD is small (~13% of process related emissions), when this technology is applied to similar wastes of other industries, the collective net reduction in emissions may be significant. The technical investigations presented in this dissertation progress from proof of feasibility through examination of the extent of sequestration in core samples taken from an aged CKD waste pile, to more fundamental batch and microscopy studies which analyze the rates and mechanisms controlling mineral carbonation reactions in a variety of fresh CKD types. Finally, the scale of the system was increased to assess the sequestration efficiency under more pilot or field-scale conditions and to clarify the importance of particle-scale processes under more dynamic (flowing gas) conditions. A comprehensive set of material characterization methods, including thermal analysis, Xray diffraction, and X-ray fluorescence, were used to confirm extents of carbonation and to better elucidate those compositional factors controlling the reactions. The results of these studies show that the rate of carbonation in CKD is controlled by the extent of carbonation. With increased degrees of conversion, particle-scale processes such as intraparticle diffusion and CaCO3 micropore precipitation patterns begin to limit the rate and possibly the extent of the reactions. Rates may also be influenced by the nature of the oxides participating in the reaction, slowing when the free or unbound oxides are consumed and reaction conditions shift towards the consumption of less reactive Ca species. While microscale processes and composition affects appear to be important at later times, the overall degrees of carbonation observed in the wastes were significant (> 80%), a majority of which occurs within the first 2 days of reaction. Under the operational conditions applied in this study, the degree of carbonation in CKD achieved in column-scale systems was comparable to those observed under ideal batch conditions. In addition, the similarity in sequestration performance among several different CKD waste types indicates that, aside from available oxide content, no compositional factors significantly hinder the ability of the waste to sequester CO2.

Ansätze zur Verbesserung von Qualität und Wirtschaftlichkeit bei generativen Verfahren durch Optimierung des Pre-Processes

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mittels generativer Fertigung ist es heute möglich die, Entwicklungszeit und Ferti-gungsdauer von Prototypen, Produkten und Werkzeugen zu verkürzen. Neben dieser Zeitersparnis sind die im Vergleich zu konventionellen Fertigungsverfahren unwe-sentlichen Geometriebeschränkungen für den Anwender von besonderem Interesse. Dieses Alleinstellungsmerkmal der generativen Fertigung macht es möglich auch komplexe Geometrie wirtschaftlich herzustellen. Voraussetzung für eine wirtschaftli-che und fehlerminimierte Fertigung ist hierbei eine möglichst optimale Prozessvorbe-reitung (Pre-Processing). Dabei sind insbesondere die Schritte der Bauteilorientie-rung, der Stützkonstruktionserzeugung, der Schichtzerlegung sowie der Bauraum-ausnutzung von Interesse. Auch wenn diese Punkte wesentlich zur Qualität und Wirtschaftlichkeit beitragen, sind die Erkenntnisse für den unerfahrenen Anwender nur unzureichend dokumentiert, wodurch eine möglichst effiziente Fertigung zu-nächst ausgeschlossen werden kann. Anhand unterschiedlicher Beispiele sollen dem Anwender hier die Möglichkeiten zur Optimierung dieser Pre-Processing Schritte er-läutert werden. In diesem Rahmen werden die aktuellen Forschungsergebnisse des Lehrstuhls Rechnereinsatz in der Konstruktion, Institut für Produkt Engineering der Universität Duisburg-Essen in Bezug auf die Optimierung der Bauteilorientierung, der variablen Schichtzerlegung und der Optimierung der Bauraumausnutzung vorgestellt.

Automatic quantitation of multiple sclerosis lesions on MR images

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A two-pronged approach for the automatic quantitation of multiple sclerosis (MS) lesions on magnetic resonance (MR) images has been developed. This method includes the design and use of a pulse sequence for improved lesion-to-tissue contrast (LTC) and seeks to identify and minimize the sources of false lesion classifications in segmented images. The new pulse sequence, referred to as AFFIRMATIVE (Attenuation of Fluid by Fast Inversion Recovery with MAgnetization Transfer Imaging with Variable Echoes), improves the LTC, relative to spin-echo images, by combining Fluid-Attenuated Inversion Recovery (FLAIR) and Magnetization Transfer Contrast (MTC). In addition to acquiring fast FLAIR/MTC images, the AFFIRMATIVE sequence simultaneously acquires fast spin-echo (FSE) images for spatial registration of images, which is necessary for accurate lesion quantitation. Flow has been found to be a primary source of false lesion classifications. Therefore, an imaging protocol and reconstruction methods are developed to generate "flow images" which depict both coherent (vascular) and incoherent (CSF) flow. An automatic technique is designed for the removal of extra-meningeal tissues, since these are known to be sources of false lesion classifications. A retrospective, three-dimensional (3D) registration algorithm is implemented to correct for patient movement which may have occurred between AFFIRMATIVE and flow imaging scans. Following application of these pre-processing steps, images are segmented into white matter, gray matter, cerebrospinal fluid, and MS lesions based on AFFIRMATIVE and flow images using an automatic algorithm. All algorithms are seamlessly integrated into a single MR image analysis software package. Lesion quantitation has been performed on images from 15 patient volunteers. The total processing time is less than two hours per patient on a SPARCstation 20. The automated nature of this approach should provide an objective means of monitoring the progression, stabilization, and/or regression of MS lesions in large-scale, multi-center clinical trials. ^

Multi-temporal mapping of seagrass cover, species and biomass of the Eastern Banks, Moreton Bay, Australia, with links to shapefiles

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The spatial and temporal dynamics of seagrasses have been studied from the leaf to patch (100 m**2) scales. However, landscape scale (> 100 km**2) seagrass population dynamics are unresolved in seagrass ecology. Previous remote sensing approaches have lacked the temporal or spatial resolution, or ecologically appropriate mapping, to fully address this issue. This paper presents a robust, semi-automated object-based image analysis approach for mapping dominant seagrass species, percentage cover and above ground biomass using a time series of field data and coincident high spatial resolution satellite imagery. The study area was a 142 km**2 shallow, clear water seagrass habitat (the Eastern Banks, Moreton Bay, Australia). Nine data sets acquired between 2004 and 2013 were used to create seagrass species and percentage cover maps through the integration of seagrass photo transect field data, and atmospherically and geometrically corrected high spatial resolution satellite image data (WorldView-2, IKONOS and Quickbird-2) using an object based image analysis approach. Biomass maps were derived using empirical models trained with in-situ above ground biomass data per seagrass species. Maps and summary plots identified inter- and intra-annual variation of seagrass species composition, percentage cover level and above ground biomass. The methods provide a rigorous approach for field and image data collection and pre-processing, a semi-automated approach to extract seagrass species and cover maps and assess accuracy, and the subsequent empirical modelling of seagrass biomass. The resultant maps provide a fundamental data set for understanding landscape scale seagrass dynamics in a shallow water environment. Our findings provide proof of concept for the use of time-series analysis of remotely sensed seagrass products for use in seagrass ecology and management.

Reflectances transect from coast to open sea, Baltic Sea

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, retrievals of the medium resolution imaging spectrometer (MERIS) reflectances and water quality products using 4 different coastal processing algorithms freely available are assessed by comparison against sea-truthing data. The study is based on a pair-wise comparison using processor-dependent quality flags for the retrieval of valid common macro-pixels. This assessment is required in order to ensure the reliability of monitoring systems based on MERIS data, such as the Swedish coastal and lake monitoring system (http.vattenkvalitet.se). The results show that the pre-processing with the Improved Contrast between Ocean and Land (ICOL) processor, correcting for adjacency effects, improve the retrieval of spectral reflectance for all processors, Therefore, it is recommended that the ICOL processor should be applied when Baltic coastal waters are investigated. Chlorophyll was retrieved best using the FUB (Free University of Berlin) processing algorithm, although overestimations in the range 18-26.5%, dependent on the compared pairs, were obtained. At low chlorophyll concentrations (< 2.5 mg/m**3), random errors dominated in the retrievals with the MEGS (MERIS ground segment processor) processor. The lowest bias and random errors were obtained with MEGS for suspended particulate matter, for which overestimations in te range of 8-16% were found. Only the FUB retrieved CDOM (Coloured Dissolved Organic Matter) correlate with in situ values. However, a large systematic underestimation appears in the estimates that nevertheless may be corrected for by using a~local correction factor. The MEGS has the potential to be used as an operational processing algorithm for the Himmerfjärden bay and adjacent areas, but it requires further improvement of the atmospheric correction for the blue bands and better definition at relatively low chlorophyll concentrations in presence of high CDOM attenuation.

Diseño, desarrollo y evaluación de sistemas de traducción automática para reducir las barreras de comunicación de las personas sordas

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La principal aportación de esta tesis doctoral ha sido la propuesta y evaluación de un sistema de traducción automática que permite la comunicación entre personas oyentes y sordas. Este sistema está formado a su vez por dos sistemas: un traductor de habla en español a Lengua de Signos Española (LSE) escrita y que posteriormente se representa mediante un agente animado; y un generador de habla en español a partir de una secuencia de signos escritos mediante glosas. El primero de ellos consta de un reconocedor de habla, un módulo de traducción entre lenguas y un agente animado que representa los signos en LSE. El segundo sistema está formado por una interfaz gráfica donde se puede especificar una secuencia de signos mediante glosas (palabras en mayúscula que representan los signos), un módulo de traducción entre lenguas y un conversor texto-habla. Para el desarrollo del sistema de traducción, en primer lugar se ha generado un corpus paralelo de 7696 frases en español con sus correspondientes traducciones a LSE. Estas frases pertenecen a cuatro dominios de aplicación distintos: la renovación del Documento Nacional de Identidad, la renovación del permiso de conducir, un servicio de información de autobuses urbanos y la recepción de un hotel. Además, se ha generado una base de datos con más de 1000 signos almacenados en cuatro sistemas distintos de signo-escritura. En segundo lugar, se ha desarrollado un módulo de traducción automática que integra dos técnicas de traducción con una estructura jerárquica: la primera basada en memoria y la segunda estadística. Además, se ha implementado un módulo de pre-procesamiento de las frases en español que, mediante su incorporación al módulo de traducción estadística, permite mejorar significativamente la tasa de traducción. En esta tesis también se ha mejorado la versión de la interfaz de traducción de LSE a habla. Por un lado, se han incorporado nuevas características que mejoran su usabilidad y, por otro, se ha integrado un traductor de lenguaje SMS (Short Message Service – Servicio de Mensajes Cortos) a español, que permite especificar la secuencia a traducir en lenguaje SMS, además de mediante una secuencia de glosas. El sistema de traducción propuesto se ha evaluado con usuarios reales en dos dominios de aplicación: un servicio de información de autobuses de la Empresa Municipal de Transportes de Madrid y la recepción del Hotel Intur Palacio San Martín de Madrid. En la evaluación estuvieron implicadas personas sordas y empleados de los dos servicios. Se extrajeron medidas objetivas (obtenidas por el sistema automáticamente) y subjetivas (mediante cuestionarios a los usuarios). Los resultados fueron muy positivos gracias a la opinión de los usuarios de la evaluación, que validaron el funcionamiento del sistema de traducción y dieron información valiosa para futuras líneas de trabajo. Por otro lado, tras la integración de cada uno de los módulos de los dos sistemas de traducción (habla-LSE y LSE-habla), los resultados de la evaluación y la experiencia adquirida en todo el proceso, una aportación importante de esta tesis doctoral es la propuesta de metodología de desarrollo de sistemas de traducción de habla a lengua de signos en los dos sentidos de la comunicación. En esta metodología se detallan los pasos a seguir para desarrollar el sistema de traducción para un nuevo dominio de aplicación. Además, la metodología describe cómo diseñar cada uno de los módulos del sistema para mejorar su flexibilidad, de manera que resulte más sencillo adaptar el sistema desarrollado a un nuevo dominio de aplicación. Finalmente, en esta tesis se analizan algunas técnicas para seleccionar las frases de un corpus paralelo fuera de dominio para entrenar el modelo de traducción cuando se quieren traducir frases de un nuevo dominio de aplicación; así como técnicas para seleccionar qué frases del nuevo dominio resultan más interesantes que traduzcan los expertos en LSE para entrenar el modelo de traducción. El objetivo es conseguir una buena tasa de traducción con la menor cantidad posible de frases. ABSTRACT The main contribution of this thesis has been the proposal and evaluation of an automatic translation system for improving the communication between hearing and deaf people. This system is made up of two systems: a Spanish into Spanish Sign Language (LSE – Lengua de Signos Española) translator and a Spanish generator from LSE sign sequences. The first one consists of a speech recognizer, a language translation module and an avatar that represents the sign sequence. The second one is made up an interface for specifying the sign sequence, a language translation module and a text-to-speech conversor. For the translation system development, firstly, a parallel corpus has been generated with 7,696 Spanish sentences and their LSE translations. These sentences are related to four different application domains: the renewal of the Identity Document, the renewal of the driver license, a bus information service and a hotel reception. Moreover, a sign database has been generated with more than 1,000 signs described in four different signwriting systems. Secondly, it has been developed an automatic translation module that integrates two translation techniques in a hierarchical structure: the first one is a memory-based technique and the second one is statistical. Furthermore, a pre processing module for the Spanish sentences has been implemented. By incorporating this pre processing module into the statistical translation module, the accuracy of the translation module improves significantly. In this thesis, the LSE into speech translation interface has been improved. On the one hand, new characteristics that improve its usability have been incorporated and, on the other hand, a SMS language into Spanish translator has been integrated, that lets specifying in SMS language the sequence to translate, besides by specifying a sign sequence. The proposed translation system has been evaluated in two application domains: a bus information service of the Empresa Municipal de Transportes of Madrid and the Hotel Intur Palacio San Martín reception. This evaluation has involved both deaf people and services employees. Objective measurements (given automatically by the system) and subjective measurements (given by user questionnaires) were extracted during the evaluation. Results have been very positive, thanks to the user opinions during the evaluation that validated the system performance and gave important information for future work. Finally, after the integration of each module of the two translation systems (speech- LSE and LSE-speech), obtaining the evaluation results and considering the experience throughout the process, a methodology for developing speech into sign language (and vice versa) into a new domain has been proposed in this thesis. This methodology includes the steps to follow for developing the translation system in a new application domain. Moreover, this methodology proposes the way to improve the flexibility of each system module, so that the adaptation of the system to a new application domain can be easier. On the other hand, some techniques are analyzed for selecting the out-of-domain parallel corpus sentences in order to train the translation module in a new domain; as well as techniques for selecting which in-domain sentences are more interesting for translating them (by LSE experts) in order to train the translation model.

Integration of embedded data processing algorithms inside PAMELA devices

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PAMELA (Phased Array Monitoring for Enhanced Life Assessment) SHMTM System is an integrated embedded ultrasonic guided waves based system consisting of several electronic devices and one system manager controller. The data collected by all PAMELA devices in the system must be transmitted to the controller, who will be responsible for carrying out the advanced signal processing to obtain SHM maps. PAMELA devices consist of hardware based on a Virtex 5 FPGA with a PowerPC 440 running an embedded Linux distribution. Therefore, PAMELA devices, in addition to the capability of performing tests and transmitting the collected data to the controller, have the capability of perform local data processing or pre-processing (reduction, normalization, pattern recognition, feature extraction, etc.). Local data processing decreases the data traffic over the network and allows CPU load of the external computer to be reduced. Even it is possible that PAMELA devices are running autonomously performing scheduled tests, and only communicates with the controller in case of detection of structural damages or when programmed. Each PAMELA device integrates a software management application (SMA) that allows to the developer downloading his own algorithm code and adding the new data processing algorithm to the device. The development of the SMA is done in a virtual machine with an Ubuntu Linux distribution including all necessary software tools to perform the entire cycle of development. Eclipse IDE (Integrated Development Environment) is used to develop the SMA project and to write the code of each data processing algorithm. This paper presents the developed software architecture and describes the necessary steps to add new data processing algorithms to SMA in order to increase the processing capabilities of PAMELA devices.An example of basic damage index estimation using delay and sum algorithm is provided.

Distinctive Features of Mobile Messages Processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

World’s mobile market pushes past 2 billion lines in 2005. Success in these competitive markets requires operational excellence with product and service innovation to improve the mobile performance. Mobile users very often prefer to send a mobile instant message or text messages rather than talking on a mobile. Well developed “written speech analysis” does not work not only with “verbal speech” but also with “mobile text messages”. The main purpose of our paper is, firstly, to highlight the problems of mobile text messages processing and, secondly, to show the possible ways of solving these problems.

Scaling geospatial searches in large spatial databases

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.

«
1
2
3
4
5
6
7
8
...
63
64
»