918 resultados para Image pre-processing
Resumo:
Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.
Resumo:
Shadows and illumination play an important role when generating a realistic scene in computer graphics. Most of the Augmented Reality (AR) systems track markers placed in a real scene and retrieve their position and orientation to serve as a frame of reference for added computer generated content, thereby producing an augmented scene. Realistic depiction of augmented content with coherent visual cues is a desired goal in many AR applications. However, rendering an augmented scene with realistic illumination is a complex task. Many existent approaches rely on a non automated pre-processing phase to retrieve illumination parameters from the scene. Other techniques rely on specific markers that contain light probes to perform environment lighting estimation. This study aims at designing a method to create AR applications with coherent illumination and shadows, using a textured cuboid marker, that does not require a training phase to provide lighting information. Such marker may be easily found in common environments: most of product packaging satisfies such characteristics. Thus, we propose a way to estimate a directional light configuration using multiple texture tracking to render AR scenes in a realistic fashion. We also propose a novel feature descriptor that is used to perform multiple texture tracking. Our descriptor is an extension of the binary descriptor, named discrete descriptor, and outperforms current state-of-the-art methods in speed, while maintaining their accuracy.
Resumo:
Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.
Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.
Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.
Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.
Resumo:
A simple but efficient voice activity detector based on the Hilbert transform and a dynamic threshold is presented to be used on the pre-processing of audio signals -- The algorithm to define the dynamic threshold is a modification of a convex combination found in literature -- This scheme allows the detection of prosodic and silence segments on a speech in presence of non-ideal conditions like a spectral overlapped noise -- The present work shows preliminary results over a database built with some political speech -- The tests were performed adding artificial noise to natural noises over the audio signals, and some algorithms are compared -- Results will be extrapolated to the field of adaptive filtering on monophonic signals and the analysis of speech pathologies on futures works
Resumo:
SQL Injection Attack (SQLIA) remains a technique used by a computer network intruder to pilfer an organisation’s confidential data. This is done by an intruder re-crafting web form’s input and query strings used in web requests with malicious intent to compromise the security of an organisation’s confidential data stored at the back-end database. The database is the most valuable data source, and thus, intruders are unrelenting in constantly evolving new techniques to bypass the signature’s solutions currently provided in Web Application Firewalls (WAF) to mitigate SQLIA. There is therefore a need for an automated scalable methodology in the pre-processing of SQLIA features fit for a supervised learning model. However, obtaining a ready-made scalable dataset that is feature engineered with numerical attributes dataset items to train Artificial Neural Network (ANN) and Machine Leaning (ML) models is a known issue in applying artificial intelligence to effectively address ever evolving novel SQLIA signatures. This proposed approach applies numerical attributes encoding ontology to encode features (both legitimate web requests and SQLIA) to numerical data items as to extract scalable dataset for input to a supervised learning model in moving towards a ML SQLIA detection and prevention model. In numerical attributes encoding of features, the proposed model explores a hybrid of static and dynamic pattern matching by implementing a Non-Deterministic Finite Automaton (NFA). This combined with proxy and SQL parser Application Programming Interface (API) to intercept and parse web requests in transition to the back-end database. In developing a solution to address SQLIA, this model allows processed web requests at the proxy deemed to contain injected query string to be excluded from reaching the target back-end database. This paper is intended for evaluating the performance metrics of a dataset obtained by numerical encoding of features ontology in Microsoft Azure Machine Learning (MAML) studio using Two-Class Support Vector Machines (TCSVM) binary classifier. This methodology then forms the subject of the empirical evaluation.
Resumo:
As introduced by Bentley et al. (2005), artificial immune systems (AIS) are lacking tissue, which is present in one form or another in all living multi-cellular organisms. Some have argued that this concept in the context of AIS brings little novelty to the already saturated field of the immune inspired computational research. This article aims to show that such a component of an AIS has the potential to bring an advantage to a data processing algorithm in terms of data pre-processing, clustering and extraction of features desired by the immune inspired system. The proposed tissue algorithm is based on self-organizing networks, such as self-organizing maps (SOM) developed by Kohonen (1996) and an analogy of the so called Toll-Like Receptors (TLR) affecting the activation function of the clusters developed by the SOM.
Resumo:
As introduced by Bentley et al. (2005), artificial immune systems (AIS) are lacking tissue, which is present in one form or another in all living multi-cellular organisms. Some have argued that this concept in the context of AIS brings little novelty to the already saturated field of the immune inspired computational research. This article aims to show that such a component of an AIS has the potential to bring an advantage to a data processing algorithm in terms of data pre-processing, clustering and extraction of features desired by the immune inspired system. The proposed tissue algorithm is based on self-organizing networks, such as self-organizing maps (SOM) developed by Kohonen (1996) and an analogy of the so called Toll-Like Receptors (TLR) affecting the activation function of the clusters developed by the SOM.
Resumo:
The Solar Intensity X-ray and particle Spectrometer (SIXS) on board BepiColombo's Mercury Planetary Orbiter (MPO) will study solar energetic particles moving towards Mercury and solar X-rays on the dayside of Mercury. The SIXS instrument consists of two detector sub-systems; X-ray detector SIXS-X and particle detector SIXS-P. The SIXS-P subdetector will detect solar energetic electrons and protons in a broad energy range using a particle telescope approach with five outer Si detectors around a central CsI(Tl) scintillator. The measurements made by the SIXS instrument are necessary for other instruments on board the spacecraft. SIXS data will be used to study the Solar X-ray corona, solar flares, solar energetic particles, the Hermean magnetosphere, and solar eruptions. The SIXS-P detector was calibrated by comparing experimental measurement data from the instrument with Geant4 simulation data. Calibration curves were produced for the different side detectors and the core scintillator for electrons and protons, respectively. The side detector energy response was found to be linear for both electrons and protons. The core scintillator energy response to protons was found to be non-linear. The core scintillator calibration for electrons was omitted due to insufficient experimental data. The electron and proton acceptance of the SIXS-P detector was determined with Geant4 simulations. Electron and proton energy channels are clean in the main energy range of the instrument. At higher energies, protons and electrons produce non-ideal response in the energy channels. Due to the limited bandwidth of the spacecraft's telemetry, the particle measurements made by SIXS-P have to be pre-processed in the data processing unit of the SIXS instrument. A lookup table was created for the pre-processing of data with Geant4 simulations, and the ability of the lookup table to provide spectral information from a simulated electron event was analysed. The lookup table produces clean electron and proton channels and is able to separate protons and electrons. Based on a simulated solar energetic electron event, the incident electron spectrum cannot be determined from channel particle counts with a standard analysis method.
Resumo:
En este trabajo se propone un nuevo sistema híbrido para el análisis de sentimientos en clase múltiple basado en el uso del diccionario General Inquirer (GI) y un enfoque jerárquico del clasificador Logistic Model Tree (LMT). Este nuevo sistema se compone de tres capas, la capa bipolar (BL) que consta de un LMT (LMT-1) para la clasificación de la polaridad de sentimientos, mientras que la segunda capa es la capa de la Intensidad (IL) y comprende dos LMTs (LMT-2 y LMT3) para detectar por separado tres intensidades de sentimientos positivos y tres intensidades de sentimientos negativos. Sólo en la fase de construcción, la capa de Agrupación (GL) se utiliza para agrupar las instancias positivas y negativas mediante el empleo de 2 k-means, respectivamente. En la fase de Pre-procesamiento, los textos son segmentados por palabras que son etiquetadas, reducidas a sus raíces y sometidas finalmente al diccionario GI con el objetivo de contar y etiquetar sólo los verbos, los sustantivos, los adjetivos y los adverbios con 24 marcadores que se utilizan luego para calcular los vectores de características. En la fase de Clasificación de Sentimientos, los vectores de características se introducen primero al LMT-1, a continuación, se agrupan en GL según la etiqueta de clase, después se etiquetan estos grupos de forma manual, y finalmente las instancias positivas son introducidas a LMT-2 y las instancias negativas a LMT-3. Los tres árboles están entrenados y evaluados usando las bases de datos Movie Review y SenTube con validación cruzada estratificada de 10-pliegues. LMT-1 produce un árbol de 48 hojas y 95 de tamaño, con 90,88% de exactitud, mientras que tanto LMT-2 y LMT-3 proporcionan dos árboles de una hoja y uno de tamaño, con 99,28% y 99,37% de exactitud,respectivamente. Los experimentos muestran que la metodología de clasificación jerárquica propuesta da un mejor rendimiento en comparación con otros enfoques prevalecientes.
Resumo:
Wind energy is one of the most promising and fast growing sector of energy production. Wind is ecologically friendly and relatively cheap energy resource available for development in practically all corners of the world (where only the wind blows). Today wind power gained broad development in the Scandinavian countries. Three important challenges concerning sustainable development, i.e. energy security, climate change and energy access make a compelling case for large-scale utilization of wind energy. In Finland, according to the climate and energy strategy, accepted in 2008, the total consumption of electricity generated by means of wind farms by 2020, should reach 6 - 7% of total consumption in the country [1]. The main challenges associated with wind energy production are harsh operational conditions that often accompany the turbine operation in the climatic conditions of the north and poor accessibility for maintenance and service. One of the major problems that require a solution is the icing of turbine structures. Icing reduces the performance of wind turbines, which in the conditions of a long cold period, can significantly affect the reliability of power supply. In order to predict and control power performance, the process of ice accretion has to be carefully tracked. There are two ways to detect icing – directly or indirectly. The first way applies to the special ice detection instruments. The second one is using indirect characteristics of turbine performance. One of such indirect methods for ice detection and power loss estimation has been proposed and used in this paper. The results were compared to the results directly gained from the ice sensors. The data used was measured in Muukko wind farm, southeast Finland during a project 'Wind power in cold climate and complex terrain'. The project was carried out in 9/2013 - 8/2015 with the partners Lappeenranta university of technology, Alstom renovables España S.L., TuuliMuukko, and TuuliSaimaa.
Resumo:
We propose an adaptive mesh refinement strategy based on exploiting a combination of a pre-processing mesh re-distribution algorithm employing a harmonic mapping technique, and standard (isotropic) mesh subdivision for discontinuous Galerkin approximations of advection-diffusion problems. Numerical experiments indicate that the resulting adaptive strategy can efficiently reduce the computed discretization error by clustering the nodes in the computational mesh where the analytical solution undergoes rapid variation.
Resumo:
One of the most exciting discoveries in astrophysics of the last last decade is of the sheer diversity of planetary systems. These include "hot Jupiters", giant planets so close to their host stars that they orbit once every few days; "Super-Earths", planets with sizes intermediate to those of Earth and Neptune, of which no analogs exist in our own solar system; multi-planet systems with planets smaller than Mars to larger than Jupiter; planets orbiting binary stars; free-floating planets flying through the emptiness of space without any star; even planets orbiting pulsars. Despite these remarkable discoveries, the field is still young, and there are many areas about which precious little is known. In particular, we don't know the planets orbiting Sun-like stars nearest to our own solar system, and we know very little about the compositions of extrasolar planets. This thesis provides developments in those directions, through two instrumentation projects.
The first chapter of this thesis concerns detecting planets in the Solar neighborhood using precision stellar radial velocities, also known as the Doppler technique. We present an analysis determining the most efficient way to detect planets considering factors such as spectral type, wavelengths of observation, spectrograph resolution, observing time, and instrumental sensitivity. We show that G and K dwarfs observed at 400-600 nm are the best targets for surveys complete down to a given planet mass and out to a specified orbital period. Overall we find that M dwarfs observed at 700-800 nm are the best targets for habitable-zone planets, particularly when including the effects of systematic noise floors caused by instrumental imperfections. Somewhat surprisingly, we demonstrate that a modestly sized observatory, with a dedicated observing program, is up to the task of discovering such planets.
We present just such an observatory in the second chapter, called the "MINiature Exoplanet Radial Velocity Array," or MINERVA. We describe the design, which uses a novel multi-aperture approach to increase stability and performance through lower system etendue, as well as keeping costs and time to deployment down. We present calculations of the expected planet yield, and data showing the system performance from our testing and development of the system at Caltech's campus. We also present the motivation, design, and performance of a fiber coupling system for the array, critical for efficiently and reliably bringing light from the telescopes to the spectrograph. We finish by presenting the current status of MINERVA, operational at Mt. Hopkins observatory in Arizona.
The second part of this thesis concerns a very different method of planet detection, direct imaging, which involves discovery and characterization of planets by collecting and analyzing their light. Directly analyzing planetary light is the most promising way to study their atmospheres, formation histories, and compositions. Direct imaging is extremely challenging, as it requires a high performance adaptive optics system to unblur the point-spread function of the parent star through the atmosphere, a coronagraph to suppress stellar diffraction, and image post-processing to remove non-common path "speckle" aberrations that can overwhelm any planetary companions.
To this end, we present the "Stellar Double Coronagraph," or SDC, a flexible coronagraphic platform for use with the 200" Hale telescope. It has two focal and pupil planes, allowing for a number of different observing modes, including multiple vortex phase masks in series for improved contrast and inner working angle behind the obscured aperture of the telescope. We present the motivation, design, performance, and data reduction pipeline of the instrument. In the following chapter, we present some early science results, including the first image of a companion to the star delta Andromeda, which had been previously hypothesized but never seen.
A further chapter presents a wavefront control code developed for the instrument, using the technique of "speckle nulling," which can remove optical aberrations from the system using the deformable mirror of the adaptive optics system. This code allows for improved contrast and inner working angles, and was written in a modular style so as to be portable to other high contrast imaging platforms. We present its performance on optical, near-infrared, and thermal infrared instruments on the Palomar and Keck telescopes, showing how it can improve contrasts by a factor of a few in less than ten iterations.
One of the large challenges in direct imaging is sensing and correcting the electric field in the focal plane to remove scattered light that can be much brighter than any planets. In the last chapter, we present a new method of focal-plane wavefront sensing, combining a coronagraph with a simple phase-shifting interferometer. We present its design and implementation on the Stellar Double Coronagraph, demonstrating its ability to create regions of high contrast by measuring and correcting for optical aberrations in the focal plane. Finally, we derive how it is possible to use the same hardware to distinguish companions from speckle errors using the principles of optical coherence. We present results observing the brown dwarf HD 49197b, demonstrating the ability to detect it despite it being buried in the speckle noise floor. We believe this is the first detection of a substellar companion using the coherence properties of light.
Resumo:
Image and video compression play a major role in the world today, allowing the storage and transmission of large multimedia content volumes. However, the processing of this information requires high computational resources, hence the improvement of the computational performance of these compression algorithms is very important. The Multidimensional Multiscale Parser (MMP) is a pattern-matching-based compression algorithm for multimedia contents, namely images, achieving high compression ratios, maintaining good image quality, Rodrigues et al. [2008]. However, in comparison with other existing algorithms, this algorithm takes some time to execute. Therefore, two parallel implementations for GPUs were proposed by Ribeiro [2016] and Silva [2015] in CUDA and OpenCL-GPU, respectively. In this dissertation, to complement the referred work, we propose two parallel versions that run the MMP algorithm in CPU: one resorting to OpenMP and another that converts the existing OpenCL-GPU into OpenCL-CPU. The proposed solutions are able to improve the computational performance of MMP by 3 and 2:7 , respectively. The High Efficiency Video Coding (HEVC/H.265) is the most recent standard for compression of image and video. Its impressive compression performance, makes it a target for many adaptations, particularly for holoscopic image/video processing (or light field). Some of the proposed modifications to encode this new multimedia content are based on geometry-based disparity compensations (SS), developed by Conti et al. [2014], and a Geometric Transformations (GT) module, proposed by Monteiro et al. [2015]. These compression algorithms for holoscopic images based on HEVC present an implementation of specific search for similar micro-images that is more efficient than the one performed by HEVC, but its implementation is considerably slower than HEVC. In order to enable better execution times, we choose to use the OpenCL API as the GPU enabling language in order to increase the module performance. With its most costly setting, we are able to reduce the GT module execution time from 6.9 days to less then 4 hours, effectively attaining a speedup of 45 .
Resumo:
In order to fulfil European and Portuguese legal requirements, adequate alternatives to traditional municipal waste landfilling must be found namely concerning organic wastes and others susceptible of valorisation. According to the Portuguese Standard NP 4486:2008, refuse derived fuels (RDF) classification is based on three main parameters: lower heating value (considered as an economic parameter), chlorine content (considered as a technical parameter) and mercury content (considered as an environmental parameter). The purpose of this study was to characterize the rejected streams resulting from the mechanical treatment of unsorted municipal solid waste, from the plastic municipal selective collection and from the composting process, in order to evaluate their potential as RDF. To accomplish this purpose six sampling campaigns were performed. Chemical characterization comprised the proximate analysis – moisture content, volatile matter, ashes and fixed carbon, as well as trace elements. Physical characterization was also done. To evaluate their potential as RDF, the following parameters established in the Portuguese standard were also evaluated: heating value and chlorine content. As expected, results show that the refused stream from mechanical treatment is rather different from the selective collection rejected stream and from the rejected from the compost screening in terms of moisture, energetic matter and ashes, as well as heating value and chlorine. Preliminary data allows us to conclude that studied materials have a very interesting potential to be used as RDF. In fact, the rejected from selective collection and the one from composting have a heating value not very different from coal. Therefore, an important key factor may be the blending of these materials with others of higher heating values, after pre-processing, in order to get fuel pellets with good consistency, storage and handling characteristics and, therefore, combustion behavior.