931 resultados para Large Data Sets
Resumo:
The primary goal of this dissertation is to develop point-based rigid and non-rigid image registration methods that have better accuracy than existing methods. We first present point-based PoIRe, which provides the framework for point-based global rigid registrations. It allows a choice of different search strategies including (a) branch-and-bound, (b) probabilistic hill-climbing, and (c) a novel hybrid method that takes advantage of the best characteristics of the other two methods. We use a robust similarity measure that is insensitive to noise, which is often introduced during feature extraction. We show the robustness of PoIRe using it to register images obtained with an electronic portal imaging device (EPID), which have large amounts of scatter and low contrast. To evaluate PoIRe we used (a) simulated images and (b) images with fiducial markers; PoIRe was extensively tested with 2D EPID images and images generated by 3D Computer Tomography (CT) and Magnetic Resonance (MR) images. PoIRe was also evaluated using benchmark data sets from the blind retrospective evaluation project (RIRE). We show that PoIRe is better than existing methods such as Iterative Closest Point (ICP) and methods based on mutual information. We also present a novel point-based local non-rigid shape registration algorithm. We extend the robust similarity measure used in PoIRe to non-rigid registrations adapting it to a free form deformation (FFD) model and making it robust to local minima, which is a drawback common to existing non-rigid point-based methods. For non-rigid registrations we show that it performs better than existing methods and that is less sensitive to starting conditions. We test our non-rigid registration method using available benchmark data sets for shape registration. Finally, we also explore the extraction of features invariant to changes in perspective and illumination, and explore how they can help improve the accuracy of multi-modal registration. For multimodal registration of EPID-DRR images we present a method based on a local descriptor defined by a vector of complex responses to a circular Gabor filter.
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.
Resumo:
The uptake of anthropogenic CO2 by the oceans has led to a rise in the oceanic partial pressure of CO2, and to a decrease in pH and carbonate ion concentration. This modification of the marine carbonate system is referred to as ocean acidification. Numerous papers report the effects of ocean acidification on marine organisms and communities but few have provided details concerning full carbonate chemistry and complementary observations. Additionally, carbonate system variables are often reported in different units, calculated using different sets of dissociation constants and on different pH scales. Hence the direct comparison of experimental results has been problematic and often misleading. The need was identified to (1) gather data on carbonate chemistry, biological and biogeochemical properties, and other ancillary data from published experimental data, (2) transform the information into common framework, and (3) make data freely available. The present paper is the outcome of an effort to integrate ocean carbonate chemistry data from the literature which has been supported by the European Network of Excellence for Ocean Ecosystems Analysis (EUR-OCEANS) and the European Project on Ocean Acidification (EPOCA). A total of 185 papers were identified, 100 contained enough information to readily compute carbonate chemistry variables, and 81 data sets were archived at PANGAEA - The Publishing Network for Geoscientific & Environmental Data. This data compilation is regularly updated as an ongoing mission of EPOCA.
Resumo:
Geo-referenced catch and fishing effort data of the bigeye tuna fisheries in the Indian Ocean over 1952-2014 were analysed and standardized to facilitate population dynamics modelling studies. During this sixty-two years historical period of exploitation, many changes occurred both in the fishing techniques and the monitoring of activity. This study includes a series of processing steps used for standardization of spatial resolution, conversion and standardization of catch and effort units, raising of geo-referenced catch into nominal catch level, screening and correction of outliers, and detection of major catchability changes over long time series of fishing data, i.e., the Japanese longline fleet operating in the tropical Indian Ocean. A total of thirty fisheries were finally determined from longline, purse seine and other-gears data sets, from which 10 longline and four purse seine fisheries represented 96% of the whole historical catch. The geo-referenced records consists of catch, fishing effort and associated length frequency samples of all fisheries.
Resumo:
The objective of this study was to determine the seasonal and interannual variability and calculate the trends of wind speed in NEB and then validate the mesoscale numerical model for after engage with the microscale numerical model in order to get the wind resource at some locations in the NEB. For this we use two data sets of wind speed (weather stations and anemometric towers) and two dynamic models; one of mesoscale and another of microscale. We use statistical tools to evaluate and validate the data obtained. The simulations of the dynamic mesoscale model were made using data assimilation methods (Newtonian Relaxation and Kalman filter). The main results show: (i) Five homogeneous groups of wind speed in the NEB with higher values in winter and spring and with lower in summer and fall; (ii) The interannual variability of the wind speed in some groups stood out with higher values; (iii) The large-scale circulation modified by the El Niño and La Niña intensified wind speed for the groups with higher values; (iv) The trend analysis showed more significant negative values for G3, G4 and G5 in all seasons and in the annual average; (v) The performance of dynamic mesoscale model showed smaller errors in the locations Paracuru and São João and major errors were observed in Triunfo; (vi) Application of the Kalman filter significantly reduce the systematic errors shown in the simulations of the dynamic mesoscale model; (vii) The wind resource indicate that Paracuru and Triunfo are favorable areas for the generation of energy, and the coupling technique after validation showed better results for Paracuru. We conclude that the objective was achieved, making it possible to identify trends in homogeneous groups of wind behavior, and to evaluate the quality of both simulations with the dynamic model of mesoscale and microscale to answer questions as necessary before planning research projects in Wind-Energy area in the NEB
Resumo:
We consider a class of initial data sets (Σ,h,K) for the Einstein constraint equations which we define to be generalized Brill (GB) data. This class of data is simply connected, U(1)²-invariant, maximal, and four-dimensional with two asymptotic ends. We study the properties of GB data and in particular the topology of Σ. The GB initial data sets have applications in geometric inequalities in general relativity. We construct a mass functional M for GB initial data sets and we show:(i) the mass of any GB data is greater than or equals M, (ii) it is a non-negative functional for a broad subclass of GB data, (iii) it evaluates to the ADM mass of reduced t − φi symmetric data set, (iv) its critical points are stationary U(1)²-invariant vacuum solutions to the Einstein equations. Then we use this mass functional and prove two geometric inequalities: (1) a positive mass theorem for subclass of GB initial data which includes Myers-Perry black holes, (2) a class of local mass-angular momenta inequalities for U(1)²-invariant black holes. Finally, we construct a one-parameter family of initial data sets which we show can be seen as small deformations of the extreme Myers- Perry black hole which preserve the horizon geometry and angular momenta but have strictly greater energy.
Resumo:
This thesis stems from the project with real-time environmental monitoring company EMSAT Corporation. They were looking for methods to automatically ag spikes and other anomalies in their environmental sensor data streams. The problem presents several challenges: near real-time anomaly detection, absence of labeled data and time-changing data streams. Here, we address this problem using both a statistical parametric approach as well as a non-parametric approach like Kernel Density Estimation (KDE). The main contribution of this thesis is extending the KDE to work more effectively for evolving data streams, particularly in presence of concept drift. To address that, we have developed a framework for integrating Adaptive Windowing (ADWIN) change detection algorithm with KDE. We have tested this approach on several real world data sets and received positive feedback from our industry collaborator. Some results appearing in this thesis have been presented at ECML PKDD 2015 Doctoral Consortium.
Resumo:
This data sets contains LPJ-LMfire dynamic global vegetation model output covering Europe and the Mediterranean for the Last Glacial Maximum (LGM; 21 ka) and for a preindustrial control simulation (20th century detrended climate). The netCDF data files are time averages of the final 30 years of the model simulation. Each netCDF file contains four or five variables: fractional cover of 9 plant functional types (PFTs; cover), total fractional coverage of trees (treecover), population density of hunter-gatherers (foragerPD; only for the "people" simulations), fraction of the gridcell burned on 30-year average (burnedf), and vegetation net primary productivity (NPP). The model spatial resolution is 0.5-degrees For the LGM simulations, LPJ-LMfire was driven by the PMIP3 suite of eight GCMs for which LGM climate simulations were available. Also provided in this archive is the result of an LPJ-LMfire run that was forced by the average climate of all GCMs (the "GCM-mean" files), and the average of each of the individual LPJ-LMfire runs over the eight LGM scenarios individually (the "LPJ-mean" files). The model simulations are provided that include the influence of human presence on the landscape (the "people" files), and in a "world without humans" scenario (the "natural" files). Finally this archive contains the preindustrial reference simulation with and without human influence ("PI_reference_people" and "PI_reference_nat", respectively). There are therefore 22 netCDF files in this archive: 8 each of LGM simulations with and without people (total 16) and the "GCM mean" simulation (2 files) and the "LPJ mean" aggregate (2 files), and finally the two preindustrial "control" simulations ("PI"), with and without humans (2 files). In addition to the LPJ-LMfire model output (netCDF files), this archive also contains a table of arboreal pollen percent calculated from pollen samples dated to the LGM at sites throughout (lgmAP.txt), and a table containing the location of archaeological sites dated to the LGM (LGM_archaeological_site_locations.txt).
Resumo:
The present data compilation includes ciliates growth rate, grazing rate and gross growth efficiency determined either in the field or in laboratory experiments. From the existing literature, we synthesized all data that we could find on cilliate. Some sources might be missing but none were purposefully ignored. Field data on microzooplankton grazing are mostly comprised of grazing rate using the dilution technique with a 24h incubation period. Laboratory grazing and growth data are focused on pelagic ciliates and heterotrophic dinoflagellates. The experiment measured grazing or growth as a function of prey concentration or at saturating prey concentration (maximal grazing rate). When considering every single data point available (each measured rate for a defined predator-prey pair and a certain prey concentration) there is a total of 1485 data points for the ciliates, counting experiments that measured growth and grazing simultaneously as 1 data point.