952 resultados para Digital data sets


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data sets describing the state of the earth's atmosphere are of great importance in the atmospheric sciences. Over the last decades, the quality and sheer amount of the available data increased significantly, resulting in a rising demand for new tools capable of handling and analysing these large, multidimensional sets of atmospheric data. The interdisciplinary work presented in this thesis covers the development and the application of practical software tools and efficient algorithms from the field of computer science, aiming at the goal of enabling atmospheric scientists to analyse and to gain new insights from these large data sets. For this purpose, our tools combine novel techniques with well-established methods from different areas such as scientific visualization and data segmentation. In this thesis, three practical tools are presented. Two of these tools are software systems (Insight and IWAL) for different types of processing and interactive visualization of data, the third tool is an efficient algorithm for data segmentation implemented as part of Insight.Insight is a toolkit for the interactive, three-dimensional visualization and processing of large sets of atmospheric data, originally developed as a testing environment for the novel segmentation algorithm. It provides a dynamic system for combining at runtime data from different sources, a variety of different data processing algorithms, and several visualization techniques. Its modular architecture and flexible scripting support led to additional applications of the software, from which two examples are presented: the usage of Insight as a WMS (web map service) server, and the automatic production of a sequence of images for the visualization of cyclone simulations. The core application of Insight is the provision of the novel segmentation algorithm for the efficient detection and tracking of 3D features in large sets of atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. Data segmentation usually leads to a significant reduction of the size of the considered data. This enables a practical visualization of the data, statistical analyses of the features and their events, and the manual or automatic detection of interesting situations for subsequent detailed investigation. The concepts of the novel algorithm, its technical realization, and several extensions for avoiding under- and over-segmentation are discussed. As example applications, this thesis covers the setup and the results of the segmentation of upper-tropospheric jet streams and cyclones as full 3D objects. Finally, IWAL is presented, which is a web application for providing an easy interactive access to meteorological data visualizations, primarily aimed at students. As a web application, the needs to retrieve all input data sets and to install and handle complex visualization tools on a local machine are avoided. The main challenge in the provision of customizable visualizations to large numbers of simultaneous users was to find an acceptable trade-off between the available visualization options and the performance of the application. Besides the implementational details, benchmarks and the results of a user survey are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present study was carried out to check whether classic osteometric parameters can be determined from the 3D reconstructions of MSCT (multislice computed tomography) scans acquired in the context of the Virtopsy project. To this end, four isolated and macerated skulls were examined by six examiners. First the skulls were conventionally (manually) measured using 32 internationally accepted linear measurements. Then the skulls were scanned by the use of MSCT with slice thicknesses of 1.25 mm and 0.63 mm, and the 33 measurements were virtually determined on the digital 3D reconstructions of the skulls. The results of the traditional and the digital measurements were compared for each examiner to figure out variations. Furthermore, several parameters were measured on the cranium and postcranium during an autopsy and compared to the values that had been measured on a 3D reconstruction from a previously acquired postmortem MSCT scan. The results indicate that equivalent osteometric values can be obtained from digital 3D reconstructions from MSCT scans using a slice thickness of 1.25 mm, and from conventional manual examinations. The measurements taken from a corpse during an autopsy could also be validated with the methods used for the digital 3D reconstructions in the context of the Virtopsy project. Future aims are the assessment and biostatistical evaluation in respect to sex, age and stature of all data sets stored in the Virtopsy project so far, as well as of future data sets. Furthermore, a definition of new parameters, only measurable with the aid of MSCT data would be conceivable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE Digital developments have led to the opportunity to compose simulated patient models based on three-dimensional (3D) skeletal, facial, and dental imaging. The aim of this systematic review is to provide an update on the current knowledge, to report on the technical progress in the field of 3D virtual patient science, and to identify further research needs to accomplish clinical translation. MATERIALS AND METHODS Searches were performed electronically (MEDLINE and OVID) and manually up to March 2014 for studies of 3D fusion imaging to create a virtual dental patient. Inclusion criteria were limited to human studies reporting on the technical protocol for superimposition of at least two different 3D data sets and medical field of interest. RESULTS Of the 403 titles originally retrieved, 51 abstracts and, subsequently, 21 full texts were selected for review. Of the 21 full texts, 18 studies were included in the systematic review. Most of the investigations were designed as feasibility studies. Three different types of 3D data were identified for simulation: facial skeleton, extraoral soft tissue, and dentition. A total of 112 patients were investigated in the development of 3D virtual models. CONCLUSION Superimposition of data on the facial skeleton, soft tissue, and/or dentition is a feasible technique to create a virtual patient under static conditions. Three-dimensional image fusion is of interest and importance in all fields of dental medicine. Future research should focus on the real-time replication of a human head, including dynamic movements, capturing data in a single step.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A wide variety of spatial data collection efforts are ongoing throughout local, state and federal agencies, private firms and non-profit organizations. Each effort is established for a different purpose but organizations and individuals often collect and maintain the same or similar information. The United States federal government has undertaken many initiatives such as the National Spatial Data Infrastructure, the National Map and Geospatial One-Stop to reduce duplicative spatial data collection and promote the coordinated use, sharing, and dissemination of spatial data nationwide. A key premise in most of these initiatives is that no national government will be able to gather and maintain more than a small percentage of the geographic data that users want and desire. Thus, national initiatives depend typically on the cooperation of those already gathering spatial data and those using GIs to meet specific needs to help construct and maintain these spatial data infrastructures and geo-libraries for their nations (Onsrud 2001). Some of the impediments to widespread spatial data sharing are well known from directly asking GIs data producers why they are not currently involved in creating datasets that are of common or compatible formats, documenting their datasets in a standardized metadata format or making their datasets more readily available to others through Data Clearinghouses or geo-libraries. The research described in this thesis addresses the impediments to wide-scale spatial data sharing faced by GIs data producers and explores a new conceptual data-sharing approach, the Public Commons for Geospatial Data, that supports user-friendly metadata creation, open access licenses, archival services and documentation of parent lineage of the contributors and value- adders of digital spatial data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many progresses have been made since the Digital Earth notion was envisioned thirteen years ago. However, the mechanism for integrating geographic information into the Digital Earth is still quite limited. In this context, we have developed a process to generate, integrate and publish geospatial Linked Data from several Spanish National data-sets. These data-sets are related to four Infrastructure for Spatial Information in the European Community (INSPIRE) themes, specifically with Administrative units, Hydrography, Statistical units, and Meteorology. Our main goal is to combine different sources (heterogeneous, multidisciplinary, multitemporal, multiresolution, and multilingual) using Linked Data principles. This goal allows the overcoming of current problems of information integration and driving geographical information toward the next decade scenario, that is, ?Linked Digital Earth.?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Digital Observatory for Protected Areas (DOPA) has been developed to support the European Union’s efforts in strengthening our capacity to mobilize and use biodiversity data, information and forecasts so that they are readily accessible to policymakers, managers, experts and other users. Conceived as a set of web based services, DOPA provides a broad set of free and open source tools to assess, monitor and even forecast the state of and pressure on protected areas at local, regional and global scale. DOPA Explorer 1.0 is a web based interface available in four languages (EN, FR, ES, PT) providing simple means to explore the nearly 16,000 protected areas that are at least as large as 100 km2. Distinguishing between terrestrial, marine and mixed protected areas, DOPA Explorer 1.0 can help end users to identify those with most unique ecosystems and species, and assess the pressures they are exposed to because of human development. Recognized by the UN Convention on Biological Diversity (CBD) as a reference information system, DOPA Explorer is based on the best global data sets available and provides means to rank protected areas at the country and ecoregion levels. Inversely, DOPA Explorer indirectly highlights the protected areas for which information is incomplete. We finally invite the end-users of DOPA to engage with us through the proposed communication platforms to help improve our work to support the safeguarding of biodiversity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in airborne Light Detection and Ranging (LIDAR) technology allow rapid and inexpensive measurements of topography over large areas. Airborne LIDAR systems usually return a 3-dimensional cloud of point measurements from reflective objects scanned by the laser beneath the flight path. This technology is becoming a primary method for extracting information of different kinds of geometrical objects, such as high-resolution digital terrain models (DTMs), buildings and trees, etc. In the past decade, LIDAR gets more and more interest from researchers in the field of remote sensing and GIS. Compared to the traditional data sources, such as aerial photography and satellite images, LIDAR measurements are not influenced by sun shadow and relief displacement. However, voluminous data pose a new challenge for automated extraction the geometrical information from LIDAR measurements because many raster image processing techniques cannot be directly applied to irregularly spaced LIDAR points. ^ In this dissertation, a framework is proposed to filter out information about different kinds of geometrical objects, such as terrain and buildings from LIDAR automatically. They are essential to numerous applications such as flood modeling, landslide prediction and hurricane animation. The framework consists of several intuitive algorithms. Firstly, a progressive morphological filter was developed to detect non-ground LIDAR measurements. By gradually increasing the window size and elevation difference threshold of the filter, the measurements of vehicles, vegetation, and buildings are removed, while ground data are preserved. Then, building measurements are identified from no-ground measurements using a region growing algorithm based on the plane-fitting technique. Raw footprints for segmented building measurements are derived by connecting boundary points and are further simplified and adjusted by several proposed operations to remove noise, which is caused by irregularly spaced LIDAR measurements. To reconstruct 3D building models, the raw 2D topology of each building is first extracted and then further adjusted. Since the adjusting operations for simple building models do not work well on 2D topology, 2D snake algorithm is proposed to adjust 2D topology. The 2D snake algorithm consists of newly defined energy functions for topology adjusting and a linear algorithm to find the minimal energy value of 2D snake problems. Data sets from urbanized areas including large institutional, commercial, and small residential buildings were employed to test the proposed framework. The results demonstrated that the proposed framework achieves a very good performance. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Extensive data sets on water quality and seagrass distributions in Florida Bay have been assembled under complementary, but independent, monitoring programs. This paper presents the landscape-scale results from these monitoring programs and outlines a method for exploring the relationships between two such data sets. Seagrass species occurrence and abundance data were used to define eight benthic habitat classes from 677 sampling locations in Florida Bay. Water quality data from 28 monitoring stations spread across the Bay were used to construct a discriminant function model that assigned a probability of a given benthic habitat class occurring for a given combination of water quality variables. Mean salinity, salinity variability, the amount of light reaching the benthos, sediment depth, and mean nutrient concentrations were important predictor variables in the discriminant function model. Using a cross-validated classification scheme, this discriminant function identified the most likely benthic habitat type as the actual habitat type in most cases. The model predicted that the distribution of benthic habitat types in Florida Bay would likely change if water quality and water delivery were changed by human engineering of freshwater discharge from the Everglades. Specifically, an increase in the seasonal delivery of freshwater to Florida Bay should cause an expansion of seagrass beds dominated by Ruppia maritima and Halodule wrightii at the expense of the Thalassia testudinum-dominated community that now occurs in northeast Florida Bay. These statistical techniques should prove useful for predicting landscape-scale changes in community composition in diverse systems where communities are in quasi-equilibrium with environmental drivers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary goal of this dissertation is the study of patterns of viral evolution inferred from serially-sampled sequence data, i.e., sequence data obtained from strains isolated at consecutive time points from a single patient or host. RNA viral populations have an extremely high genetic variability, largely due to their astronomical population sizes within host systems, high replication rate, and short generation time. It is this aspect of their evolution that demands special attention and a different approach when studying the evolutionary relationships of serially-sampled sequence data. New methods that analyze serially-sampled data were developed shortly after a groundbreaking HIV-1 study of several patients from which viruses were isolated at recurring intervals over a period of 10 or more years. These methods assume a tree-like evolutionary model, while many RNA viruses have the capacity to exchange genetic material with one another using a process called recombination. ^ A genealogy involving recombination is best described by a network structure. A more general approach was implemented in a new computational tool, Sliding MinPD, one that is mindful of the sampling times of the input sequences and that reconstructs the viral evolutionary relationships in the form of a network structure with implicit representations of recombination events. The underlying network organization reveals unique patterns of viral evolution and could help explain the emergence of disease-associated mutants and drug-resistant strains, with implications for patient prognosis and treatment strategies. In order to comprehensively test the developed methods and to carry out comparison studies with other methods, synthetic data sets are critical. Therefore, appropriate sequence generators were also developed to simulate the evolution of serially-sampled recombinant viruses, new and more through evaluation criteria for recombination detection methods were established, and three major comparison studies were performed. The newly developed tools were also applied to "real" HIV-1 sequence data and it was shown that the results represented within an evolutionary network structure can be interpreted in biologically meaningful ways. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.