793 resultados para Web data
Resumo:
eHabitat is a Web Processing Service (WPS) designed to compute the likelihood of finding ecosystems with equal properties. Inputs to the WPS, typically thematic geospatial "layers", can be discovered using standardised catalogues, and the outputs tailored to specific end user needs. Because these layers can range from geophysical data captured through remote sensing to socio-economical indicators, eHabitat is exposed to a broad range of different types and levels of uncertainties. Potentially chained to other services to perform ecological forecasting, for example, eHabitat would be an additional component further propagating uncertainties from a potentially long chain of model services. This integration of complex resources increases the challenges in dealing with uncertainty. For such a system, as envisaged by initiatives such as the "Model Web" from the Group on Earth Observations, to be used for policy or decision making, users must be provided with information on the quality of the outputs since all system components will be subject to uncertainty. UncertWeb will create the Uncertainty-Enabled Model Web by promoting interoperability between data and models with quantified uncertainty, building on existing open, international standards. It is the objective of this paper to illustrate a few key ideas behind UncertWeb using eHabitat to discuss the main types of uncertainties the WPS has to deal with and to present the benefits of the use of the UncertWeb framework.
Resumo:
UncertWeb is a European research project running from 2010-2013 that will realize the uncertainty enabled model web. The assumption is that data services, in order to be useful, need to provide information about the accuracy or uncertainty of the data in a machine-readable form. Models taking these data as imput should understand this and propagate errors through model computations, and quantify and communicate errors or uncertainties generated by the model approximations. The project will develop technology to realize this and provide demonstration case studies.
Resumo:
In the years 2004 and 2005 we collected samples of phytoplankton, zooplankton and macroinvertebrates in an artificial small pond in Budapest. We set up a simulation model predicting the abundance of the cyclopoids, Eudiaptomus zachariasi and Ischnura pumilio by considering only temperature as it affects the abundance of population of the previous day. Phytoplankton abundance was simulated by considering not only temperature, but the abundance of the three mentioned groups. This discrete-deterministic model could generate similar patterns like the observed one and testing it on historical data was successful. However, because the model was overpredicting the abundances of Ischnura pumilio and Cyclopoida at the end of the year, these results were not considered. Running the model with the data series of climate change scenarios, we had an opportunity to predict the individual numbers for the period around 2050. If the model is run with the data series of the two scenarios UKHI and UKLO, which predict drastic global warming, then we can observe a decrease in abundance and shift in the date of the maximum abundance occurring (excluding Ischnura pumilio, where the maximum abundance increases and it occurs later), whereas under unchanged climatic conditions (BASE scenario) the change in abundance is negligible. According to the scenarios GFDL 2535, GFDL 5564 and UKTR, a transition could be noticed.
Resumo:
In the years 2004 and 2005, we collected samples of phytoplankton, zooplankton, and macroinvertebrates in an artificial small pond in Budapest (Hungary). We set up a simulation model predicting the abundances of the cyclopoids, Eudiaptomus zachariasi, and Ischnura pumilio by considering only temperature and the abundance of population of the previous day. Phytoplankton abundance was simulated by considering not only temperature but the abundances of the three mentioned groups. When we ran the model with the data series of internationally accepted climate change scenarios, the different outcomes were discussed. Comparative assessment of the alternative climate change scenarios was also carried out with statistical methods.
Resumo:
An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^
Resumo:
This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^
Resumo:
This dissertation established a software-hardware integrated design for a multisite data repository in pediatric epilepsy. A total of 16 institutions formed a consortium for this web-based application. This innovative fully operational web application allows users to upload and retrieve information through a unique human-computer graphical interface that is remotely accessible to all users of the consortium. A solution based on a Linux platform with My-SQL and Personal Home Page scripts (PHP) has been selected. Research was conducted to evaluate mechanisms to electronically transfer diverse datasets from different hospitals and collect the clinical data in concert with their related functional magnetic resonance imaging (fMRI). What was unique in the approach considered is that all pertinent clinical information about patients is synthesized with input from clinical experts into 4 different forms, which were: Clinical, fMRI scoring, Image information, and Neuropsychological data entry forms. A first contribution of this dissertation was in proposing an integrated processing platform that was site and scanner independent in order to uniformly process the varied fMRI datasets and to generate comparative brain activation patterns. The data collection from the consortium complied with the IRB requirements and provides all the safeguards for security and confidentiality requirements. An 1-MR1-based software library was used to perform data processing and statistical analysis to obtain the brain activation maps. Lateralization Index (LI) of healthy control (HC) subjects in contrast to localization-related epilepsy (LRE) subjects were evaluated. Over 110 activation maps were generated, and their respective LIs were computed yielding the following groups: (a) strong right lateralization: (HC=0%, LRE=18%), (b) right lateralization: (HC=2%, LRE=10%), (c) bilateral: (HC=20%, LRE=15%), (d) left lateralization: (HC=42%, LRE=26%), e) strong left lateralization: (HC=36%, LRE=31%). Moreover, nonlinear-multidimensional decision functions were used to seek an optimal separation between typical and atypical brain activations on the basis of the demographics as well as the extent and intensity of these brain activations. The intent was not to seek the highest output measures given the inherent overlap of the data, but rather to assess which of the many dimensions were critical in the overall assessment of typical and atypical language activations with the freedom to select any number of dimensions and impose any degree of complexity in the nonlinearity of the decision space.
Resumo:
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
With the exponential increasing demands and uses of GIS data visualization system, such as urban planning, environment and climate change monitoring, weather simulation, hydrographic gauge and so forth, the geospatial vector and raster data visualization research, application and technology has become prevalent. However, we observe that current web GIS techniques are merely suitable for static vector and raster data where no dynamic overlaying layers. While it is desirable to enable visual explorations of large-scale dynamic vector and raster geospatial data in a web environment, improving the performance between backend datasets and the vector and raster applications remains a challenging technical issue. This dissertation is to implement these challenging and unimplemented areas: how to provide a large-scale dynamic vector and raster data visualization service with dynamic overlaying layers accessible from various client devices through a standard web browser, and how to make the large-scale dynamic vector and raster data visualization service as rapid as the static one. To accomplish these, a large-scale dynamic vector and raster data visualization geographic information system based on parallel map tiling and a comprehensive performance improvement solution are proposed, designed and implemented. They include: the quadtree-based indexing and parallel map tiling, the Legend String, the vector data visualization with dynamic layers overlaying, the vector data time series visualization, the algorithm of vector data rendering, the algorithm of raster data re-projection, the algorithm for elimination of superfluous level of detail, the algorithm for vector data gridding and re-grouping and the cluster servers side vector and raster data caching.
Resumo:
Stable isotope analysis has emerged as one of the primary means for examining the structure and dynamics of food webs, and numerous analytical approaches are now commonly used in the field. Techniques range from simple, qualitative inferences based on the isotopic niche, to Bayesian mixing models that can be used to characterize food-web structure at multiple hierarchical levels. We provide a comprehensive review of these techniques, and thus a single reference source to help identify the most useful approaches to apply to a given data set. We structure the review around four general questions: (1) what is the trophic position of an organism in a food web?; (2) which resource pools support consumers?; (3) what additional information does relative position of consumers in isotopic space reveal about food-web structure?; and (4) what is the degree of trophic variability at the intrapopulation level? For each general question, we detail different approaches that have been applied, discussing the strengths and weaknesses of each. We conclude with a set of suggestions that transcend individual analytical approaches, and provide guidance for future applications in the field.
Resumo:
The purpose of this study was to explore the relationship between faculty perceptions, selected demographics, implementation of elements of transactional distance theory and online web-based course completion rates. This theory posits that the high transactional distance of online courses makes it difficult for students to complete these courses successfully; too often this is associated with low completion rates. Faculty members play an indispensable role in course design, whether online or face-to-face. They also influence course delivery format from design through implementation and ultimately to how students will experience the course. This study used transactional distance theory as the conceptual framework to examine the relationship between teaching and learning strategies used by faculty members to help students complete online courses. Faculty members' sex, number of years teaching online at the college, and their online course completion rates were considered. A researcher-developed survey was used to collect data from 348 faculty members who teach online at two prominent colleges in the southeastern part of United States. An exploratory factor analysis resulted in six factors related to transactional distance theory. The factors accounted for slightly over 65% of the variance of transactional distance scores as measured by the survey instrument. Results provided support for Moore's (1993) theory of transactional distance. Female faculty members scored higher in all the factors of transactional distance theory when compared to men. Faculty number of years teaching online at the college level correlated significantly with all the elements of transactional distance theory. Regression analysis was used to determine that two of the factors, instructor interface and instructor-learner interaction, accounted for 12% of the variance in student online course completion rates. In conclusion, of the six factors found, the two with the highest percentage scores were instructor interface and instructor-learner interaction. This finding, while in alignment with the literature concerning the dialogue element of transactional distance theory, brings a special interest to the importance of instructor interface as a factor. Surprisingly, based on the reviewed literature on transactional distance theory, faculty perceptions concerning learner-learner interaction was not an important factor and there was no learner-content interaction factor.
Resumo:
Stable isotope analysis has become a standard ecological tool for elucidating feeding relationships of organisms and determining food web structure and connectivity. There remain important questions concerning rates at which stable isotope values are incorporated into tissues (turnover rates) and the change in isotope value between a tissue and a food source (discrimination values). These gaps in our understanding necessitate experimental studies to adequately interpret field data. Tissue turnover rates and discrimination values vary among species and have been investigated in a broad array of taxa. However, little attention has been paid to ectothermic top predators in this regard. We quantified the turnover rates and discrimination values for three tissues (scutes, red blood cells, and plasma) in American alligators (Alligator mississippiensis). Plasma turned over faster than scutes or red blood cells, but turnover rates of all three tissues were very slow in comparison to those in endothermic species. Alligator δ15N discrimination values were surprisingly low in comparison to those of other top predators and varied between experimental and control alligators. The variability of δ15N discrimination values highlights the difficulties in using δ15N to assign absolute and possibly even relative trophic levels in field studies. Our results suggest that interpreting stable isotope data based on parameter estimates from other species can be problematic and that large ectothermic tetrapod tissues may be characterized by unique stable isotope dynamics relative to species occupying lower trophic levels and endothermic tetrapods.
Resumo:
The software product line engineering brings advantages when compared with the traditional software development regarding the mass customization of the system components. However, there are scenarios that to maintain separated clones of a software system seems to be an easier and more flexible approach to manage their variabilities of a software product line. This dissertation evaluates qualitatively an approach that aims to support the reconciliation of functionalities between cloned systems. The analyzed approach is based on mining data about the issues and source code of evolved cloned web systems. The next step is to process the merge conflicts collected by the approach and not indicated by traditional control version systems to identify potential integration problems from the cloned software systems. The results of the study show the feasibility of the approach to perform a systematic characterization and analysis of merge conflicts for large-scale web-based systems.
Resumo:
La tesi presenta uno studio della libreria grafica per web D3, sviluppata in javascript, e ne presenta una catalogazione dei grafici implementati e reperibili sul web. Lo scopo è quello di valutare la libreria e studiarne i pregi e difetti per capire se sia opportuno utilizzarla nell'ambito di un progetto Europeo. Per fare questo vengono studiati i metodi di classificazione dei grafici presenti in letteratura e viene esposto e descritto lo stato dell'arte del data visualization. Viene poi descritto il metodo di classificazione proposto dal team di progettazione e catalogata la galleria di grafici presente sul sito della libreria D3. Infine viene presentato e studiato in maniera formale un algoritmo per selezionare un grafico in base alle esigenze dell'utente.