863 resultados para Data sources detection
Resumo:
An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^
Resumo:
The research presented in this dissertation is comprised of several parts which jointly attain the goal of Semantic Distributed Database Management with Applications to Internet Dissemination of Environmental Data. ^ Part of the research into more effective and efficient data management has been pursued through enhancements to the Semantic Binary Object-Oriented database (Sem-ODB) such as more effective load balancing techniques for the database engine, and the use of Sem-ODB as a tool for integrating structured and unstructured heterogeneous data sources. Another part of the research in data management has pursued methods for optimizing queries in distributed databases through the intelligent use of network bandwidth; this has applications in networks that provide varying levels of Quality of Service or throughput. ^ The application of the Semantic Binary database model as a tool for relational database modeling has also been pursued. This has resulted in database applications that are used by researchers at the Everglades National Park to store environmental data and to remotely-sensed imagery. ^ The areas of research described above have contributed to the creation TerraFly, which provides for the dissemination of geospatial data via the Internet. TerraFly research presented herein ranges from the development of TerraFly's back-end database and interfaces, through the features that are presented to the public (such as the ability to provide autopilot scripts and on-demand data about a point), to applications of TerraFly in the areas of hazard mitigation, recreation, and aviation. ^
Resumo:
With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.
Resumo:
Context: Due to a unique combination of factors, outdoor athletes in the Southeastern United States are at high risk of lightning deaths and injuries. Lightning detection methods are available to minimize lightning strike victims. Objective: Becoming aware of the risk factors that predispose athletes to lightning strikes and determining the most reliable detection method against hazardous weather will enable Certified Athletic Trainers to develop protocols that protect athletes from injury. Data Sources: A comprehensive literature review of Medline and Pubmed using key words: lightning, lightning risk factors, lightning safety, lightning detection, and athletic trainers and lightning was completed. Data Synthesis: Factors predisposing athletes to lighting death or injury include: time of year, time of day, the athlete’s age, geographical location, physical location, sex, perspiration level, and lack of education and preparedness by athletes and staff. Although handheld lightning detectors have become widely accessible to detect lightning strikes, their performance has not been independently or objectively confirmed. There is evidence that these detectors inaccurately detect strike locations by recording false strikes and not recording actual strikes. Conclusions: Lightning education and preparation are two factors that can be controlled. Measures need to be taken by Certified Athletic Trainers to ensure the safety of athletes during outdoor athletics. It is critical for athletic trainers and supervising staff members to become fully aware of the risks of lightning strikes in order to most effectively protect everyone under their supervision. Even though lightning detectors have been manufactured in an attempt to minimize death and injuries due to lightning strikes, none of the detectors have been proven to be 100% effective. Educating coaches, athletes, and parents on the risks of lightning and the detection methods available, while implementing an emergency action plan for lightning safety, is crucial to ensure the well being of the student-athlete population.
Resumo:
The purpose of this project was to evaluate the use of remote sensing 1) to detect and map Everglades wetland plant communities at different scales; and 2) to compare map products delineated and resampled at various scales with the intent to quantify and describe the quantitative and qualitative differences between such products. We evaluated data provided by Digital Globe’s WorldView 2 (WV2) sensor with a spatial resolution of 2m and data from Landsat’s Thematic and Enhanced Thematic Mapper (TM and ETM+) sensors with a spatial resolution of 30m. We were also interested in the comparability and scalability of products derived from these data sources. The adequacy of each data set to map wetland plant communities was evaluated utilizing two metrics: 1) model-based accuracy estimates of the classification procedures; and 2) design-based post-classification accuracy estimates of derived maps.
Resumo:
Kernel-level malware is one of the most dangerous threats to the security of users on the Internet, so there is an urgent need for its detection. The most popular detection approach is misuse-based detection. However, it cannot catch up with today's advanced malware that increasingly apply polymorphism and obfuscation. In this thesis, we present our integrity-based detection for kernel-level malware, which does not rely on the specific features of malware. ^ We have developed an integrity analysis system that can derive and monitor integrity properties for commodity operating systems kernels. In our system, we focus on two classes of integrity properties: data invariants and integrity of Kernel Queue (KQ) requests. ^ We adopt static analysis for data invariant detection and overcome several technical challenges: field-sensitivity, array-sensitivity, and pointer analysis. We identify data invariants that are critical to system runtime integrity from Linux kernel 2.4.32 and Windows Research Kernel (WRK) with very low false positive rate and very low false negative rate. We then develop an Invariant Monitor to guard these data invariants against real-world malware. In our experiment, we are able to use Invariant Monitor to detect ten real-world Linux rootkits and nine real-world Windows malware and one synthetic Windows malware. ^ We leverage static and dynamic analysis of kernel and device drivers to learn the legitimate KQ requests. Based on the learned KQ requests, we build KQguard to protect KQs. At runtime, KQguard rejects all the unknown KQ requests that cannot be validated. We apply KQguard on WRK and Linux kernel, and extensive experimental evaluation shows that KQguard is efficient (up to 5.6% overhead) and effective (capable of achieving zero false positives against representative benign workloads after appropriate training and very low false negatives against 125 real-world malware and nine synthetic attacks). ^ In our system, Invariant Monitor and KQguard cooperate together to protect data invariants and KQs in the target kernel. By monitoring these integrity properties, we can detect malware by its violation of these integrity properties during execution.^
Resumo:
With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.
Resumo:
Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. One of the main challenges in data integration is reconciling semantic differences among data sources. Approaches that been used to solve this problem can be categorized as schema-based and attribute-based. Schema-based approaches use schema information to identify the semantic similarity in data; furthermore, they focus on reconciling types before reconciling attributes. In contrast, attribute-based approaches use statistical and structural information of attributes to identify the semantic similarity of data in different sources. This research examines an approach to semantic reconciliation based on integrating properties expressed at different levels of abstraction or granularity using the concept of property precedence. Property precedence reconciles the meaning of attributes by identifying similarities between attributes based on what these attributes represent in the real world. In order to use property precedence for semantic integration, we need to identify the precedence of attributes within and across data sources. The goal of this research is to develop and evaluate a method and algorithms that will identify precedence relations among attributes and build property precedence graph (PPG) that can be used to support integration.
Resumo:
Kernel-level malware is one of the most dangerous threats to the security of users on the Internet, so there is an urgent need for its detection. The most popular detection approach is misuse-based detection. However, it cannot catch up with today's advanced malware that increasingly apply polymorphism and obfuscation. In this thesis, we present our integrity-based detection for kernel-level malware, which does not rely on the specific features of malware. We have developed an integrity analysis system that can derive and monitor integrity properties for commodity operating systems kernels. In our system, we focus on two classes of integrity properties: data invariants and integrity of Kernel Queue (KQ) requests. We adopt static analysis for data invariant detection and overcome several technical challenges: field-sensitivity, array-sensitivity, and pointer analysis. We identify data invariants that are critical to system runtime integrity from Linux kernel 2.4.32 and Windows Research Kernel (WRK) with very low false positive rate and very low false negative rate. We then develop an Invariant Monitor to guard these data invariants against real-world malware. In our experiment, we are able to use Invariant Monitor to detect ten real-world Linux rootkits and nine real-world Windows malware and one synthetic Windows malware. We leverage static and dynamic analysis of kernel and device drivers to learn the legitimate KQ requests. Based on the learned KQ requests, we build KQguard to protect KQs. At runtime, KQguard rejects all the unknown KQ requests that cannot be validated. We apply KQguard on WRK and Linux kernel, and extensive experimental evaluation shows that KQguard is efficient (up to 5.6% overhead) and effective (capable of achieving zero false positives against representative benign workloads after appropriate training and very low false negatives against 125 real-world malware and nine synthetic attacks). In our system, Invariant Monitor and KQguard cooperate together to protect data invariants and KQs in the target kernel. By monitoring these integrity properties, we can detect malware by its violation of these integrity properties during execution.
Resumo:
Site 1103 was one of a transect of three sites drilled across the Antarctic Peninsula continental shelf during Leg 178. The aim of drilling on the shelf was to determine the age of the sedimentary sequences and to ground truth previous interpretations of the depositional environment (i.e., topsets and foresets) of progradational seismostratigraphic sequences S1, S2, S3, and S4. The ultimate objective was to obtain a better understanding of the history of glacial advances and retreats in this west Antarctic margin. Drilling the topsets of the progradational wedge (0-247 m below seafloor [mbsf]), which consist of unsorted and unconsolidated materials of seismic Unit S1, was very unfavorable, resulting in very low (2.3%) core recovery. Recovery improved (34%) below 247 mbsf, corresponding to sediments of seismic Unit S3, which have a consolidated matrix. Logs were only obtained from the interval between 75 and 244 mbsf, and inconsistencies on the automatic analog picking of the signals received from the sonic log at the array and at the two other receivers prevented accurate shipboard time-depth conversions. This, in turn, limited the capacity for making seismic stratigraphic interpretations at this site and regionally. This study is an attempt to compile all available data sources, perform quality checks, and introduce nonstandard processing techniques for the logging data obtained to arrive at a reliable and continuous depth vs. velocity profile. We defined 13 data categories using differential traveltime information. Polynomial exclusion techniques with various orders and low-pass filtering reduced the noise of the initial data pool and produced a definite velocity depth profile that is synchronous with the resistivity logging data. A comparison of the velocity profile produced with various other logs of Site 1103 further validates the presented data. All major logging units are expressed within the new velocity data. A depth-migrated section with the new velocity data is presented together with the original time section and initial depth estimates published within the Leg 178 Initial Reports volume. The presented data confirms the location of the shelf unconformity at 222 ms two-way traveltime (TWT), or 243 mbsf, and allows its seismic identification as a strong negative and subsequent positive reflection.
Resumo:
The creation of Causal Loop Diagrams (CLDs) is a major phase in the System Dynamics (SD) life-cycle, since the created CLDs express dependencies and feedback in the system under study, as well as, guide modellers in building meaningful simulation models. The cre-ation of CLDs is still subject to the modeller's domain expertise (mental model) and her ability to abstract the system, because of the strong de-pendency on semantic knowledge. Since the beginning of SD, available system data sources (written and numerical models) have always been sparsely available, very limited and imperfect and thus of little benefit to the whole modelling process. However, in recent years, we have seen an explosion in generated data, especially in all business related domains that are analysed via Business Dynamics (BD). In this paper, we intro-duce a systematic tool supported CLD creation approach, which analyses and utilises available disparate data sources within the business domain. We demonstrate the application of our methodology on a given business use-case and evaluate the resulting CLD. Finally, we propose directions for future research to further push the automation in the CLD creation and increase confidence in the generated CLDs.
Resumo:
Real-time data of key performance enablers in logistics warehouses are of growing importance as they permit decision-makers to instantaneously react to alerts, deviations and damages. Several technologies appear as adequate data sources to collect the information required in order to achieve the goal. In the present re-search paper, the load status of the fork of a forklift is to be recognized with the help of a sensor-based and a camera-based solution approach. The comparison of initial experimentation results yields a statement about which direction to pursue for promising further research.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08