962 resultados para 0804 Data Format
Resumo:
Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.
Resumo:
Every Argo data file submitted by a DAC for distribution on the GDAC has its format and data consistency checked by the Argo FileChecker. Two types of checks are applied: 1. Format checks. Ensures the file formats match the Argo standards precisely. 2. Data consistency checks. Additional data consistency checks are performed on a file after it passes the format checks. These checks do not duplicate any of the quality control checks performed elsewhere. These checks can be thought of as “sanity checks” to ensure that the data are consistent with each other. The data consistency checks enforce data standards and ensure that certain data values are reasonable and/or consistent with other information in the files. Examples of the “data standard” checks are the “mandatory parameters” defined for meta-data files and the technical parameter names in technical data files. Files with format or consistency errors are rejected by the GDAC and are not distributed. Less serious problems will generate warnings and the file will still be distributed on the GDAC. Reference Tables and Data Standards: Many of the consistency checks involve comparing the data to the published reference tables and data standards. These tables are documented in the User’s Manual. (The FileChecker implements “text versions” of these tables.)
Resumo:
Fault diagnosis has become an important component in intelligent systems, such as intelligent control systems and intelligent eLearning systems. Reiter's diagnosis theory, described by first-order sentences, has been attracting much attention in this field. However, descriptions and observations of most real-world situations are related to fuzziness because of the incompleteness and the uncertainty of knowledge, e. g., the fault diagnosis of student behaviors in the eLearning processes. In this paper, an extension of Reiter's consistency-based diagnosis methodology, Fuzzy Diagnosis, has been proposed, which is able to deal with incomplete or fuzzy knowledge. A number of important properties of the Fuzzy diagnoses schemes have also been established. The computing of fuzzy diagnoses is mapped to solving a system of inequalities. Some special cases, abstracted from real-world situations, have been discussed. In particular, the fuzzy diagnosis problem, in which fuzzy observations are represented by clause-style fuzzy theories, has been presented and its solving method has also been given. A student fault diagnostic problem abstracted from a simplified real-world eLearning case is described to demonstrate the application of our diagnostic framework.
Resumo:
An ontology is increasingly becoming an essential tool for solving problems in many research areas. The ontology is a complex information object. It can contain millions of concepts in complex relationships. When we want to manage complex information objects, we generally turn to information systems technology. An information system intended to manage ontology is called an ontology server. The ontology server technology is at the time of writing quite immature. Therefore, this paper reviews and compares the main ontology servers that have been reported in the literatures. As a result, we point out several research questions related to server technology
Resumo:
The paper provides evidence that spatial indexing structures offer faster resolution of Formal Concept Analysis queries than B-Tree/Hash methods. We show that many Formal Concept Analysis operations, computing the contingent and extent sizes as well as listing the matching objects, enjoy improved performance with the use of spatial indexing structures such as the RD-Tree. Speed improvements can vary up to eighty times faster depending on the data and query. The motivation for our study is the application of Formal Concept Analysis to Semantic File Systems. In such applications millions of formal objects must be dealt with. It has been found that spatial indexing also provides an effective indexing technique for more general purpose applications requiring scalability in Formal Concept Analysis systems. The coverage and benchmarking are presented with general applications in mind.
Resumo:
Supported file formats: - CrossRef XML file(s) - TRiDaS (Tree Ring Data Standard, http://www.tridas.org). Example: hdl:10013/epic.42747.d001 - IMMA (International Maritime Meteorological Archive). Used by the project CLIWOC (García-Herrera et al. 2007, http://doi.pangaea.de/10.1594/PANGAEA.743343) - NOAA IOAS (International Ocean Atlas Series). Example: hdl:10013/epic.42747.d008 - SOCAT (Surface Ocean CO2 Atlas, Bakker et al. 2014, http://doi.pangaea.de/10.1594/PANGAEA.811776) - CHUAN (Comprehensive Historical Upper-Air Network, Stickler et al. 2013, http://doi.pangaea.de/10.1594/PANGAEA.821222). Example: hdl:10013/epic.42747.d003 - Thermosalinograph (TSG) data. Format developed by Gerd Rohardt. Example: hdl:10013/epic.42747.d002 - Columus GPS Data Logger V-900 format to KML or GPX. Example: hdl:10013/epic.42747.d006
Resumo:
QUT Library and the High Performance Computing and Research Support (HPC) Team have been collaborating on developing and delivering a range of research support services, including those designed to assist researchers to manage their data. QUT’s Management of Research Data policy has been available since 2010 and is complemented by the Data Management Guidelines and Checklist. QUT has partnered with the Australian Research Data Service (ANDS) on a number of projects including Seeding the Commons, Metadata Hub (with Griffith University) and the Data Capture program. The HPC Team has also been developing the QUT Research Data Repository based on the Architecta Mediaflux system and have run several pilots with faculties. Library and HPC staff have been trained in the principles of research data management and are providing a range of research data management seminars and workshops for researchers and HDR students.
Resumo:
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.
Resumo:
It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.
Resumo:
Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.
Resumo:
Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.
Resumo:
Unterstützungssysteme für die Programmierausbildung sind weit verbreitet, doch gängige Standards für den Austausch von allgemeinen (Lern-) Inhalten und Tests erfüllen nicht die speziellen Anforderungen von Programmieraufgaben wie z. B. den Umgang mit komplexen Einreichungen aus mehreren Dateien oder die Kombination verschiedener (automatischer) Bewertungsverfahren. Dadurch können Aufgaben nicht zwischen Systemen ausgetauscht werden, was aufgrund des hohen Aufwands für die Entwicklung guter Aufgaben jedoch wünschenswert wäre. In diesem Beitrag wird ein erweiterbares XML-basiertes Format zum Austausch von Programmieraufgaben vorgestellt, das bereits von mehreren Systemen prototypisch genutzt wird. Die Spezifikation des Austauschformats ist online verfügbar [PFMA].