971 resultados para 080400 DATA FORMAT


Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Every Argo data file submitted by a DAC for distribution on the GDAC has its format and data consistency checked by the Argo FileChecker. Two types of checks are applied: 1. Format checks. Ensures the file formats match the Argo standards precisely. 2. Data consistency checks. Additional data consistency checks are performed on a file after it passes the format checks. These checks do not duplicate any of the quality control checks performed elsewhere. These checks can be thought of as “sanity checks” to ensure that the data are consistent with each other. The data consistency checks enforce data standards and ensure that certain data values are reasonable and/or consistent with other information in the files. Examples of the “data standard” checks are the “mandatory parameters” defined for meta-data files and the technical parameter names in technical data files. Files with format or consistency errors are rejected by the GDAC and are not distributed. Less serious problems will generate warnings and the file will still be distributed on the GDAC. Reference Tables and Data Standards: Many of the consistency checks involve comparing the data to the published reference tables and data standards. These tables are documented in the User’s Manual. (The FileChecker implements “text versions” of these tables.)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is the first outdoor test of small-scale dye sensitized solar cells (DSC) powering a standalone nanosensor node. A solar cell test station (SCTS) has been developed using standard DSC to power a gas nanosensor, a radio transmitter, and the control electronics (CE) for battery charging. The station is remotely monitored through wired (Ethernet cable) or wireless connection (radio transmitter) in order to evaluate in real time the performance of the solar cells powering a nanosensor and a transmitter under different weather conditions. We analyze trends of energy conversion efficiency after 60 days of operation. The 408 cm2 active surface module produces enough energy to power a gas nanosensor and a radio transmitter during the day and part of the night. Also, by using a variable programmable load we keep the system working on the maximum power point (MPP) quantifying the total energy generated and stored in a battery. Although this technology is at an early stage of development, these experiments provide useful data for future outdoor applications such as nanosensor network nodes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the conventional wisdom that proactive security is superior to reactive security, we show that reactive security can be competitive with proactive security as long as the reactive defender learns from past attacks instead of myopically overreacting to the last attack. Our game-theoretic model follows common practice in the security literature by making worst-case assumptions about the attacker: we grant the attacker complete knowledge of the defender’s strategy and do not require the attacker to act rationally. In this model, we bound the competitive ratio between a reactive defense algorithm (which is inspired by online learning theory) and the best fixed proactive defense. Additionally, we show that, unlike proactive defenses, this reactive strategy is robust to a lack of information about the attacker’s incentives and knowledge.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two-party key exchange (2PKE) protocols have been rigorously analyzed under various models considering different adversarial actions. However, the analysis of group key exchange (GKE) protocols has not been as extensive as that of 2PKE protocols. Particularly, an important security attribute called key compromise impersonation (KCI) resilience has been completely ignored for the case of GKE protocols. Informally, a protocol is said to provide KCI resilience if the compromise of the long-term secret key of a protocol participant A does not allow the adversary to impersonate an honest participant B to A. In this paper, we argue that KCI resilience for GKE protocols is at least as important as it is for 2PKE protocols. Our first contribution is revised definitions of security for GKE protocols considering KCI attacks by both outsider and insider adversaries. We also give a new proof of security for an existing two-round GKE protocol under the revised security definitions assuming random oracles. We then show how to achieve insider KCIR in a generic way using a known compiler in the literature. As one may expect, this additional security assurance comes at the cost of an extra round of communication. Finally, we show that a few existing protocols are not secure against outsider KCI attacks. The attacks on these protocols illustrate the necessity of considering KCI resilience for GKE protocols.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collaborative question answering (cQA) portals such as Yahoo! Answers allow users as askers or answer authors to communicate, and exchange information through the asking and answering of questions in the network. In their current set-up, answers to a question are arranged in chronological order. For effective information retrieval, it will be advantageous to have the users’ answers ranked according to their quality. This paper proposes a novel approach of evaluating and ranking the users’answers and recommending the top-n quality answers to information seekers. The proposed approach is based on a user-reputation method which assigns a score to an answer reflecting its answer author’s reputation level in the network. The proposed approach is evaluated on a dataset collected from a live cQA, namely, Yahoo! Answers. To compare the results obtained by the non-content-based user-reputation method, experiments were also conducted with several content-based methods that assign a score to an answer reflecting its content quality. Various combinations of non-content and content-based scores were also used in comparing results. Empirical analysis shows that the proposed method is able to rank the users’ answers and recommend the top-n answers with good accuracy. Results of the proposed method outperform the content-based methods, various combinations, and the results obtained by the popular link analysis method, HITS.