969 resultados para Data Representations


Relevância:

70.00% 70.00%

Publicador:

Resumo:

There is a requirement for better integration between design and analysis tools, which is difficult due to their different objectives, separate data representations and workflows. Currently, substantial effort is required to produce a suitable analysis model from design geometry. Robust links are required between these different representations to enable analysis attributes to be transferred between different design and analysis packages for models at various levels of fidelity.

This paper describes a novel approach for integrating design and analysis models by identifying and managing the relationships between the different representations. Three key technologies, Cellular Modeling, Virtual Topology and Equivalencing, have been employed to achieve effective simulation model management. These technologies and their implementation are discussed in detail. Prototype automated tools are introduced demonstrating how multiple simulation models can be linked and maintained to facilitate seamless integration throughout the design cycle.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Future digital signal processing (DSP) systems must provide robustness on algorithm and application level to the presence of reliability issues that come along with corresponding implementations in modern semiconductor process technologies. In this paper, we address this issue by investigating the impact of unreliable memories on general DSP systems. In particular, we propose a novel framework to characterize the effects of unreliable memories, which enables us to devise novel methods to mitigate the associated performance loss. We propose to deploy specifically designed data representations, which have the capability of substantially improving the system reliability compared to that realized by conventional data representations used in digital integrated circuits, such as 2's-complement or sign-magnitude number formats. To demonstrate the efficacy of the proposed framework, we analyze the impact of unreliable memories on coded communication systems, and we show that the deployment of optimized data representations substantially improves the error-rate performance of such systems.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Effective data summarization methods that use AI techniques can help humans understand large sets of data. In this paper, we describe a knowledge-based method for automatically generating summaries of geospatial and temporal data, i.e. data with geographical and temporal references. The method is useful for summarizing data streams, such as GPS traces and traffic information, that are becoming more prevalent with the increasing use of sensors in computing devices. The method presented here is an initial architecture for our ongoing research in this domain. In this paper we describe the data representations we have designed for our method, our implementations of components to perform data abstraction and natural language generation. We also discuss evaluation results that show the ability of our method to generate certain types of geospatial and temporal descriptions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This technique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The digital humanities are growing rapidly in response to a rise in Internet use. What humanists mostly work on, and which forms much of the contents of our growing repositories, are digital surrogates of originally analog artefacts. But is the data model upon which many of those surrogates are based – embedded markup – adequate for the task? Or does it in fact inhibit reusability and flexibility? To enhance interoperability of resources and tools, some changes to the standard markup model are needed. Markup could be removed from the text and stored in standoff form. The versions of which many cultural heritage texts are composed could also be represented externally, and computed automatically. These changes would not disrupt existing data representations, which could be imported without significant data loss. They would also enhance automation and ease the increasing burden on the modern digital humanist.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This chapter argues for the need to restructure children’s statistical experiences from the beginning years of formal schooling. The ability to understand and apply statistical reasoning is paramount across all walks of life, as seen in the variety of graphs, tables, diagrams, and other data representations requiring interpretation. Young children are immersed in our data-driven society, with early access to computer technology and daily exposure to the mass media. With the rate of data proliferation have come increased calls for advancing children’s statistical reasoning abilities, commencing with the earliest years of schooling (e.g., Langrall et al. 2008; Lehrer and Schauble 2005; Shaughnessy 2010; Whitin and Whitin 2011). Several articles (e.g., Franklin and Garfield 2006; Langrall et al. 2008) and policy documents (e.g., National Council of Teachers ofMathematics 2006) have highlighted the need for a renewed focus on this component of early mathematics learning, with children working mathematically and scientifically in dealing with realworld data. One approach to this component in the beginning school years is through data modelling (English 2010; Lehrer and Romberg 1996; Lehrer and Schauble 2000, 2007)...

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The CIL compiler for core Standard ML compiles whole programs using a novel typed intermediate language (TIL) with intersection and union types and flow labels on both terms and types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce customized data representations. However, duplication incurs compile-time space costs that are potentially much greater than are incurred in TILs employing type-level abstraction or quantification. In this paper, we present empirical data on the compile-time space costs of using CIL as an intermediate language. The data shows that these costs can be made tractable by using sufficiently fine-grained flow analyses together with standard hash-consing techniques. The data also suggests that non-duplicating formulations of intersection (and union) types would not achieve significantly better space complexity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Discrete data representations are necessary, or at least convenient, in many machine learning problems. While feature selection (FS) techniques aim at finding relevant subsets of features, the goal of feature discretization (FD) is to find concise (quantized) data representations, adequate for the learning task at hand. In this paper, we propose two incremental methods for FD. The first method belongs to the filter family, in which the quality of the discretization is assessed by a (supervised or unsupervised) relevance criterion. The second method is a wrapper, where discretized features are assessed using a classifier. Both methods can be coupled with any static (unsupervised or supervised) discretization procedure and can be used to perform FS as pre-processing or post-processing stages. The proposed methods attain efficient representations suitable for binary and multi-class problems with different types of data, being competitive with existing methods. Moreover, using well-known FS methods with the features discretized by our techniques leads to better accuracy than with the features discretized by other methods or with the original features. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis is based on five papers addressing variance reduction in different ways. The papers have in common that they all present new numerical methods. Paper I investigates quantitative structure-retention relationships from an image processing perspective, using an artificial neural network to preprocess three-dimensional structural descriptions of the studied steroid molecules. Paper II presents a new method for computing free energies. Free energy is the quantity that determines chemical equilibria and partition coefficients. The proposed method may be used for estimating, e.g., chromatographic retention without performing experiments. Two papers (III and IV) deal with correcting deviations from bilinearity by so-called peak alignment. Bilinearity is a theoretical assumption about the distribution of instrumental data that is often violated by measured data. Deviations from bilinearity lead to increased variance, both in the data and in inferences from the data, unless invariance to the deviations is built into the model, e.g., by the use of the method proposed in paper III and extended in paper IV. Paper V addresses a generic problem in classification; namely, how to measure the goodness of different data representations, so that the best classifier may be constructed. Variance reduction is one of the pillars on which analytical chemistry rests. This thesis considers two aspects on variance reduction: before and after experiments are performed. Before experimenting, theoretical predictions of experimental outcomes may be used to direct which experiments to perform, and how to perform them (papers I and II). After experiments are performed, the variance of inferences from the measured data are affected by the method of data analysis (papers III-V).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The aim of this paper is to demonstrate the validity of using Gaussian mixture models (GMM) for representing probabilistic distributions in a decentralised data fusion (DDF) framework. GMMs are a powerful and compact stochastic representation allowing efficient communication of feature properties in large scale decentralised sensor networks. It will be shown that GMMs provide a basis for analytical solutions to the update and prediction operations for general Bayesian filtering. Furthermore, a variant on the Covariance Intersect algorithm for Gaussian mixtures will be presented ensuring a conservative update for the fusion of correlated information between two nodes in the network. In addition, purely visual sensory data will be used to show that decentralised data fusion and tracking of non-Gaussian states observed by multiple autonomous vehicles is feasible.