275 resultados para Computer Science, Interdisciplinary Applications
Resumo:
Non-linear methods for estimating variability in time-series are currently of widespread use. Among such methods are approximate entropy (ApEn) and sample approximate entropy (SampEn). The applicability of ApEn and SampEn in analyzing data is evident and their use is increasing. However, consistency is a point of concern in these tools, i.e., the classification of the temporal organization of a data set might indicate a relative less ordered series in relation to another when the opposite is true. As highlighted by their proponents themselves, ApEn and SampEn might present incorrect results due to this lack of consistency. In this study, we present a method which gains consistency by using ApEn repeatedly in a wide range of combinations of window lengths and matching error tolerance. The tool is called volumetric approximate entropy, vApEn. We analyze nine artificially generated prototypical time-series with different degrees of temporal order (combinations of sine waves, logistic maps with different control parameter values, random noises). While ApEn/SampEn clearly fail to consistently identify the temporal order of the sequences, vApEn correctly do. In order to validate the tool we performed shuffled and surrogate data analysis. Statistical analysis confirmed the consistency of the method. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. We have developed an inherently incremental technique (incBoard) that maintains a coherent disposition of elements from a dynamic multidimensional data set on a 2D grid as the set changes. Here, we introduce a novel layout that uses pairwise similarity from grid neighbors, as defined in incBoard, to reposition elements on the visual space, free from constraints imposed by the grid. The board continues to be updated and can be displayed alongside the new space. As similar items are placed together, while dissimilar neighbors are moved apart, it supports users in the identification of clusters and subsets of related elements. Densely populated areas identified in the incSpace can be efficiently explored with the corresponding incBoard visualization, which is not susceptible to occlusion. The solution remains inherently incremental and maintains a coherent disposition of elements, even for fully renewed sets. The algorithm considers relative positions for the initial placement of elements, and raw dissimilarity to fine tune the visualization. It has low computational cost, with complexity depending only on the size of the currently viewed subset, V. Thus, a data set of size N can be sequentially displayed in O(N) time, reaching O(N (2)) only if the complete set is simultaneously displayed.
Resumo:
While watching TV, viewers use the remote control to turn the TV set on and off, change channel and volume, to adjust the image and audio settings, etc. Worldwide, research institutes collect information about audience measurement, which can also be used to provide personalization and recommendation services, among others. The interactive digital TV offers viewers the opportunity to interact with interactive applications associated with the broadcast program. Interactive TV infrastructure supports the capture of the user-TV interaction at fine-grained levels. In this paper we propose the capture of all the user interaction with a TV remote control-including short term and instant interactions: we argue that the corresponding captured information can be used to create content pervasively and automatically, and that this content can be used by a wide variety of services, such as audience measurement, personalization and recommendation services. The capture of fine grained data about instant and interval-based interactions also allows the underlying infrastructure to offer services at the same scale, such as annotation services and adaptative applications. We present the main modules of an infrastructure for TV-based services, along with a detailed example of a document used to record the user-remote control interaction. Our approach is evaluated by means of a proof-of-concept prototype which uses the Brazilian Digital TV System, the Ginga-NCL middleware.
Resumo:
The evolution of commodity computing lead to the possibility of efficient usage of interconnected machines to solve computationally-intensive tasks, which were previously solvable only by using expensive supercomputers. This, however, required new methods for process scheduling and distribution, considering the network latency, communication cost, heterogeneous environments and distributed computing constraints. An efficient distribution of processes over such environments requires an adequate scheduling strategy, as the cost of inefficient process allocation is unacceptably high. Therefore, a knowledge and prediction of application behavior is essential to perform effective scheduling. In this paper, we overview the evolution of scheduling approaches, focusing on distributed environments. We also evaluate the current approaches for process behavior extraction and prediction, aiming at selecting an adequate technique for online prediction of application execution. Based on this evaluation, we propose a novel model for application behavior prediction, considering chaotic properties of such behavior and the automatic detection of critical execution points. The proposed model is applied and evaluated for process scheduling in cluster and grid computing environments. The obtained results demonstrate that prediction of the process behavior is essential for efficient scheduling in large-scale and heterogeneous distributed environments, outperforming conventional scheduling policies by a factor of 10, and even more in some cases. Furthermore, the proposed approach proves to be efficient for online predictions due to its low computational cost and good precision. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
This paper is about the use of natural language to communicate with computers. Most researches that have pursued this goal consider only requests expressed in English. A way to facilitate the use of several languages in natural language systems is by using an interlingua. An interlingua is an intermediary representation for natural language information that can be processed by machines. We propose to convert natural language requests into an interlingua [universal networking language (UNL)] and to execute these requests using software components. In order to achieve this goal, we propose OntoMap, an ontology-based architecture to perform the semantic mapping between UNL sentences and software components. OntoMap also performs component search and retrieval based on semantic information formalized in ontologies and rules.
Resumo:
Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents (http://www.documentengineering.org). The ACM Symposium on Document Engineering is an annual meeting of researchers active in document engineering: it is sponsored by ACM by means of the ACM SIGWEB Special Interest Group. In this editorial, we first point to work carried out in the context of document engineering, which are directly related to multimedia tools and applications. We conclude with a summary of the papers presented in this special issue.
Resumo:
The pervasive and ubiquitous computing has motivated researches on multimedia adaptation which aims at matching the video quality to the user needs and device restrictions. This technique has a high computational cost which needs to be studied and estimated when designing architectures and applications. This paper presents an analytical model to quantify these video transcoding costs in a hardware independent way. The model was used to analyze the impact of transcoding delays in end-to-end live-video transmissions over LANs, MANs and WANs. Experiments confirm that the proposed model helps to define the best transcoding architecture for different scenarios.
Resumo:
Modern database applications are increasingly employing database management systems (DBMS) to store multimedia and other complex data. To adequately support the queries required to retrieve these kinds of data, the DBMS need to answer similarity queries. However, the standard structured query language (SQL) does not provide effective support for such queries. This paper proposes an extension to SQL that seamlessly integrates syntactical constructions to express similarity predicates to the existing SQL syntax and describes the implementation of a similarity retrieval engine that allows posing similarity queries using the language extension in a relational DBM. The engine allows the evaluation of every aspect of the proposed extension, including the data definition language and data manipulation language statements, and employs metric access methods to accelerate the queries. Copyright (c) 2008 John Wiley & Sons, Ltd.
Resumo:
Techniques devoted to generating triangular meshes from intensity images either take as input a segmented image or generate a mesh without distinguishing individual structures contained in the image. These facts may cause difficulties in using such techniques in some applications, such as numerical simulations. In this work we reformulate a previously developed technique for mesh generation from intensity images called Imesh. This reformulation makes Imesh more versatile due to an unified framework that allows an easy change of refinement metric, rendering it effective for constructing meshes for applications with varied requirements, such as numerical simulation and image modeling. Furthermore, a deeper study about the point insertion problem and the development of geometrical criterion for segmentation is also reported in this paper. Meshes with theoretical guarantee of quality can also be obtained for each individual image structure as a post-processing step, a characteristic not usually found in other methods. The tests demonstrate the flexibility and the effectiveness of the approach.
Resumo:
This paper aims at identifying some of the key factors in adopting an organization-wide software reuse program. The factors are derived from practical experience reported by industry professionals, through a survey involving 57 Brazilian small, medium and large software organizations. Some of them produce software with commonality between applications, and have mature processes, while others successfully achieved reuse through isolated, ad hoe efforts. The paper compiles the answers from the survey participants, showing which factors were more associated with reuse success. Based on this relationship, a guide is presented, pointing out which factors should be more strongly considered by small, medium and large organizations attempting to establish a reuse program. (C) 2007 Elsevier Inc. All rights reserved.
Resumo:
The literature reports research efforts allowing the editing of interactive TV multimedia documents by end-users. In this article we propose complementary contributions relative to end-user generated interactive video, video tagging, and collaboration. In earlier work we proposed the watch-and-comment (WaC) paradigm as the seamless capture of an individual`s comments so that corresponding annotated interactive videos be automatically generated. As a proof of concept, we implemented a prototype application, the WACTOOL, that supports the capture of digital ink and voice comments over individual frames and segments of the video, producing a declarative document that specifies both: different media stream structure and synchronization. In this article, we extend the WaC paradigm in two ways. First, user-video interactions are associated with edit commands and digital ink operations. Second, focusing on collaboration and distribution issues, we employ annotations as simple containers for context information by using them as tags in order to organize, store and distribute information in a P2P-based multimedia capture platform. We highlight the design principles of the watch-and-comment paradigm, and demonstrate related results including the current version of the WACTOOL and its architecture. We also illustrate how an interactive video produced by the WACTOOL can be rendered in an interactive video environment, the Ginga-NCL player, and include results from a preliminary evaluation.
Resumo:
Species` potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species` potential distribution. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.
Resumo:
Credit scoring modelling comprises one of the leading formal tools for supporting the granting of credit. Its core objective consists of the generation of a score by means of which potential clients can be listed in the order of the probability of default. A critical factor is whether a credit scoring model is accurate enough in order to provide correct classification of the client as a good or bad payer. In this context the concept of bootstraping aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the fitted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper we propose a new bagging-type variant procedure, which we call poly-bagging, consisting of combining predictors over a succession of resamplings. The study is derived by credit scoring modelling. The proposed poly-bagging procedure was applied to some different artificial datasets and to a real granting of credit dataset up to three successions of resamplings. We observed better classification accuracy for the two-bagged and the three-bagged models for all considered setups. These results lead to a strong indication that the poly-bagging approach may promote improvement on the modelling performance measures, while keeping a flexible and straightforward bagging-type structure easy to implement. (C) 2011 Elsevier Ltd. All rights reserved.