922 resultados para Tree data structures
Resumo:
Spatial data representation and compression has become a focus issue in computer graphics and image processing applications. Quadtrees, as one of hierarchical data structures, basing on the principle of recursive decomposition of space, always offer a compact and efficient representation of an image. For a given image, the choice of quadtree root node plays an important role in its quadtree representation and final data compression. The goal of this thesis is to present a heuristic algorithm for finding a root node of a region quadtree, which is able to reduce the number of leaf nodes when compared with the standard quadtree decomposition. The empirical results indicate that, this proposed algorithm has quadtree representation and data compression improvement when in comparison with the traditional method.
Resumo:
Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.
Resumo:
This paper presents a new technique and two algorithms to bulk-load data into multi-way dynamic metric access methods, based on the covering radius of representative elements employed to organize data in hierarchical data structures. The proposed algorithms are sample-based, and they always build a valid and height-balanced tree. We compare the proposed algorithm with existing ones, showing the behavior to bulk-load data into the Slim-tree metric access method. After having identified the worst case of our first algorithm, we describe adequate counteractions in an elegant way creating the second algorithm. Experiments performed to evaluate their performance show that our bulk-loading methods build trees faster than the sequential insertion method regarding construction time, and that it also significantly improves search performance. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Nach dem ersten, 2011 erschienenen Band versammelt der vorliegende zweite wiederum eine Auswahl von Beiträgen, die auf Analysen der Daten der Schweizer Längsschnittstudie TREE (Transitionen von der Erstausbildung ins Erwerbsleben) basieren. Die kritischen Übergänge im Jugend- und jungen Erwachsenenalter werden aus soziologischen, ökonomischen, psychologischen und erziehungswissenschaftlichen Blickwinkeln beleuchtet. Die Beiträge widerspiegeln damit eindrücklich das analytische und interdisziplinäre Potenzial der TREE-Daten. Thematisch steht der langfristige Einfluss der sozialen Herkunft auf Bildungs- und Erwerbsverläufe, insbesondere auf den Zugang zu höherer Bildung, im Zentrum.
Resumo:
Nowadays, Internet is a place where social networks have reached an important impact in collaboration among people over the world in different ways. This article proposes a new paradigm for building CSCW business tools following the novel ideas provided by the social web to collaborate and generate awareness. An implementation of these concepts is described, including the components we provide to collaborate in workspaces, (such as videoconference, chat, desktop sharing, forums or temporal events), and the way we generate awareness from these complex social data structures. Figures and validation results are also presented to stress that this architecture has been defined to support awareness generation via joining current and future social data from business and social networks worlds, based on the idea of using social data stored in the cloud.
Resumo:
Managing large medical image collections is an increasingly demanding important issue in many hospitals and other medical settings. A huge amount of this information is daily generated, which requires robust and agile systems. In this paper we present a distributed multi-agent system capable of managing very large medical image datasets. In this approach, agents extract low-level information from images and store them in a data structure implemented in a relational database. The data structure can also store semantic information related to images and particular regions. A distinctive aspect of our work is that a single image can be divided so that the resultant sub-images can be stored and managed separately by different agents to improve performance in data accessing and processing. The system also offers the possibility of applying some region-based operations and filters on images, facilitating image classification. These operations can be performed directly on data structures in the database.
Resumo:
Precise modeling of the program heap is fundamental for understanding the behavior of a program, and is thus of signiflcant interest for many optimization applications. One of the fundamental properties of the heap that can be used in a range of optimization techniques is the sharing relationships between the elements in an array or collection. If an analysis can determine that the memory locations pointed to by different entries of an array (or collection) are disjoint, then in many cases loops that traverse the array can be vectorized or transformed into a thread-parallel versión. This paper introduces several novel sharing properties over the concrete heap and corresponding abstractions to represent them. In conjunction with an existing shape analysis technique, these abstractions allow us to precisely resolve the sharing relations in a wide range of heap structures (arrays, collections, recursive data structures, composite heap structures) in a computationally efflcient manner. The effectiveness of the approach is evaluated on a set of challenge problems from the JOlden and SPECjvm98 suites. Sharing information obtained from the analysis is used to achieve substantial thread-level parallel speedups.
Resumo:
In this paper we describe Fénix, a data model for exchanging information between Natural Language Processing applications. The format proposed is intended to be flexible enough to cover both current and future data structures employed in the field of Computational Linguistics. The Fénix architecture is divided into four separate layers: conceptual, logical, persistence and physical. This division provides a simple interface to abstract the users from low-level implementation details, such as programming languages and data storage employed, allowing them to focus in the concepts and processes to be modelled. The Fénix architecture is accompanied by a set of programming libraries to facilitate the access and manipulation of the structures created in this framework. We will also show how this architecture has been already successfully applied in different research projects.
Resumo:
Thesis--University of Illinois at Urbana-Champaign.
Resumo:
"UIUCDCS-R-74-615"
Resumo:
A progressive spatial query retrieves spatial data based on previous queries (e.g., to fetch data in a more restricted area with higher resolution). A direct query, on the other side, is defined as an isolated window query. A multi-resolution spatial database system should support both progressive queries and traditional direct queries. It is conceptually challenging to support both types of query at the same time, as direct queries favour location-based data clustering, whereas progressive queries require fragmented data clustered by resolutions. Two new scaleless data structures are proposed in this paper. Experimental results using both synthetic and real world datasets demonstrate that the query processing time based on the new multiresolution approaches is comparable and often better than multi-representation data structures for both types of queries.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.
Resumo:
Data refinements are refinement steps in which a program’s local data structures are changed. Data refinement proof obligations require the software designer to find an abstraction relation that relates the states of the original and new program. In this paper we describe an algorithm that helps a designer find an abstraction relation for a proposed refinement. Given sufficient time and space, the algorithm can find a minimal abstraction relation, and thus show that the refinement holds. As it executes, the algorithm displays mappings that cannot be in any abstraction relation. When the algorithm is not given sufficient resources to terminate, these mappings can help the designer find a suitable abstraction relation. The same algorithm can be used to test an abstraction relation supplied by the designer.
Resumo:
The increasing cost of developing complex software systems has created a need for tools which aid software construction. One area in which significant progress has been made is with the so-called Compiler Writing Tools (CWTs); these aim at automated generation of various components of a compiler and hence at expediting the construction of complete programming language translators. A number of CWTs are already in quite general use, but investigation reveals significant drawbacks with current CWTs, such as lex and yacc. The effective use of a CWT typically requires a detailed technical understanding of its operation and involves tedious and error-prone input preparation. Moreover, CWTs such as lex and yacc address only a limited aspect of the compilation process; for example, actions necessary to perform lexical symbol valuation and abstract syntax tree construction must be explicitly coded by the user. This thesis presents a new CWT called CORGI (COmpiler-compiler from Reference Grammar Input) which deals with the entire `front-end' component of a compiler; this includes the provision of necessary data structures and routines to manipulate them, both generated from a single input specification. Compared with earlier CWTs, CORGI has a higher-level and hence more convenient user interface, operating on a specification derived directly from a `reference manual' grammar for the source language. Rather than developing a compiler-compiler from first principles, CORGI has been implemented by building a further shell around two existing compiler construction tools, namely lex and yacc. CORGI has been demonstrated to perform efficiently in realistic tests, both in terms of speed and the effectiveness of its user interface and error-recovery mechanisms.