912 resultados para Database Query
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
Query processing is a commonly performed procedure and a vital and integral part of information processing. It is therefore important and necessary for information processing applications to continuously improve the accessibility of data sources as well as the ability to perform queries on those data sources. ^ It is well known that the relational database model and the Structured Query Language (SQL) are currently the most popular tools to implement and query databases. However, a certain level of expertise is needed to use SQL and to access relational databases. This study presents a semantic modeling approach that enables the average user to access and query existing relational databases without the concern of the database's structure or technicalities. This method includes an algorithm to represent relational database schemas in a more semantically rich way. The result of which is a semantic view of the relational database. The user performs queries using an adapted version of SQL, namely Semantic SQL. This method substantially reduces the size and complexity of queries. Additionally, it shortens the database application development cycle and improves maintenance and reliability by reducing the size of application programs. Furthermore, a Semantic Wrapper tool illustrating the semantic wrapping method is presented. ^ I further extend the use of this semantic wrapping method to heterogeneous database management. Relational, object-oriented databases and the Internet data sources are considered to be part of the heterogeneous database environment. Semantic schemas resulting from the algorithm presented in the method were employed to describe the structure of these data sources in a uniform way. Semantic SQL was utilized to query various data sources. As a result, this method provides users with the ability to access and perform queries on heterogeneous database systems in a more innate way. ^
Resumo:
With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.
Resumo:
In his discussion - Database As A Tool For Hospitality Management - William O'Brien, Assistant Professor, School of Hospitality Management at Florida International University, O’Brien offers at the outset, “Database systems offer sweeping possibilities for better management of information in the hospitality industry. The author discusses what such systems are capable of accomplishing.” The author opens with a bit of background on database system development, which also lends an impression as to the complexion of the rest of the article; uh, it’s a shade technical. “In early 1981, Ashton-Tate introduced dBase 11. It was the first microcomputer database management processor to offer relational capabilities and a user-friendly query system combined with a fast, convenient report writer,” O’Brien informs. “When 16-bit microcomputers such as the IBM PC series were introduced late the following year, more powerful database products followed: dBase 111, Friday!, and Framework. The effect on the entire business community, and the hospitality industry in particular, has been remarkable”, he further offers with his informed outlook. Professor O’Brien offers a few anecdotal situations to illustrate how much a comprehensive data-base system means to a hospitality operation, especially when billing is involved. Although attitudes about computer systems, as well as the systems themselves have changed since this article was written, there is pertinent, fundamental information to be gleaned. In regards to the digression of the personal touch when a customer is engaged with a computer system, O’Brien says, “A modern data processing system should not force an employee to treat valued customers as numbers…” He also cautions, “Any computer system that decreases the availability of the personal touch is simply unacceptable.” In a system’s ability to process information, O’Brien suggests that in the past businesses were so enamored with just having an automated system that they failed to take full advantage of its capabilities. O’Brien says that a lot of savings, in time and money, went un-noticed and/or under-appreciated. Today, everyone has an integrated system, and the wise business manager is the business manager who takes full advantage of all his resources. O’Brien invokes the 80/20 rule, and offers, “…the last 20 percent of results costs 80 percent of the effort. But times have changed. Everyone is automating data management, so that last 20 percent that could be ignored a short time ago represents a significant competitive differential.” The evolution of data systems takes center stage for much of the article; pitfalls also emerge.
Resumo:
In their discussion - Database System for Alumni Tracking - by Steven Moll, Associate Professor and William O'Brien, Assistant Professor, School of Hospitality Management at Florida International University, Professors Moll and O’Brien initially state: “The authors describe a unique database program which was created to solve problems associated with tracking hospitality majors subsequent to graduation.” “…and please, whatever you do, keep in touch with your school; join an alum’ organization. It is a great way to engage the resources of your school to help further your career,” says Professor Claudia Castillo in addressing a group of students attending her Life after College seminar on 9/18/2009. This is a very good point and it is obviously germane to the article at hand. “One of the greatest strengths of a hospitality management school, a strength that grows with each passing year, is its body of alumni,” say the authors. “Whether in recruiting new students or placing graduates, whether in fund raising or finding scholarship recipients, whatever the task, the network of loyal alumni stands ready to help.” The caveat is the resources are only available if students and school, faculty and alumni can keep track of each other, say professors Moll and O’Brien. The authors want you to know that the practice is now considered essential to success, especially in the hospitality industry whereby the fluid nature of the industry makes networking de rigueur to accomplishment. “When the world was a smaller, slower place, it was fairly easy for graduates to keep track of each other; there weren't that many graduates and they didn't move that often,” say the authors. “Now the hospitality graduate enters an international job market and may move five times in the first four years of employment,” they expand that thought. In the contemporary atmosphere linking human resources from institution to marketplace is relatively easy to do. “How can an association keep track of its graduates? There are many techniques, but all of them depend upon adequate recordkeeping,” Moll and O’Brien answer their own query. “A few years ago that would have meant a group of secretaries; today it means a database system,” they say. Moll and O’Brien discuss the essentials of compiling/programming such a comprehensive data base; the body of information to include, guidelines on the problems encountered, and how to avoid the pitfalls. They use the Florida International University, Hospitality database as a template for their example.
Resumo:
Over the past five years, XML has been embraced by both the research and industrial community due to its promising prospects as a new data representation and exchange format on the Internet. The widespread popularity of XML creates an increasing need to store XML data in persistent storage systems and to enable sophisticated XML queries over the data. The currently available approaches to addressing the XML storage and retrieval issue have the limitations of either being not mature enough (e.g. native approaches) or causing inflexibility, a lot of fragmentation and excessive join operations (e.g. non-native approaches such as the relational database approach). ^ In this dissertation, I studied the issue of storing and retrieving XML data using the Semantic Binary Object-Oriented Database System (Sem-ODB) to leverage the advanced Sem-ODB technology with the emerging XML data model. First, a meta-schema based approach was implemented to address the data model mismatch issue that is inherent in the non-native approaches. The meta-schema based approach captures the meta-data of both Document Type Definitions (DTDs) and Sem-ODB Semantic Schemas, thus enables a dynamic and flexible mapping scheme. Second, a formal framework was presented to ensure precise and concise mappings. In this framework, both schemas and the conversions between them are formally defined and described. Third, after major features of an XML query language, XQuery, were analyzed, a high-level XQuery to Semantic SQL (Sem-SQL) query translation scheme was described. This translation scheme takes advantage of the navigation-oriented query paradigm of the Sem-SQL, thus avoids the excessive join problem of relational approaches. Finally, the modeling capability of the Semantic Binary Object-Oriented Data Model (Sem-ODM) was explored from the perspective of conceptually modeling an XML Schema using a Semantic Schema. ^ It was revealed that the advanced features of the Sem-ODB, such as multi-valued attributes, surrogates, the navigation-oriented query paradigm, among others, are indeed beneficial in coping with the XML storage and retrieval issue using a non-XML approach. Furthermore, extensions to the Sem-ODB to make it work more effectively with XML data were also proposed. ^
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
The paper addresses issues related to the design of a graphical query mechanism that can act as an interface to any object-oriented database system (OODBS), in general, and the object model of ODMG 2.0, in particular. In the paper a brief literature survey of related work is given, and an analysis methodology that allows the evaluation of such languages is proposed. Moreover, the user's view level of a new graphical query language, namely GOQL (Graphical Object Query Language), for ODMG 2.0 is presented. The user's view level provides a graphical schema that does not contain any of the perplexing details of an object-oriented database schema, and it also provides a foundation for a graphical interface that can support ad-hoc queries for object-oriented database applications. We illustrate, using an example, the user's view level of GOQL
Resumo:
Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.
Resumo:
Gli obiettivi di questi tesi sono lo studio comparativo di alcuni DBMS non relazionali e il confronto di diverse soluzioni di modellazione logica e fisica per database non relazionali. Utilizzando come sistemi di gestione due DBMS Document-based non relazionali, MongoDB e CouchDB, ed un DBMS relazionale, Oracle, sarà effettuata un’analisi di diverse soluzione di modellazione logica dei dati in database documentali e uno studio mirato alla scelta degli attributi sui quali costruire indici. In primo luogo verrà definito un semplice caso di studio su cui effettuare i confronto, basato su due entità in relazione 1:N, sulle quali sarà costruito un opportuno carico di lavoro. Idatabase non relazionali sono schema-less, senza schema fisso, ed esiste una libertà maggiore di modellazione. In questo lavoro di tesi i dati verranno modellati secondo le tecniche del Referencing ed Embedding che consistono rispettivamente nell’inserimento di una chiave (riferimento) oppure di un intero sotto-documento (embedding) all’interno di un documento per poter esprimere il concetto di relazione tra diverse entità. Per studiare l’opportunità di indicizzare un attributo, ciascuna entità sarà poi composta da due triplette uguali di attributi definiti con differenti livelli di selettività, con la differenza che su ciascun attributo della seconda sarà costruito un indice. Il carico di lavoro sarà costituito da query definite in modo da poter testare le diverse modellazioni includendo anche predicati di join che non sono solitamente contemplati in modelli documentali. Per ogni tipo di database verranno eseguite le query e registrati i tempi, in modo da poter confrontare le performance dei diversi DBMS sulla base delle operazioni CRUD.
Resumo:
Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.
Resumo:
Despite a strong increase in research on seamounts and oceanic islands ecology and biogeography, many basic aspects of their biodiversity are still unknown. In the southwestern Atlantic, the Vitória-Trindade Seamount Chain (VTC) extends ca. 1,200 km offshore the Brazilian continental shelf, from the Vitória seamount to the oceanic islands of Trindade and Martin Vaz. For a long time, most of the biological information available regarded its islands. Our study presents and analyzes an extensive database on the VTC fish biodiversity, built on data compiled from literature and recent scientific expeditions that assessed both shallow to mesophotic environments. A total of 273 species were recorded, 211 of which occur on seamounts and 173 at the islands. New records for seamounts or islands include 191 reef fish species and 64 depth range extensions. The structure of fish assemblages was similar between islands and seamounts, not differing in species geographic distribution, trophic composition, or spawning strategies. Main differences were related to endemism, higher at the islands, and to the number of endangered species, higher at the seamounts. Since unregulated fishing activities are common in the region, and mining activities are expected to drastically increase in the near future (carbonates on seamount summits and metals on slopes), this unique biodiversity needs urgent attention and management.
Resumo:
Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.
Resumo:
Considering the difficulties in finding good-quality images for the development and test of computer-aided diagnosis (CAD), this paper presents a public online mammographic images database free for all interested viewers and aimed to help develop and evaluate CAD schemes. The digitalization of the mammographic images is made with suitable contrast and spatial resolution for processing purposes. The broad recuperation system allows the user to search for different images, exams, or patient characteristics. Comparison with other databases currently available has shown that the presented database has a sufficient number of images, is of high quality, and is the only one to include a functional search system.