20 resultados para Database systems
em Digital Commons at Florida International University
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
The main challenges of multimedia data retrieval lie in the effective mapping between low-level features and high-level concepts, and in the individual users' subjective perceptions of multimedia content. ^ The objectives of this dissertation are to develop an integrated multimedia indexing and retrieval framework with the aim to bridge the gap between semantic concepts and low-level features. To achieve this goal, a set of core techniques have been developed, including image segmentation, content-based image retrieval, object tracking, video indexing, and video event detection. These core techniques are integrated in a systematic way to enable the semantic search for images/videos, and can be tailored to solve the problems in other multimedia related domains. In image retrieval, two new methods of bridging the semantic gap are proposed: (1) for general content-based image retrieval, a stochastic mechanism is utilized to enable the long-term learning of high-level concepts from a set of training data, such as user access frequencies and access patterns of images. (2) In addition to whole-image retrieval, a novel multiple instance learning framework is proposed for object-based image retrieval, by which a user is allowed to more effectively search for images that contain multiple objects of interest. An enhanced image segmentation algorithm is developed to extract the object information from images. This segmentation algorithm is further used in video indexing and retrieval, by which a robust video shot/scene segmentation method is developed based on low-level visual feature comparison, object tracking, and audio analysis. Based on shot boundaries, a novel data mining framework is further proposed to detect events in soccer videos, while fully utilizing the multi-modality features and object information obtained through video shot/scene detection. ^ Another contribution of this dissertation is the potential of the above techniques to be tailored and applied to other multimedia applications. This is demonstrated by their utilization in traffic video surveillance applications. The enhanced image segmentation algorithm, coupled with an adaptive background learning algorithm, improves the performance of vehicle identification. A sophisticated object tracking algorithm is proposed to track individual vehicles, while the spatial and temporal relationships of vehicle objects are modeled by an abstract semantic model. ^
Resumo:
With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.
Resumo:
Satisfiability, implication and equivalence problems are important and widely-encountered database problems that need to be efficiently and effectively solved. We provide a comprehensive and systematic study of these problems. We consider three popular types of arithmetic inequalities, (X op C), (X op Y), and (X op Y + C), where X and Y are attributes, C is a constant of the domain of X, and op $\in\ \{{<},\ {\le},\ {=},\ {\not=},\ {>},\ {\ge}\}.$ These inequalities are most frequently used in a database system, since the first type of inequalities represents $\theta$-join, the second type represents selection, and the third type is popular in deductive databases. We study the problems under the integer domain and the real domain, as well as under two different operator sets.^ Our results show that solutions under different domains and/or different operator sets are quite different. In this dissertation, we either report the first necessary and sufficient conditions as well as their efficient algorithms with complexity analysis, or provide improved algorithms. ^
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
Query processing is a commonly performed procedure and a vital and integral part of information processing. It is therefore important and necessary for information processing applications to continuously improve the accessibility of data sources as well as the ability to perform queries on those data sources. ^ It is well known that the relational database model and the Structured Query Language (SQL) are currently the most popular tools to implement and query databases. However, a certain level of expertise is needed to use SQL and to access relational databases. This study presents a semantic modeling approach that enables the average user to access and query existing relational databases without the concern of the database's structure or technicalities. This method includes an algorithm to represent relational database schemas in a more semantically rich way. The result of which is a semantic view of the relational database. The user performs queries using an adapted version of SQL, namely Semantic SQL. This method substantially reduces the size and complexity of queries. Additionally, it shortens the database application development cycle and improves maintenance and reliability by reducing the size of application programs. Furthermore, a Semantic Wrapper tool illustrating the semantic wrapping method is presented. ^ I further extend the use of this semantic wrapping method to heterogeneous database management. Relational, object-oriented databases and the Internet data sources are considered to be part of the heterogeneous database environment. Semantic schemas resulting from the algorithm presented in the method were employed to describe the structure of these data sources in a uniform way. Semantic SQL was utilized to query various data sources. As a result, this method provides users with the ability to access and perform queries on heterogeneous database systems in a more innate way. ^
Resumo:
In his discussion - Database As A Tool For Hospitality Management - William O'Brien, Assistant Professor, School of Hospitality Management at Florida International University, O’Brien offers at the outset, “Database systems offer sweeping possibilities for better management of information in the hospitality industry. The author discusses what such systems are capable of accomplishing.” The author opens with a bit of background on database system development, which also lends an impression as to the complexion of the rest of the article; uh, it’s a shade technical. “In early 1981, Ashton-Tate introduced dBase 11. It was the first microcomputer database management processor to offer relational capabilities and a user-friendly query system combined with a fast, convenient report writer,” O’Brien informs. “When 16-bit microcomputers such as the IBM PC series were introduced late the following year, more powerful database products followed: dBase 111, Friday!, and Framework. The effect on the entire business community, and the hospitality industry in particular, has been remarkable”, he further offers with his informed outlook. Professor O’Brien offers a few anecdotal situations to illustrate how much a comprehensive data-base system means to a hospitality operation, especially when billing is involved. Although attitudes about computer systems, as well as the systems themselves have changed since this article was written, there is pertinent, fundamental information to be gleaned. In regards to the digression of the personal touch when a customer is engaged with a computer system, O’Brien says, “A modern data processing system should not force an employee to treat valued customers as numbers…” He also cautions, “Any computer system that decreases the availability of the personal touch is simply unacceptable.” In a system’s ability to process information, O’Brien suggests that in the past businesses were so enamored with just having an automated system that they failed to take full advantage of its capabilities. O’Brien says that a lot of savings, in time and money, went un-noticed and/or under-appreciated. Today, everyone has an integrated system, and the wise business manager is the business manager who takes full advantage of all his resources. O’Brien invokes the 80/20 rule, and offers, “…the last 20 percent of results costs 80 percent of the effort. But times have changed. Everyone is automating data management, so that last 20 percent that could be ignored a short time ago represents a significant competitive differential.” The evolution of data systems takes center stage for much of the article; pitfalls also emerge.
Resumo:
Moving objects database systems are the most challenging sub-category among Spatio-Temporal database systems. A database system that updates in real-time the location information of GPS-equipped moving vehicles has to meet even stricter requirements. Currently existing data storage models and indexing mechanisms work well only when the number of moving objects in the system is relatively small. This dissertation research aimed at the real-time tracking and history retrieval of massive numbers of vehicles moving on road networks. A total solution has been provided for the real-time update of the vehicles' location and motion information, range queries on current and history data, and prediction of vehicles' movement in the near future. ^ To achieve these goals, a new approach called Segmented Time Associated to Partitioned Space (STAPS) was first proposed in this dissertation for building and manipulating the indexing structures for moving objects databases. ^ Applying the STAPS approach, an indexing structure of associating a time interval tree to each road segment was developed for real-time database systems of vehicles moving on road networks. The indexing structure uses affordable storage to support real-time data updates and efficient query processing. The data update and query processing performance it provides is consistent without restrictions such as a time window or assuming linear moving trajectories. ^ An application system design based on distributed system architecture with centralized organization was developed to maximally support the proposed data and indexing structures. The suggested system architecture is highly scalable and flexible. Finally, based on a real-world application model of vehicles moving in region-wide, main issues on the implementation of such a system were addressed. ^
Resumo:
Current technology permits connecting local networks via high-bandwidth telephone lines. Central coordinator nodes may use Intelligent Networks to manage data flow over dialed data lines, e.g. ISDN, and to establish connections between LANs. This dissertation focuses on cost minimization and on establishing operational policies for query distribution over heterogeneous, geographically distributed databases. Based on our study of query distribution strategies, public network tariff policies, and database interface standards we propose methods for communication cost estimation, strategies for the reduction of bandwidth allocation, and guidelines for central to node communication protocols. Our conclusion is that dialed data lines offer a cost effective alternative for the implementation of distributed database query systems, and that existing commercial software may be adapted to support query processing in heterogeneous distributed database systems. ^
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
Moving objects database systems are the most challenging sub-category among Spatio-Temporal database systems. A database system that updates in real-time the location information of GPS-equipped moving vehicles has to meet even stricter requirements. Currently existing data storage models and indexing mechanisms work well only when the number of moving objects in the system is relatively small. This dissertation research aimed at the real-time tracking and history retrieval of massive numbers of vehicles moving on road networks. A total solution has been provided for the real-time update of the vehicles’ location and motion information, range queries on current and history data, and prediction of vehicles’ movement in the near future. To achieve these goals, a new approach called Segmented Time Associated to Partitioned Space (STAPS) was first proposed in this dissertation for building and manipulating the indexing structures for moving objects databases. Applying the STAPS approach, an indexing structure of associating a time interval tree to each road segment was developed for real-time database systems of vehicles moving on road networks. The indexing structure uses affordable storage to support real-time data updates and efficient query processing. The data update and query processing performance it provides is consistent without restrictions such as a time window or assuming linear moving trajectories. An application system design based on distributed system architecture with centralized organization was developed to maximally support the proposed data and indexing structures. The suggested system architecture is highly scalable and flexible. Finally, based on a real-world application model of vehicles moving in region-wide, main issues on the implementation of such a system were addressed.
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
The Semantic Binary Data Model (SBM) is a viable alternative to the now-dominant relational data model. SBM would be especially advantageous for applications dealing with complex interrelated networks of objects provided that a robust efficient implementation can be achieved. This dissertation presents an implementation design method for SBM, algorithms, and their analytical and empirical evaluation. Our method allows building a robust and flexible database engine with a wider applicability range and improved performance. ^ Extensions to SBM are introduced and an implementation of these extensions is proposed that allows the database engine to efficiently support applications with a predefined set of queries. A New Record data structure is proposed. Trade-offs of employing Fact, Record and Bitmap Data structures for storing information in a semantic database are analyzed. ^ A clustering ID distribution algorithm and an efficient algorithm for object ID encoding are proposed. Mapping to an XML data model is analyzed and a new XML-based XSDL language facilitating interoperability of the system is defined. Solutions to issues associated with making the database engine multi-platform are presented. An improvement to the atomic update algorithm suitable for certain scenarios of database recovery is proposed. ^ Specific guidelines are devised for implementing a robust and well-performing database engine based on the extended Semantic Data Model. ^
Resumo:
An electronic database support system for strategic planning activities can be built by providing conceptual and system specific information. The design and development of this type of system center around the information needs of strategy planners. Data that supply information on the organization's internal and external environments must be originated, evaluated, collected, organized, managed, and analyzed. Strategy planners may use the resulting information to improve their decision making.
Resumo:
Database design is a difficult problem for non-expert designers. It is desirable to assist such designers during the problem solving process by means of a knowledge based (KB) system. A number of prototype KB systems have been proposed, however there are many shortcomings. Few have incorporated sufficient expertise in modeling relationships, particularly higher order relationships. There has been no empirical study that experimentally tested the effectiveness of any of these KB tools. Problem solving behavior of non-experts, whom the systems were intended to assist, has not been one of the bases for system design. In this project a consulting system for conceptual database design that addresses the above short comings was developed and empirically validated.^ The system incorporates (a) findings on why non-experts commit errors and (b) heuristics for modeling relationships. Two approaches to knowledge base implementation--system restrictiveness and decisional guidance--were used and compared in this project. The Restrictive approach is proscriptive and limits the designer's choices at various design phases by forcing him/her to follow a specific design path. The Guidance system approach which is less restrictive, provides context specific, informative and suggestive guidance throughout the design process. The main objectives of the study are to evaluate (1) whether the knowledge-based system is more effective than a system without the knowledge-base and (2) which knowledge implementation--restrictive or guidance--strategy is more effective. To evaluate the effectiveness of the knowledge base itself, the two systems were compared with a system that does not incorporate the expertise (Control).^ The experimental procedure involved the student subjects solving a task without using the system (pre-treatment task) and another task using one of the three systems (experimental task). The experimental task scores of those subjects who performed satisfactorily in the pre-treatment task were analyzed. Results are (1) The knowledge based approach to database design support lead to more accurate solutions than the control system; (2) No significant difference between the two KB approaches; (3) Guidance approach led to best performance; and (4) The subjects perceived the Restrictive system easier to use than the Guidance system. ^