8 resultados para H.3.1 [Information Storage and Retrieval]
em Digital Commons at Florida International University
Resumo:
Over the past five years, XML has been embraced by both the research and industrial community due to its promising prospects as a new data representation and exchange format on the Internet. The widespread popularity of XML creates an increasing need to store XML data in persistent storage systems and to enable sophisticated XML queries over the data. The currently available approaches to addressing the XML storage and retrieval issue have the limitations of either being not mature enough (e.g. native approaches) or causing inflexibility, a lot of fragmentation and excessive join operations (e.g. non-native approaches such as the relational database approach). ^ In this dissertation, I studied the issue of storing and retrieving XML data using the Semantic Binary Object-Oriented Database System (Sem-ODB) to leverage the advanced Sem-ODB technology with the emerging XML data model. First, a meta-schema based approach was implemented to address the data model mismatch issue that is inherent in the non-native approaches. The meta-schema based approach captures the meta-data of both Document Type Definitions (DTDs) and Sem-ODB Semantic Schemas, thus enables a dynamic and flexible mapping scheme. Second, a formal framework was presented to ensure precise and concise mappings. In this framework, both schemas and the conversions between them are formally defined and described. Third, after major features of an XML query language, XQuery, were analyzed, a high-level XQuery to Semantic SQL (Sem-SQL) query translation scheme was described. This translation scheme takes advantage of the navigation-oriented query paradigm of the Sem-SQL, thus avoids the excessive join problem of relational approaches. Finally, the modeling capability of the Semantic Binary Object-Oriented Data Model (Sem-ODM) was explored from the perspective of conceptually modeling an XML Schema using a Semantic Schema. ^ It was revealed that the advanced features of the Sem-ODB, such as multi-valued attributes, surrogates, the navigation-oriented query paradigm, among others, are indeed beneficial in coping with the XML storage and retrieval issue using a non-XML approach. Furthermore, extensions to the Sem-ODB to make it work more effectively with XML data were also proposed. ^
Resumo:
With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.
Resumo:
The outcome of this research is an Intelligent Retrieval System for Conditions of Contract Documents. The objective of the research is to improve the method of retrieving data from a computer version of a construction Conditions of Contract document. SmartDoc, a prototype computer system has been developed for this purpose. The system provides recommendations to aid the user in the process of retrieving clauses from the construction Conditions of Contract document. The prototype system integrates two computer technologies: hypermedia and expert systems. Hypermedia is utilized to provide a dynamic way for retrieving data from the document. Expert systems technology is utilized to build a set of rules that activate the recommendations to aid the user during the process of retrieval of clauses. The rules are based on experts knowledge. The prototype system helps the user retrieve related clauses that are not explicitly cross-referenced but, according to expert experience, are relevant to the topic that the user is interested in.
Resumo:
The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents.
Resumo:
The main challenges of multimedia data retrieval lie in the effective mapping between low-level features and high-level concepts, and in the individual users' subjective perceptions of multimedia content. ^ The objectives of this dissertation are to develop an integrated multimedia indexing and retrieval framework with the aim to bridge the gap between semantic concepts and low-level features. To achieve this goal, a set of core techniques have been developed, including image segmentation, content-based image retrieval, object tracking, video indexing, and video event detection. These core techniques are integrated in a systematic way to enable the semantic search for images/videos, and can be tailored to solve the problems in other multimedia related domains. In image retrieval, two new methods of bridging the semantic gap are proposed: (1) for general content-based image retrieval, a stochastic mechanism is utilized to enable the long-term learning of high-level concepts from a set of training data, such as user access frequencies and access patterns of images. (2) In addition to whole-image retrieval, a novel multiple instance learning framework is proposed for object-based image retrieval, by which a user is allowed to more effectively search for images that contain multiple objects of interest. An enhanced image segmentation algorithm is developed to extract the object information from images. This segmentation algorithm is further used in video indexing and retrieval, by which a robust video shot/scene segmentation method is developed based on low-level visual feature comparison, object tracking, and audio analysis. Based on shot boundaries, a novel data mining framework is further proposed to detect events in soccer videos, while fully utilizing the multi-modality features and object information obtained through video shot/scene detection. ^ Another contribution of this dissertation is the potential of the above techniques to be tailored and applied to other multimedia applications. This is demonstrated by their utilization in traffic video surveillance applications. The enhanced image segmentation algorithm, coupled with an adaptive background learning algorithm, improves the performance of vehicle identification. A sophisticated object tracking algorithm is proposed to track individual vehicles, while the spatial and temporal relationships of vehicle objects are modeled by an abstract semantic model. ^
Resumo:
The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. ^ Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. ^ This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model’s parsing mechanism. ^ The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents. ^
Resumo:
A semi-arid mangrove estuary system in the northeast Brazilian coast (Ceará state) was selected for this study to (i) evaluate the impact of shrimp farm nutrient-rich wastewater effluents on the soil geochemistry and organic carbon (OC) storage and (ii) estimate the total amount of OC stored in mangrove soils (0–40 cm). Wastewater-affected mangrove forests were referred to as WAM and undisturbed areas as Non-WAM. Redox conditions and OC content were statistically correlated (P < 0.05) with seasonality and type of land use (WAM vs. Non-WAM). Eh values were from anoxic to oxic conditions in the wet season (from − 5 to 68 mV in WAM and from < 40 to > 400 mV in Non-WAM soils) and significantly higher (from 66 to 411 mV) in the dry season (P < 0.01). OC contents (0–40 cm soil depth) were significantly higher (P < 0.01) in the wet season than the dry season, and higher in Non-WAM soils than in WAM soils (values of 8.1 and 6.7 kg m− 2 in the wet and dry seasons, respectively, for Non-WAM, and values of 3.8 and 2.9 kg m− 2 in the wet and dry seasons, respectively, for WAM soils; P < 0.01). Iron partitioning was significantly dependent (P < 0.05) on type of land use, with a smaller degree of pyritization and lower Fe-pyrite presence in WAM soils compared to Non-WAM soils. Basal respiration of soil sediments was significantly influenced (P < 0.01) by type of land use with highest CO2 flux rates measured in the WAM soils (mean values of 0.20 mg CO2 h− 1–g− 1 C vs. 0.04 mg CO2 h− 1–g− 1 C). The OC storage reduction in WAM soils was potentially caused (i) by an increase in microbial activity induced by loading of nutrient-rich effluents and (ii) by an increase of strong electron acceptors [e.g., NO3−] that promote a decrease in pyrite concentration and hence a reduction in soil OC burial. The current estimated OC stored in mangrove soils (0–40 cm) in the state of Ceará is approximately 1 million t.
Resumo:
The work on CERP monitoring item 3.1.3.5 (Marl prairie/slough gradients) is being conducted by Florida International University (Dr Michael Ross, Project Leader), with Everglades National Park (Dr. Craig Smith) providing administrative support and technical consultation. As of January 2006 the funds transferred by ACOE to ENP, and subsequently to FIU, have been entirely expended or encumbered in salaries or wages. The project work for 2005 started rather late in the fiscal year, but ultimately accomplished the Year 1 goals of securing a permit to conduct the research in Everglades National Park, finalizing a detailed scope of work, and sampling marsh sites which are most easily accessed during the wet season. 46 plots were sampled in detail, and a preliminary vegetation classification distinguished three groups among these sites (Sawgrass marsh, sawgrass and other, and slough) which may be arranged roughly along a hydrologic gradient from least to most persistently inundated . We also made coarser observations of vegetation type at 5-m intervals along 2 transects totaling ~ 5 km. When these data were compared with similar observations made in 1998-99, it appeared that vegetation in the western portion of Northeast Shark Slough (immediately east of the L-67 extension) had shifted toward a more hydric type during the last 6 years, while vegetation further east was unchanged in this respect. Because this classification and trend analysis is based on a small fraction of the data set that will be available after the first cycle of sampling (3 years from now), the results should not be interpreted too expansively. However, they do demonstrate the potential for gaining a more comprehensive view of marsh vegetation structure and dynamics in the Everglades, and will provide a sound basis for adaptive management.