3 resultados para Cross-lingual document retrieval
em Digital Commons at Florida International University
Resumo:
The outcome of this research is an Intelligent Retrieval System for Conditions of Contract Documents. The objective of the research is to improve the method of retrieving data from a computer version of a construction Conditions of Contract document. SmartDoc, a prototype computer system has been developed for this purpose. The system provides recommendations to aid the user in the process of retrieving clauses from the construction Conditions of Contract document. The prototype system integrates two computer technologies: hypermedia and expert systems. Hypermedia is utilized to provide a dynamic way for retrieving data from the document. Expert systems technology is utilized to build a set of rules that activate the recommendations to aid the user during the process of retrieval of clauses. The rules are based on experts knowledge. The prototype system helps the user retrieve related clauses that are not explicitly cross-referenced but, according to expert experience, are relevant to the topic that the user is interested in.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
Over the past five years, XML has been embraced by both the research and industrial community due to its promising prospects as a new data representation and exchange format on the Internet. The widespread popularity of XML creates an increasing need to store XML data in persistent storage systems and to enable sophisticated XML queries over the data. The currently available approaches to addressing the XML storage and retrieval issue have the limitations of either being not mature enough (e.g. native approaches) or causing inflexibility, a lot of fragmentation and excessive join operations (e.g. non-native approaches such as the relational database approach). ^ In this dissertation, I studied the issue of storing and retrieving XML data using the Semantic Binary Object-Oriented Database System (Sem-ODB) to leverage the advanced Sem-ODB technology with the emerging XML data model. First, a meta-schema based approach was implemented to address the data model mismatch issue that is inherent in the non-native approaches. The meta-schema based approach captures the meta-data of both Document Type Definitions (DTDs) and Sem-ODB Semantic Schemas, thus enables a dynamic and flexible mapping scheme. Second, a formal framework was presented to ensure precise and concise mappings. In this framework, both schemas and the conversions between them are formally defined and described. Third, after major features of an XML query language, XQuery, were analyzed, a high-level XQuery to Semantic SQL (Sem-SQL) query translation scheme was described. This translation scheme takes advantage of the navigation-oriented query paradigm of the Sem-SQL, thus avoids the excessive join problem of relational approaches. Finally, the modeling capability of the Semantic Binary Object-Oriented Data Model (Sem-ODM) was explored from the perspective of conceptually modeling an XML Schema using a Semantic Schema. ^ It was revealed that the advanced features of the Sem-ODB, such as multi-valued attributes, surrogates, the navigation-oriented query paradigm, among others, are indeed beneficial in coping with the XML storage and retrieval issue using a non-XML approach. Furthermore, extensions to the Sem-ODB to make it work more effectively with XML data were also proposed. ^