2 resultados para Natural Language Queries, NLPX, Bricks, XML-IR, Users
em Repositório Institucional da Universidade de Aveiro - Portugal
Resumo:
For the actual existence of e-government it is necessary and crucial to provide public information and documentation, making its access simple to citizens. A portion, not necessarily small, of these documents is in an unstructured form and in natural language, and consequently outside of which the current search systems are generally able to cope and effectively handle. Thus, in thesis, it is possible to improve access to these contents using systems that process natural language and create structured information, particularly if supported in semantics. In order to put this thesis to test, this work was developed in three major phases: (1) design of a conceptual model integrating the creation of structured information and making it available to various actors, in line with the vision of e-government 2.0; (2) definition and development of a prototype instantiating the key modules of this conceptual model, including ontology based information extraction supported by examples of relevant information, knowledge management and access based on natural language; (3) assessment of the usability and acceptability of querying information as made possible by the prototype - and in consequence of the conceptual model - by users in a realistic scenario, that included comparison with existing forms of access. In addition to this evaluation, at another level more related to technology assessment and not to the model, evaluations were made on the performance of the subsystem responsible for information extraction. The evaluation results show that the proposed model was perceived as more effective and useful than the alternatives. Associated with the performance of the prototype to extract information from documents, comparable to the state of the art, results demonstrate the feasibility and advantages, with current technology, of using natural language processing and integration of semantic information to improve access to unstructured contents in natural language. The conceptual model and the prototype demonstrator intend to contribute to the future existence of more sophisticated search systems that are also more suitable for e-government. To have transparency in governance, active citizenship, greater agility in the interaction with the public administration, among others, it is necessary that citizens and businesses have quick and easy access to official information, even if it was originally created in natural language.
Resumo:
The electronic storage of medical patient data is becoming a daily experience in most of the practices and hospitals worldwide. However, much of the data available is in free-form text, a convenient way of expressing concepts and events, but especially challenging if one wants to perform automatic searches, summarization or statistical analysis. Information Extraction can relieve some of these problems by offering a semantically informed interpretation and abstraction of the texts. MedInX, the Medical Information eXtraction system presented in this document, is the first information extraction system developed to process textual clinical discharge records written in Portuguese. The main goal of the system is to improve access to the information locked up in unstructured text, and, consequently, the efficiency of the health care process, by allowing faster and reliable access to quality information on health, for both patient and health professionals. MedInX components are based on Natural Language Processing principles, and provide several mechanisms to read, process and utilize external resources, such as terminologies and ontologies, in the process of automatic mapping of free text reports onto a structured representation. However, the flexible and scalable architecture of the system, also allowed its application to the task of Named Entity Recognition on a shared evaluation contest focused on Portuguese general domain free-form texts. The evaluation of the system on a set of authentic hospital discharge letters indicates that the system performs with 95% F-measure, on the task of entity recognition, and 95% precision on the task of relation extraction. Example applications, demonstrating the use of MedInX capabilities in real applications in the hospital setting, are also presented in this document. These applications were designed to answer common clinical problems related with the automatic coding of diagnoses and other health-related conditions described in the documents, according to the international classification systems ICD-9-CM and ICF. The automatic review of the content and completeness of the documents is an example of another developed application, denominated MedInX Clinical Audit system.