900 resultados para Open Information Extraction


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The realization of the Semantic Web is constrained by a knowledge acquisition bottleneck, i.e. the problem of how to add RDF mark-up to the millions of ordinary web pages that already exist. Information Extraction (IE) has been proposed as a solution to the annotation bottleneck. In the task based evaluation reported here, we compared the performance of users without access to annotation, users working with annotations which had been produced from manually constructed knowledge bases, and users working with annotations augmented using IE. We looked at retrieval performance, overlap between retrieved items and the two sets of annotations, and usage of annotation options. Automatically generated annotations were found to add value to the browsing experience in the scenario investigated. Copyright 2005 ACM.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The software architecture and development consideration for open metadata extraction and processing framework are outlined. Special attention is paid to the aspects of reliability and fault tolerance. Grid infrastructure is shown as useful backend for general-purpose task.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose – The purpose of this study is to examine and extend Noer’s theoretical model of the new employment relationship. Design/methodology/approach – Case study methodology is used to scrutinise the model. The results of a literature-based survey on the elements underpinning the five values in the model are analysed from dual perspectives of individual and organization using a multi-source assessment instrument. A schema is developed to guide and inform a series of focus group discussions from an analysis of the survey data. Using content analysis, the transcripts from the focus group discussions are evaluated using the model’s values and their elements. The transcripts are also reviewed for implicit themes. The case studied is Flight Centre Limited, an Australian-based international retail travel company. Findings – Using this approach, some elements of the five values in Noer’s model are identified as characteristic of the company’s psychological contract. Specifically, to some extent, the model’s values of flexible deployment, customer focus, performance focus, project-based work, and human spirit and work can be applied in this case. A further analysis of the transcripts validates three additional values in the psychological contract literature: commitment; learning and development; and open information. As a result of the findings, Noer’s model is extended to eight values. Research limitations/implications – The study offers a research-based model of the new employment relationship. Since generalisations from the case study findings cannot be applied directly to other settings, the opportunity to test this model in a variety of contexts is open to other researchers. Originality/value – In practice, the methodology used is a unique process for benchmarking the psychological contract. The process may be applied in other business settings. By doing so, organization development professionals have a consulting framework for comparing an organization’s dominant psychological contract with the extended model presented here.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Highly sensitive infrared (IR) cameras provide high-resolution diagnostic images of the temperature and vascular changes of breasts. These images can be processed to emphasize hot spots that exhibit early and subtle changes owing to pathology. The resulting images show clusters that appear random in shape and spatial distribution but carry class dependent information in shape and texture. Automated pattern recognition techniques are challenged because of changes in location, size and orientation of these clusters. Higher order spectral invariant features provide robustness to such transformations and are suited for texture and shape dependent information extraction from noisy images. In this work, the effectiveness of bispectral invariant features in diagnostic classification of breast thermal images into malignant, benign and normal classes is evaluated and a phase-only variant of these features is proposed. High resolution IR images of breasts, captured with measuring accuracy of ±0.4% (full scale) and temperature resolution of 0.1 °C black body, depicting malignant, benign and normal pathologies are used in this study. Breast images are registered using their lower boundaries, automatically extracted using landmark points whose locations are learned during training. Boundaries are extracted using Canny edge detection and elimination of inner edges. Breast images are then segmented using fuzzy c-means clustering and the hottest regions are selected for feature extraction. Bispectral invariant features are extracted from Radon projections of these images. An Adaboost classifier is used to select and fuse the best features during training and then classify unseen test images into malignant, benign and normal classes. A data set comprising 9 malignant, 12 benign and 11 normal cases is used for evaluation of performance. Malignant cases are detected with 95% accuracy. A variant of the features using the normalized bispectrum, which discards all magnitude information, is shown to perform better for classification between benign and normal cases, with 83% accuracy compared to 66% for the original.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper reports on the 2nd ShARe/CLEFeHealth evaluation lab which continues our evaluation resource building activities for the medical domain. In this lab we focus on patients' information needs as opposed to the more common campaign focus of the specialised information needs of physicians and other healthcare workers. The usage scenario of the lab is to ease patients and next-of-kins' ease in understanding eHealth information, in particular clinical reports. The 1st ShARe/CLEFeHealth evaluation lab was held in 2013. This lab consisted of three tasks. Task 1 focused on named entity recognition and normalization of disorders; Task 2 on normalization of acronyms/abbreviations; and Task 3 on information retrieval to address questions patients may have when reading clinical reports. This year's lab introduces a new challenge in Task 1 on visual-interactive search and exploration of eHealth data. Its aim is to help patients (or their next-of-kin) in readability issues related to their hospital discharge documents and related information search on the Internet. Task 2 then continues the information extraction work of the 2013 lab, specifically focusing on disorder attribute identification and normalization from clinical text. Finally, this year's Task 3 further extends the 2013 information retrieval task, by cleaning the 2013 document collection and introducing a new query generation method and multilingual queries. De-identified clinical reports used by the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Tasks 1 and 3 were from the Internet and originated from the Khresmoi project. Task 2 annotations originated from the ShARe annotations. For Tasks 1 and 3, new annotations, queries, and relevance assessments were created. 50, 79, and 91 people registered their interest in Tasks 1, 2, and 3, respectively. 24 unique teams participated with 1, 10, and 14 teams in Tasks 1, 2 and 3, respectively. The teams were from Africa, Asia, Canada, Europe, and North America. The Task 1 submission, reviewed by 5 expert peers, related to the task evaluation category of Effective use of interaction and targeted the needs of both expert and novice users. The best system had an Accuracy of 0.868 in Task 2a, an F1-score of 0.576 in Task 2b, and Precision at 10 (P@10) of 0.756 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in terms of stability and robustness, of active learning models built using conditional random fields (CRFs) for information extraction applications. Stability, defined as a small variation of performance when small variation of the training data or a small variation of the parameters occur, is a major issue for machine learning models, but even more so in the active learning framework which aims to minimise the amount of training data required. The factors we investigate are a) the choice of incremental vs. standard active learning, b) the feature set used as a representation of the text (i.e., morphological features, syntactic features, or semantic features) and c) Gaussian prior variance as one of the important CRFs parameters. Our empirical findings show that incremental learning and the Gaussian prior variance lead to more stable and robust models across iterations. Our study also demonstrates that orthographical, morphological and contextual features as a group of basic features play an important role in learning effective models across all iterations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This workshop comprised a diverse group of African construction experts, ranging far wider than RSA. Each of the attendees had attended the annual ASOCSA conference and was additionally provided with a short workshop pre-brief. The aim was to develop a view of their 15-20 year vision of construction improvement in RSA and the steps necessary to get there. These included sociological, structural, technical and process changes. Whilst some suggestions are significantly challenging, none are impossible, given sufficient collaboration between government, industry, academia and NGOs. The highest priority projects (more properly, programmes) were identified and further explored. These are: 1. Information Hub (‘Open Africa’). Aim – to utilise emerging trends in Open Data to provide a force for African unity. 2. Workforce Development. Aim – to rebuild a competent, skilled construction industry for RSA projects and for export. 3. Modular DIY Building. Aim – to accelerate the development of sustainable, cost-efficient and desirable housing for African economic immigrants and others living in makeshift and slum dwellings. Open Data is a maturing theme in different cities and governments around the world and the workshop attendees were very keen to seize such a possibility to assist in developing an environment where Africans can share information and foster collaboration. It is likely that NGOs might be keen to follow up such an initiative. There are significant developments taking place around the world in the construction sector currently, with comparatively large savings being made for taxpayers (20% plus in the UK). Not all of these changes would be easy to transplant to RSA (even more so to much of the rest of Africa). Workforce development was a keen plea amongst the attendees, who seemed concerned that expertise has leaked away and is not being replaced with sufficient intensity. It is possible today to develop modular buildings in such a way that even unskilled residents can assist in their construction, and even their appropriate design. These buildings can be sited nearly autonomously from infrastructures, thus relieving the tensions on cities and townships, whilst providing humane accommodation for the economically disadvantaged. Development of suitable solutions could either be conducted with other similarly stressed countries or developed in-country and the expertise exported. Finally, it should be pointed out that this was very much a first step. Any opportunity to collaborate from an Australian, QUT or CIB perspective would be welcomed, whilst acknowledging that the leading roles belong to RSA, CSIR, NRF, ASOCSA and the University of KwaZulu-Natal.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The most difficult operation in the flood inundation mapping using optical flood images is to separate fully inundated areas from the ‘wet’ areas where trees and houses are partly covered by water. This can be referred as a typical problem the presence of mixed pixels in the images. A number of automatic information extraction image classification algorithms have been developed over the years for flood mapping using optical remote sensing images. Most classification algorithms generally, help in selecting a pixel in a particular class label with the greatest likelihood. However, these hard classification methods often fail to generate a reliable flood inundation mapping because the presence of mixed pixels in the images. To solve the mixed pixel problem advanced image processing techniques are adopted and Linear Spectral unmixing method is one of the most popular soft classification technique used for mixed pixel analysis. The good performance of linear spectral unmixing depends on two important issues, those are, the method of selecting endmembers and the method to model the endmembers for unmixing. This paper presents an improvement in the adaptive selection of endmember subset for each pixel in spectral unmixing method for reliable flood mapping. Using a fixed set of endmembers for spectral unmixing all pixels in an entire image might cause over estimation of the endmember spectra residing in a mixed pixel and hence cause reducing the performance level of spectral unmixing. Compared to this, application of estimated adaptive subset of endmembers for each pixel can decrease the residual error in unmixing results and provide a reliable output. In this current paper, it has also been proved that this proposed method can improve the accuracy of conventional linear unmixing methods and also easy to apply. Three different linear spectral unmixing methods were applied to test the improvement in unmixing results. Experiments were conducted in three different sets of Landsat-5 TM images of three different flood events in Australia to examine the method on different flooding conditions and achieved satisfactory outcomes in flood mapping.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of this master s thesis was to clarify employees views on outsourcing. The main questions of the study were 1) How the employees dealt with the change created by outsourcing and what things they felt important when implementing the change? 2) What kind of organizational questions the employees paid attention to when moving from one organization to another? 3) What kind of management issues the employees brought up in the outsourcing process? and 4) How the employees reflected change while experienced outsourcing? The theoretical foundations of the study were Jack Mezirow s theory of transformational learning and Yrjö Engeström s theory of expansive learning. The management of outsourcing was viewed by John P. Kotter s change management model. Research casted light on transformation and learning on four levels of analysis: on individual, organizational and management levels and on the level of reflection. The target of the study were the outsourced employees, who were moved from a Finnish public corporation to a private ITC organization along with the services they produce. The study material was eleven interviews from the outsourced employees. The study was implemented by fenomenographical theme analysis. The analysis revealed results in all four levels. On the individual level the main results were the importance of systematic and open information, meaning of social and technical integration and the feeling of employee s own control. On the organizational level the move from the public sector to private and all the changes in organizational culture and in fringe benefitswere fundamental. Organizational learning was analyzed with expansive learning theory. Expanding was perceived in four dimensions: temporal dimension, spatial dimension, responsibility-moral dimension and developmental dimension. On the management level the actions of one s closest manager was vital, as was the upper management s clear engagement and a shared view of the necessity of a change. In the data was found employees reflective talking, which was indicating the meaning of the change and which was interpreted from the learning point of view. According to this study, it is possible to identify and analyze reflective talk and that way have information about employees learning in an organizational change. It was prominent to notice how reflection in the process of outsourcing is extremely versatile and extensive.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many networks such as social networks and organizational networks in global companies consist of self-interested agents. The topology of these networks often plays a crucial role in important tasks such as information diffusion and information extraction. Consequently, growing a stable network having a certain topology is of interest. Motivated by this, we study the following important problem: given a certain desired network topology, under what conditions would best response (link addition/deletion) strategies played by self-interested agents lead to formation of a stable network having that topology. We study this interesting reverse engineering problem by proposing a natural model of recursive network formation and a utility model that captures many key features. Based on this model, we analyze relevant network topologies and derive a set of sufficient conditions under which these topologies emerge as pairwise stable networks, wherein no node wants to delete any of its links and no two nodes would want to create a link between them.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this article, we present position indication functionality as obtained by using a retrodirective array, thereby allowing location information extraction of the position of the remote transmitter with which the retrodirective array is cooperating. This is carried out using straightforward circuitry with no requirement for complex angle of arrival algorithms, thereby giving a result in real time enabling tracking of fast moving transmitters. We show using a 10 x element retrodirective array, operating at 2.4 GHz that accuracies of far-field angle of arrival of within +/- 1 degrees over the arrays +/- 30 degrees azimuth field of view are possible. While in the near-field for angles of arrival of +/- 10 degrees it is possible to extract the position of a dipole source down to a resolution of 032 lambda. (C) 2010 Wiley Periodicals, Inc. Microwave Opt Technol Lett 52: 1031-1034, 2010; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/mop.25097