968 resultados para Knowledge extraction
Resumo:
With the increasing popularity and adoption of building information modeling (BIM), the amount of digital information available about a building is overwhelming. Enormous challenges remain however in identifying meaningful and required information from a complex BIM model to support a particular construction management (CM) task. Detailed specifications of information required by different construction domains and expressive and easy-to-use BIM reasoning mechanisms are seen as an important means in addressing these challenges. This paper analyzes some of the characteristics and requirements of component-specific construction knowledge in relation to the current work practice and BIM-based applications. It is argued that domain ontologies and information extraction approaches, such as queries could significantly bring much needed support for knowledge sharing and integration of information between design, construction and facility management.
Resumo:
Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.
Resumo:
We identify relation completion (RC) as one recurring problem that is central to the success of novel big data applications such as Entity Reconstruction and Data Enrichment. Given a semantic relation, RC attempts at linking entity pairs between two entity lists under the relation. To accomplish the RC goals, we propose to formulate search queries for each query entity α based on some auxiliary information, so that to detect its target entity β from the set of retrieved documents. For instance, a pattern-based method (PaRE) uses extracted patterns as the auxiliary information in formulating search queries. However, high-quality patterns may decrease the probability of finding suitable target entities. As an alternative, we propose CoRE method that uses context terms learned surrounding the expression of a relation as the auxiliary information in formulating queries. The experimental results based on several real-world web data collections demonstrate that CoRE reaches a much higher accuracy than PaRE for the purpose of RC.
Resumo:
It is well established that the traditional taxonomy and nomenclature of Chironomidae relies on adult males whose usually characteristic genitalia provide evidence of species distinction. In the early days some names were based on female adults of variable distinctiveness – but females are difficult to identify (Ekrem et al. 2010) and many of these names remain dubious. In Russia especially, a system based on larval morphology grew in parallel to the conventional adult-based system. The systems became reconciled with the studies that underlay the production of the Holarctic generic keys to Chironomidae, commencing notably with the larval volume (Wiederholm, 1983). Ever since Thienemann’s pioneering studies, it has been evident that the pupa, notably the cast skins (exuviae) provide a wealth of features that can aid in identification (e.g. Wiederholm, 1986). Furthermore, the pupae can be readily associated with name-bearing adults when a pharate (‘cloaked’) adult stage is visible within the pupa. Association of larvae with the name-bearing later stages has been much more difficult, time-consuming and fraught with risk of failure. Yet it is identification of the larval stage that is needed by most applied researchers due to the value of the immature stages of the family in aquatic monitoring for water quality, although the pupal stage also has advocates (reviewed by Sinclair & Gresens, 2008). Few use the adult stage for such purposes as their provenance and association with the water body can be verified only by emergence trapping, and sampling of adults lies outside regular aquatic monitoring protocols.
Resumo:
Currently we are facing an overburdening growth of the number of reliable information sources on the Internet. The quantity of information available to everyone via Internet is dramatically growing each year [15]. At the same time, temporal and cognitive resources of human users are not changing, therefore causing a phenomenon of information overload. World Wide Web is one of the main sources of information for decision makers (reference to my research). However our studies show that, at least in Poland, the decision makers see some important problems when turning to Internet as a source of decision information. One of the most common obstacles raised is distribution of relevant information among many sources, and therefore need to visit different Web sources in order to collect all important content and analyze it. A few research groups have recently turned to the problem of information extraction from the Web [13]. The most effort so far has been directed toward collecting data from dispersed databases accessible via web pages (related to as data extraction or information extraction from the Web) and towards understanding natural language texts by means of fact, entity, and association recognition (related to as information extraction). Data extraction efforts show some interesting results, however proper integration of web databases is still beyond us. Information extraction field has been recently very successful in retrieving information from natural language texts, however it is still lacking abilities to understand more complex information, requiring use of common sense knowledge, discourse analysis and disambiguation techniques.
Resumo:
The quantification and characterisation of soil phosphorus (P) is of agricultural and environmental importance and different extraction methods are widely used to asses the bioavailability of P and to characterize soil P reserves. However, the large variety of extractants, pre-treatments and sample preparation procedures complicate the comparison of published results. In order to improve our understanding of the behaviour and cycling of P in soil, it is crucial to know the scientific relevance of the methods used for various purposes. The knowledge of the factors affecting the analytical outcome is a prerequisite for justified interpretation of the results. The aim of this thesis was to study the effects of sample preparation procedures on soil P and to determine the dependence of the recovered P pool on the chemical nature of extractants. Sampling is a critical step in soil testing and sampling strategy is dependent on the land-use history and the purpose of sampling. This study revealed that pre-treatments changed soil properties and air-drying was found to affect soil P, particularly extractable organic P, by disrupting organic matter. This was evidenced by an increase in the water-extractable small-sized (<0.2 µm) P that, at least partly, took place at the expense of the large-sized (>0.2 µm) P. However, freezing induced only insignificant changes and thus, freezing can be taken to be a suitable method for storing soils from the boreal zone that naturally undergo periodic freezing. The results demonstrated that chemical nature of the extractant affects its sensitivity to detect changes in soil P solubility. Buffered extractants obscured the alterations in P solubility induced by pH changes; however, water extraction, though sensitive to physicochemical changes, can be used to reveal short term changes in soil P solubility. As for the organic P, the analysis was found to be sensitive to the sample preparation procedures: filtering may leave a large proportion of extractable organic P undetected, whereas the outcome of centrifugation was found to be affected by the ionic strength of the extractant. Widely used sequential fractionation procedures proved to be able to detect land-use -derived differences in the distribution of P among fractions of different solubilities. However, interpretation of the results from extraction experiments requires better understanding of the biogeochemical function of the recovered P fraction in the P cycle in differently managed soils under dissimilar climatic conditions.
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
In this paper, pattern classification problem in tool wear monitoring is solved using nature inspired techniques such as Genetic Programming(GP) and Ant-Miner (AM). The main advantage of GP and AM is their ability to learn the underlying data relationships and express them in the form of mathematical equation or simple rules. The extraction of knowledge from the training data set using GP and AM are in the form of Genetic Programming Classifier Expression (GPCE) and rules respectively. The GPCE and AM extracted rules are then applied to set of data in the testing/validation set to obtain the classification accuracy. A major attraction in GP evolved GPCE and AM based classification is the possibility of obtaining an expert system like rules that can be directly applied subsequently by the user in his/her application. The performance of the data classification using GP and AM is as good as the classification accuracy obtained in the earlier study.
Resumo:
The increasing use of 3D modeling of Human Face in Face Recognition systems, User Interfaces, Graphics, Gaming and the like has made it an area of active study. Majority of the 3D sensors rely on color coded light projection for 3D estimation. Such systems fail to generate any response in regions covered by Facial Hair (like beard, mustache), and hence generate holes in the model which have to be filled manually later on. We propose the use of wavelet transform based analysis to extract the 3D model of Human Faces from a sinusoidal white light fringe projected image. Our method requires only a single image as input. The method is robust to texture variations on the face due to space-frequency localization property of the wavelet transform. It can generate models to pixel level refinement as the phase is estimated for each pixel by a continuous wavelet transform. In cases of sparse Facial Hair, the shape distortions due to hairs can be filtered out, yielding an estimate for the underlying face. We use a low-pass filtering approach to estimate the face texture from the same image. We demonstrate the method on several Human Faces both with and without Facial Hairs. Unseen views of the face are generated by texture mapping on different rotations of the obtained 3D structure. To the best of our knowledge, this is the first attempt to estimate 3D for Human Faces in presence of Facial hair structures like beard and mustache without generating holes in those areas.
Resumo:
This paper reviews the fingerprint classification literature looking at the problem from a double perspective. We first deal with feature extraction methods, including the different models considered for singular point detection and for orientation map extraction. Then, we focus on the different learning models considered to build the classifiers used to label new fingerprints. Taxonomies and classifications for the feature extraction, singular point detection, orientation extraction and learning methods are presented. A critical view of the existing literature have led us to present a discussion on the existing methods and their drawbacks such as difficulty in their reimplementation, lack of details or major differences in their evaluations procedures. On this account, an experimental analysis of the most relevant methods is carried out in the second part of this paper, and a new method based on their combination is presented.
Resumo:
This paper shows how knowledge, in the form of fuzzy rules, can be derived from a self-organizing supervised learning neural network called fuzzy ARTMAP. Rule extraction proceeds in two stages: pruning removes those recognition nodes whose confidence index falls below a selected threshold; and quantization of continuous learned weights allows the final system state to be translated into a usable set of rules. Simulations on a medical prediction problem, the Pima Indian Diabetes (PID) database, illustrate the method. In the simulations, pruned networks about 1/3 the size of the original actually show improved performance. Quantization yields comprehensible rules with only slight degradation in test set prediction performance.
Resumo:
Computer based mathematical models describing the aircraft evacuation process have a vital role to play in aviation safety. However such models have a heavy dependency on real evacuation data in order to (a) identify the key processes and factors associated with evacuation, (b) quantify variables and parameters associated with the identified factors/processes and finally (c) validate the models. The Fire Safety Engineering Group of the University of Greenwich is undertaking a large data extraction exercise from three major data sources in order to address these issues. This paper describes the extraction and application of data from one of these sources - aviation accident reports. To aid in the storage and analysis of the raw data, a computer database known as AASK (aircraft accident statistics and knowledge) is under development. AASK is being developed to store human observational and anecdotal data contained in accident reports and interview transcripts. AASK comprises four component sub-databases. These consist of the ACCIDENT (crash details), FLIGHT ATTENDANT (observations and actions of the flight attendants), FATALS (details concerning passenger fatalities) and PAX (observations and accounts from individual passengers) databases. AASK currently contains information from 25 survivable aviation accidents covering the period 4 April 1977 to 6 August 1995, involving some 2415 passengers, 2210 survivors, 205 fatalities and accounts from 669 people. In addition to aiding the development of aircraft evacuation models, AASK is also being used to challenge some of the myths which proliferate in the aviation safety industry such as, passenger exit selection during evacuation, nature and frequency of seat jumping, speed of passenger response and group dynamics. AASK can also be used to aid in the development of a more comprehensive approach to conducting post accident interviews, and will eventually be used to store the data directly.
Resumo:
The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction.
Resumo:
Heterocyclic aromatic amines (HCA) are carcinogenic mutagens formed during cooking of proteinaceous foods, particularly meat. To assist in the ongoing search for biomarkers of HCA exposure in blood, a method is described for the extraction from human plasma of the most abundant HCAs: 2-Amino-1-methyl-6-phenylimidazo(4,5-b)pyridine (PhIP), 2-amino-3,8-dimethylimidazo[4,5-f]quinoxaline (MeIQx) and 2-amino-3,4,8-trimethylimidazo[4,5-f]quinoxaline (4,8-DiMeIQx) (and its isomer 7,8-DiMeIQx), using Hollow Fibre Membrane Liquid-Phase Microextraction. This technique employs 2.5 cm lengths of porous polypropylene fibres impregnated with organic solvent to facilitate simultaneous extraction from an alkaline aqueous sample into a low volume acidic acceptor phase. This low cost protocol is extensively optimised for fibre length, extraction time, sample pH and volume. Detection is by UPLC-MS/MS using positive mode electrospray ionisation with a 3.4 min runtime, with optimum peak shape, sensitivity and baseline separation being achieved at pH 9.5. To our knowledge this is the first description of HCA chromatography under alkaline conditions. Application of fixed ion ratio tolerances for confirmation of analyte identity is discussed. Assay precision is between 4.5 and 8.8% while lower limits of detection between 2 and 5 pg/mL are below the concentrations postulated for acid-labile HCA-protein adducts in blood.
Resumo:
This paper uses the history of rubber extraction to explore competing attempts to control the forest environments of Assam and beyond in the second half of the nineteenth century. Forest communities faced rival efforts at environmental control from both European and Indian traders, as well as from various centres of authority within the Raj. Government attempts to regulate rubber collection were undermined by the weak authority of the Raj in these regions, leading to widespread smuggling. Partly in response to the disruptive influence of rubber traders on the frontier, the Raj began to restrict the presence of outsiders in tribal regions, which came to be understood as distinct areas outside British control. When rubber yields from the forests nearest the Brahmaputra fell in the wake of intensive exploitation, India's scientific foresters demanded and from 1870 obtained the ability to regulate the Assamese forests, blaming indigenous rubber tapping strategies for the declining yields and arguing that Indian rubber could be ‘equal [to] if not better' than Amazonian rubber if only tappers would change their practices. The knowledge of the scientific foresters was fundamentally flawed, however, and their efforts to establish a new type of tapping practice failed. By 1880, the government had largely abandoned attempts to regulate wild Indian rubber, though wild sources continued to dominate the supply of global rubber until after 1910.