999 resultados para Extraction automatique de connaissances


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent investigations into plant tissues have indicated that the free form of the natural polyphenolic antioxidant, ellagic acid (EA), is much more plentiful than first envisaged; consequently a re-assessment of solvent systems for the extraction of this water-insoluble form is needed. As EA solubility and its UV-Vis spectrum, commonly used for detection and quantification, are both governed by pH, an understanding of this dependence is vital if accurate EA measurements are to be achieved. After evaluating the pH effects on the solubility and UV-Vis spectra of commercial EA, an extraction protocol was devised that promoted similar pH conditions for both standard solutions and plant tissue extracts. The extraction so devised followed by HPLC with photodiode-array detection (DAD) provided a simple, sensitive and validated methodology that determined free EA in a variety of plant extracts. The use of 100 % methanol or a triethanolamine-based mixture as the standard dissolving solvents were the best choices, while these higher pH-generating solvents were more efficient in extracting EA from the plants tested with the final choice allied to the plants’ natural acidity. Two of the native Australian plants anise myrtle (Syzygium anisatum) and Kakadu plum (Terminalia ferdinandiana) exhibited high concentrations of free EA. Furthermore, the dual approach to measuring EA UV-Vis spectra made possible an assessment of the effect of acidified eluent on EA spectra when the DAD was employed.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents two algorithms for smoothing and feature extraction for fingerprint classification. Deutsch's(2) Thinning algorithm (rectangular array) is used for thinning the digitized fingerprint (binary version). A simple algorithm is also suggested for classifying the fingerprints. Experimental results obtained using such algorithms are presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Organochlorine pesticides (OCPs) are ubiquitous environmental contaminants with adverse impacts on aquatic biota, wildlife and human health even at low concentrations. However, conventional methods for their determination in river sediments are resource intensive. This paper presents an approach that is rapid and also reliable for the detection of OCPs. Accelerated Solvent Extraction (ASE) with in-cell silica gel clean-up followed by Triple Quadrupole Gas Chromatograph Mass Spectrometry (GCMS/MS) was used to recover OCPs from sediment samples. Variables such as temperature, solvent ratio, adsorbent mass and extraction cycle were evaluated and optimised for the extraction. With the exception of Aldrin, which was unaffected by any of the variables evaluated, the recovery of OCPs from sediment samples was largely influenced by solvent ratio and adsorbent mass and, to some extent, the number of cycles and temperature. The optimised conditions for OCPs extraction in sediment with good recoveries were determined to be 4 cycles, 4.5 g of silica gel, 105 ᴼC, and 4:3 v/v DCM: hexane mixture. With the exception of two compounds (α-BHC and Aldrin) whose recoveries were low (59.73 and 47.66 % respectively), the recovery of the other pesticides were in the range 85.35 – 117.97% with precision < 10 % RSD. The method developed significantly reduces sample preparation time, the amount of solvent used, matrix interference, and is highly sensitive and selective.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents 'vSpeak', the first initiative taken in Pakistan for ICT enabled conversion of dynamic Sign Urdu gestures into natural language sentences. To realize this, vSpeak has adopted a novel approach for feature extraction using edge detection and image compression which gives input to the Artificial Neural Network that recognizes the gesture. This technique caters for the blurred images as well. The training and testing is currently being performed on a dataset of 200 patterns of 20 words from Sign Urdu with target accuracy of 90% and above.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, pattern classification problem in tool wear monitoring is solved using nature inspired techniques such as Genetic Programming(GP) and Ant-Miner (AM). The main advantage of GP and AM is their ability to learn the underlying data relationships and express them in the form of mathematical equation or simple rules. The extraction of knowledge from the training data set using GP and AM are in the form of Genetic Programming Classifier Expression (GPCE) and rules respectively. The GPCE and AM extracted rules are then applied to set of data in the testing/validation set to obtain the classification accuracy. A major attraction in GP evolved GPCE and AM based classification is the possibility of obtaining an expert system like rules that can be directly applied subsequently by the user in his/her application. The performance of the data classification using GP and AM is as good as the classification accuracy obtained in the earlier study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper the approach for automatic road extraction for an urban region using structural, spectral and geometric characteristics of roads has been presented. Roads have been extracted based on two levels: Pre-processing and road extraction methods. Initially, the image is pre-processed to improve the tolerance by reducing the clutter (that mostly represents the buildings, parking lots, vegetation regions and other open spaces). The road segments are then extracted using Texture Progressive Analysis (TPA) and Normalized cut algorithm. The TPA technique uses binary segmentation based on three levels of texture statistical evaluation to extract road segments where as, Normalizedcut method for road extraction is a graph based method that generates optimal partition of road segments. The performance evaluation (quality measures) for road extraction using TPA and normalized cut method is compared. Thus the experimental result show that normalized cut method is efficient in extracting road segments in urban region from high resolution satellite image.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Assessment of heavy metal bioavailability in sediments is complex because of the number of partial extraction methods available for the assessment and the general lack of certified reference materials. This study evaluates five different extraction methodologies to ascertain the relative strengths and weaknesses of each method. The results are then compared to previously published work to ascertain the most effective partial extraction technique, which was established to be dilute (0.75 – 1 M) nitric acid solutions. These results imply that single reagent; weak acid extractions provide a better assessment of potentially bioavailable metals than the chelating agents used in sequential extraction methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report a measurement of the ratio of the tt̅ to Z/γ* production cross sections in √s=1.96  TeV pp̅ collisions using data corresponding to an integrated luminosity of up to 4.6  fb-1, collected by the CDF II detector. The tt̅ cross section ratio is measured using two complementary methods, a b-jet tagging measurement and a topological approach. By multiplying the ratios by the well-known theoretical Z/γ*→ll cross section predicted by the standard model, the extracted tt̅ cross sections are effectively insensitive to the uncertainty on luminosity. A best linear unbiased estimate is used to combine both measurements with the result σtt̅ =7.70±0.52  pb, for a top-quark mass of 172.5  GeV/c2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method for the delipidation of egg yolk plasma using phospholipase-C, n-heptane, and 1-butanol has been described. An aggregating protein fraction and a soluble protein fraction were separated by the action of phospholipase-C. The aggregating protein fraction freed of most of the lipids by treatment with n-heptane and 1-butanol was shown to be the apolipoproteins of yolk plasma, whereas the soluble proteins were identified as the livetins. Carbohydrate and the N-terminal amino acid analysis of these protein fractions are reported. A comparison of these protein fractions with the corresponding fractions obtained by formic acid delipidation of yolk plasma has been made. The gelation of yolk plasma by the action of phospholipase-C has been interpreted as an aggregation of lipoproteins caused by ionic interactions. The role of lecithin in maintaining the structural integrity of lipoproteins has been discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report a measurement of the ratio of the tt̅ to Z/γ* production cross sections in √s=1.96  TeV pp̅ collisions using data corresponding to an integrated luminosity of up to 4.6  fb-1, collected by the CDF II detector. The tt̅ cross section ratio is measured using two complementary methods, a b-jet tagging measurement and a topological approach. By multiplying the ratios by the well-known theoretical Z/γ*→ll cross section predicted by the standard model, the extracted tt̅ cross sections are effectively insensitive to the uncertainty on luminosity. A best linear unbiased estimate is used to combine both measurements with the result σtt̅ =7.70±0.52  pb, for a top-quark mass of 172.5  GeV/c2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We report a measurement of the ratio of the top-antitop to Z/gamma* production cross sections in sqrt(s) = 1.96 TeV proton-antiproton collisions using data corresponding to an integrated luminosity of up to 4.6 fb-1, collected by the CDF II detector. The top-antitop cross section ratio is measured using two complementary methods, a b-jet tagging measurement and a topological approach. By multiplying the ratios by the well-known theoretical Z/gamma*->ll cross section, the extracted top-antitop cross sections are effectively insensitive to the uncertainty on luminosity. A best linear unbiased estimate is used to combine both measurements with the result sigma_(top-antitop) = 7.70 +/- 0.52 pb, for a top-quark mass of 172.5 GeV/c^2.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose two texture-based approaches, one involving Gabor filters and the other employing log-polar wavelets, for separating text from non-text elements in a document image. Both the proposed algorithms compute local energy at some information-rich points, which are marked by Harris' corner detector. The advantage of this approach is that the algorithm calculates the local energy at selected points and not throughout the image, thus saving a lot of computational time. The algorithm has been tested on a large set of scanned text pages and the results have been seen to be better than the results from the existing algorithms. Among the proposed schemes, the Gabor filter based scheme marginally outperforms the wavelet based scheme.