884 resultados para fact extraction
Resumo:
The paper presents an approach to extraction of facts from texts of documents. This approach is based on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually extracted facts combine into one structure the dictionary lexical objects found in the text and match them against concepts of subject domain ontology.
Resumo:
Currently we are facing an overburdening growth of the number of reliable information sources on the Internet. The quantity of information available to everyone via Internet is dramatically growing each year [15]. At the same time, temporal and cognitive resources of human users are not changing, therefore causing a phenomenon of information overload. World Wide Web is one of the main sources of information for decision makers (reference to my research). However our studies show that, at least in Poland, the decision makers see some important problems when turning to Internet as a source of decision information. One of the most common obstacles raised is distribution of relevant information among many sources, and therefore need to visit different Web sources in order to collect all important content and analyze it. A few research groups have recently turned to the problem of information extraction from the Web [13]. The most effort so far has been directed toward collecting data from dispersed databases accessible via web pages (related to as data extraction or information extraction from the Web) and towards understanding natural language texts by means of fact, entity, and association recognition (related to as information extraction). Data extraction efforts show some interesting results, however proper integration of web databases is still beyond us. Information extraction field has been recently very successful in retrieving information from natural language texts, however it is still lacking abilities to understand more complex information, requiring use of common sense knowledge, discourse analysis and disambiguation techniques.
Resumo:
Analysis of high resolution satellite images has been an important research topic for urban analysis. One of the important features of urban areas in urban analysis is the automatic road network extraction. Two approaches for road extraction based on Level Set and Mean Shift methods are proposed. From an original image it is difficult and computationally expensive to extract roads due to presences of other road-like features with straight edges. The image is preprocessed to improve the tolerance by reducing the noise (the buildings, parking lots, vegetation regions and other open spaces) and roads are first extracted as elongated regions, nonlinear noise segments are removed using a median filter (based on the fact that road networks constitute large number of small linear structures). Then road extraction is performed using Level Set and Mean Shift method. Finally the accuracy for the road extracted images is evaluated based on quality measures. The 1m resolution IKONOS data has been used for the experiment.
Resumo:
Studies have been made on the kinetics of ytterbium(III) with bis-(2,4,4-trimethylpentyl) phosphinic acid (Cyanex 272, HA) in n-heptane using a constant interfacial cell with laminar flow. The stiochiometry and the equilibrium constant of the extracted complex formation reaction between Yb3+ and Cyanex 272 are determined. The extraction rate is dependent of the stirring rate. This fact together with the Ea value suggests that the mass transfer process is a mixed chemical reaction-diffusion controlled at lower temperature, whereas it is entirely diffusion controlled at higher temperature. The rate equations for the ytterbium extraction with Cyanex 272 have been obtained. The rate-determining step is also made by predictions derived from interfacial reaction models, and through the approximate solutions of the flux equation, diffusion parameters and thickness of the diffusion film have been calculated.
Resumo:
The yttrium(III) extraction kinetics and mechanism with bis-(2,4,4-trimethyl-pentyl) phosphinic acid (Cyanex 272, HA) dissolved in heptane have been investigated by constant interfacial cell with laminar flow. The data has been analyzed in terms of pseudo-first order constants. Studies on the effects of stirring rate, temperature, acidity in aqueous phase, and extractant concentration on the extraction rate show that the extraction regime is dependent on the extraction conditions. The plot of interfacial area on the rate has shown a linear relationship. This fact together with the strong surface activity of Cyanex 272 at heptane-water interfaces has made the interface the most probable location for the chemical reactions. The forward, reverse rate equations and extraction rate constant for the yttrium extraction with Cyanex 272 have been obtained under the experimental conditions. The rate-determining step has been also predicted from interfacial reaction models. The predictions have been found to be in good agreement with the rate equations obtained from experimental data, confirming the basic assumption that the chemical reaction is located at the liquid-liquid interface.
Resumo:
The extraction kinetics of ytterbium with sec-nonylphenoxy acetic acid (CA-100) in heptane have been investigated using a constant interfacial area cell with laminar flow. The influence of stirring speed and temperature on the rate indicated that the extraction rate was controlled by the experiment conditions. The plot of interfacial area on the rate showed a linear relationship. This fact together with the low solubility in water and strong surface activity of CA-100 at heptane-water interfaces made the interface the most probable locale for the chemical reactions. The influences of extractant concentration and hydrogen ion concentration on the extraction rate were investigated, and the forward and reverse rate equations for the ytterbium extraction with CA-100 were also obtained. Based on the experimental data, the apparent forward extraction rate constant was calculated. Interfacial reaction models were proposed that agree well with the rate equations obtained from experimental data.
Resumo:
In view of the growing interest in endohedral lanthanide fullerenes, Ce, as a typical +4 oxidation state lanthanide element, has been systematically studied. The synthesis, extraction and electronic structure of Ce@C-2n are investigated. Soot containing Ce@C-2n was synthesized in high yield by carbonizing CeO2-containing graphite rods and are back-burning the CeC2-enriched cathode deposit in a DC are plasma apparatus. Ce@C-2n, dominated by Ce@C-82, can be efficiently extracted from the insoluble part of the soot after toluene Soxhlet extraction by pyridine at high temperature and high pressure in a closed vessel. About 60% Ce@C-2n (2n = 82,80,78,76) and 35% Ce@C-82 can be enriched in the pyridine extract. This fact is identified by desorption electron impact mass spectrometry (DEI MS). The electronic structure of Ce@C-2n is analyzed by using X-ray photoemission spectroscopy (XPS) of pyridine-free film. It is suggested that the encapsulated Ce atom is in a charge state close to +3 and was effectively protected from reaction with water and oxygen by the enclosing fullerene cage. Unlike theoretical expectation, the electronic state of Ce@C-82 is formally described as Ce+3@C-82(3-). (C) 1997 Elsevier Science Ltd.
Resumo:
The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction.
Resumo:
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2014
Resumo:
To study the complex formation of group 5 elements (Nb, Ta, Ha, and pseudoanalog Pa) in aqueous HCI solutions of medium and high concentrations the electronic structures of anionic complexes of these elements [MCl_6]^-, [MOCl_4]^-, [M(OH)-2 Cl_4]^-, and [MOCl_5]^2- have been calculated using the relativistic Dirac-Slater Discrete-Variational Method. The charge density distribution analysis has shown that tantalum occupies a specific position in the group and has the highest tendency to form the pure halide complex, [TaCl_6-. This fact along with a high covalency of this complex explains its good extractability into aliphatic amines. Niobium has equal trends to form pure halide [NbCl_6]^- and oxyhalide [NbOCl_5]^2- species at medium and high acid concentrations. Protactinium has a slight preference for the [PaOCl_5]^2- form or for the pure halide complexes with coordination number higher than 6 under these conditions. Element 105 at high HCl concentrations will have a preference to form oxyhalide anionic complex [HaOCl_5]^2- rather than [HaCl_6]^-. For the same sort of anionic oxychloride complexes an estimate has been done of their partition between the organic and aqueous phases in the extraction by aliphatic amines, which shows the following succession of the partition coefficients: P_Nb < P_Ha < P_Pa.
Resumo:
[EN] During maximal whole body exercise VO2 peak is limited by O2 delivery. In turn, it is though that blood flow at near-maximal exercise must be restrained by the sympathetic nervous system to maintain mean arterial pressure. To determine whether enhancing vasodilation across the leg results in higher O2 delivery and leg VO2 during near-maximal and maximal exercise in humans, seven men performed two maximal incremental exercise tests on the cycle ergometer. In random order, one test was performed with and one without (control exercise) infusion of ATP (8 mg in 1 ml of isotonic saline solution) into the right femoral artery at a rate of 80 microg.kg body mass-1.min-1. During near-maximal exercise (92% of VO2 peak), the infusion of ATP increased leg vascular conductance (+43%, P<0.05), leg blood flow (+20%, 1.7 l/min, P<0.05), and leg O2 delivery (+20%, 0.3 l/min, P<0.05). No effects were observed on leg or systemic VO2. Leg O2 fractional extraction was decreased from 85+/-3 (control) to 78+/-4% (ATP) in the infused leg (P<0.05), while it remained unchanged in the left leg (84+/-2 and 83+/-2%; control and ATP; n=3). ATP infusion at maximal exercise increased leg vascular conductance by 17% (P<0.05), while leg blood flow tended to be elevated by 0.8 l/min (P=0.08). However, neither systemic nor leg peak VO2 values where enhanced due to a reduction of O2 extraction from 84+/-4 to 76+/-4%, in the control and ATP conditions, respectively (P<0.05). In summary, the VO2 of the skeletal muscles of the lower extremities is not enhanced by limb vasodilation at near-maximal or maximal exercise in humans. The fact that ATP infusion resulted in a reduction of O2 extraction across the exercising leg suggests a vasodilating effect of ATP on less-active muscle fibers and other noncontracting tissues and that under normal conditions these regions are under high vasoconstrictor influence to ensure the most efficient flow distribution of the available cardiac output to the most active muscle fibers of the exercising limb.
Resumo:
Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.
Resumo:
The central objective of research in Information Retrieval (IR) is to discover new techniques to retrieve relevant information in order to satisfy an Information Need. The Information Need is satisfied when relevant information can be provided to the user. In IR, relevance is a fundamental concept which has changed over time, from popular to personal, i.e., what was considered relevant before was information for the whole population, but what is considered relevant now is specific information for each user. Hence, there is a need to connect the behavior of the system to the condition of a particular person and his social context; thereby an interdisciplinary sector called Human-Centered Computing was born. For the modern search engine, the information extracted for the individual user is crucial. According to the Personalized Search (PS), two different techniques are necessary to personalize a search: contextualization (interconnected conditions that occur in an activity), and individualization (characteristics that distinguish an individual). This movement of focus to the individual's need undermines the rigid linearity of the classical model overtaken the ``berry picking'' model which explains that the terms change thanks to the informational feedback received from the search activity introducing the concept of evolution of search terms. The development of Information Foraging theory, which observed the correlations between animal foraging and human information foraging, also contributed to this transformation through attempts to optimize the cost-benefit ratio. This thesis arose from the need to satisfy human individuality when searching for information, and it develops a synergistic collaboration between the frontiers of technological innovation and the recent advances in IR. The search method developed exploits what is relevant for the user by changing radically the way in which an Information Need is expressed, because now it is expressed through the generation of the query and its own context. As a matter of fact the method was born under the pretense to improve the quality of search by rewriting the query based on the contexts automatically generated from a local knowledge base. Furthermore, the idea of optimizing each IR system has led to develop it as a middleware of interaction between the user and the IR system. Thereby the system has just two possible actions: rewriting the query, and reordering the result. Equivalent actions to the approach was described from the PS that generally exploits information derived from analysis of user behavior, while the proposed approach exploits knowledge provided by the user. The thesis went further to generate a novel method for an assessment procedure, according to the "Cranfield paradigm", in order to evaluate this type of IR systems. The results achieved are interesting considering both the effectiveness achieved and the innovative approach undertaken together with the several applications inspired using a local knowledge base.
Resumo:
Most of the present digital images processing methods are related with objective characterization of external properties as shape, form or colour. This information concerns objective characteristics of different bodies and is applied to extract details to perform several different tasks. But in some occasions, some other type of information is needed. This is the case when the image processing system is going to be applied to some operation related with living bodies. In this case, some other type of object information may be useful. As a matter of fact, it may give additional knowledge about its subjective properties. Some of these properties are object symmetry, parallelism between lines and the feeling of size. These types of properties concerns more to internal sensations of living beings when they are related with their environment than to the objective information obtained by artificial systems. This paper presents an elemental system able to detect some of the above-mentioned parameters. A first mathematical model to analyze these situations is reported. This theoretical model will give the possibility to implement a simple working system. The basis of this system is the use of optical logic cells, previously employed in optical computing.
Resumo:
In large organizations the resources needed to solve challenging problems are typically dispersed over systems within and beyond the organization, and also in different media. However, there is still the need, in knowledge environments, for extraction methods able to combine evidence for a fact from across different media. In many cases the whole is more than the sum of its parts: only when considering the different media simultaneously can enough evidence be obtained to derive facts otherwise inaccessible to the knowledge worker via traditional methods that work on each single medium separately. In this paper, we present a cross-media knowledge extraction framework specifically designed to handle large volumes of documents composed of three types of media text, images and raw data and to exploit the evidence across the media. Our goal is to improve the quality and depth of automatically extracted knowledge.