988 resultados para knowledge extraction
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.
Resumo:
This article presents an automatic methodology for extraction of road seeds from high-resolution aerial images. The method is based on a set of four road objects and another set of connection rules among road objects. Each road object is a local representation of an approximately straight road fragment and its construction is based on a combination of polygons describing all relevant image edges, according to some rules embodying road knowledge. Each one of the road seeds is composed by a sequence of connected road objects, in which each sequence of this type can be geometrically structured as a chain of contiguous quadrilaterals. Experiments carried out with high-resolution aerial images showed that the proposed methodology is very promising in extracting road seeds. This article presents the fundamentals of the method and the experimental results, as well.
Resumo:
This paper presents an automatic methodology for road network extraction from medium-and high-resolution aerial images. It is based on two steps. In the first step, the road seeds (i.e., road segments) are extracted using a set of four road objects and another set of connection rules among road objects. Each road object is a local representation of an approximately straight road fragment and its construction is based on a combination of polygons describing all relevant image edges, according to some rules embodying road knowledge. Each road seed is composed by a sequence of connected road objects in which each sequence of this type can be geometrically structured as a chain of contiguous quadrilaterals. In the second step, two strategies for road completion are applied in order to generate the complete road network. The first strategy is based on two basic perceptual grouping rules, i.e., proximity and collinearity rules, which allow the sequential reconstruction of gaps between every pair of disconnected road segments. This strategy does not allow the reconstruction of road crossings, but it allows the extraction of road centerlines from the contiguous quadrilaterals representing connected road segments. The second strategy for road completion aims at reconstructing road crossings. Firstly, the road centerlines are used to find reference points for road crossings, which are their approximate positions. Then these points are used to extract polygons representing the contours of road crossings. This paper presents the proposed methodology and experimental results. © Pleiades Publishing, Inc. 2006.
Resumo:
Landfarm soils are employed in industrial and petrochemical residue bioremediation. This process induces selective pressure directed towards microorganisms capable of degrading toxic compounds. Detailed description of taxa in these environments is difficult due to a lack of knowledge of culture conditions required for unknown microorganisms. A metagenomic approach permits identification of organisms without the need for culture. However, a DNA extraction step is first required, which can bias taxonomic representativeness and interfere with cloning steps by extracting interference substances. We developed a simplified DNA extraction procedure coupled with metagenomic DNA amplification in an effort to overcome these limitations. The amplified sequences were used to generate a metagenomic data set and the taxonomic and functional representativeness were evaluated in comparison with a data set built with DNA extracted by conventional methods. The simplified and optimized method of RAPD to access metagenomic information provides better representativeness of the taxonomical and metabolic aspects of the environmental samples.
Resumo:
Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.
Resumo:
The central objective of research in Information Retrieval (IR) is to discover new techniques to retrieve relevant information in order to satisfy an Information Need. The Information Need is satisfied when relevant information can be provided to the user. In IR, relevance is a fundamental concept which has changed over time, from popular to personal, i.e., what was considered relevant before was information for the whole population, but what is considered relevant now is specific information for each user. Hence, there is a need to connect the behavior of the system to the condition of a particular person and his social context; thereby an interdisciplinary sector called Human-Centered Computing was born. For the modern search engine, the information extracted for the individual user is crucial. According to the Personalized Search (PS), two different techniques are necessary to personalize a search: contextualization (interconnected conditions that occur in an activity), and individualization (characteristics that distinguish an individual). This movement of focus to the individual's need undermines the rigid linearity of the classical model overtaken the ``berry picking'' model which explains that the terms change thanks to the informational feedback received from the search activity introducing the concept of evolution of search terms. The development of Information Foraging theory, which observed the correlations between animal foraging and human information foraging, also contributed to this transformation through attempts to optimize the cost-benefit ratio. This thesis arose from the need to satisfy human individuality when searching for information, and it develops a synergistic collaboration between the frontiers of technological innovation and the recent advances in IR. The search method developed exploits what is relevant for the user by changing radically the way in which an Information Need is expressed, because now it is expressed through the generation of the query and its own context. As a matter of fact the method was born under the pretense to improve the quality of search by rewriting the query based on the contexts automatically generated from a local knowledge base. Furthermore, the idea of optimizing each IR system has led to develop it as a middleware of interaction between the user and the IR system. Thereby the system has just two possible actions: rewriting the query, and reordering the result. Equivalent actions to the approach was described from the PS that generally exploits information derived from analysis of user behavior, while the proposed approach exploits knowledge provided by the user. The thesis went further to generate a novel method for an assessment procedure, according to the "Cranfield paradigm", in order to evaluate this type of IR systems. The results achieved are interesting considering both the effectiveness achieved and the innovative approach undertaken together with the several applications inspired using a local knowledge base.
Resumo:
Recently developed computer applications provide tools for planning cranio-maxillofacial interventions based on 3-dimensional (3D) virtual models of the patient's skull obtained from computed-tomography (CT) scans. Precise knowledge of the location of the mid-facial plane is important for the assessment of deformities and for planning reconstructive procedures. In this work, a new method is presented to automatically compute the mid-facial plane on the basis of a surface model of the facial skeleton obtained from CT. The method matches homologous surface areas selected by the user on the left and right facial side using an iterative closest point optimization. The symmetry plane which best approximates this matching transformation is then computed. This new automatic method was evaluated in an experimental study. The study included experienced and inexperienced clinicians defining the symmetry plane by a selection of landmarks. This manual definition was systematically compared with the definition resulting from the new automatic method: Quality of the symmetry planes was evaluated by their ability to match homologous areas of the face. Results show that the new automatic method is reliable and leads to significantly higher accuracy than the manual method when performed by inexperienced clinicians. In addition, the method performs equally well in difficult trauma situations, where key landmarks are unreliable or absent.
Resumo:
The extraction of the finite temperature heavy quark potential from lattice QCD relies on a spectral analysis of the Wilson loop. General arguments tell us that the lowest lying spectral peak encodes, through its position and shape, the real and imaginary parts of this complex potential. Here we benchmark this extraction strategy using leading order hard-thermal loop (HTL) calculations. In other words, we analytically calculate the Wilson loop and determine the corresponding spectrum. By fitting its lowest lying peak we obtain the real and imaginary parts and confirm that the knowledge of the lowest peak alone is sufficient for obtaining the potential. Access to the full spectrum allows an investigation of spectral features that do not contribute to the potential but can pose a challenge to numerical attempts of an analytic continuation from imaginary time data. Differences in these contributions between the Wilson loop and gauge fixed Wilson line correlators are discussed. To better understand the difficulties in a numerical extraction we deploy the maximum entropy method with extended search space to HTL correlators in Euclidean time and observe how well the known spectral function and values for the real and imaginary parts are reproduced. Possible venues for improvement of the extraction strategy are discussed.
Resumo:
Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.
Resumo:
Clinical text understanding (CTU) is of interest to health informatics because critical clinical information frequently represented as unconstrained text in electronic health records are extensively used by human experts to guide clinical practice, decision making, and to document delivery of care, but are largely unusable by information systems for queries and computations. Recent initiatives advocating for translational research call for generation of technologies that can integrate structured clinical data with unstructured data, provide a unified interface to all data, and contextualize clinical information for reuse in multidisciplinary and collaborative environment envisioned by CTSA program. This implies that technologies for the processing and interpretation of clinical text should be evaluated not only in terms of their validity and reliability in their intended environment, but also in light of their interoperability, and ability to support information integration and contextualization in a distributed and dynamic environment. This vision adds a new layer of information representation requirements that needs to be accounted for when conceptualizing implementation or acquisition of clinical text processing tools and technologies for multidisciplinary research. On the other hand, electronic health records frequently contain unconstrained clinical text with high variability in use of terms and documentation practices, and without commitmentto grammatical or syntactic structure of the language (e.g. Triage notes, physician and nurse notes, chief complaints, etc). This hinders performance of natural language processing technologies which typically rely heavily on the syntax of language and grammatical structure of the text. This document introduces our method to transform unconstrained clinical text found in electronic health information systems to a formal (computationally understandable) representation that is suitable for querying, integration, contextualization and reuse, and is resilient to the grammatical and syntactic irregularities of the clinical text. We present our design rationale, method, and results of evaluation in processing chief complaints and triage notes from 8 different emergency departments in Houston Texas. At the end, we will discuss significance of our contribution in enabling use of clinical text in a practical bio-surveillance setting.
Resumo:
In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.
Resumo:
Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts.
Resumo:
Most of the present digital images processing methods are related with objective characterization of external properties as shape, form or colour. This information concerns objective characteristics of different bodies and is applied to extract details to perform several different tasks. But in some occasions, some other type of information is needed. This is the case when the image processing system is going to be applied to some operation related with living bodies. In this case, some other type of object information may be useful. As a matter of fact, it may give additional knowledge about its subjective properties. Some of these properties are object symmetry, parallelism between lines and the feeling of size. These types of properties concerns more to internal sensations of living beings when they are related with their environment than to the objective information obtained by artificial systems. This paper presents an elemental system able to detect some of the above-mentioned parameters. A first mathematical model to analyze these situations is reported. This theoretical model will give the possibility to implement a simple working system. The basis of this system is the use of optical logic cells, previously employed in optical computing.
Resumo:
This paper presents a method for identifying concepts in microposts and classifying them into a predefined set of categories. The method relies on the DBpedia knowledge base to identify the types of the concepts detected in the messages. For those concepts that are not classified in the ontology we infer their types via the ontology properties which characterise the type.