988 resultados para knowledge extraction


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor mapping and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can estimate multiple plane models simultaneously from a noisy point cloud using the knowledge extracted from a scene (or an object) in order to reconstruct it accurately. This method comprises two steps: first, it clusters the data into planar faces that preserve some constraints defined by knowledge related to the object (e.g., the angles between faces); and second, the models of the planes are estimated based on these data using a novel multi-constraint RANSAC. We performed experiments in the clustering and RANSAC stages, which showed that the proposed method performed better than state-of-the-art methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Solution-processed polymer films are used in multiple technological applications. The presence of residual solvent in the film, as a consequence of the preparation method, affects the material properties, so films are typically subjected to post-deposition thermal annealing treatments aiming at its elimination. Monitoring the amount of solvent eliminated as a function of the annealing parameters is important to design a proper treatment to ensure complete solvent elimination, crucial to obtain reproducible and stable material properties and therefore, device performance. Here we demonstrate, for the first time to our knowledge, the use of an organic distributed feedback (DFB) laser to monitor with high precision the amount of solvent extracted from a spin-coated polymer film as a function of the thermal annealing time. The polymer film of interest, polystyrene in the present work, is doped with a small amount of a laser dye as to constitute the active layer of the laser device and deposited over a reusable DFB resonator. It is shown that solvent elimination translates into shifts in the DFB laser wavelength, as a consequence of changes in film thickness and refractive index. The proposed method is expected to be applicable to other types of annealing treatments, polymer-solvent combinations or film deposition methods, thus constituting a valuable tool to accurately control the quality and reproducibility of solution-processed polymer thin films.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic ontology building is a vital issue in many fields where they are currently built manually. This paper presents a user-centred methodology for ontology construction based on the use of Machine Learning and Natural Language Processing. In our approach, the user selects a corpus of texts and sketches a preliminary ontology (or selects an existing one) for a domain with a preliminary vocabulary associated to the elements in the ontology (lexicalisations). Examples of sentences involving such lexicalisation (e.g. ISA relation) in the corpus are automatically retrieved by the system. Retrieved examples are validated by the user and used by an adaptive Information Extraction system to generate patterns that discover other lexicalisations of the same objects in the ontology, possibly identifying new concepts or relations. New instances are added to the existing ontology or used to tune it. This process is repeated until a satisfactory ontology is obtained. The methodology largely automates the ontology construction process and the output is an ontology with an associated trained leaner to be used for further ontology modifications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With this paper, we propose a set of techniques to largely automate the process of KA, by using technologies based on Information Extraction (IE) , Information Retrieval and Natural Language Processing. We aim to reduce all the impeding factors mention above and thereby contribute to the wider utility of the knowledge management tools. In particular we intend to reduce the introspection of knowledge engineers or the extended elicitations of knowledge from experts by extensive textual analysis using a variety of methods and tools, as texts are largely available and in them - we believe - lies most of an organization's memory.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Challenges in the field are also presented and possible solutions are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We report statistical time-series analysis tools providing improvements in the rapid, precision extraction of discrete state dynamics from time traces of experimental observations of molecular machines. By building physical knowledge and statistical innovations into analysis tools, we provide techniques for estimating discrete state transitions buried in highly correlated molecular noise. We demonstrate the effectiveness of our approach on simulated and real examples of steplike rotation of the bacterial flagellar motor and the F1-ATPase enzyme. We show that our method can clearly identify molecular steps, periodicities and cascaded processes that are too weak for existing algorithms to detect, and can do so much faster than existing algorithms. Our techniques represent a step in the direction toward automated analysis of high-sample-rate, molecular-machine dynamics. Modular, open-source software that implements these techniques is provided.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We show a new method for term extraction from a domain relevant corpus using natural language processing for the purposes of semi-automatic ontology learning. Literature shows that topical words occur in bursts. We find that the ranking of extracted terms is insensitive to the choice of population model, but calculating frequencies relative to the burst size rather than the document length in words yields significantly different results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper presents an approach to extraction of facts from texts of documents. This approach is based on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually extracted facts combine into one structure the dictionary lexical objects found in the text and match them against concepts of subject domain ontology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Within the framework of heritage preservation, 3D scanning and modeling for heritage documentation has increased significantly in recent years, mainly due to the evolution of laser and image-based techniques, modeling software, powerful computers and virtual reality. 3D laser acquisition constitutes a real development opportunity for 3D modeling based previously on theoretical data. The representation of the object information rely on the knowledge of its historic and theoretical frame to reconstitute a posteriori its previous states. This project proposes an approach dealing with data extraction based on architectural knowledge and Laser statement informing measurements, the whole leading to 3D reconstruction. The experimented Khmer objects are exposed at Guimet museum in Paris. The purpose of this digital modeling meets the need of exploitable models for simulation projects, prototyping, exhibitions, promoting cultural tourism and particularly for archiving against any likely disaster and as an aided tool for the formulation of virtual museum concept.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

β-methylamino-L-alanine (BMAA) is a neurotoxin linked to neurodegeneration, which is manifested in the devastating human diseases amyotrophic lateral sclerosis, Alzheimer’s and Parkinson’s disease. This neurotoxin is known to be produced by almost all tested species within the cyanobacterial phylum including free living as well as the symbiotic strains. The global distribution of the BMAA producers ranges from a terrestrial ecosystem on the Island of Guam in the Pacific Ocean to an aquatic ecosystem in Northern Europe, the Baltic Sea, where annually massive surface blooms occur. BMAA had been shown to accumulate in the Baltic Sea food web, with highest levels in the bottom dwelling fish-species as well as in mollusks. One of the aims of this thesis was to test the bottom-dwelling bioaccumulation hypothesis by using a larger number of samples allowing a statistical evaluation. Hence, a large set of fish individuals from the lake Finjasjön, were caught and the BMAA concentrations in different tissues were related to the season of catching, fish gender, total weight and species. The results reveal that fish total weight and fish species were positively correlated with BMAA concentration in the fish brain. Therefore, significantly higher concentrations of BMAA in the brain were detected in plankti-benthivorous fish species and heavier (potentially older) individuals. Another goal was to investigate the potential production of BMAA by other phytoplankton organisms. Therefore, diatom cultures were investigated and confirmed to produce BMAA, even in higher concentrations than cyanobacteria. All diatom cultures studied during this thesis work were show to contain BMAA, as well as one dinoflagellate species. This might imply that the environmental spread of BMAA in aquatic ecosystems is even higher than previously thought. Earlier reports on the concentration of BMAA in different organisms have shown highly variable results and the methods used for quantification have been intensively discussed in the scientific community. In the most recent studies, liquid chromatography-tandem mass spectrometry (LC-MS/MS) has become the instrument of choice, due to its high sensitivity and selectivity. Even so, different studies show quite variable concentrations of BMAA. In this thesis, three of the most common BMAA extraction protocols were evaluated in order to find out if the extraction could be one of the sources of variability. It was found that the method involving precipitation of proteins using trichloroacetic acid gave the best performance, complying with all in-house validation criteria. However, extractions of diatom and cyanobacteria cultures with this validated method and quantified using LC-MS/MS still resulted in variable BMAA concentrations, which suggest that also biological reasons contribute to the discrepancies. The current knowledge on the environmental factors that can induce or reduce BMAA production is still limited. In cyanobacteria, production of BMAA was earlier shown to be negative correlated with nitrogen availability – both in laboratory cultures as well as in natural populations. Based on this observation, it was suggested that in unicellular non-diazotrophic cyanobacteria, BMAA might take part in nitrogen metabolism. In order to find out if BMAA has a similar role in diatoms, BMAA was added to two diatom species in culture, in concentrations corresponding to those earlier found in the diatoms. The results suggest that BMAA might induce a nitrogen starvation signal in diatoms, as was earlier observed in cyanobacteria. However, diatoms recover shortly by the extracellular presence of excreted ammonia. Thus, also in diatoms, BMAA might be involved in the nitrogen balance in the cell.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.