968 resultados para knowledge extraction
Resumo:
We report statistical time-series analysis tools providing improvements in the rapid, precision extraction of discrete state dynamics from time traces of experimental observations of molecular machines. By building physical knowledge and statistical innovations into analysis tools, we provide techniques for estimating discrete state transitions buried in highly correlated molecular noise. We demonstrate the effectiveness of our approach on simulated and real examples of steplike rotation of the bacterial flagellar motor and the F1-ATPase enzyme. We show that our method can clearly identify molecular steps, periodicities and cascaded processes that are too weak for existing algorithms to detect, and can do so much faster than existing algorithms. Our techniques represent a step in the direction toward automated analysis of high-sample-rate, molecular-machine dynamics. Modular, open-source software that implements these techniques is provided.
Resumo:
We show a new method for term extraction from a domain relevant corpus using natural language processing for the purposes of semi-automatic ontology learning. Literature shows that topical words occur in bursts. We find that the ranking of extracted terms is insensitive to the choice of population model, but calculating frequencies relative to the burst size rather than the document length in words yields significantly different results.
Resumo:
The paper presents an approach to extraction of facts from texts of documents. This approach is based on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually extracted facts combine into one structure the dictionary lexical objects found in the text and match them against concepts of subject domain ontology.
Resumo:
Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.
Resumo:
Within the framework of heritage preservation, 3D scanning and modeling for heritage documentation has increased significantly in recent years, mainly due to the evolution of laser and image-based techniques, modeling software, powerful computers and virtual reality. 3D laser acquisition constitutes a real development opportunity for 3D modeling based previously on theoretical data. The representation of the object information rely on the knowledge of its historic and theoretical frame to reconstitute a posteriori its previous states. This project proposes an approach dealing with data extraction based on architectural knowledge and Laser statement informing measurements, the whole leading to 3D reconstruction. The experimented Khmer objects are exposed at Guimet museum in Paris. The purpose of this digital modeling meets the need of exploitable models for simulation projects, prototyping, exhibitions, promoting cultural tourism and particularly for archiving against any likely disaster and as an aided tool for the formulation of virtual museum concept.
Resumo:
With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.
The neurotoxin β-N-methylamino-L-alanine (BMAA) : Sources, bioaccumulation and extraction procedures
Resumo:
β-methylamino-L-alanine (BMAA) is a neurotoxin linked to neurodegeneration, which is manifested in the devastating human diseases amyotrophic lateral sclerosis, Alzheimer’s and Parkinson’s disease. This neurotoxin is known to be produced by almost all tested species within the cyanobacterial phylum including free living as well as the symbiotic strains. The global distribution of the BMAA producers ranges from a terrestrial ecosystem on the Island of Guam in the Pacific Ocean to an aquatic ecosystem in Northern Europe, the Baltic Sea, where annually massive surface blooms occur. BMAA had been shown to accumulate in the Baltic Sea food web, with highest levels in the bottom dwelling fish-species as well as in mollusks. One of the aims of this thesis was to test the bottom-dwelling bioaccumulation hypothesis by using a larger number of samples allowing a statistical evaluation. Hence, a large set of fish individuals from the lake Finjasjön, were caught and the BMAA concentrations in different tissues were related to the season of catching, fish gender, total weight and species. The results reveal that fish total weight and fish species were positively correlated with BMAA concentration in the fish brain. Therefore, significantly higher concentrations of BMAA in the brain were detected in plankti-benthivorous fish species and heavier (potentially older) individuals. Another goal was to investigate the potential production of BMAA by other phytoplankton organisms. Therefore, diatom cultures were investigated and confirmed to produce BMAA, even in higher concentrations than cyanobacteria. All diatom cultures studied during this thesis work were show to contain BMAA, as well as one dinoflagellate species. This might imply that the environmental spread of BMAA in aquatic ecosystems is even higher than previously thought. Earlier reports on the concentration of BMAA in different organisms have shown highly variable results and the methods used for quantification have been intensively discussed in the scientific community. In the most recent studies, liquid chromatography-tandem mass spectrometry (LC-MS/MS) has become the instrument of choice, due to its high sensitivity and selectivity. Even so, different studies show quite variable concentrations of BMAA. In this thesis, three of the most common BMAA extraction protocols were evaluated in order to find out if the extraction could be one of the sources of variability. It was found that the method involving precipitation of proteins using trichloroacetic acid gave the best performance, complying with all in-house validation criteria. However, extractions of diatom and cyanobacteria cultures with this validated method and quantified using LC-MS/MS still resulted in variable BMAA concentrations, which suggest that also biological reasons contribute to the discrepancies. The current knowledge on the environmental factors that can induce or reduce BMAA production is still limited. In cyanobacteria, production of BMAA was earlier shown to be negative correlated with nitrogen availability – both in laboratory cultures as well as in natural populations. Based on this observation, it was suggested that in unicellular non-diazotrophic cyanobacteria, BMAA might take part in nitrogen metabolism. In order to find out if BMAA has a similar role in diatoms, BMAA was added to two diatom species in culture, in concentrations corresponding to those earlier found in the diatoms. The results suggest that BMAA might induce a nitrogen starvation signal in diatoms, as was earlier observed in cyanobacteria. However, diatoms recover shortly by the extracellular presence of excreted ammonia. Thus, also in diatoms, BMAA might be involved in the nitrogen balance in the cell.
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.