987 resultados para Concept Extraction


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microposts are small fragments of social media content that have been published using a lightweight paradigm (e.g. Tweets, Facebook likes, foursquare check-ins). Microposts have been used for a variety of applications (e.g., sentiment analysis, opinion mining, trend analysis), by gleaning useful information, often using third-party concept extraction tools. There has been very large uptake of such tools in the last few years, along with the creation and adoption of new methods for concept extraction. However, the evaluation of such efforts has been largely consigned to document corpora (e.g. news articles), questioning the suitability of concept extraction tools and methods for Micropost data. This report describes the Making Sense of Microposts Workshop (#MSM2013) Concept Extraction Challenge, hosted in conjunction with the 2013 World Wide Web conference (WWW'13). The Challenge dataset comprised a manually annotated training corpus of Microposts and an unlabelled test corpus. Participants were set the task of engineering a concept extraction system for a defined set of concepts. Out of a total of 22 complete submissions 13 were accepted for presentation at the workshop; the submissions covered methods ranging from sequence mining algorithms for attribute extraction to part-of-speech tagging for Micropost cleaning and rule-based and discriminative models for token classification. In this report we describe the evaluation process and explain the performance of different approaches in different contexts.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

S’insérant dans les domaines de la Lecture et de l’Analyse de Textes Assistées par Ordinateur (LATAO), de la Gestion Électronique des Documents (GÉD), de la visualisation de l’information et, en partie, de l’anthropologie, cette recherche exploratoire propose l’expérimentation d’une méthodologie descriptive en fouille de textes afin de cartographier thématiquement un corpus de textes anthropologiques. Plus précisément, nous souhaitons éprouver la méthode de classification hiérarchique ascendante (CHA) pour extraire et analyser les thèmes issus de résumés de mémoires et de thèses octroyés de 1985 à 2009 (1240 résumés), par les départements d’anthropologie de l’Université de Montréal et de l’Université Laval, ainsi que le département d’histoire de l’Université Laval (pour les résumés archéologiques et ethnologiques). En première partie de mémoire, nous présentons notre cadre théorique, c'est-à-dire que nous expliquons ce qu’est la fouille de textes, ses origines, ses applications, les étapes méthodologiques puis, nous complétons avec une revue des principales publications. La deuxième partie est consacrée au cadre méthodologique et ainsi, nous abordons les différentes étapes par lesquelles ce projet fut conduit; la collecte des données, le filtrage linguistique, la classification automatique, pour en nommer que quelques-unes. Finalement, en dernière partie, nous présentons les résultats de notre recherche, en nous attardant plus particulièrement sur deux expérimentations. Nous abordons également la navigation thématique et les approches conceptuelles en thématisation, par exemple, en anthropologie, la dichotomie culture ̸ biologie. Nous terminons avec les limites de ce projet et les pistes d’intérêts pour de futures recherches.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a method for identifying concepts in microposts and classifying them into a predefined set of categories. The method relies on the DBpedia knowledge base to identify the types of the concepts detected in the messages. For those concepts that are not classified in the ontology we infer their types via the ontology properties which characterise the type.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In search to increase the offer of liquid, clean, renewable and sustainable energy in the world energy matrix, the use of lignocellulosic materials (LCMs) for bioethanol production arises as a valuable alternative. The objective of this work was to analyze and compare the performance of Saccharomyces cerevisiae, Pichia stipitis and Zymomonas mobilis in the production of bioethanol from coconut fibre mature (CFM) using different strategies: simultaneous saccharification and fermentation (SSF) and semi-simultaneous saccharification and fermentation (SSSF). The CFM was pretreated by hydrothermal pretreatment catalyzed with sodium hydroxide (HPCSH). The pretreated CFM was characterized by X-ray diffractometry and SEM, and the lignin recovered in the liquid phase by FTIR and TGA. After the HPCSH pretreatment (2.5% (v/v) sodium hydroxide at 180 °C for 30 min), the cellulose content was 56.44%, while the hemicellulose and lignin were reduced 69.04% and 89.13%, respectively. Following pretreatment, the obtained cellulosic fraction was submitted to SSF and SSSF. Pichia stipitis allowed for the highest ethanol yield 90.18% in SSSF, 91.17% and 91.03% were obtained with Saccharomyces cerevisiae and Zymomonas mobilis, respectively. It may be concluded that the selection of the most efficient microorganism for the obtention of high bioethanol production yields from cellulose pretreated by HPCSH depends on the operational strategy used and this pretreatment is an interesting alternative for add value of coconut fibre mature compounds (lignin, phenolics) being in accordance with the biorefinery concept.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Extracts from malagueta pepper (Capsicum frutescens L.) were obtained using supercritical fluid extraction (SFE) assisted by ultrasound, with carbon dioxide as solvent at 15MPa and 40°C. The SFE global yield increased up to 77% when ultrasound waves were applied, and the best condition of ultrasound-assisted extraction was ultrasound power of 360W applied during 60min. Four capsaicinoids were identified in the extracts and quantified by high performance liquid chromatography. The use of ultrasonic waves did not influence significantly the capsaicinoid profiles and the phenolic content of the extracts. However, ultrasound has enhanced the SFE rate. A model based on the broken and intact cell concept was adequate to represent the extraction kinetics and estimate the mass transfer coefficients, which were increased with ultrasound. Images obtained by field emission scanning electron microscopy showed that the action of ultrasonic waves did not cause cracks on the cell wall surface. On the other hand, ultrasound promoted disturbances in the vegetable matrix, leading to the release of extractable material on the solid surface. The effects of ultrasound were more significant on SFE from larger solid particles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the development of a prototype of a tubular linear induction motor applied to onshore oil exploitation, named MAT AE OS (which is the Portuguese acronym for Tubular Asynchronous Motor for Onshore Oil Exploitation). The function of this motor is to directly drive the sucker-rod pump installed in the down hole of the oil well. Considering the drawbacks and operational costs of the conventional oil extraction method, which is based on the walking beam and rod, string system, the developed prototype is intended to become a feasible alternative from both technical and economic points of view. At the present time, the MAT AE OS prototype is installed in a test bench at the Applied Electromagnetism Laboratory at the Escola Politecnica da Universidade de Sao Paulo. The complete testing system is controlled and supervised by special software, enabling good flexibility in operation, data acquisition, and performance analysis. The test results indicate that the motor develops a constant lift force along the pumping cycle, as shown by the measured dynamometric charts. Also, the evaluated electromechanical performance seems to be superior to that obtained by the traditional method. The system utilizing the MAT AE OS prototype allows the complete elimination of the rod string sets required by the conventional equipment, indicating that the new system may advantageously replace the surface mechanical components presently utilized.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Based in internet growth, through semantic web, together with communication speed improvement and fast development of storage device sizes, data and information volume rises considerably every day. Because of this, in the last few years there has been a growing interest in structures for formal representation with suitable characteristics, such as the possibility to organize data and information, as well as the reuse of its contents aimed for the generation of new knowledge. Controlled Vocabulary, specifically Ontologies, present themselves in the lead as one of such structures of representation with high potential. Not only allow for data representation, as well as the reuse of such data for knowledge extraction, coupled with its subsequent storage through not so complex formalisms. However, for the purpose of assuring that ontology knowledge is always up to date, they need maintenance. Ontology Learning is an area which studies the details of update and maintenance of ontologies. It is worth noting that relevant literature already presents first results on automatic maintenance of ontologies, but still in a very early stage. Human-based processes are still the current way to update and maintain an ontology, which turns this into a cumbersome task. The generation of new knowledge aimed for ontology growth can be done based in Data Mining techniques, which is an area that studies techniques for data processing, pattern discovery and knowledge extraction in IT systems. This work aims at proposing a novel semi-automatic method for knowledge extraction from unstructured data sources, using Data Mining techniques, namely through pattern discovery, focused in improving the precision of concept and its semantic relations present in an ontology. In order to verify the applicability of the proposed method, a proof of concept was developed, presenting its results, which were applied in building and construction sector.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Marine ecosystem can be considered a rather exploited source of natural substances with enormous bioactive potential. In Mexico macro-algae study remain forgotten for research and economic purposes besides the high amount of this resource along the west and east coast. For that reason the Bioferinery Group of the Autonomous University of Coahuila, have been studying the biorefinery concept in order to recover high value byproducts of Mexican brown macro-algae including polysaccharides and enzymes to be applied in food, pharmaceutical and energy industry. Brown macroalgae are an important source of fucoidan, alginate and laminarin which comprise a complex group of macromolecules with a wide range of important biological properties such as anticoagulant, antioxidant, antitumoral and antiviral and also as rich source of fermentable sugars for enzymes production. Additionally, specific enzymes able to degrade algae matrix (fucosidases, sulfatases, aliginases, etc) are important tools to establish structural characteristics and biological functions of these polysaccharides. The aims of the present work were the integral study of bioprocess for macroalgae biomass exploitation by the use of green technologies as hydrothermal extraction and solid state fermentation in order to produce polysaccharides and enzymes (fucoidan and fucoidan hydrolytic enzymes). This work comprises the use of the different bioprocess phases in order to produce high value products with lower time and wastes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The growing population on earth along with diminishing fossil deposits and the climate change debate calls out for a better utilization of renewable, bio-based materials. In a biorefinery perspective, the renewable biomass is converted into many different products such as fuels, chemicals, and materials, quite similar to the petroleum refinery industry. Since forests cover about one third of the land surface on earth, ligno-cellulosic biomass is the most abundant renewable resource available. The natural first step in a biorefinery is separation and isolation of the different compounds the biomass is comprised of. The major components in wood are cellulose, hemicellulose, and lignin, all of which can be made into various end-products. Today, focus normally lies on utilizing only one component, e.g., the cellulose in the Kraft pulping process. It would be highly desirable to utilize all the different compounds, both from an economical and environmental point of view. The separation process should therefore be optimized. Hemicelluloses can partly be extracted with hot-water prior to pulping. Depending in the severity of the extraction, the hemicelluloses are degraded to various degrees. In order to be able to choose from a variety of different end-products, the hemicelluloses should be as intact as possible after the extraction. The main focus of this work has been on preserving the hemicellulose molar mass throughout the extraction at a high yield by actively controlling the extraction pH at the high temperatures used. Since it has not been possible to measure pH during an extraction due to the high temperatures, the extraction pH has remained a “black box”. Therefore, a high-temperature in-line pH measuring system was developed, validated, and tested for hot-water wood extractions. One crucial step in the measurements is calibration, therefore extensive efforts was put on developing a reliable calibration procedure. Initial extractions with wood showed that the actual extraction pH was ~0.35 pH units higher than previously believed. The measuring system was also equipped with a controller connected to a pump. With this addition it was possible to control the extraction to any desired pH set point. When the pH dropped below the set point, the controller started pumping in alkali and by that the desired set point was maintained very accurately. Analyses of the extracted hemicelluloses showed that less hemicelluloses were extracted at higher pH but with a higher molar-mass. Monomer formation could, at a certain pH level, be completely inhibited. Increasing the temperature, but maintaining a specific pH set point, would speed up the extraction without degrading the molar-mass of the hemicelluloses and thereby intensifying the extraction. The diffusion of the dissolved hemicelluloses from the wood particle is a major part of the extraction process. Therefore, a particle size study ranging from 0.5 mm wood particles to industrial size wood chips was conducted to investigate the internal mass transfer of the hemicelluloses. Unsurprisingly, it showed that hemicelluloses were extracted faster from smaller wood particles than larger although it did not seem to have a substantial effect on the average molar mass of the extracted hemicelluloses. However, smaller particle sizes require more energy to manufacture and thus increases the economic cost. Since bark comprises 10 – 15 % of a tree, it is important to also consider it in a biorefinery concept. Spruce inner and outer bark was hot-water extracted separately to investigate the possibility to isolate the bark hemicelluloses. It was showed that the bark hemicelluloses comprised mostly of pectic material and differed considerably from the wood hemicelluloses. The bark hemicelluloses, or pectins, could be extracted at lower temperatures than the wood hemicelluloses. A chemical characterization, done separately on inner and outer bark, showed that inner bark contained over 10 % stilbene glucosides that could be extracted already at 100 °C with aqueous acetone.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sharing of information with those in need of it has always been an idealistic goal of networked environments. With the proliferation of computer networks, information is so widely distributed among systems, that it is imperative to have well-organized schemes for retrieval and also discovery. This thesis attempts to investigate the problems associated with such schemes and suggests a software architecture, which is aimed towards achieving a meaningful discovery. Usage of information elements as a modelling base for efficient information discovery in distributed systems is demonstrated with the aid of a novel conceptual entity called infotron.The investigations are focused on distributed systems and their associated problems. The study was directed towards identifying suitable software architecture and incorporating the same in an environment where information growth is phenomenal and a proper mechanism for carrying out information discovery becomes feasible. An empirical study undertaken with the aid of an election database of constituencies distributed geographically, provided the insights required. This is manifested in the Election Counting and Reporting Software (ECRS) System. ECRS system is a software system, which is essentially distributed in nature designed to prepare reports to district administrators about the election counting process and to generate other miscellaneous statutory reports.Most of the distributed systems of the nature of ECRS normally will possess a "fragile architecture" which would make them amenable to collapse, with the occurrence of minor faults. This is resolved with the help of the penta-tier architecture proposed, that contained five different technologies at different tiers of the architecture.The results of experiment conducted and its analysis show that such an architecture would help to maintain different components of the software intact in an impermeable manner from any internal or external faults. The architecture thus evolved needed a mechanism to support information processing and discovery. This necessitated the introduction of the noveI concept of infotrons. Further, when a computing machine has to perform any meaningful extraction of information, it is guided by what is termed an infotron dictionary.The other empirical study was to find out which of the two prominent markup languages namely HTML and XML, is best suited for the incorporation of infotrons. A comparative study of 200 documents in HTML and XML was undertaken. The result was in favor ofXML.The concept of infotron and that of infotron dictionary, which were developed, was applied to implement an Information Discovery System (IDS). IDS is essentially, a system, that starts with the infotron(s) supplied as clue(s), and results in brewing the information required to satisfy the need of the information discoverer by utilizing the documents available at its disposal (as information space). The various components of the system and their interaction follows the penta-tier architectural model and therefore can be considered fault-tolerant. IDS is generic in nature and therefore the characteristics and the specifications were drawn up accordingly. Many subsystems interacted with multiple infotron dictionaries that were maintained in the system.In order to demonstrate the working of the IDS and to discover the information without modification of a typical Library Information System (LIS), an Information Discovery in Library Information System (lDLIS) application was developed. IDLIS is essentially a wrapper for the LIS, which maintains all the databases of the library. The purpose was to demonstrate that the functionality of a legacy system could be enhanced with the augmentation of IDS leading to information discovery service. IDLIS demonstrates IDS in action. IDLIS proves that any legacy system could be augmented with IDS effectively to provide the additional functionality of information discovery service.Possible applications of IDS and scope for further research in the field are covered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Generalizing the notion of an eigenvector, invariant subspaces are frequently used in the context of linear eigenvalue problems, leading to conceptually elegant and numerically stable formulations in applications that require the computation of several eigenvalues and/or eigenvectors. Similar benefits can be expected for polynomial eigenvalue problems, for which the concept of an invariant subspace needs to be replaced by the concept of an invariant pair. Little has been known so far about numerical aspects of such invariant pairs. The aim of this paper is to fill this gap. The behavior of invariant pairs under perturbations of the matrix polynomial is studied and a first-order perturbation expansion is given. From a computational point of view, we investigate how to best extract invariant pairs from a linearization of the matrix polynomial. Moreover, we describe efficient refinement procedures directly based on the polynomial formulation. Numerical experiments with matrix polynomials from a number of applications demonstrate the effectiveness of our extraction and refinement procedures.