16 resultados para extraction procedures
Resumo:
The automatic acquisition of lexical associations from corpora is a crucial issue for Natural Language Processing. A lexical association is a recurrent combination of words that co-occur together more often than expected by chance in a given domain. In fact, lexical associations define linguistic phenomena such as idiomes, collocations or compound words. Due to the fact that the sense of a lexical association is not compositionnal, their identification is fundamental for the realization of analysis and synthesis that take into account all the subtilities of the language. In this report, we introduce a new statistically-based architecture that extracts from naturally occurring texts contiguous and non contiguous. For that purpose, three new concepts have been defined : the positional N-gram models, the Mutual Expectation and the GenLocalMaxs algorithm. Thus, the initial text is fisrtly transformed in a set of positionnal N-grams i.e ordered vectors of simple lexical units. Then, an association measure, the Mutual Expectation, evaluates the degree of cohesion of each positional N-grams based on the identification of local maximum values of Mutual Expectation. Great efforts have also been carried out to evaluate our metodology. For that purpose, we have proposed the normalisation of five well-known association measures and shown that both the Mutual Expectation and the GenLocalMaxs algorithm evidence significant improvements comparing to existent metodologies.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Conservação e Restauro
Resumo:
Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science
Resumo:
Dissertation to obtain a Master Degree in Biotechnology
Resumo:
Dissertação para obtenção do Grau de Doutor em Engenharia Electrotécnica e de Computadores
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
A Masters Thesis, presented as part of the requirements for the award of a Research Masters Degree in Economics from NOVA – School of Business and Economics
Resumo:
Dissertação para a obtenção do Grau de Mestre em Engenharia Química e Bioquímica
Resumo:
Phenolic acids are aromatic secondary plant metabolites, widely spread throughout the plant kingdom. Due to their biological and pharmacological properties, they have been playing an important role in phytotherapy and consequently techniques for their separation and purification are in need. This thesis aims at exploring new sustainable separation processes based on ionic liquids (ILs) in the extraction of biologically active phenolic acids. For that purpose, three phenolic acids with similar chemical structures were selected: cinnamic acid, p-coumaric acid and caffeic acid. In the last years, it has been shown that ionic liquids-based aqueous biphasic systems (ABSs) are valid alternatives for the extraction, recovery and purification of biomolecules when compared to conventional ABS or extractions carried out with organic solvents. In particular, cholinium-based ILs represent a clear step towards a greener chemistry, while providing means for the implementation of efficient techniques for the separation and purification of biomolecules. In this work, ABSs were implemented using cholinium carboxylate ILs using either high charge density inorganic salt (K3PO4) or polyethylene glycol (PEG) to promote the phase separation of aqueous solutions containing three different phenolic acids. These systems allow for the evaluation of effect of chemical structure of the anion on the extraction efficiency. Only one imidazolium-based IL was used in order to establish the effect of the cation chemical structure. The selective extraction of one single acid was also researched. Overall, it was observed that phenolic acids display very complex behaviours in aqueous solutions, from dimerization to polymerization and also hetero-association are quite frequent phenomena, depending on the pH conditions. These phenomena greatly hinder the correct quantification of these acids in solution.
Resumo:
This project aims to prepare Worten Empresas (WE) fulfilling the increasing market demand through process changings, focusing on the Portuguese market, particularly on internal B2B clients1. Several methods were used to measure the current service level provided - process mapping, resources assessment, benchmark and a survey. The results were then used to compare against service level actually desired by WE’s customer, and then to identify the performance gaps in response times and quality of the follow-up during the sales process. To bridge the identified gaps, both a set of recommendations and an implementation plan were suggested to improve and monitor customer experience. This study concluded that it is possible to fulfill the increasing level of demand and at the same time improve customer satisfaction by implementing changes at the operations level.
Resumo:
The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.
Resumo:
Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving.
Resumo:
Application of Experimental Design techniques has proven to be essential in various research fields, due to its statistical capability of processing the effect of interactions among independent variables, known as factors, in a system’s response. Advantages of this methodology can be summarized in more resource and time efficient experimentations while providing more accurate results. This research emphasizes the quantification of 4 antioxidants extraction, at two different concentration, prepared according to an experimental procedure and measured by a Photodiode Array Detector. Experimental planning was made following a Central Composite Design, which is a type of DoE that allows to consider the quadratic component in Response Surfaces, a component that includes pure curvature studies on the model produced. This work was executed with the intention of analyzing responses, peak areas obtained from chromatograms plotted by the Detector’s system, and comprehending if the factors considered – acquired from an extensive literary review – produced the expected effect in response. Completion of this work will allow to take conclusions regarding what factors should be considered for the optimization studies of antioxidants extraction in a Oca (Oxalis tuberosa) matrix.
Resumo:
The world energy consumption is expected to increase strongly in coming years, because of the emerging economies. Biomass is the only renewable carbon resource that is abundant enough to be used as a source of energy Grape pomace is one of the most abundant agro-industrial residues in the world, being a good biomass resource. The aim of this work is the valorization of grape pomace from white grapes (WWGP) and from red grapes (RWGP), through the extraction of phenolic compounds with antioxidant activity, as well as through the extraction/hydrolysis of carbohydrates, using subcritical water, or hot compressed water (HCW). The main focus of this work is the optimization of the process for WWGP, while for RWGP only one set of parameters were tested. The temperatures used were 170, 190 and 210 °C for WWGP, and 180 °C for RWGP. The water flow rates were 5 and 10 mL/min, and the pressure was always kept at 100 bar. Before performing HCW assays, both residues were characterized, revealing that WWGP is very rich in free sugars (around 40%) essentially glucose and fructose, while RWGP has higher contents of structural sugars, lignin, lipids and protein. For WWGP the best results were achieved at 210 °C and 10 mL/min: higher yield in water soluble compounds (69 wt.%), phenolics extraction (26.2 mg/g) and carbohydrates recovery (49.3 wt.% relative to the existing 57.8%). For RWGP the conditions were not optimized (180 °C and 5 mL/min), and the values of the yield in water soluble compounds (25 wt.%), phenolics extraction (19.5 mg/g) and carbohydrates recovery (11.4 wt.% relative to the existing 33.5%) were much lower. The antioxidant activity of the HCW extracts from each assay was determined, the best result being obtained for WWGP, namely for extracts obtained at 210 °C (EC50=20.8 μg/mL; EC50 = half maximum effective concentration; EC50 = 22.1 μg/mL for RWGP, at 180 ºC).
Resumo:
Based in internet growth, through semantic web, together with communication speed improvement and fast development of storage device sizes, data and information volume rises considerably every day. Because of this, in the last few years there has been a growing interest in structures for formal representation with suitable characteristics, such as the possibility to organize data and information, as well as the reuse of its contents aimed for the generation of new knowledge. Controlled Vocabulary, specifically Ontologies, present themselves in the lead as one of such structures of representation with high potential. Not only allow for data representation, as well as the reuse of such data for knowledge extraction, coupled with its subsequent storage through not so complex formalisms. However, for the purpose of assuring that ontology knowledge is always up to date, they need maintenance. Ontology Learning is an area which studies the details of update and maintenance of ontologies. It is worth noting that relevant literature already presents first results on automatic maintenance of ontologies, but still in a very early stage. Human-based processes are still the current way to update and maintain an ontology, which turns this into a cumbersome task. The generation of new knowledge aimed for ontology growth can be done based in Data Mining techniques, which is an area that studies techniques for data processing, pattern discovery and knowledge extraction in IT systems. This work aims at proposing a novel semi-automatic method for knowledge extraction from unstructured data sources, using Data Mining techniques, namely through pattern discovery, focused in improving the precision of concept and its semantic relations present in an ontology. In order to verify the applicability of the proposed method, a proof of concept was developed, presenting its results, which were applied in building and construction sector.