903 resultados para Sentence extraction
Resumo:
Nous présentons une méthode hybride pour le résumé de texte, en combinant l'extraction de phrases et l'élagage syntaxique des phrases extraites. L'élagage syntaxique est effectué sur la base d’une analyse complète des phrases selon un parseur de dépendances, analyse réalisée par la grammaire développée au sein d'un logiciel commercial de correction grammaticale, le Correcteur 101. Des sous-arbres de l'analyse syntaxique sont supprimés quand ils sont identifiés par les relations ciblées. L'analyse est réalisée sur un corpus de divers textes. Le taux de réduction des phrases extraites est d’en moyenne environ 74%, tout en conservant la grammaticalité ou la lisibilité dans une proportion de plus de 64%. Étant donné ces premiers résultats sur un ensemble limité de relations syntaxiques, cela laisse entrevoir des possibilités pour une application de résumé automatique de texte.
Resumo:
Cette thèse présente le résultat de plusieurs années de recherche dans le domaine de la génération automatique de résumés. Trois contributions majeures, présentées sous la forme d'articles publiés ou soumis pour publication, en forment le coeur. Elles retracent un cheminement qui part des méthodes par extraction en résumé jusqu'aux méthodes par abstraction. L'expérience HexTac, sujet du premier article, a d'abord été menée pour évaluer le niveau de performance des êtres humains dans la rédaction de résumés par extraction de phrases. Les résultats montrent un écart important entre la performance humaine sous la contrainte d'extraire des phrases du texte source par rapport à la rédaction de résumés sans contrainte. Cette limite à la rédaction de résumés par extraction de phrases, observée empiriquement, démontre l'intérêt de développer d'autres approches automatiques pour le résumé. Nous avons ensuite développé un premier système selon l'approche Fully Abstractive Summarization, qui se situe dans la catégorie des approches semi-extractives, comme la compression de phrases et la fusion de phrases. Le développement et l'évaluation du système, décrits dans le second article, ont permis de constater le grand défi de générer un résumé facile à lire sans faire de l'extraction de phrases. Dans cette approche, le niveau de compréhension du contenu du texte source demeure insuffisant pour guider le processus de sélection du contenu pour le résumé, comme dans les approches par extraction de phrases. Enfin, l'approche par abstraction basée sur des connaissances nommée K-BABS est proposée dans un troisième article. Un repérage des éléments d'information pertinents est effectué, menant directement à la génération de phrases pour le résumé. Cette approche a été implémentée dans le système ABSUM, qui produit des résumés très courts mais riches en contenu. Ils ont été évalués selon les standards d'aujourd'hui et cette évaluation montre que des résumés hybrides formés à la fois de la sortie d'ABSUM et de phrases extraites ont un contenu informatif significativement plus élevé qu'un système provenant de l'état de l'art en extraction de phrases.
Resumo:
Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an extractive summary. The graph or network representing one piece of text consists of nodes corresponding to sentences, while edges connect sentences that share common meaningful nouns. Because various metrics could be used, we developed a set of 14 summarizers, generically referred to as CN-Summ, employing network concepts such as node degree, length of shortest paths, d-rings and k-cores. An additional summarizer was created which selects the highest ranked sentences in the 14 systems, as in a voting system. When applied to a corpus of Brazilian Portuguese texts, some CN-Summ versions performed better than summarizers that do not employ deep linguistic knowledge, with results comparable to state-of-the-art summarizers based on expensive linguistic resources. The use of complex networks to represent texts appears therefore as suitable for automatic summarization, consistent with the belief that the metrics of such networks may capture important text features. (c) 2008 Elsevier Inc. All rights reserved.
Resumo:
Estudi elaborat a partir d’una estada a Xerox Research Centre Europe a Grenoble, França,entre juny i desembre del 2006. El projecte tradueïx termes tècnics anglesos a noruec. És asimètric perquè no tenim recursos lingüístics per a la llengua noruega, però solament per a l'anglès. S’ha desenvolupat i posat en pràctica mètodes que comprovaven contigüitat ("local reordering" i permutació selectiva) per a millorar el funcionament d’una eina anterior. Contigüitat és quan una paraula es traduïx en paraules múltiples, aquestes paraules han de ser adjacents en l'oració. A més, s’ha construït una taula de les operacions de recerca per als termes tècnics i s’ha integrat aquesta taula en un programa de demostració.
Resumo:
Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.
Resumo:
Using the eye-movement monitoring technique in two reading comprehension experiments, we investigated the timing of constraints on wh-dependencies (so-called ‘island’ constraints) in native and nonnative sentence processing. Our results show that both native and nonnative speakers of English are sensitive to extraction islands during processing, suggesting that memory storage limitations affect native and nonnative comprehenders in essentially the same way. Furthermore, our results show that the timing of island effects in native compared to nonnative sentence comprehension is affected differently by the type of cue (semantic fit versus filled gaps) signalling whether dependency formation is possible at a potential gap site. Whereas English native speakers showed immediate sensitivity to filled gaps but not to lack of semantic fit, proficient German-speaking learners of L2 English showed the opposite sensitivity pattern. This indicates that initial wh-dependency formation in nonnative processing is based on semantic feature-matching rather than being structurally mediated as in native comprehension.
Resumo:
Motivation: In molecular biology, molecular events describe observable alterations of biomolecules, such as binding of proteins or RNA production. These events might be responsible for drug reactions or development of certain diseases. As such, biomedical event extraction, the process of automatically detecting description of molecular interactions in research articles, attracted substantial research interest recently. Event trigger identification, detecting the words describing the event types, is a crucial and prerequisite step in the pipeline process of biomedical event extraction. Taking the event types as classes, event trigger identification can be viewed as a classification task. For each word in a sentence, a trained classifier predicts whether the word corresponds to an event type and which event type based on the context features. Therefore, a well-designed feature set with a good level of discrimination and generalization is crucial for the performance of event trigger identification. Results: In this article, we propose a novel framework for event trigger identification. In particular, we learn biomedical domain knowledge from a large text corpus built from Medline and embed it into word features using neural language modeling. The embedded features are then combined with the syntactic and semantic context features using the multiple kernel learning method. The combined feature set is used for training the event trigger classifier. Experimental results on the golden standard corpus show that >2.5% improvement on F-score is achieved by the proposed framework when compared with the state-of-the-art approach, demonstrating the effectiveness of the proposed framework. © 2014 The Author 2014. The source code for the proposed framework is freely available and can be downloaded at http://cse.seu.edu.cn/people/zhoudeyu/ETI_Sourcecode.zip.
Resumo:
To detect the presence of male DNA in vaginal samples collected from survivors of sexual violence and stored on filter paper. A pilot study was conducted to evaluate 10 vaginal samples spotted on sterile filter paper: 6 collected at random in April 2009 and 4 in October 2010. Time between sexual assault and sample collection was 4-48hours. After drying at room temperature, the samples were placed in a sterile envelope and stored for 2-3years until processing. DNA extraction was confirmed by polymerase chain reaction for human β-globin, and the presence of prostate-specific antigen (PSA) was quantified. The presence of the Y chromosome was detected using primers for sequences in the TSPY (Y7/Y8 and DYS14) and SRY genes. β-Globin was detected in all 10 samples, while 2 samples were positive for PSA. Half of the samples amplified the Y7/Y8 and DYS14 sequences of the TSPY gene and 30% amplified the SRY gene sequence of the Y chromosome. Four male samples and 1 female sample served as controls. Filter-paper spots stored for periods of up to 3years proved adequate for preserving genetic material from vaginal samples collected following sexual violence.
Resumo:
In the current study, a new approach has been developed for correcting the effect that moisture reduction after virgin olive oil (VOO) filtration exerts on the apparent increase of the secoiridoid content by using an internal standard during extraction. Firstly, two main Spanish varieties (Picual and Hojiblanca) were submitted to industrial filtration of VOOs. Afterwards, the moisture content was determined in unfiltered and filtered VOOs, and liquid-liquid extraction of phenolic compounds was performed using different internal standards. The resulting extracts were analyzed by HPLC-ESI-TOF/MS, in order to gain maximum information concerning the phenolic profiles of the samples under study. The reduction effect of filtration on the moisture content, phenolic alcohols, and flavones was confirmed at the industrial scale. Oleuropein was chosen as internal standard and, for the first time, the apparent increase of secoiridoids in filtered VOO was corrected, using a correction coefficient (Cc) calculated from the variation of internal standard area in filtered and unfiltered VOO during extraction. This approach gave the real concentration of secoiridoids in filtered VOO, and clarified the effect of the filtration step on the phenolic fraction. This finding is of great importance for future studies that seek to quantify phenolic compounds in VOOs.
Resumo:
Originally from Asia, Dovyalis hebecarpa is a dark purple/red exotic berry now also produced in Brazil. However, no reports were found in the literature about phenolic extraction or characterisation of this berry. In this study we evaluate the extraction optimisation of anthocyanins and total phenolics in D. hebecarpa berries aiming at the development of a simple and mild analytical technique. Multivariate analysis was used to optimise the extraction variables (ethanol:water:acetone solvent proportions, times, and acid concentrations) at different levels. Acetone/water (20/80 v/v) gave the highest anthocyanin extraction yield, but pure water and different proportions of acetone/water or acetone/ethanol/water (with >50% of water) were also effective. Neither acid concentration nor time had a significant effect on extraction efficiency allowing to fix the recommended parameters at the lowest values tested (0.35% formic acid v/v, and 17.6 min). Under optimised conditions, extraction efficiencies were increased by 31.5% and 11% for anthocyanin and total phenolics, respectively as compared to traditional methods that use more solvent and time. Thus, the optimised methodology increased yields being less hazardous and time consuming than traditional methods. Finally, freeze-dried D. hebecarpa showed high content of target phytochemicals (319 mg/100g and 1,421 mg/100g of total anthocyanin and total phenolic content, respectively).
Resumo:
Extraction processes are largely used in many chemical, biotechnological and pharmaceutical industries for recovery of bioactive compounds from medicinal plants. To replace the conventional extraction techniques, new techniques as high-pressure extraction processes that use environment friendly solvents have been developed. However, these techniques, sometimes, are associated with low extraction rate. The ultrasound can be effectively used to improve the extraction rate by the increasing the mass transfer and possible rupture of cell wall due the formation of microcavities leading to higher product yields with reduced processing time and solvent consumption. This review presents a brief survey about the mechanism and aspects that affecting the ultrasound assisted extraction focusing on the use of ultrasound irradiation for high-pressure extraction processes intensification.
Resumo:
Purified genomic DNA can be difficult to obtain from some plant species because of the presence of impurities such as polysaccharides, which are often co-extracted with DNA. In this study, we developed a fast, simple, and low-cost protocol for extracting DNA from plants containing high levels of secondary metabolites. This protocol does not require the use of volatile toxic reagents such as mercaptoethanol, chloroform, or phenol and allows the extraction of high-quality DNA from wild and cultivated tropical species.
Resumo:
Extracts from malagueta pepper (Capsicum frutescens L.) were obtained using supercritical fluid extraction (SFE) assisted by ultrasound, with carbon dioxide as solvent at 15MPa and 40°C. The SFE global yield increased up to 77% when ultrasound waves were applied, and the best condition of ultrasound-assisted extraction was ultrasound power of 360W applied during 60min. Four capsaicinoids were identified in the extracts and quantified by high performance liquid chromatography. The use of ultrasonic waves did not influence significantly the capsaicinoid profiles and the phenolic content of the extracts. However, ultrasound has enhanced the SFE rate. A model based on the broken and intact cell concept was adequate to represent the extraction kinetics and estimate the mass transfer coefficients, which were increased with ultrasound. Images obtained by field emission scanning electron microscopy showed that the action of ultrasonic waves did not cause cracks on the cell wall surface. On the other hand, ultrasound promoted disturbances in the vegetable matrix, leading to the release of extractable material on the solid surface. The effects of ultrasound were more significant on SFE from larger solid particles.
Resumo:
This work encompasses a direct and coherent strategy to synthesise a molecularly imprinted polymer (MIP) capable of extracting fluconazole from its sample. The MIP was successfully prepared from methacrylic acid (functional monomer), ethyleneglycoldimethacrylate (crosslinker) and acetonitrile (porogenic solvent) in the presence of fluconazole as the template molecule through a non-covalent approach. The non-imprinted polymer (NIP) was prepared following the same synthetic scheme, but in the absence of the template. The data obtained from scanning electronic microscopy, infrared spectroscopy, thermogravimetric and nitrogen Brunauer-Emmett-Teller plot helped to elucidate the structural as well as the morphological characteristics of the MIP and NIP. The application of MIP as a sorbent was demonstrated by packing it in solid phase extraction cartridges to extract fluconazole from commercial capsule samples through an offline analytical procedure. The quantification of fluconazole was accomplished through UPLC-MS, which resulted in LOD≤1.63×10(-10) mM. Furthermore, a high percentage recovery of 91±10% (n=9) was obtained. The ability of the MIP for selective recognition of fluconazole was evaluated by comparison with the structural analogues, miconazole, tioconazole and secnidazole, resulting in percentage recoveries of 51, 35 and 32%, respectively.
Resumo:
The aim of this work is to obtain, purify and characterize biochemically a peroxidase from Copaifera langsdorffii leaves (COP). COP was obtained by acetone precipitation followed by ion-exchange chromatography. Purification yielded 3.5% of peroxidase with the purification factor of 46.86. The COP optimum pH is 6.0 and the temperature is 35 ºC. COP was stable in the pH range of 4.5 to 9.3 and at temperatures below 50.0 ºC. The apparent Michaelis-Menten constants (Km) for guaiacol and H2O2 were 0.04 mM and 0.39 mM respectively. Enzyme turnover was 0.075 s-1 for guaiacol and 0.28 s-1 for hydrogen peroxide. Copaifera langsdorffii leaves showed to be a rich source of active peroxidase (COP) during the whole year. COP could replace HRP, the most used peroxidase, in analytical determinations and treatment of industrial effluents at low cost.