5 resultados para Natural language generation
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
With the increasing production of information from e-government initiatives, there is also the need to transform a large volume of unstructured data into useful information for society. All this information should be easily accessible and made available in a meaningful and effective way in order to achieve semantic interoperability in electronic government services, which is a challenge to be pursued by governments round the world. Our aim is to discuss the context of e-Government Big Data and to present a framework to promote semantic interoperability through automatic generation of ontologies from unstructured information found in the Internet. We propose the use of fuzzy mechanisms to deal with natural language terms and present some related works found in this area. The results achieved in this study are based on the architectural definition and major components and requirements in order to compose the proposed framework. With this, it is possible to take advantage of the large volume of information generated from e-Government initiatives and use it to benefit society.
Resumo:
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
While the use of statistical physics methods to analyze large corpora has been useful to unveil many patterns in texts, no comprehensive investigation has been performed on the interdependence between syntactic and semantic factors. In this study we propose a framework for determining whether a text (e.g., written in an unknown alphabet) is compatible with a natural language and to which language it could belong. The approach is based on three types of statistical measurements, i.e. obtained from first-order statistics of word properties in a text, from the topology of complex networks representing texts, and from intermittency concepts where text is treated as a time series. Comparative experiments were performed with the New Testament in 15 different languages and with distinct books in English and Portuguese in order to quantify the dependency of the different measurements on the language and on the story being told in the book. The metrics found to be informative in distinguishing real texts from their shuffled versions include assortativity, degree and selectivity of words. As an illustration, we analyze an undeciphered medieval manuscript known as the Voynich Manuscript. We show that it is mostly compatible with natural languages and incompatible with random texts. We also obtain candidates for keywords of the Voynich Manuscript which could be helpful in the effort of deciphering it. Because we were able to identify statistical measurements that are more dependent on the syntax than on the semantics, the framework may also serve for text analysis in language-dependent applications.
Resumo:
Objective: In order to gain further insight into the function of the enteric adenovirus short fiber (SF), we have constructed a recombinant dodecahedron containing the SF protein of HAdV-41 and the HAdV-3 penton base. Methods: Recombinant baculoviruses expressing the HAdV-41 SF protein and HAdV-3 penton base were cloned and amplified in Sf9 insect cells. Recombinant dodecahedra were expressed by coinfection of High Five (TM) cells with both baculoviruses, 72 h post-infection. Cell lysate was centrifuged on sucrose density gradient and the purified recombinant dodecahedra were recovered. Results: Analysis by negative staining electron microscopy demonstrated that chimeric dodecahedra made of the HAdV-3 penton base and decorated with the HAdV-41 SF were successfully generated. Next, recombinant dodecahedra were digested with pepsin and analyzed by Western blot. A 'site-specific' proteolysis of the HAdV-41 SF was observed, while the HAdV-3 penton base core was completely digested. Conclusion: These results show that, in vitro, the HAdV-41 SF likely undergoes proteolysis in the gastrointestinal tract, its natural environment, which may facilitate the recognition of receptors in intestinal cells. The results obtained in the present study may be the basis for the development of gene therapy vectors towards the intestinal epithelium, as well as orally administered vaccine vectors, but also for the HAdV-41 SF partner identification. Copyright (C) 2011 S. Karger AG, Basel
Resumo:
Micro-gas turbines are a good alternative for on-site power generation, since their operation is very reliable. The possibility of operating with various fuels increases versatility and, as a result, the usage of these devices. Focusing on a performance improvement of a tri-fuel low-cost micro-gas turbine, this work presents investigations of the inner flow of its combustion chamber. The aim of this analysis was the characterization of the flame structure by the temperature field of the chamber inner flow. The chamber was fuelled with natural gas. In the current chamber, a swirler and a reversed flow configuration were utilized to provide flame stabilization. The inner flow investigations were done with numerical analysis, which were compared to experimental data. The analysis of the inner flow was done with numerical simulations, which used the RSM turbulence model. A β-PDF equilibrium model was adopted to account for the turbulent combustion process. Different models of heat transfer were compared. Thermal radiation and specially heat conduction in the liner walls played significant roles on results.