906 resultados para Text mining, Classificazione, Stemming, Text categorization


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Mount Antero/White area is a popular prospecting area. Recent expansions in the recreation economy is drawing more visitors to the area. Consequently, visitors may be placing unsustainable pressures on the landscape. In order to help rectify this, the legal, ecological, geologic, aesthetic, recreational, historic, social, and economic character of the Antero/White area has been identified. Four feasible management alternatives have also been recognized. They are a) take no new management actions, b) prohibit motorized activities in the area, c) develop a mineralogical park, and d) a combination of options b and c. Option C has been defended, as it best balances the desires of area users with the underlying ecological and geological character of the area.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we address two issues. The first one analyzes whether the performance of a text summarization method depends on the topic of a document. The second one is concerned with how certain linguistic properties of a text may affect the performance of a number of automatic text summarization methods. For this we consider semantic analysis methods, such as textual entailment and anaphora resolution, and we study how they are related to proper noun, pronoun and noun ratios calculated over original documents that are grouped into related topics. Given the obtained results, we can conclude that although our first hypothesis is not supported, since it has been found no evident relationship between the topic of a document and the performance of the methods employed, adapting summarization systems to the linguistic properties of input documents benefits the process of summarization.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article analyzes the appropriateness of a text summarization system, COMPENDIUM, for generating abstracts of biomedical papers. Two approaches are suggested: an extractive (COMPENDIUM E), which only selects and extracts the most relevant sentences of the documents, and an abstractive-oriented one (COMPENDIUM E–A), thus facing also the challenge of abstractive summarization. This novel strategy combines extractive information, with some pieces of information of the article that have been previously compressed or fused. Specifically, in this article, we want to study: i) whether COMPENDIUM produces good summaries in the biomedical domain; ii) which summarization approach is more suitable; and iii) the opinion of real users towards automatic summaries. Therefore, two types of evaluation were performed: quantitative and qualitative, for evaluating both the information contained in the summaries, as well as the user satisfaction. Results show that extractive and abstractive-oriented summaries perform similarly as far as the information they contain, so both approaches are able to keep the relevant information of the source documents, but the latter is more appropriate from a human perspective, when a user satisfaction assessment is carried out. This also confirms the suitability of our suggested approach for generating summaries following an abstractive-oriented paradigm.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper describes a module for the prediction of emotions in text chats in Spanish, oriented to its use in specific-domain text-to-speech systems. A general overview of the system is given, and the results of some evaluations carried out with two corpora of real chat messages are described. These results seem to indicate that this system offers a performance similar to other systems described in the literature, for a more complex task than other systems (identification of emotions and emotional intensity in the chat domain).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we present a Text Summarisation tool, compendium, capable of generating the most common types of summaries. Regarding the input, single- and multi-document summaries can be produced; as the output, the summaries can be extractive or abstractive-oriented; and finally, concerning their purpose, the summaries can be generic, query-focused, or sentiment-based. The proposed architecture for compendium is divided in various stages, making a distinction between core and additional stages. The former constitute the backbone of the tool and are common for the generation of any type of summary, whereas the latter are used for enhancing the capabilities of the tool. The main contributions of compendium with respect to the state-of-the-art summarisation systems are that (i) it specifically deals with the problem of redundancy, by means of textual entailment; (ii) it combines statistical and cognitive-based techniques for determining relevant content; and (iii) it proposes an abstractive-oriented approach for facing the challenge of abstractive summarisation. The evaluation performed in different domains and textual genres, comprising traditional texts, as well as texts extracted from the Web 2.0, shows that compendium is very competitive and appropriate to be used as a tool for generating summaries.