836 resultados para Text retrieval
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
A Work Project, presented as part of the requirements for the Award of a Masters Degree in Management from the NOVA – School of Business and Economics
Resumo:
The extraction of relevant terms from texts is an extensively researched task in Text- Mining. Relevant terms have been applied in areas such as Information Retrieval or document clustering and classification. However, relevance has a rather fuzzy nature since the classification of some terms as relevant or not relevant is not consensual. For instance, while words such as "president" and "republic" are generally considered relevant by human evaluators, and words like "the" and "or" are not, terms such as "read" and "finish" gather no consensus about their semantic and informativeness. Concepts, on the other hand, have a less fuzzy nature. Therefore, instead of deciding on the relevance of a term during the extraction phase, as most extractors do, I propose to first extract, from texts, what I have called generic concepts (all concepts) and postpone the decision about relevance for downstream applications, accordingly to their needs. For instance, a keyword extractor may assume that the most relevant keywords are the most frequent concepts on the documents. Moreover, most statistical extractors are incapable of extracting single-word and multi-word expressions using the same methodology. These factors led to the development of the ConceptExtractor, a statistical and language-independent methodology which is explained in Part I of this thesis. In Part II, I will show that the automatic extraction of concepts has great applicability. For instance, for the extraction of keywords from documents, using the Tf-Idf metric only on concepts yields better results than using Tf-Idf without concepts, specially for multi-words. In addition, since concepts can be semantically related to other concepts, this allows us to build implicit document descriptors. These applications led to published work. Finally, I will present some work that, although not published yet, is briefly discussed in this document.
Resumo:
Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving.
Resumo:
In the State of Amazonas, Brazil, urban expansion together with precarious basic sanitation conditions and human settlement on river banks has contributed to the persistence of waterborne and intestinal parasitic diseases. Time series of the recorded cases of cholera, typhoid fever, hepatitis A and leptospirosis are described, using data from different levels of the surveillance systems. The sources for intestinal parasitosis prevalence data (non-compulsory reporting in Brazil) were Medical Literature Analysis and Retrieval System Online (MEDLINE), Literatura Latino-Americana (LILACS) and the annals of major scientific meetings. Relevant papers and abstracts in all languages were accessed by two independent reviewers. The references cited by each relevant paper were scrutinized to locate additional papers. Despite its initial dissemination across the entire State of Amazonas, cholera was controlled in 1998. The magnitude of typhoid fever has decreased; however, a pattern characterized by eventual outbreaks still remains. Leptospirosis is an increasing cause of concern in association with the annual floods. The overall prevalence of intestinal parasites is high regardless of the municipality and the characteristics of areas and populations. The incidence of hepatitis A has decreased over the past decade. A comparison of older and recent surveys shows that the prevalence of intestinal parasitic diseases has remained constant. The load of waterborne and intestinal parasitic diseases ranks high among the health problems present in the State of Amazonas. Interventions aiming at basic sanitation and vaccination for hepatitis A were formulated and implemented, but assessment of their effectiveness in the targeted populations is still needed.
Resumo:
Abstract: An integrative literature review was conducted to synthesize available publications regarding the potential use of serological tests in leprosy programs. We searched the databases Literatura Latino-Americana e do Caribe em Ciências da Saúde, Índice Bibliográfico Espanhol em Ciências da Saúde, Acervo da Biblioteca da Organização Pan-Americana da Saúde, Medical Literature Analysis and Retrieval System Online, Hanseníase, National Library of Medicine, Scopus, Ovid, Cinahl, and Web of Science for articles investigating the use of serological tests for antibodies against phenolic glycolipid-I (PGL-I), ML0405, ML2331, leprosy IDRI diagnostic-1 (LID-1), and natural disaccharide octyl-leprosy IDRI diagnostic-1 (NDO-LID). From an initial pool of 3.514 articles, 40 full-length articles fulfilled our inclusion criteria. Based on these papers, we concluded that these antibodies can be used to assist in diagnosing leprosy, detecting neuritis, monitoring therapeutic efficacy, and monitoring household contacts or at-risk populations in leprosy-endemic areas. Thus, available data suggest that serological tests could contribute substantially to leprosy management.
Resumo:
In men with prior vasectomy, microsurgical reconstruction of the reproductive tract is more cost-effective than sperm retrieval with in vitro fertilization and intracytoplasmic sperm injection if the obstructive interval is less than 15 years and no female fertility risk factors are present. If epididymal obstruction is detected or advanced female age is present, the decision to use either microsurgical reconstruction or sperm retrieval with in vitro fertilization and intracytoplasmic sperm injection should be individualized. Sperm retrieval with in vitro fertilization and intracytoplasmic sperm injection is preferred to surgical treatment when female factors requiring in vitro fertilization are present or when the chance for success with sperm retrieval and intracytoplasmic sperm injection exceeds the chance for success with surgical treatment.
Resumo:
Search is now going beyond looking for factual information, and people wish to search for the opinions of others to help them in their own decision-making. Sentiment expressions or opinion expressions are used by users to express their opinion and embody important pieces of information, particularly in online commerce. The main problem that the present dissertation addresses is how to model text to find meaningful words that express a sentiment. In this context, I investigate the viability of automatically generating a sentiment lexicon for opinion retrieval and sentiment classification applications. For this research objective we propose to capture sentiment words that are derived from online users’ reviews. In this approach, we tackle a major challenge in sentiment analysis which is the detection of words that express subjective preference and domain-specific sentiment words such as jargon. To this aim we present a fully generative method that automatically learns a domain-specific lexicon and is fully independent of external sources. Sentiment lexicons can be applied in a broad set of applications, however popular recommendation algorithms have somehow been disconnected from sentiment analysis. Therefore, we present a study that explores the viability of applying sentiment analysis techniques to infer ratings in a recommendation algorithm. Furthermore, entities’ reputation is intrinsically associated with sentiment words that have a positive or negative relation with those entities. Hence, is provided a study that observes the viability of using a domain-specific lexicon to compute entities reputation. Finally, a recommendation system algorithm is improved with the use of sentiment-based ratings and entities reputation.
Resumo:
Actualmente, com a massificação da utilização das redes sociais, as empresas passam a sua mensagem nos seus canais de comunicação, mas os consumidores dão a sua opinião sobre ela. Argumentam, opinam, criticam (Nardi, Schiano, Gumbrecht, & Swartz, 2004). Positiva ou negativamente. Neste contexto o Text Mining surge como uma abordagem interessante para a resposta à necessidade de obter conhecimento a partir dos dados existentes. Neste trabalho utilizámos um algoritmo de Clustering hierárquico com o objectivo de descobrir temas distintos num conjunto de tweets obtidos ao longo de um determinado período de tempo para as empresas Burger King e McDonald’s. Com o intuito de compreender o sentimento associado a estes temas foi feita uma análise de sentimentos a cada tema encontrado, utilizando um algoritmo Bag-of-Words. Concluiu-se que o algoritmo de Clustering foi capaz de encontrar temas através do tweets obtidos, essencialmente ligados a produtos e serviços comercializados pelas empresas. O algoritmo de Sentiment Analysis atribuiu um sentimento a esses temas, permitindo compreender de entre os produtos/serviços identificados quais os que obtiveram uma polaridade positiva ou negativa, e deste modo sinalizar potencias situações problemáticas na estratégia das empresas, e situações positivas passíveis de identificação de decisões operacionais bem-sucedidas.
Resumo:
telligence applications for the banking industry. Searches were performed in relevant journals resulting in 219 articles published between 2002 and 2013. To analyze such a large number of manuscripts, text mining techniques were used in pursuit for relevant terms on both business intelligence and banking domains. Moreover, the latent Dirichlet allocation modeling was used in or- der to group articles in several relevant topics. The analysis was conducted using a dictionary of terms belonging to both banking and business intelli- gence domains. Such procedure allowed for the identification of relationships between terms and topics grouping articles, enabling to emerge hypotheses regarding research directions. To confirm such hypotheses, relevant articles were collected and scrutinized, allowing to validate the text mining proce- dure. The results show that credit in banking is clearly the main application trend, particularly predicting risk and thus supporting credit approval or de- nial. There is also a relevant interest in bankruptcy and fraud prediction. Customer retention seems to be associated, although weakly, with targeting, justifying bank offers to reduce churn. In addition, a large number of ar- ticles focused more on business intelligence techniques and its applications, using the banking industry just for evaluation, thus, not clearly acclaiming for benefits in the banking business. By identifying these current research topics, this study also highlights opportunities for future research.
Resumo:
The Childhood protection is a subject with high value for the society, but, the Child Abuse cases are difficult to identify. The process from suspicious to accusation is very difficult to achieve. It must configure very strong evidences. Typically, Health Care services deal with these cases from the beginning where there are evidences based on the diagnosis, but they aren’t enough to promote the accusation. Besides that, this subject it’s highly sensitive because there are legal aspects to deal with such as: the patient privacy, paternity issues, medical confidentiality, among others. We propose a Child Abuses critical knowledge monitor system model that addresses this problem. This decision support system is implemented with a multiple scientific domains: to capture of tokens from clinical documents from multiple sources; a topic model approach to identify the topics of the documents; knowledge management through the use of ontologies to support the critical knowledge sensibility concepts and relations such as: symptoms, behaviors, among other evidences in order to match with the topics inferred from the clinical documents and then alert and log when clinical evidences are present. Based on these alerts clinical personnel could analyze the situation and take the appropriate procedures.
Resumo:
Transcriptional Regulatory Networks (TRNs) are powerful tool for representing several interactions that occur within a cell. Recent studies have provided information to help researchers in the tasks of building and understanding these networks. One of the major sources of information to build TRNs is biomedical literature. However, due to the rapidly increasing number of scientific papers, it is quite difficult to analyse the large amount of papers that have been published about this subject. This fact has heightened the importance of Biomedical Text Mining approaches in this task. Also, owing to the lack of adequate standards, as the number of databases increases, several inconsistencies concerning gene and protein names and identifiers are common. In this work, we developed an integrated approach for the reconstruction of TRNs that retrieve the relevant information from important biological databases and insert it into a unique repository, named KREN. Also, we applied text mining techniques over this integrated repository to build TRNs. However, was necessary to create a dictionary of names and synonyms associated with these entities and also develop an approach that retrieves all the abstracts from the related scientific papers stored on PubMed, in order to create a corpora of data about genes. Furthermore, these tasks were integrated into @Note, a software system that allows to use some methods from the Biomedical Text Mining field, including an algorithms for Named Entity Recognition (NER), extraction of all relevant terms from publication abstracts, extraction relationships between biological entities (genes, proteins and transcription factors). And finally, extended this tool to allow the reconstruction Transcriptional Regulatory Networks through using scientific literature.
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação