968 resultados para text vector space model
Resumo:
In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.
Resumo:
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.
Resumo:
This class introduces basics of web mining and information retrieval including, for example, an introduction to the Vector Space Model and Text Mining. Guest Lecturer: Dr. Michael Granitzer Optional: Modeling the Internet and the Web: Probabilistic Methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003 (Chapter 4, Text Analysis)
Resumo:
Includes index.
Resumo:
A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure. When applied in extracting protein-protein interactions, we found that it performed better than other established statistical methods and achieved 61.5% in F-score with balanced recall and precision values. Moreover, the statistical nature of the pure data-driven HVS model makes it intrinsically robust and it can be easily adapted to other domains.
Resumo:
A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.
Resumo:
Submitted in part fulfillment of the requirements for the degree of Master in Computer Science
Resumo:
This paper introduces a State Space approach to explain the dynamics of rent growth, expected returns and Price-Rent ratio in housing markets. According to the present value model, movements in price to rent ratio should be matched by movements in expected returns and expected rent growth. The state space framework assume that both variables follow an autoregressive process of order one. The model is applied to the US and UK housing market, which yields series of the latent variables given the behaviour of the Price-Rent ratio. Resampling techniques and bootstrapped likelihood ratios show that expected returns tend to be highly persistent compared to rent growth. The Öltered expected returns is considered in a simple predictability of excess returns model with high statistical predictability evidenced for the UK. Overall, it is found that the present value model tends to have strong statistical predictability in the UK housing markets.
Resumo:
This paper introduces a State Space approach to explain the dynamics of rent growth, expected returns and Price-Rent ratio in housing markets. According to the present value model, movements in price to rent ratio should be matched by movements in expected returns and expected rent growth. The state space framework assume that both variables follow an autoregression process of order one. The model is applied to the US and UK housing market, which yields series of the latent variables given the behaviour of the Price-Rent ratio. Resampling techniques and bootstrapped likelihood ratios show that expected returns tend to be highly persistent compared to rent growth. The filtered expected returns is considered in a simple predictability of excess returns model with high statistical predictability evidence for the UK. Overall, it is found that the present value model tends to have strong statistical predictability in the UK housing markets.
Resumo:
Viruses rapidly evolve, and HIV in particular is known to be one of the fastest evolving human viruses. It is now commonly accepted that viral evolution is the cause of the intriguing dynamics exhibited during HIV infections and the ultimate success of the virus in its struggle with the immune system. To study viral evolution, we use a simple mathematical model of the within-host dynamics of HIV which incorporates random mutations. In this model, we assume a continuous distribution of viral strains in a one-dimensional phenotype space where random mutations are modelled by di ffusion. Numerical simulations show that random mutations combined with competition result in evolution towards higher Darwinian fitness: a stable traveling wave of evolution, moving towards higher levels of fi tness, is formed in the phenoty space.
Resumo:
We model the large scale fading of wireless THz communications links deployed in a metropolitan area taking into account reception through direct line of sight, ground or wall reflection and diffraction. The movement of the receiver in the three dimensions is modelled by an autonomous dynamic linear system in state-space whereas the geometric relations involved in the attenuation and multi-path propagation of the electric field are described by a static non-linear mapping. A subspace algorithm in conjunction with polynomial regression is used to identify a Wiener model from time-domain measurements of the field intensity.