3 resultados para Text analysis

em Illinois Digital Environment for Access to Learning and Scholarship Repository


Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Czech composer Petr Eben (1927-2007) has written music in all genres except symphony, but he is highly recognized for his organ and choral compositions, which are his preferred genres. His vocal works include choral songs and vocal-instrumental works at a wide range of difficulty levels, from simple pedagogical songs to very advanced and technically challenging compositions. This study examines two of Eben‘s vocal-instrumental compositions. The oratorio Apologia Sokratus (1967) is a three-movement work; its libretto is based on Plato‘s Apology of Socrates. The ballet Curses and Blessings (1983) has a libretto compiled from numerous texts from the thirteenth to the twentieth centuries. The formal design of the ballet is unusual—a three-movement composition where the first is choral, the second is orchestral, and the third combines the previous two played simultaneously. Eben assembled the libretti for both compositions and they both address the contrasting sides of the human soul, evil and good, and the everlasting fight between them. This unity and contrast is the philosophical foundation for both compositions. The dissertation discusses the multileveled meanings behind the text settings and musical style of the oratorio and ballet in analyses focusing on the text, melodic and harmonic construction, and symbolism. Additional brief analyses of other vocal and vocal-instrumental compositions by Eben establish the ground for the examination of the oratorio and ballet and for understanding features of the composer‘s musical style. While the oratorio Apologia Sokratus was discussed in short articles in the 1970s, the ballet Curses and Blessings has never previously been addressed within Eben scholarship. The dissertation examines the significant features of Eben‘s music. His melodic style incorporates influences as diverse as Gregorian chant and folk tunes on the one hand, and modern vocal techniques such as Sprechgesang and vocal aleatoricism on the other. His harmonic language includes bitonality and polytonality, used to augment the tonal legacy of earlier times, together with elements of pitch collections and limited serial procedures as well as various secundal and quartal harmonic sonorities derived from them. His music features the vibrant rhythms of folk music, and incorporates other folk devices like ostinato, repetitive patterns, and improvisation.