5 resultados para Fuzzy Clustering
em Helda - Digital Repository of University of Helsinki
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.
Resumo:
This paper investigates the clustering pattern in the Finnish stock market. Using trading volume and time as factors capturing the clustering pattern in the market, the Keim and Madhavan (1996) and the Engle and Russell (1998) model provide the framework for the analysis. The descriptive and the parametric analysis provide evidences that an important determinant of the famous U-shape pattern in the market is the rate of information arrivals as measured by large trading volumes and durations at the market open and close. Precisely, 1) the larger the trading volume, the greater the impact on prices both in the short and the long run, thus prices will differ across quantities. 2) Large trading volume is a non-linear function of price changes in the long run. 3) Arrival times are positively autocorrelated, indicating a clustering pattern and 4) Information arrivals as approximated by durations are negatively related to trading flow.
Resumo:
Hypertexts are digital texts characterized by interactive hyperlinking and a fragmented textual organization. Increasingly prominent since the early 1990s, hypertexts have become a common text type both on the Internet and in a variety of other digital contexts. Although studied widely in disciplines like hypertext theory and media studies, formal linguistic approaches to hypertext continue to be relatively rare. This study examines coherence negotiation in hypertext with particularly reference to hypertext fiction. Coherence, or the quality of making sense, is a fundamental property of textness. Proceeding from the premise that coherence is a subjectively evaluated property rather than an objective quality arising directly from textual cues, the study focuses on the processes through which readers interact with hyperlinks and negotiate continuity between hypertextual fragments. The study begins with a typological discussion of textuality and an overview of the historical and technological precedents of modern hypertexts. Then, making use of text linguistic, discourse analytical, pragmatic, and narratological approaches to textual coherence, the study takes established models developed for analyzing and describing conventional texts, and examines their applicability to hypertext. Primary data derived from a collection of hyperfictions is used throughout to illustrate the mechanisms in practice. Hypertextual coherence negotiation is shown to require the ability to cognitively operate between local and global coherence by means of processing lexical cohesion, discourse topical continuities, inferences and implications, and shifting cognitive frames. The main conclusion of the study is that the style of reading required by hypertextuality fosters a new paradigm of coherence. Defined as fuzzy coherence, this new approach to textual sensemaking is predicated on an acceptance of the coherence challenges readers experience when the act of reading comes to involve repeated encounters with referentially imprecise hyperlinks and discourse topical shifts. A practical application of fuzzy coherence is shown to be in effect in the way coherence is actively manipulated in hypertext narratives.