5 resultados para constrained clustering

em Helda - Digital Repository of University of Helsinki


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Real-time scheduling algorithms, such as Rate Monotonic and Earliest Deadline First, guarantee that calculations are performed within a pre-defined time. As many real-time systems operate on limited battery power, these algorithms have been enhanced with power-aware properties. In this thesis, 13 power-aware real-time scheduling algorithms for processor, device and system-level use are explored.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper investigates the clustering pattern in the Finnish stock market. Using trading volume and time as factors capturing the clustering pattern in the market, the Keim and Madhavan (1996) and the Engle and Russell (1998) model provide the framework for the analysis. The descriptive and the parametric analysis provide evidences that an important determinant of the famous U-shape pattern in the market is the rate of information arrivals as measured by large trading volumes and durations at the market open and close. Precisely, 1) the larger the trading volume, the greater the impact on prices both in the short and the long run, thus prices will differ across quantities. 2) Large trading volume is a non-linear function of price changes in the long run. 3) Arrival times are positively autocorrelated, indicating a clustering pattern and 4) Information arrivals as approximated by durations are negatively related to trading flow.