5 resultados para Model selection

em Helda - Digital Repository of University of Helsinki


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The first quarter of the 20th century witnessed a rebirth of cosmology, study of our Universe, as a field of scientific research with testable theoretical predictions. The amount of available cosmological data grew slowly from a few galaxy redshift measurements, rotation curves and local light element abundances into the first detection of the cos- mic microwave background (CMB) in 1965. By the turn of the century the amount of data exploded incorporating fields of new, exciting cosmological observables such as lensing, Lyman alpha forests, type Ia supernovae, baryon acoustic oscillations and Sunyaev-Zeldovich regions to name a few. -- CMB, the ubiquitous afterglow of the Big Bang, carries with it a wealth of cosmological information. Unfortunately, that information, delicate intensity variations, turned out hard to extract from the overall temperature. Since the first detection, it took nearly 30 years before first evidence of fluctuations on the microwave background were presented. At present, high precision cosmology is solidly based on precise measurements of the CMB anisotropy making it possible to pinpoint cosmological parameters to one-in-a-hundred level precision. The progress has made it possible to build and test models of the Universe that differ in the way the cosmos evolved some fraction of the first second since the Big Bang. -- This thesis is concerned with the high precision CMB observations. It presents three selected topics along a CMB experiment analysis pipeline. Map-making and residual noise estimation are studied using an approach called destriping. The studied approximate methods are invaluable for the large datasets of any modern CMB experiment and will undoubtedly become even more so when the next generation of experiments reach the operational stage. -- We begin with a brief overview of cosmological observations and describe the general relativistic perturbation theory. Next we discuss the map-making problem of a CMB experiment and the characterization of residual noise present in the maps. In the end, the use of modern cosmological data is presented in the study of an extended cosmological model, the correlated isocurvature fluctuations. Current available data is shown to indicate that future experiments are certainly needed to provide more information on these extra degrees of freedom. Any solid evidence of the isocurvature modes would have a considerable impact due to their power in model selection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Jordens ekologiska system undergår för tillfället stora förändringar pga. människans aktiviteter. Ett växande antal studier visar att dessa förändringar påverkar naturliga och sexuella urvalet och därmed evolutiva processer. Målet med detta arbete var att undersöka effekterna av omgivningsförändringar på sexuella urvalet genom att använda den ökade övergödningen inom storpiggen Gasterosteus aculeatus lekområden som modell system. Sexuella urvalet är en viktig evolutiv kraft med följder på populations- och artnivå (Kapitel 1). Avhandlingens olika delar fokuserar på övergödningens effekter på upptäckandet av partners, användningen av visuella- och doftsignaler i partnersval, och fördelningen av parningsframgången mellan bobyggande hanar. I Kapitel II och III simuleras hur grumlighet orsakad av fytoplankton påverkar hastigheten med vilken potentiella partners påträffas, genom effekter på synligheten. Resultaten visar att normala algblomningar i Östersjön har en måttlig effekt på finnandet av potentiella partners. Detta tyder på att algblomningarna troligen inte kommer att minska på selektiva parningen pga. ökade sökkostnader. I Kapitel IV visas att storspiggen ändrar relativa användningen av olika signaler när vattnets grumlighet ökar; visuella signaler minskar i betydelse medan doftsignaler ökar i betydelse. Samtidigt underlättas användandet av doftsignaler av ändringar i vattnets kemiska sammansättning då fotosyntesen intensifieras (Kapitel V). Lek i övergödda vatten kan ändå vara kostsamt både på individ- och populationsnivån, då parasiterade hanar, som troligen är dåligt genetiskt anpassade till sin miljö, lyckas få mer ägg i sina bon än friskare hanar som troligen är av högre genetisk kvalitet (Kapitel VI). Övergödningen påverkar således partnersval och konkurrensen om partners genom att påverka upptäckandet av potentiella partners, evalueringen av partners och fördelningen av partners inom lekområdena. De följder detta kan ha för evolutionen av sexuellt selekterad egenskaper och för populationers dynamik och livskraft är dock oklara. Avhandlingen visar på svårigheten att förutse följderna av omgivningsförändringar för sexuella urvalet och effekterna på individ och populationsnivå.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research on unit cohesion has shown positive correlations between cohesion and valued outcomes such as strong performance, reduced stress, less indiscipline, and high re-enlistment intentions. However, the correlations have varied in strength and significance. The purpose of this study is to show that taking into consideration the multi-component nature of cohesion and relating the most applicable components to specific outcomes could resolve much of the inconsistency. Unit cohesion is understood as a process of social integration among members of a primary group with its leaders, and with the larger secondary groups of which they are a part. Correspondingly, included in the framework are four bonding components: horizontal (peer) and vertical (subordinate and leader) and organizational and institutional, respectively. The data were collected as part of a larger research project on cohesion, leadership, and personal adjustment to the military. In all, 1,534 conscripts responded to four questionnaires during their service in 2001-2002. In addition, sociometric questionnaires were given to 537 group members in 47 squads toward the end of their service. The results showed that platoons with strong primary-group cohesion differed from other platoons in terms of performance, training quality, secondary-group experiences, and attitudes toward refresher training. On the sociometric level it was found that soldiers who were chosen as friends by others were more likely to have higher expected performance, better performance ratings, more positive attitudes toward military service, higher levels of well-being during conscript service, and fewer exemptions from duty during it. On the group level, the selection of the respondents own group leader rather than naming a leader from outside (i.e., leader bonding) had a bearing not only on cohesion and performance, but also on the social, attitudinal, and behavioral criteria. Overall, the aim of the study was to contribute to the research on cohesion by introducing a model that takes into account the primary foci of bonding and their impact. The results imply that primary-group and secondary-group bonding processes are equally influential in explaining individual and group performance, whereas the secondary-group bonding components are far superior in explaining career intentions, personal growth, avoidance of duty, and attitudes toward refresher training and national defense. This should be considered in the planning and conducting of training. The main conclusion is that the different types of cohesion components have a unique, positive, significant, but varying impact on a wide range of criteria, confirming the need to match the components with the specific criteria.