19 resultados para Statistical peak moments

em Helda - Digital Repository of University of Helsinki


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pro gradu työni käsittelee virolaisia mainoksia ja niiden kohderyhmiä. Tutkimusaihe on erityisen mielenkiintoinen, sillä virolaisesta mainonnasta on tehty kansainvälisellä tasolla varsin vähän tutkimuksia. Halusin pro gradu -työssäni selvittää, mitä analysoimillani mainoksilla halutaan viestiä, kenelle ne suunnataan sekä millaisin kielellisin ja kuvallisin keinoin niitä tuotetaan. Käytin tutkimuksen materiaalina kuutta Reval Hotels ketjun mainosta, jotka ovat ilmestyneet viron- ja/tai englanninkielisinä viimeisen kahden vuoden aikana lehdissä, esitteissä tai hotellissa pöytä- tai seinämainoksina. Tutkimukseni teoreettinen viitekehys perustui kolmeen erilaiseen näkökulmaan: mainostekstien tutkimukseen, semiotiikkaan sekä Pierre Bourdieun teoriaan symbolisesta pääomasta ja sosiaalisten luokkien eroista. Sosiologisen näkökulman käyttö oli perusteltua erityisesti sen uutuusarvon vuoksi, sillä Bourdieun teoriaa ei ole aikaisemmin hyödynnetty virolaisten mainosten tutkimuksessa. Mainosten kuvien ja tekstien analyysi osoitti, että mainokset olivat rakenteellisesti oikeaoppisesti laadittuja. Mainoksista löytyi lukuisia symbolisen pääoman elementtejä, kuten englannin kieli, raha, aika, tehokkuus ja eurooppalaisuus. Mainokset olivat samalla varsin konservatiivisia ja yllätyksettömiä. Materiaalin joukosta erottui kuitenkin yksi mainos, jolla rikottiin perinteisen mainostamisen kaavaa ja josta avautui paljon erilaisia merkityksiä. Johtopäätöksenä voitiin muun muassa todeta, että Reval Hotels voi markkinointistrategiaansa kehittämällä hankkia itselleen uusia tuottavia kohderyhmiä. Pro gradu työtäni voi myöhemmin laajentaa esimerkiksi vertailevaksi jatkotutkimukseksi, jossa tarkastellaan useampien Virossa toimivien hotellien mainontaa.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In genetic epidemiology, population-based disease registries are commonly used to collect genotype or other risk factor information concerning affected subjects and their relatives. This work presents two new approaches for the statistical inference of ascertained data: a conditional and full likelihood approaches for the disease with variable age at onset phenotype using familial data obtained from population-based registry of incident cases. The aim is to obtain statistically reliable estimates of the general population parameters. The statistical analysis of familial data with variable age at onset becomes more complicated when some of the study subjects are non-susceptible, that is to say these subjects never get the disease. A statistical model for a variable age at onset with long-term survivors is proposed for studies of familial aggregation, using latent variable approach, as well as for prospective studies of genetic association studies with candidate genes. In addition, we explore the possibility of a genetic explanation of the observed increase in the incidence of Type 1 diabetes (T1D) in Finland in recent decades and the hypothesis of non-Mendelian transmission of T1D associated genes. Both classical and Bayesian statistical inference were used in the modelling and estimation. Despite the fact that this work contains five studies with different statistical models, they all concern data obtained from nationwide registries of T1D and genetics of T1D. In the analyses of T1D data, non-Mendelian transmission of T1D susceptibility alleles was not observed. In addition, non-Mendelian transmission of T1D susceptibility genes did not make a plausible explanation for the increase in T1D incidence in Finland. Instead, the Human Leucocyte Antigen associations with T1D were confirmed in the population-based analysis, which combines T1D registry information, reference sample of healthy subjects and birth cohort information of the Finnish population. Finally, a substantial familial variation in the susceptibility of T1D nephropathy was observed. The presented studies show the benefits of sophisticated statistical modelling to explore risk factors for complex diseases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Remote sensing provides methods to infer land cover information over large geographical areas at a variety of spatial and temporal resolutions. Land cover is input data for a range of environmental models and information on land cover dynamics is required for monitoring the implications of global change. Such data are also essential in support of environmental management and policymaking. Boreal forests are a key component of the global climate and a major sink of carbon. The northern latitudes are expected to experience a disproportionate and rapid warming, which can have a major impact on vegetation at forest limits. This thesis examines the use of optical remote sensing for estimating aboveground biomass, leaf area index (LAI), tree cover and tree height in the boreal forests and tundra taiga transition zone in Finland. The continuous fields of forest attributes are required, for example, to improve the mapping of forest extent. The thesis focus on studying the feasibility of satellite data at multiple spatial resolutions, assessing the potential of multispectral, -angular and -temporal information, and provides regional evaluation for global land cover data. Preprocessed ASTER, MISR and MODIS products are the principal satellite data. The reference data consist of field measurements, forest inventory data and fine resolution land cover maps. Fine resolution studies demonstrate how statistical relationships between biomass and satellite data are relatively strong in single species and low biomass mountain birch forests in comparison to higher biomass coniferous stands. The combination of forest stand data and fine resolution ASTER images provides a method for biomass estimation using medium resolution MODIS data. The multiangular data improve the accuracy of land cover mapping in the sparsely forested tundra taiga transition zone, particularly in mires. Similarly, multitemporal data improve the accuracy of coarse resolution tree cover estimates in comparison to single date data. Furthermore, the peak of the growing season is not necessarily the optimal time for land cover mapping in the northern boreal regions. The evaluated coarse resolution land cover data sets have considerable shortcomings in northernmost Finland and should be used with caution in similar regions. The quantitative reference data and upscaling methods for integrating multiresolution data are required for calibration of statistical models and evaluation of land cover data sets. The preprocessed image products have potential for wider use as they can considerably reduce the time and effort used for data processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Population dynamics are generally viewed as the result of intrinsic (purely density dependent) and extrinsic (environmental) processes. Both components, and potential interactions between those two, have to be modelled in order to understand and predict dynamics of natural populations; a topic that is of great importance in population management and conservation. This thesis focuses on modelling environmental effects in population dynamics and how effects of potentially relevant environmental variables can be statistically identified and quantified from time series data. Chapter I presents some useful models of multiplicative environmental effects for unstructured density dependent populations. The presented models can be written as standard multiple regression models that are easy to fit to data. Chapters II IV constitute empirical studies that statistically model environmental effects on population dynamics of several migratory bird species with different life history characteristics and migration strategies. In Chapter II, spruce cone crops are found to have a strong positive effect on the population growth of the great spotted woodpecker (Dendrocopos major), while cone crops of pine another important food resource for the species do not effectively explain population growth. The study compares rate- and ratio-dependent effects of cone availability, using state-space models that distinguish between process and observation error in the time series data. Chapter III shows how drought, in combination with settling behaviour during migration, produces asymmetric spatially synchronous patterns of population dynamics in North American ducks (genus Anas). Chapter IV investigates the dynamics of a Finnish population of skylark (Alauda arvensis), and point out effects of rainfall and habitat quality on population growth. Because the skylark time series and some of the environmental variables included show strong positive autocorrelation, the statistical significances are calculated using a Monte Carlo method, where random autocorrelated time series are generated. Chapter V is a simulation-based study, showing that ignoring observation error in analyses of population time series data can bias the estimated effects and measures of uncertainty, if the environmental variables are autocorrelated. It is concluded that the use of state-space models is an effective way to reach more accurate results. In summary, there are several biological assumptions and methodological issues that can affect the inferential outcome when estimating environmental effects from time series data, and that therefore need special attention. The functional form of the environmental effects and potential interactions between environment and population density are important to deal with. Other issues that should be considered are assumptions about density dependent regulation, modelling potential observation error, and when needed, accounting for spatial and/or temporal autocorrelation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An efficient and statistically robust solution for the identification of asteroids among numerous sets of astrometry is presented. In particular, numerical methods have been developed for the short-term identification of asteroids at discovery, and for the long-term identification of scarcely observed asteroids over apparitions, a task which has been lacking a robust method until now. The methods are based on the solid foundation of statistical orbital inversion properly taking into account the observational uncertainties, which allows for the detection of practically all correct identifications. Through the use of dimensionality-reduction techniques and efficient data structures, the exact methods have a loglinear, that is, O(nlog(n)), computational complexity, where n is the number of included observation sets. The methods developed are thus suitable for future large-scale surveys which anticipate a substantial increase in the astrometric data rate. Due to the discontinuous nature of asteroid astrometry, separate sets of astrometry must be linked to a common asteroid from the very first discovery detections onwards. The reason for the discontinuity in the observed positions is the rotation of the observer with the Earth as well as the motion of the asteroid and the observer about the Sun. Therefore, the aim of identification is to find a set of orbital elements that reproduce the observed positions with residuals similar to the inevitable observational uncertainty. Unless the astrometric observation sets are linked, the corresponding asteroid is eventually lost as the uncertainty of the predicted positions grows too large to allow successful follow-up. Whereas the presented identification theory and the numerical comparison algorithm are generally applicable, that is, also in fields other than astronomy (e.g., in the identification of space debris), the numerical methods developed for asteroid identification can immediately be applied to all objects on heliocentric orbits with negligible effects due to non-gravitational forces in the time frame of the analysis. The methods developed have been successfully applied to various identification problems. Simulations have shown that the methods developed are able to find virtually all correct linkages despite challenges such as numerous scarce observation sets, astrometric uncertainty, numerous objects confined to a limited region on the celestial sphere, long linking intervals, and substantial parallaxes. Tens of previously unknown main-belt asteroids have been identified with the short-term method in a preliminary study to locate asteroids among numerous unidentified sets of single-night astrometry of moving objects, and scarce astrometry obtained nearly simultaneously with Earth-based and space-based telescopes has been successfully linked despite a substantial parallax. Using the long-term method, thousands of realistic 3-linkages typically spanning several apparitions have so far been found among designated observation sets each spanning less than 48 hours.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the most fundamental and widely accepted ideas in finance is that investors are compensated through higher returns for taking on non-diversifiable risk. Hence the quantification, modeling and prediction of risk have been, and still are one of the most prolific research areas in financial economics. It was recognized early on that there are predictable patterns in the variance of speculative prices. Later research has shown that there may also be systematic variation in the skewness and kurtosis of financial returns. Lacking in the literature so far, is an out-of-sample forecast evaluation of the potential benefits of these new more complicated models with time-varying higher moments. Such an evaluation is the topic of this dissertation. Essay 1 investigates the forecast performance of the GARCH (1,1) model when estimated with 9 different error distributions on Standard and Poor’s 500 Index Future returns. By utilizing the theory of realized variance to construct an appropriate ex post measure of variance from intra-day data it is shown that allowing for a leptokurtic error distribution leads to significant improvements in variance forecasts compared to using the normal distribution. This result holds for daily, weekly as well as monthly forecast horizons. It is also found that allowing for skewness and time variation in the higher moments of the distribution does not further improve forecasts. In Essay 2, by using 20 years of daily Standard and Poor 500 index returns, it is found that density forecasts are much improved by allowing for constant excess kurtosis but not improved by allowing for skewness. By allowing the kurtosis and skewness to be time varying the density forecasts are not further improved but on the contrary made slightly worse. In Essay 3 a new model incorporating conditional variance, skewness and kurtosis based on the Normal Inverse Gaussian (NIG) distribution is proposed. The new model and two previously used NIG models are evaluated by their Value at Risk (VaR) forecasts on a long series of daily Standard and Poor’s 500 returns. The results show that only the new model produces satisfactory VaR forecasts for both 1% and 5% VaR Taken together the results of the thesis show that kurtosis appears not to exhibit predictable time variation, whereas there is found some predictability in the skewness. However, the dynamic properties of the skewness are not completely captured by any of the models.