10 resultados para High-dimensional data visualization
em Helda - Digital Repository of University of Helsinki
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Volatility is central in options pricing and risk management. It reflects the uncertainty of investors and the inherent instability of the economy. Time series methods are among the most widely applied scientific methods to analyze and predict volatility. Very frequently sampled data contain much valuable information about the different elements of volatility and may ultimately reveal the reasons for time varying volatility. The use of such ultra-high-frequency data is common to all three essays of the dissertation. The dissertation belongs to the field of financial econometrics. The first essay uses wavelet methods to study the time-varying behavior of scaling laws and long-memory in the five-minute volatility series of Nokia on the Helsinki Stock Exchange around the burst of the IT-bubble. The essay is motivated by earlier findings which suggest that different scaling laws may apply to intraday time-scales and to larger time-scales, implying that the so-called annualized volatility depends on the data sampling frequency. The empirical results confirm the appearance of time varying long-memory and different scaling laws that, for a significant part, can be attributed to investor irrationality and to an intraday volatility periodicity called the New York effect. The findings have potentially important consequences for options pricing and risk management that commonly assume constant memory and scaling. The second essay investigates modelling the duration between trades in stock markets. Durations convoy information about investor intentions and provide an alternative view at volatility. Generalizations of standard autoregressive conditional duration (ACD) models are developed to meet needs observed in previous applications of the standard models. According to the empirical results based on data of actively traded stocks on the New York Stock Exchange and the Helsinki Stock Exchange the proposed generalization clearly outperforms the standard models and also performs well in comparison to another recently proposed alternative to the standard models. The distribution used to derive the generalization may also prove valuable in other areas of risk management. The third essay studies empirically the effect of decimalization on volatility and market microstructure noise. Decimalization refers to the change from fractional pricing to decimal pricing and it was carried out on the New York Stock Exchange in January, 2001. The methods used here are more accurate than in the earlier studies and put more weight on market microstructure. The main result is that decimalization decreased observed volatility by reducing noise variance especially for the highly active stocks. The results help risk management and market mechanism designing.
Resumo:
Dispersal is a highly important life history trait. In fragmented landscapes the long-term persistence of populations depends on dispersal. Evolution of dispersal is affected by costs and benefits and these may differ between different landscapes. This results in differences in the strength and direction of natural selection on dispersal in fragmented landscapes. Dispersal has been shown to be a nonrandom process that is associated with traits such as flight ability in insects. This thesis examines genetic and physiological traits affecting dispersal in the Glanville fritillary butterfly (Melitaea cinxia). Flight metabolic rate is a repeatable trait representing flight ability. Unlike in many vertebrates, resting metabolic rate cannot be used as a surrogate of maximum metabolic rate as no strong correlation between the two was found in the Glanville fritillary. Resting and flight metabolic rate are affected by environmental variables, most notably temperature. However, only flight metabolic rate has a strong genetic component. Molecular variation in the much-studied candidate locus phosphoglucose isomerase (Pgi), which encodes the glycolytic enzyme PGI, has an effect on carbohydrate metabolism in flight. This effect is temperature dependent: in low to moderate temperatures individuals with the heterozygous genotype at the single nucleotide polymorphism (SNP) AA111 have higher flight metabolic rate than the common homozygous genotype. At high temperatures the situation is reversed. This finding suggests that variation in enzyme properties is indeed translated to organismal performance. High-resolution data on individual female Glanville fritillaries moving freely in the field were recorded using harmonic radar. There was a strong positive correlation between flight metabolic rate and dispersal rate. Flight metabolic rate explained one third of the observed variation in the one-hour movement distance. A fine-scaled analysis of mobility showed that mobility peaked at intermediate ambient temperatures but the two common Pgi genotypes differed in their reaction norms to temperature. As with flight metabolic rate, heterozygotes at SNP AA111 were the most active genotype in low to moderate temperatures. The results show that molecular variation is associated with variation in dispersal rate through the link of flight physiology under the influence of environmental conditions. The evolutionary pressures for dispersal differ between males and females. The effect of flight metabolic rate on dispersal was examined in both sexes in field and laboratory conditions. The relationship between flight metabolic rate and dispersal rate in the field and flight duration in the laboratory were found to differ between the two sexes. In females the relationship was positive, but in males the longest distances and flight durations were recorded for individuals with low flight metabolic rate. These findings may reflect male investment in mate locating. Instead of dispersing, males with high flight metabolic rate may establish territories and follow a perching strategy when locating females and hence move less on the landscape level. Males with low metabolic rate may be forced to disperse due to low competitive success or may show adaptations to an alternative strategy: patrolling. In the light of life history trade-offs and the rate of living theory having high metabolic rate may carry a cost in the form of shortened lifespan. Experiments relating flight metabolic rate to longevity showed a clear correlation in the opposite direction: high flight metabolic rate was associated with long lifespan. This suggests that individuals with high metabolic rate do not pay an extra physiological cost for their high flight capacity, rather there are positive correlations between different measures of fitness. These results highlight the importance of condition.
Resumo:
Throughout the history of Linnean taxonomy, species have been described with varying degrees of justification. Many descriptions have been based on only a few ambiguous morphological characters. Moreover, species have been considered natural, well-defined units whereas higher taxa have been treated as disparate, non-existent creations. In the present thesis a few such cases were studied in detail. Often the species-level descriptions were based on only a few specimens and the variation previously thought to be interspecific was found to be intraspecific. In some cases morphological characters were sufficient to resolve the evolutionary relationships between the taxa, but generally more resolution was gained by the addition of molecular evidence. However, both morphological and molecular data were found to be deceptive in some cases. The DNA sequences of morphologically similar specimens were found to differ distinctly in some cases, whereas in other closely related species the morphology of specimens with identical DNA sequences differed substantially. This study counsels caution when evolutionary relationships are being studied utilizing only one source of evidence or a very limited number of characters (e.g. barcoding). Moreover, it emphasizes the importance of high quality data as well as the utilization of proper methods when making scientific inferences. Properly conducted analyses produce robust results that can be utilized in numerous interesting ways. The present thesis considered two such extensions of systematics. A novel hypothesis on the origin of bioluminescence in Elateriformia beetles is presented, tying it to the development of the clicking mechanism in the ancestors of these animals. An entirely different type of extension of systematics is the proposed high value of the white sand forests in maintaining the diversity of beetles in the Peruvian Amazon. White sand forests are under growing pressure from human activities that lead to deforestation. They were found to harbor an extremely diverse beetle fauna and many taxa were specialists living only in this unique habitat. In comparison to the predominant clay soil forests, considerably more elateroid beetles belonging to all studied taxonomic levels (species, genus, tribus, and subfamily) were collected in white sand forests. This evolutionary diversity is hypothesized to be due to a combination of factors: (1) the forest structure, which favors the fungus-plant interactions important for the elateroid beetles, (2) the old age of the forest type favoring survival of many evolutionary lineages and (3) the widespread distribution and fragmentation of the forests in the Miocene, favoring speciation.
Resumo:
The magnetic field of the Earth is 99 % of the internal origin and generated in the outer liquid core by the dynamo principle. In the 19th century, Carl Friedrich Gauss proved that the field can be described by a sum of spherical harmonic terms. Presently, this theory is the basis of e.g. IGRF models (International Geomagnetic Reference Field), which are the most accurate description available for the geomagnetic field. In average, dipole forms 3/4 and non-dipolar terms 1/4 of the instantaneous field, but the temporal mean of the field is assumed to be a pure geocentric axial dipolar field. The validity of this GAD (Geocentric Axial Dipole) hypothesis has been estimated by using several methods. In this work, the testing rests on the frequency dependence of inclination with respect to latitude. Each combination of dipole (GAD), quadrupole (G2) and octupole (G3) produces a distinct inclination distribution. These theoretical distributions have been compared with those calculated from empirical observations from different continents, and last, from the entire globe. Only data from Precambrian rocks (over 542 million years old) has been used in this work. The basic assumption is that during the long-term course of drifting continents, the globe is sampled adequately. There were 2823 observations altogether in the paleomagnetic database of the University of Helsinki. The effect of the quality of observations, as well as the age and rocktype, has been tested. For comparison between theoretical and empirical distributions, chi-square testing has been applied. In addition, spatiotemporal binning has effectively been used to remove the errors caused by multiple observations. The modelling from igneous rock data tells that the average magnetic field of the Earth is best described by a combination of a geocentric dipole and a very weak octupole (less than 10 % of GAD). Filtering and binning gave distributions a more GAD-like appearance, but deviation from GAD increased as a function of the age of rocks. The distribution calculated from so called keypoles, the most reliable determinations, behaves almost like GAD, having a zero quadrupole and an octupole 1 % of GAD. In no earlier study, past-400-Ma rocks have given a result so close to GAD, but low inclinations have been prominent especially in the sedimentary data. Despite these results, a greater deal of high-quality data and a proof of the long-term randomness of the Earth's continental motions are needed to make sure the dipole model holds true.
Resumo:
The increased availability of high frequency data sets have led to important new insights in understanding of financial markets. The use of high frequency data is interesting and persuasive, since it can reveal new information that cannot be seen in lower data aggregation. This dissertation explores some of the many important issues connected with the use, analysis and application of high frequency data. These include the effects of intraday seasonal, the behaviour of time varying volatility, the information content of various market data, and the issue of inter market linkages utilizing high frequency 5 minute observations from major European and the U.S stock indices, namely DAX30 of Germany, CAC40 of France, SMI of Switzerland, FTSE100 of the UK and SP500 of the U.S. The first essay in the dissertation shows that there are remarkable similarities in the intraday behaviour of conditional volatility across European equity markets. Moreover, the U.S macroeconomic news announcements have significant cross border effect on both, European equity returns and volatilities. The second essay reports substantial intraday return and volatility linkages across European stock indices of the UK and Germany. This relationship appears virtually unchanged by the presence or absence of the U.S stock market. However, the return correlation among the U.K and German markets rises significantly following the U.S stock market opening, which could largely be described as a contemporaneous effect. The third essay sheds light on market microstructure issues in which traders and market makers learn from watching market data, and it is this learning process that leads to price adjustments. This study concludes that trading volume plays an important role in explaining international return and volatility transmissions. The examination concerning asymmetry reveals that the impact of the positive volume changes is larger on foreign stock market volatility than the negative changes. The fourth and the final essay documents number of regularities in the pattern of intraday return volatility, trading volume and bid-ask spreads. This study also reports a contemporaneous and positive relationship between the intraday return volatility, bid ask spread and unexpected trading volume. These results verify the role of trading volume and bid ask quotes as proxies for information arrival in producing contemporaneous and subsequent intraday return volatility. Moreover, asymmetric effect of trading volume on conditional volatility is also confirmed. Overall, this dissertation explores the role of information in explaining the intraday return and volatility dynamics in international stock markets. The process through which the information is incorporated in stock prices is central to all information-based models. The intraday data facilitates the investigation that how information gets incorporated into security prices as a result of the trading behavior of informed and uninformed traders. Thus high frequency data appears critical in enhancing our understanding of intraday behavior of various stock markets’ variables as it has important implications for market participants, regulators and academic researchers.
Resumo:
In this thesis we deal with the concept of risk. The objective is to bring together and conclude on some normative information regarding quantitative portfolio management and risk assessment. The first essay concentrates on return dependency. We propose an algorithm for classifying markets into rising and falling. Given the algorithm, we derive a statistic: the Trend Switch Probability, for detection of long-term return dependency in the first moment. The empirical results suggest that the Trend Switch Probability is robust over various volatility specifications. The serial dependency in bear and bull markets behaves however differently. It is strongly positive in rising market whereas in bear markets it is closer to a random walk. Realized volatility, a technique for estimating volatility from high frequency data, is investigated in essays two and three. In the second essay we find, when measuring realized variance on a set of German stocks, that the second moment dependency structure is highly unstable and changes randomly. Results also suggest that volatility is non-stationary from time to time. In the third essay we examine the impact from market microstructure on the error between estimated realized volatility and the volatility of the underlying process. With simulation-based techniques we show that autocorrelation in returns leads to biased variance estimates and that lower sampling frequency and non-constant volatility increases the error variation between the estimated variance and the variance of the underlying process. From these essays we can conclude that volatility is not easily estimated, even from high frequency data. It is neither very well behaved in terms of stability nor dependency over time. Based on these observations, we would recommend the use of simple, transparent methods that are likely to be more robust over differing volatility regimes than models with a complex parameter universe. In analyzing long-term return dependency in the first moment we find that the Trend Switch Probability is a robust estimator. This is an interesting area for further research, with important implications for active asset allocation.
Resumo:
Using a data set consisting of three years of 5-minute intraday stock index returns for major European stock indices and U.S. macroeconomic surprises, the conditional mean and volatility behaviors in European market were investigated. The findings suggested that the opening of the U.S market significantly raised the level of volatility in Europe, and that all markets respond in an identical fashion. Furthermore, the U.S. macroeconomic surprises exerted an immediate and major impact on both European stock markets’ returns and volatilities. Thus, high frequency data appear to be critical for the identification of news that impacted the markets.
Resumo:
The first line medication for mild to moderate Alzheimer s disease (AD) is based on cholinesterase inhibitors which prolong the effect of the neurotransmitter acetylcholine in cholinergic nerve synapses which relieves the symptoms of the disease. Implications of cholinesterases involvement in disease modifying processes has increased interest in this research area. The drug discovery and development process is a long and expensive process that takes on average 13.5 years and costs approximately 0.9 billion US dollars. Drug attritions in the clinical phases are common due to several reasons, e.g., poor bioavailability of compounds leading to low efficacy or toxic effects. Thus, improvements in the early drug discovery process are needed to create highly potent non-toxic compounds with predicted drug-like properties. Nature has been a good source for the discovery of new medicines accounting for around half of the new drugs approved to market during the last three decades. These compounds are direct isolates from the nature, their synthetic derivatives or natural mimics. Synthetic chemistry is an alternative way to produce compounds for drug discovery purposes. Both sources have pros and cons. The screening of new bioactive compounds in vitro is based on assaying compound libraries against targets. Assay set-up has to be adapted and validated for each screen to produce high quality data. Depending on the size of the library, miniaturization and automation are often requirements to reduce solvent and compound amounts and fasten the process. In this contribution, natural extract, natural pure compound and synthetic compound libraries were assessed as sources for new bioactive compounds. The libraries were screened primarily for acetylcholinesterase inhibitory effect and secondarily for butyrylcholinesterase inhibitory effect. To be able to screen the libraries, two assays were evaluated as screening tools and adapted to be compatible with special features of each library. The assays were validated to create high quality data. Cholinesterase inhibitors with various potencies and selectivity were found in natural product and synthetic compound libraries which indicates that the two sources complement each other. It is acknowledged that natural compounds differ structurally from compounds in synthetic compound libraries which further support the view of complementation especially if a high diversity of structures is the criterion for selection of compounds in a library.