25 resultados para Knowledge discovery in databases


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis the use of the Bayesian approach to statistical inference in fisheries stock assessment is studied. The work was conducted in collaboration of the Finnish Game and Fisheries Research Institute by using the problem of monitoring and prediction of the juvenile salmon population in the River Tornionjoki as an example application. The River Tornionjoki is the largest salmon river flowing into the Baltic Sea. This thesis tackles the issues of model formulation and model checking as well as computational problems related to Bayesian modelling in the context of fisheries stock assessment. Each article of the thesis provides a novel method either for extracting information from data obtained via a particular type of sampling system or for integrating the information about the fish stock from multiple sources in terms of a population dynamics model. Mark-recapture and removal sampling schemes and a random catch sampling method are covered for the estimation of the population size. In addition, a method for estimating the stock composition of a salmon catch based on DNA samples is also presented. For most of the articles, Markov chain Monte Carlo (MCMC) simulation has been used as a tool to approximate the posterior distribution. Problems arising from the sampling method are also briefly discussed and potential solutions for these problems are proposed. Special emphasis in the discussion is given to the philosophical foundation of the Bayesian approach in the context of fisheries stock assessment. It is argued that the role of subjective prior knowledge needed in practically all parts of a Bayesian model should be recognized and consequently fully utilised in the process of model formulation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Telecommunications network management is based on huge amounts of data that are continuously collected from elements and devices from all around the network. The data is monitored and analysed to provide information for decision making in all operation functions. Knowledge discovery and data mining methods can support fast-pace decision making in network operations. In this thesis, I analyse decision making on different levels of network operations. I identify the requirements decision-making sets for knowledge discovery and data mining tools and methods, and I study resources that are available to them. I then propose two methods for augmenting and applying frequent sets to support everyday decision making. The proposed methods are Comprehensive Log Compression for log data summarisation and Queryable Log Compression for semantic compression of log data. Finally I suggest a model for a continuous knowledge discovery process and outline how it can be implemented and integrated to the existing network operations infrastructure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Inherited retinal diseases are the most common cause of vision loss among the working population in Western countries. It is estimated that ~1 of the people worldwide suffer from vision loss due to inherited retinal diseases. The severity of these diseases varies from partial vision loss to total blindness, and at the moment no effective cure exists. To date, nearly 200 mapped loci, including 140 cloned genes for inherited retinal diseases have been identified. By a rough estimation 50% of the retinal dystrophy genes still await discovery. In this thesis we aimed to study the genetic background of two inherited retinal diseases, X-linked cone-rod dystrophy and Åland Island eye disease. X-linked cone-rod dystrophy (CORDX) is characterized by progressive loss of visual function in school age or early adulthood. Affected males show reduced visual acuity, photophobia, myopia, color vision defects, central scotomas, and variable changes in fundus. The disease is genetically heterogeneous and two disease loci, CORDX1 and CORDX2, were known prior to the present thesis work. CORDX1, located on Xp21.1-11.4, is caused by mutations in the RPGR gene. CORDX2 is located on Xq27-28 but the causative gene is still unknown. Åland Island eye disease (AIED), originally described in a family living in Åland Islands, is a congenital retinal disease characterized by decreased visual acuity, fundus hypopigmentation, nystagmus, astigmatism, red color vision defect, myopia, and defective night vision. AIED shares similarities with another retinal disease, congenital stationary night blindness (CSNB2). Mutations in the L-type calcium channel α1F-subunit gene, CACNA1F, are known to cause CSNB2, as well as AIED-like disease. The disease locus of the original AIED family maps to the same genetic interval as the CACNA1F gene, but efforts to reveal CACNA1F mutations in patients of the original AIED family have been unsuccessful. The specific aims of this study were to map the disease gene in a large Finnish family with X-linked cone-rod dystrophy and to identify the disease-causing genes in the patients of the Finnish cone-rod dystrophy family and the original AIED family. With the help of linkage and haplotype analyses, we could localize the disease gene of the Finnish cone-rod dystrophy family to the Xp11.4-Xq13.1 region, and thus establish a new genetic X-linked cone-rod dystrophy locus, CORDX3. Mutation analyses of candidate genes revealed three novel CACNA1F gene mutations: IVS28-1 GCGTC>TGG in CORDX3 patients, a 425 bp deletion, comprising exon 30 and flanking intronic regions in AIED patients, and IVS16+2T>C in an additional Finnish patient with a CSNB2-like phenotype. All three novel mutations altered splice sites of the CACNA1F gene, and resulted in defective pre-mRNA splicing suggesting altered or absent channel function as a disease mechanism. The analyses of CACNA1F mRNA also revealed novel alternative wt splice variants, which may enhance channel diversity or regulate the overall expression level of the channel. The results of our studies may be utilized in genetic counseling of the families, and they provide a basis for studies on the pathogenesis of these diseases. In the future, the knowledge of the genetic defects may be used in the identification of specific therapies for the patients.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Market microstructure is “the study of the trading mechanisms used for financial securities” (Hasbrouck (2007)). It seeks to understand the sources of value and reasons for trade, in a setting with different types of traders, and different private and public information sets. The actual mechanisms of trade are a continually changing object of study. These include continuous markets, auctions, limit order books, dealer markets, or combinations of these operating as a hybrid market. Microstructure also has to allow for the possibility of multiple prices. At any given time an investor may be faced with a multitude of different prices, depending on whether he or she is buying or selling, the quantity he or she wishes to trade, and the required speed for the trade. The price may also depend on the relationship that the trader has with potential counterparties. In this research, I touch upon all of the above issues. I do this by studying three specific areas, all of which have both practical and policy implications. First, I study the role of information in trading and pricing securities in markets with a heterogeneous population of traders, some of whom are informed and some not, and who trade for different private or public reasons. Second, I study the price discovery of stocks in a setting where they are simultaneously traded in more than one market. Third, I make a contribution to the ongoing discussion about market design, i.e. the question of which trading systems and ways of organizing trading are most efficient. A common characteristic throughout my thesis is the use of high frequency datasets, i.e. tick data. These datasets include all trades and quotes in a given security, rather than just the daily closing prices, as in traditional asset pricing literature. This thesis consists of four separate essays. In the first essay I study price discovery for European companies cross-listed in the United States. I also study explanatory variables for differences in price discovery. In my second essay I contribute to earlier research on two issues of broad interest in market microstructure: market transparency and informed trading. I examine the effects of a change to an anonymous market at the OMX Helsinki Stock Exchange. I broaden my focus slightly in the third essay, to include releases of macroeconomic data in the United States. I analyze the effect of these releases on European cross-listed stocks. The fourth and last essay examines the uses of standard methodologies of price discovery analysis in a novel way. Specifically, I study price discovery within one market, between local and foreign traders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wealthy individuals - business angels who invest a share of their net worth in entrepreneurial ventures - form an essential part of an informal venture capital market that can secure funding for entrepreneurial ventures. In Finland, business angels represent an untapped pool of capital that can contribute to fostering entrepreneurial development. In addition, business angels can bridge knowledge gaps in new business ventures by means of making their human capital available. This study has two objectives. The first is to gain an understanding of the characteristics and investment behaviour of Finnish business angels. The strongest focus here is on the due diligence procedures and their involvement post investment. The second objective is to assess whether agency theory and the incomplete contacting theory are useful theoretical lenses in the arena of business angels. To achieve the second objective, this study investigates i) how risk is mitigated in the investment process, ii) how uncertainty influences the comprehensiveness of due diligence as well as iii) how control is allocated post investment. Research hypotheses are derived from assumptions underlying agency theory and the incomplete contacting theory. The data for this study comprise interviews with 53 business angels. In terms of sample size this is the largest on Finnish business angels. The research hypotheses in this study are tested using regression analysis. This study suggests that the Finnish informal venture capital market appears to be comprised of a limited number of business angels whose style of investing much resembles their formal counterparts’. Much focus is placed on managing risks prior to making the investment by strong selectiveness and by a relatively comprehensive due diligence. The involvement is rarely on a day-to-day basis and many business angels seem to see board membership as a more suitable alternative than involvement in the operations of an entrepreneurial venture. The uncertainty involved does not seem to drive an increase in due diligence. On the contrary, it would appear that due diligence is more rigorous in safer later stage investments and when the business angels have considerable previous experience as investors. Finnish business angels’ involvement post investment is best explained by their degree of ownership in the entrepreneurial venture. It seems that when investors feel they are sufficiently rewarded, in terms of an adequate equity stake, they are willing to involve themselves actively in their investments. The lack of support for a relationship between increased uncertainty and the comprehensiveness of due diligence may partly be explained by an increasing trend towards portfolio diversification. This is triggered by a taxation system that favours investments through investment companies rather than direct investments. Many business angels appear to have substituted a specialization strategy that builds on reducing uncertainty for a diversification strategy that builds on reducing firm specific (idiosyncratic) risk by holding shares in ventures whose returns are not expected to exhibit a strong positive correlation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplication, nodes and (unweighted) edges are grouped to supernodes and superedges, respectively, to obtain a smaller graph. We propose models and algorithms for weighted graphs. The interpretation (i.e. decompression) of a compressed, weighted graph is that a pair of original nodes is connected by an edge if their supernodes are connected by one, and that the weight of an edge is approximated to be the weight of the superedge. The compression problem now consists of choosing supernodes, superedges, and superedge weights so that the approximation error is minimized while the amount of compression is maximized. In this paper, we formulate this task as the 'simple weighted graph compression problem'. We then propose a much wider class of tasks under the name of 'generalized weighted graph compression problem'. The generalized task extends the optimization to preserve longer-range connectivities between nodes, not just individual edge weights. We study the properties of these problems and propose a range of algorithms to solve them, with dierent balances between complexity and quality of the result. We evaluate the problems and algorithms experimentally on real networks. The results indicate that weighted graphs can be compressed efficiently with relatively little compression error.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lactobacillus rhamnosus GG is a probiotic bacterium that is known worldwide. Since its discovery in 1985, the health effects and biology of this health-promoting strain have been researched at an increasing rate. However, knowledge of the molecular biology responsible for these health effects is limited, even though research in this area has continued to grow since the publication of the whole genome sequence of L. rhamnosus GG in 2009. In this thesis, the molecular biology of L. rhamnosus GG was explored by mapping the changes in protein levels in response to diverse stress factors and environmental conditions. The proteomics data were supplemented with transcriptome level mapping of gene expression. The harsh conditions of the gastro-intestinal tract, which involve acidic conditions and detergent-like bile acids, are a notable challenge to the survival of probiotic bacteria. To simulate these conditions, L. rhamnosus GG was exposed to a sudden bile stress, and several stress response mechanisms were revealed, among others various changes in the cell envelope properties. L. rhamnosus GG also responded in various ways to mild acid stress, which probiotic bacteria may face in dairy fermentations and product formulations. The acid stress response of L. rhamnosus GG included changes in central metabolism and specific responses related to the control of intracellular pH. Altogether, L. rhamnosus GG was shown to possess a large repertoire of mechanisms for responding to stress conditions, which is a beneficial character of a probiotic organism. Adaptation to different growth conditions was studied by comparing the proteome level responses of L. rhamnosus GG to divergent growth media and to different phases of growth. Comparing different growth phases revealed that the metabolism of L. rhamnosus GG is modified markedly during shift from the exponential to the stationary phase of growth. These changes were seen both at proteome and transcriptome levels and in various different cellular functions. When the growth of L. rhamnosus GG in a rich laboratory medium and in an industrial whey-based medium was compared, various differences in metabolism and in factors affecting the cell surface properties could be seen. These results led us to recommend that the industrial-type media should be used in laboratory studies of L. rhamnosus GG and other probiotic bacteria to achieve a similar physiological state for the bacteria as that found in industrial products, which would thus yield more relevant information about the bacteria. In addition, an interesting phenomenon of protein phosphorylation was observed in L. rhamnosus GG. Phosphorylation of several proteins of L. rhamnosus GG was detected, and there were hints that the degree of phosphorylation may be dependent on the growth pH.