15 resultados para pattern clustering
em Helda - Digital Repository of University of Helsinki
Resumo:
This paper investigates the clustering pattern in the Finnish stock market. Using trading volume and time as factors capturing the clustering pattern in the market, the Keim and Madhavan (1996) and the Engle and Russell (1998) model provide the framework for the analysis. The descriptive and the parametric analysis provide evidences that an important determinant of the famous U-shape pattern in the market is the rate of information arrivals as measured by large trading volumes and durations at the market open and close. Precisely, 1) the larger the trading volume, the greater the impact on prices both in the short and the long run, thus prices will differ across quantities. 2) Large trading volume is a non-linear function of price changes in the long run. 3) Arrival times are positively autocorrelated, indicating a clustering pattern and 4) Information arrivals as approximated by durations are negatively related to trading flow.
Resumo:
Type 1 diabetes (T1D) is a common, multifactorial disease with strong familial clustering. In Finland, the incidence of T1D among children aged 14 years or under is the highest in the world. The increase in incidence has been approximately 2.4% per year. Although most new T1D cases are sporadic the first-degree relatives are at an increased risk of developing the same disease. This study was designed to examine the familial aggregation of T1D and one of its serious complications, diabetic nephropathy (DN). More specifically the study aimed (1) to determine the concordance rates of T1D in monozygotic (MZ) and dizygotic (DZ) twins and to estimate the relative contributions of genetic and environmental factors to the variability in liability to T1D as well as to study the age at onset of diabetes in twins; (2) to obtain long-term empirical estimates of the risk of T1D among siblings of T1D patients and the factors related to this risk, especially the effect of age at onset of diabetes in the proband and the birth cohort effect; (3) to establish if DN is aggregating in a Finnish population-based cohort of families with multiple cases of T1D, and to assess its magnitude and particularly to find out whether the risk of DN in siblings is varying according to the severity of DN in the proband and/or the age at onset of T1D: (4) to assess the recurrence risk of T1D in the offspring of a Finnish population-based cohort of patients with childhood onset T1D, and to investigate potential sex-related effects in the transmission of T1D from the diabetic parents to their offspring as well as to study whether there is a temporal trend in the incidence. The study population comprised of the Finnish Young Twin Cohort (22,650 twin pairs), a population-based cohort of patients with T1D diagnosed at the age of 17 years or earlier between 1965 and 1979 (n=5,144) and all their siblings (n=10,168) and offspring (n=5,291). A polygenic, multifactorial liability model was fitted to the twin data. Kaplan-Meier analyses were used to provide the cumulative incidence for the development of T1D and DN. Cox s proportional hazards models were fitted to the data. Poisson regression analysis was used to evaluate temporal trends in incidence. Standardized incidence ratios (SIRs) between the first-degree relatives of T1D patients and background population were determined. The twin study showed that the vast majority of affected MZ twin pairs remained discordant. Pairwise concordance for T1D was 27.3% in MZ and 3.8% in DZ twins. The probandwise concordance estimates were 42.9% and 7.4%, respectively. The model with additive genetic and individual environmental effects was the best-fitting liability model to T1D, with 88% of the phenotypic variance due to genetic factors. The second paper showed that the 50-year cumulative incidence of T1D in the siblings of diabetic probands was 6.9%. A young age at diagnosis in the probands considerably increased the risk. If the proband was diagnosed at the age of 0-4, 5-9, 10-14, 15 or more, the corresponding 40-year cumulative risks were 13.2%, 7.8%, 4.7% and 3.4%. The cumulative incidence increased with increasing birth year. However, SIR among children aged 14 years or under was approximately 12 throughout the follow-up. The third paper showed that diabetic siblings of the probands with nephropathy had a 2.3 times higher risk of DN compared with siblings of probands free of nephropathy. The presence of end stage renal disease (ESRD) in the proband increases the risk three-fold for diabetic siblings. Being diagnosed with diabetes during puberty (10-14) or a few years before (5-9) increased the susceptibility for DN in the siblings. The fourth paper revealed that of the offspring of male probands, 7.8% were affected by the age of 20 compared with 5.3% of the offspring of female probands. Offspring of fathers with T1D have 1.7 times greater risk to be affected with T1D than the offspring of mothers with T1D. The excess risk in the offspring of male fathers manifested itself through the higher risk the younger the father was when diagnosed with T1D. Young age at onset of diabetes in fathers increased the risk of T1D greatly in the offspring, but no such pattern was seen in the offspring of diabetic mothers. The SIR among offspring aged 14 years or under remained fairly constant throughout the follow-up, approximately 10. The present study has provided new knowledge on T1D recurrence risk in the first-degree relatives and the risk factors modifying the risk. Twin data demonstrated high genetic liability for T1D and increased heritability. The vast majority of affected MZ twin pairs, however, remain discordant for T1D. This study confirmed the drastic impact of the young age at onset of diabetes in the probands on the increased risk of T1D in the first-degree relatives. The only exception was the absence of this pattern in the offspring of T1D mothers. Both the sibling and the offspring recurrence risk studies revealed dynamic changes in the cumulative incidence of T1D in the first-degree relatives. SIRs among the first-degree relatives of T1D patients seems to remain fairly constant. The study demonstrates that the penetrance of the susceptibility genes for T1D may be low, although strongly influenced by the environmental factors. Presence of familial aggregation of DN was confirmed for the first time in a population-based study. Although the majority of the sibling pairs with T1D were discordant for DN, its presence in one sibling doubles and presence of ESRD triples the risk of DN in the other diabetic sibling. An encouraging observation was that although the proportion of children to be diagnosed with T1D at the age of 4 or under is increasing, they seem to have a decreased risk of DN or at least delayed onset.
Resumo:
Germline mutations in fumarate hydratase (FH) cause hereditary leiomyomatosis and renal cell cancer (HLRCC). FH is a nuclear encoded enzyme which functions in the Krebs tricarboxylic acid cycle, and homozygous mutation in FH lead to severe developmental defects. Both uterine and cutaneous leiomyomas are components of the HLRCC phenotype. Most of these tumours show loss of the wild-type allele and, also, the mutations reduce FH enzyme activity, which indicate that FH is a tumour suppressor gene. The renal cell cancers associated with HLRCC are of rare papillary type 2 histology. Other genes involved in the Krebs cycle, which are also implicated in neoplasia are 3 of the 4 subunits encoding succinate dehydrogenase (SDH); mutations in SHDB, SDHC, and SDHD predispose to paraganglioma and phaeochromocytoma. Although uterine leiomyomas (or fibroids) are very common, the estimations of affected women ranging from 25% to 77%, not much is known about their genetic background. Cytogenetic studies have revealed that rearrangements involving chromosomes 6, 7, 12 and 14 are most commonly seen in fibroids. Deletions on the long arm of chromosome 7 have been reported to be involved in about 17 to 34 % of leiomyomas and the small commonly deleted region on 7q22 suggests that there might be an underlying tumour suppressor gene in that region. The purpose of this study was to investigate the genetic mechanisms behind the development of tumours associated with HLRCC, both renal cell cancer and uterine fibroids. Firstly, a database search at the Finnish cancer registry was conducted in order to identify new families with early-onset RCC and to test if the family history was compatible with HLRCC. Secondly, sporadic uterine fibroids were tested for deletions on 7q in order to define the minimal deleted 7q-region, followed by mutation analysis of the candidate genes. Thirdly, oligonucleotide chips were utilised to study the global gene expression profiles of uterine fibroids in order to test whether 7q-deletions and FH mutations significantly affected fibroid biology. In the screen for early-onset RCC, 214 families were identified. Subsequently, the pedigrees were constructed and clinical data obtained. One of the index cases (RCC at the age of 28) had a mother who had been diagnosed with a heart tumour, which in further investigation turned out to be a paraganglioma. This lead to an alternative hypothesis that SDH, instead of FH, could be involved. SDHA, SDHB, SDHC and SDHD were sequenced from these individuals; a germline SDHB R27X mutation was detected with loss of the wild-type allele in both tumours. These results suggest that germline mutations in the SDHB gene predispose to early-onset RCC establishing a novel form of hereditary RCC. This has immediate clinical implications in the surveillance of patients suffering from early-onset RCC and phaeochromocytoma/paraganglioma. For the studies on sporadic uterine fibroids, a set of 166 fibroids from 51 individuals were collected. The 7q LOH mapping defined a commonly deleted region of about 3.2 mega bases in 11 of the 166 tumours. The deletion was consistent with previously reported allelotyping studies of leiomyomas and it therefore suggested the presence of a tumour suppressor gene in the deleted region. Furthermore, the high-resolution aCGH-chip analysis refined the deleted region to only 2.79Mb. When combined with previous data, the commonly deleted region was only 2.3Mb. The mutation screening of the known genes within the commonly deleted region did not reveal pathogenic mutations, however. The expression microarray analysis revealed that FH-deficient fibroids, both sporadic and familial, had their distinct gene expression profile as they formed their own group in the unsupervised clustering. On the other hand, the presence or absence of 7q-deletions did not significantly alter the global gene expression pattern of fibroids, suggesting that these two groups do not have different biological backgrounds. Multiple differentially expressed genes were identified between FH wild-type and FH-mutant fibroids, and the most significant increase was seen in the expression of carbohydrate metabolism-related and hypoxia inducible factor (HIF) target genes.
Resumo:
Puumala virus (PUUV) is the causative agent of nephropathia epidemica (NE), a mild form of hemorrhagic fever with renal syndrome. Finland has the highest documented incidence of NE with around 1000 cases diagnosed annually. PUUV is also found in other Scandinavian countries, Central Europe and the European part of Russia. PUUV belongs to the genus Hantavirus in the family Bunyaviridae. Hantaviruses are rodent-borne viruses each carried by a specific host that is persistently and asymptomatically infected by the virus. PUUV is carried by the bank voles (Myodes glareolus, previously known as Clethrionomys glareolus). Hantaviruses have co-evolved with their carrier rodents for millions of years and these host animals are the evolutionary scene of hantaviruses. In this study, PUUV sequences were recovered from bank voles captured in Denmark and Russian Karelia to study the evolution of PUUV in Scandinavia. Phylogenetic analysis of these strains showed a geographical clustering of genetic variants following the presumable migration pattern of bank voles during the recolonization of Scandinavia after the last ice age approximately 10 000 years ago. The currently known PUUV genome sequences were subjected to in-depth phylogenetic analyses and the results showed that genetic drift seems to be the major mechanism of PUUV evolution. In general, PUUV seems to evolve quite slowly following a molecular clock. We also found evidence for recombination in the evolution of some genetic lineages of PUUV. Viral microevolution was studied in controlled virus transmission in colonized bank voles and changes in quasispecies dynamics were recorded as the virus was transmitted from one animal to another. We witnessed PUUV evolution in vivo, as one synonymous mutation became repeatedly fixed in the viral genome during the experiment. The detailed knowledge on the PUUV diversity was used to establish new sensitive and specific detection methods for this virus. Direct viral invasion of the hypophysis was demonstrated for the first time in a lethal case of NE. PUUV detection was done by immunohistochemistry, in situ hybridization and RT-nested-PCR of the autopsy tissue samples.
Resumo:
Acquiring sufficient information on the genetic variation, genetic differentiation, and the ecological and genetic relationships among individuals and populations are essential for establishing guidelines on conservation and utilization of the genetic resources of a species, and more particularly when biotic and abiotic stresses are considered. The aim of this study was to assess the extent and pattern of genetic variation in date palm (Phoenix dacttylifera L) cultivars; the genetic diversity and structure in its populations occurring over geographical ranges; the variation in economically and botanically important traits of it and the variation in its drought adaptive traits, in conservation and utilization context. In this study, the genetic diversity and relationships among selected cultivars from Sudan and Morocco were assessed using microsatellite markers. Microsatellite markers were also used to investigate the genetic diversity within and among populations collected from different geographic locations in Sudan. In a separate investigation, fruits of cultivars selected from Sudan, involved morphological and chemical characterization, and morphological and DNA polymorphism of the mother trees were also investigated. Morphological and photosynthetic adjustments to water stress were studied in the five most important date palm cultivars in Sudan, namely, Gondaila, Barakawi, Bitamoda, Khateeb and Laggai; and the mechanism enhancing photosynthetic gas exchange in date palm under water stress was also investigated. Results showed a significant (p < 0.001, t-test) differentiation between Sudan and Morocco groups of cultivars. However, the major feature of all tested cultivars was the complete lack of clustering and the absence of cultivars representing specific clones. The results indicated high genetic as well as compositional and morphological diversity among cultivars; while, compositional and morphological traits were found to be characteristic features that strongly differentiate cultivars as well as phenotypes. High genetic diversity was observed also in different populations. Slight but significant (p < 0.01, AMOVA) divergence was observed for soft and dry types; however, the genetic divergence among populations was relatively weak. The results showed a complex genetic relationships between some of the tested populations especially when isolation by distance was considered. The results of the study also revealed that date palm cultivars and phenotypes possess specific direct or interaction effects due to water availability on a range of morphological and physiological traits. Soft and dry phenotypes responded differently to different levels of water stress, while the dry phenotype was more sensitive and conservative. The results indicated that date palm has high fixation capacity to photosynthetic CO2 supply with interaction effect to water availability, which can be considered as advantageous when coping with stresses that may arise with climate change. In conclusion, although a large amount of diversity exists among date palm germplasm, the findings in this study show that the role of biological nature of the tree, isolation by distance and environmental effects on structuring date palm genome was highly influenced by human impacts. Identity of date palm cultivars as developed and manipulated by date palm growers, in the absence of scientific breeding programmes, may continue to mainly depend on tree morphology and fruit characters. The pattern of genetic differentiation may cover specific morphological and physiological traits that contribute to adaptive mechanisms in each phenotype. These traits can be considered for further studies related to drought adaptation in date palm.
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.