28 resultados para random number generator

em Helda - Digital Repository of University of Helsinki


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study reports a diachronic corpus investigation of common-number pronouns used to convey unknown or otherwise unspecified reference. The study charts agreement patterns in these pronouns in various diachronic and synchronic corpora. The objective is to provide base-line data on variant frequencies and distributions in the history of English, as there are no previous systematic corpus-based observations on this topic. This study seeks to answer the questions of how pronoun use is linked with the overall typological development in English and how their diachronic evolution is embedded in the linguistic and social structures in which they are used. The theoretical framework draws on corpus linguistics and historical sociolinguistics, grammaticalisation, diachronic typology, and multivariate analysis of modelling sociolinguistic variation. The method employs quantitative corpus analyses from two main electronic corpora, one from Modern English and the other from Present-day English. The Modern English material is the Corpus of Early English Correspondence, and the time frame covered is 1500-1800. The written component of the British National Corpus is used in the Present-day English investigations. In addition, the study draws supplementary data from other electronic corpora. The material is used to compare the frequencies and distributions of common-number pronouns between these two time periods. The study limits the common-number uses to two subsystems, one anaphoric to grammatically singular antecedents and one cataphoric, in which the pronoun is followed by a relative clause. Various statistical tools are used to process the data, ranging from cross-tabulations to multivariate VARBRUL analyses in which the effects of sociolinguistic and systemic parameters are assessed to model their impact on the dependent variable. This study shows how one pronoun type has extended its uses in both subsystems, an increase linked with grammaticalisation and the changes in other pronouns in English through the centuries. The variationist sociolinguistic analysis charts how grammaticalisation in the subsystems is embedded in the linguistic and social structures in which the pronouns are used. The study suggests a scale of two statistical generalisations of various sociolinguistic factors which contribute to grammaticalisation and its embedding at various stages of the process.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Helicobacter pylori infection is a risk factor for gastric cancer, which is a major health issue worldwide. Gastric cancer has a poor prognosis due to the unnoticeable progression of the disease and surgery is the only available treatment in gastric cancer. Therefore, gastric cancer patients would greatly benefit from identifying biomarker genes that would improve diagnostic and prognostic prediction and provide targets for molecular therapies. DNA copy number amplifications are the hallmarks of cancers in various anatomical locations. Mechanisms of amplification predict that DNA double-strand breaks occur at the margins of the amplified region. The first objective of this thesis was to identify the genes that were differentially expressed in H. pylori infection as well as the transcription factors and signal transduction pathways that were associated with the gene expression changes. The second objective was to identify putative biomarker genes in gastric cancer with correlated expression and copy number, and the last objective was to characterize cancers based on DNA copy number amplifications. DNA microarrays, an in vitro model and real-time polymerase chain reaction were used to measure gene expression changes in H. pylori infected AGS cells. In order to identify the transcription factors and signal transduction pathways that were activated after H. pylori infection, gene expression profiling data from the H. pylori experiments and a bioinformatics approach accompanied by experimental validation were used. Genome-wide expression and copy number microarray analysis of clinical gastric cancer samples and immunohistochemistry on tissue microarray were used to identify putative gastric cancer genes. Data mining and machine learning techniques were applied to study amplifications in a cross-section of cancers. FOS and various stress response genes were regulated by H. pylori infection. H. pylori regulated genes were enriched in the chromosomal regions that are frequently changed in gastric cancer, suggesting that molecular pathways of gastric cancer and premalignant H. pylori infection that induces gastritis are interconnected. 16 transcription factors were identified as being associated with H. pylori infection induced changes in gene expression. NF-κB transcription factor and p50 and p65 subunits were verified using elecrophoretic mobility shift assays. ERBB2 and other genes located in 17q12- q21 were found to be up-regulated in association with copy number amplification in gastric cancer. Cancers with similar cell type and origin clustered together based on the genomic localization of the amplifications. Cancer genes and large genes were co-localized with amplified regions and fragile sites, telomeres, centromeres and light chromosome bands were enriched at the amplification boundaries. H. pylori activated transcription factors and signal transduction pathways function in cellular mechanisms that might be capable of promoting carcinogenesis of the stomach. Intestinal and diffuse type gastric cancers showed distinct molecular genetic profiles. Integration of gene expression and copy number microarray data allowed the identification of genes that might be involved in gastric carcinogenesis and have clinical relevance. Gene amplifications were demonstrated to be non-random genomic instabilities. Cell lineage, properties of precursor stem cells, tissue microenvironment and genomic map localization of specific oncogenes define the site specificity of DNA amplifications, whereas labile genomic features define the structures of amplicons. These conclusions suggest that the definition of genomic changes in cancer is based on the interplay between the cancer cell and the tumor microenvironment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digital elevation models (DEMs) have been an important topic in geography and surveying sciences for decades due to their geomorphological importance as the reference surface for gravita-tion-driven material flow, as well as the wide range of uses and applications. When DEM is used in terrain analysis, for example in automatic drainage basin delineation, errors of the model collect in the analysis results. Investigation of this phenomenon is known as error propagation analysis, which has a direct influence on the decision-making process based on interpretations and applications of terrain analysis. Additionally, it may have an indirect influence on data acquisition and the DEM generation. The focus of the thesis was on the fine toposcale DEMs, which are typically represented in a 5-50m grid and used in the application scale 1:10 000-1:50 000. The thesis presents a three-step framework for investigating error propagation in DEM-based terrain analysis. The framework includes methods for visualising the morphological gross errors of DEMs, exploring the statistical and spatial characteristics of the DEM error, making analytical and simulation-based error propagation analysis and interpreting the error propagation analysis results. The DEM error model was built using geostatistical methods. The results show that appropriate and exhaustive reporting of various aspects of fine toposcale DEM error is a complex task. This is due to the high number of outliers in the error distribution and morphological gross errors, which are detectable with presented visualisation methods. In ad-dition, the use of global characterisation of DEM error is a gross generalisation of reality due to the small extent of the areas in which the decision of stationarity is not violated. This was shown using exhaustive high-quality reference DEM based on airborne laser scanning and local semivariogram analysis. The error propagation analysis revealed that, as expected, an increase in the DEM vertical error will increase the error in surface derivatives. However, contrary to expectations, the spatial au-tocorrelation of the model appears to have varying effects on the error propagation analysis depend-ing on the application. The use of a spatially uncorrelated DEM error model has been considered as a 'worst-case scenario', but this opinion is now challenged because none of the DEM derivatives investigated in the study had maximum variation with spatially uncorrelated random error. Sig-nificant performance improvement was achieved in simulation-based error propagation analysis by applying process convolution in generating realisations of the DEM error model. In addition, typology of uncertainty in drainage basin delineations is presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Planar curves arise naturally as interfaces between two regions of the plane. An important part of statistical physics is the study of lattice models. This thesis is about the interfaces of 2D lattice models. The scaling limit is an infinite system limit which is taken by letting the lattice mesh decrease to zero. At criticality, the scaling limit of an interface is one of the SLE curves (Schramm-Loewner evolution), introduced by Oded Schramm. This family of random curves is parametrized by a real variable, which determines the universality class of the model. The first and the second paper of this thesis study properties of SLEs. They contain two different methods to study the whole SLE curve, which is, in fact, the most interesting object from the statistical physics point of view. These methods are applied to study two symmetries of SLE: reversibility and duality. The first paper uses an algebraic method and a representation of the Virasoro algebra to find common martingales to different processes, and that way, to confirm the symmetries for polynomial expected values of natural SLE data. In the second paper, a recursion is obtained for the same kind of expected values. The recursion is based on stationarity of the law of the whole SLE curve under a SLE induced flow. The third paper deals with one of the most central questions of the field and provides a framework of estimates for describing 2D scaling limits by SLE curves. In particular, it is shown that a weak estimate on the probability of an annulus crossing implies that a random curve arising from a statistical physics model will have scaling limits and those will be well-described by Loewner evolutions with random driving forces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genetic studies on phylogeography and adaptive divergence in Northern Hemisphere fish species such as three-spined stickleback (Gasterosteus aculeatus) provide an excellent opportunity to investigate genetic mechanisms underlying population differentiation. According to the theory, the process of population differentiation results from a complex interplay between random and deterministic processes as well historical factors. The main scope in this thesis was to study how historical factors like the Pleistocene ice ages have shaped the patterns molecular diversity in three-spined stickleback populations in Europe and how this information could be utilized in the conservation genetic context. Furthermore, identifying footprints of natural selection at the DNA level might be used in identifying genes involved in evolutionary change. Overall, the results from phylogeographic studies indicate that the three-spined stickleback has colonized the Atlantic basin relatively recently but constitutes three major evolutionary lineages in Europe. In addition, the colonization of freshwater appears to result from multiple and independent invasions by the marine conspecifics. Molecular data together with morphology suggest that the most divergent freshwater populations are located in the Balkan Peninsula and these populations deserve a special conservation genetic status without warranting further taxonomical classification. In order to investigate the adaptive divergence in Fennoscandian three-spined stickleback populations several approaches were used. First, sequence variability in the Eda-gene, coding for the number of lateral plates, was concordant with the previously observed global pattern. Full plated allele is in high frequencies among marine populations whereas low plated allele dominates in the freshwater populations. Second, a microsatellite based genome scan identified both indications of balancing and directional selection in the three-spined stickleback genome, i.e. loci with unusually similar or unusually different allele frequencies over populations. The directionally selected loci were mainly associated with the adaptation to freshwater. A follow up study conducting a more detailed analysis in a chromosome region containing a putatively selected gene locus identified a fairly large genomic region affected by natural selection. However, this region contained several gene predictions, all of which might be the actual target of natural selection. All in all, the phylogeographic and adaptive divergence studies indicate that most of the genetic divergence has occurred in the freshwater populations whereas the marine populations have remained relatively uniform.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mitochondrial diseases are caused by disturbances of the energy metabolism. The disorders range from severe childhood neurological diseases to muscle diseases of adults. Recently, mitochondrial dysfunction has also been found in Parkinson s disease, diabetes, certain types of cancer and premature aging. Mitochondria are the power plants of the cell but they also participate in the regulation of cell growth, signaling and cell death. Mitochondria have their own genetic material, mtDNA, which contains the genetic instructions for cellular respiration. Single cell may host thousands of mitochondria and several mtDNA molecules may reside inside single mitochondrion. All proteins needed for mtDNA maintenance are, however, encoded by the nuclear genome, and therefore, mutations of the corresponding genes can also cause mitochondrial disease. We have here studied the function of mitochondrial helicase Twinkle. Our research group has previously identified nuclear Twinkle gene mutations underlying an inherited adult-onset disorder, progressive external ophthalmoplegia (PEO). Characteristic for the PEO disease is the accumulation of multiple mtDNA deletions in tissues such as the muscle and brain. In this study, we have shown that Twinkle helicase is essential for mtDNA maintenance and that it is capable of regulating mtDNA copy number. Our results support the role of Twinkle as the mtDNA replication helicase. No cure is available for mitochondrial disease. Good disease models are needed for studies of the cause of disease and its progression and for treatment trials. Such disease model, which replicates the key features of the PEO disease, has been generated in this study. The model allows for careful inspection of how Twinkle mutations lead to mtDNA deletions and further causes the PEO disease. This model will be utilized in a range of studies addressing the delay of the disease onset and progression and in subsequent treatment trials. In conclusion, in this thesis fundamental knowledge of the function of the mitochondrial helicase Twinkle was gained. In addition, the first model for adult-onset mitochondrial disease was generated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to predict the current state and future development of Earth s climate, detailed information on atmospheric aerosols and aerosol-cloud-interactions is required. Furthermore, these interactions need to be expressed in such a way that they can be represented in large-scale climate models. The largest uncertainties in the estimate of radiative forcing on the present day climate are related to the direct and indirect effects of aerosol. In this work aerosol properties were studied at Pallas and Utö in Finland, and at Mount Waliguan in Western China. Approximately two years of data from each site were analyzed. In addition to this, data from two intensive measurement campaigns at Pallas were used. The measurements at Mount Waliguan were the first long term aerosol particle number concentration and size distribution measurements conducted in this region. They revealed that the number concentration of aerosol particles at Mount Waliguan were much higher than those measured at similar altitudes in other parts of the world. The particles were concentrated in the Aitken size range indicating that they were produced within a couple of days prior to reaching the site, rather than being transported over thousands of kilometers. Aerosol partitioning between cloud droplets and cloud interstitial particles was studied at Pallas during the two measurement campaigns, First Pallas Cloud Experiment (First PaCE) and Second Pallas Cloud Experiment (Second PaCE). The method of using two differential mobility particle sizers (DMPS) to calculate the number concentration of activated particles was found to agree well with direct measurements of cloud droplet. Several parameters important in cloud droplet activation were found to depend strongly on the air mass history. The effects of these parameters partially cancelled out each other. Aerosol number-to-volume concentration ratio was studied at all three sites using data sets with long time-series. The ratio was found to vary more than in earlier studies, but less than either aerosol particle number concentration or volume concentration alone. Both air mass dependency and seasonal pattern were found at Pallas and Utö, but only seasonal pattern at Mount Waliguan. The number-to-volume concentration ratio was found to follow the seasonal temperature pattern well at all three sites. A new parameterization for partitioning between cloud droplets and cloud interstitial particles was developed. The parameterization uses aerosol particle number-to-volume concentration ratio and aerosol particle volume concentration as the only information on the aerosol number and size distribution. The new parameterization is computationally more efficient than the more detailed parameterizations currently in use, but the accuracy of the new parameterization was slightly lower. The new parameterization was also compared to directly observed cloud droplet number concentration data, and a good agreement was found.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Time-dependent backgrounds in string theory provide a natural testing ground for physics concerning dynamical phenomena which cannot be reliably addressed in usual quantum field theories and cosmology. A good, tractable example to study is the rolling tachyon background, which describes the decay of an unstable brane in bosonic and supersymmetric Type II string theories. In this thesis I use boundary conformal field theory along with random matrix theory and Coulomb gas thermodynamics techniques to study open and closed string scattering amplitudes off the decaying brane. The calculation of the simplest example, the tree-level amplitude of n open strings, would give us the emission rate of the open strings. However, even this has been unknown. I will organize the open string scattering computations in a more coherent manner and will argue how to make further progress.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A straightforward computation of the list of the words (the `tail words' of the list) that are distributionally most similar to a given word (the `head word' of the list) leads to the question: How semantically similar to the head word are the tail words; that is: how similar are their meanings to its meaning? And can we do better? The experiment was done on nearly 18,000 most frequent nouns in a Finnish newsgroup corpus. These nouns are considered to be distributionally similar to the extent that they occur in the same direct dependency relations with the same nouns, adjectives and verbs. The extent of the similarity of their computational representations is quantified with the information radius. The semantic classification of head-tail pairs is intuitive; some tail words seem to be semantically similar to the head word, some do not. Each such pair is also associated with a number of further distributional variables. Individually, their overlap for the semantic classes is large, but the trained classification-tree models have some success in using combinations to predict the semantic class. The training data consists of a random sample of 400 head-tail pairs with the tail word ranked among the 20 distributionally most similar to the head word, excluding names. The models are then tested on a random sample of another 100 such pairs. The best success rates range from 70% to 92% of the test pairs, where a success means that the model predicted my intuitive semantic class of the pair. This seems somewhat promising when distributional similarity is used to capture semantically similar words. This analysis also includes a general discussion of several different similarity formulas, arranged in three groups: those that apply to sets with graded membership, those that apply to the members of a vector space, and those that apply to probability mass functions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Defects in mitochondrial DNA (mtDNA) maintenance cause a range of human diseases, including autosomal dominant progressive external ophthalmoplegia (adPEO). This study aimed to clarify the molecular background of adPEO. We discovered that deoxynucleoside triphosphate (dNTP) metabolism plays a crucial in mtDNA maintenance and were thus prompted to search for therapeutic strategies based on the modulation of cellular dNTP pools or mtDNA copy number. Human mtDNA is a 16.6 kb circular molecule present in hundreds to thousands of copies per cell. mtDNA is compacted into nucleoprotein clusters called nucleoids. mtDNA maintenance diseases result from defects in nuclear encoded proteins that maintain the mtDNA. These syndromes typically afflict highly differentiated, post-mitotic tissues such as muscle and nerve, but virtually any organ can be affected. adPEO is a disease where mtDNA molecules with large-scale deletions accumulate in patients tissues, particularly in skeletal muscle. Mutations in five nuclear genes, encoding the proteins ANT1, Twinkle, POLG, POLG2 and OPA1, have previously been shown to cause adPEO. Here, we studied a large North American pedigree with adPEO, and identified a novel heterozygous mutation in the gene RRM2B, which encodes the p53R2 subunit of the enzyme ribonucleotide reductase (RNR). RNR is the rate-limiting enzyme in dNTP biosynthesis, and is required both for nuclear and mitochondrial DNA replication. The mutation results in the expression of a truncated form of p53R2, which is likely to compete with the wild-type allele. A change in enzyme function leads to defective mtDNA replication due to altered dNTP pools. Therefore, RRM2B is a novel adPEO disease gene. The importance of adequate dNTP pools and RNR function for mtDNA maintenance has been established in many organisms. In yeast, induction of RNR has previously been shown to increase mtDNA copy number, and to rescue the phenotype caused by mutations in the yeast mtDNA polymerase. To further study the role of RNR in mammalian mtDNA maintenance, we used mice that broadly overexpress the RNR subunits Rrm1, Rrm2 or p53R2. Active RNR is a heterotetramer consisting of two large subunits (Rrm1) and two small subunits (either Rrm2 or p53R2). We also created bitransgenic mice that overexpress Rrm1 together with either Rrm2 or p53R2. In contrast to the previous findings in yeast, bitransgenic RNR overexpression led to mtDNA depletion in mouse skeletal muscle, without mtDNA deletions or point mutations. The mtDNA depletion was associated with imbalanced dNTP pools. Furthermore, the mRNA expression levels of Rrm1 and p53R2 were found to correlate with mtDNA copy number in two independent mouse models, suggesting nuclear-mitochondrial cross talk with regard to mtDNA copy number. We conclude that tight regulation of RNR is needed to prevent harmful alterations in the dNTP pool balance, which can lead to disordered mtDNA maintenance. Increasing the copy number of wild-type mtDNA has been suggested as a strategy for treating PEO and other mitochondrial diseases. Only two proteins are known to cause a robust increase in mtDNA copy number when overexpressed in mice; the mitochondrial transcription factor A (TFAM), and the mitochondrial replicative helicase Twinkle. We studied the mechanisms by which Twinkle and TFAM elevate mtDNA levels, and showed that Twinkle specifically implements mtDNA synthesis. Furthermore, both Twinkle and TFAM were found to increase mtDNA content per nucleoid. Increased mtDNA content in mouse tissues correlated with an age-related accumulation of mtDNA deletions, depletion of mitochondrial transcripts, and progressive respiratory dysfunction. Simultaneous overexpression of Twinkle and TFAM led to a further increase in the mtDNA content of nucleoids, and aggravated the respiratory deficiency. These results suggested that high mtDNA levels have detrimental long-term effects in mice. These data have to be considered when developing and evaluating treatment strategies for elevating mtDNA copy number.