16 resultados para Inference Technique
em Helda - Digital Repository of University of Helsinki
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
The present challenge in drug discovery is to synthesize new compounds efficiently in minimal time. The trend is towards carefully designed and well-characterized compound libraries because fast and effective synthesis methods easily produce thousands of new compounds. The need for rapid and reliable analysis methods is increased at the same time. Quality assessment, including the identification and purity tests, is highly important since false (negative or positive) results, for instance in tests of biological activity or determination of early-ADME parameters in vitro (the pharmacokinetic study of drug absorption, distribution, metabolism, and excretion), must be avoided. This thesis summarizes the principles of classical planar chromatographic separation combined with ultraviolet (UV) and mass spectrometric (MS) detection, and introduces powerful, rapid, easy, low-cost, and alternative tools and techniques for qualitative and quantitative analysis of small drug or drug-like molecules. High performance thin-layer chromatography (HPTLC) was introduced and evaluated for fast semi-quantitative assessment of the purity of synthesis target compounds. HPTLC methods were compared with the liquid chromatography (LC) methods. Electrospray ionization mass spectrometry (ESI MS) and atmospheric pressure matrix-assisted laser desorption/ionization MS (AP MALDI MS) were used to identify and confirm the product zones on the plate. AP MALDI MS was rapid, and easy to carry out directly on the plate without scraping. The PLC method was used to isolate target compounds from crude synthesized products and purify them for bioactivity and preliminary ADME tests. Ultra-thin-layer chromatography (UTLC) with AP MALDI MS and desorption electrospray ionization mass spectrometry (DESI MS) was introduced and studied for the first time. Because of the thinner adsorbent layer, the monolithic UTLC plate provided 10 100 times better sensitivity in MALDI analysis than did HPTLC plates. The limits of detection (LODs) down to low picomole range were demonstrated for UTLC AP MALDI and UTLC DESI MS. In a comparison of AP and vacuum MALDI MS detection for UTLC plates, desorption from the irregular surface of the plates with the combination of an external AP MALDI ion source and an ion trap instrument provided clearly less variation in mass accuracy than the vacuum MALDI time-of-flight (TOF) instrument. The performance of the two-dimensional (2D) UTLC separation with AP MALDI MS method was studied for the first time. The influence of the urine matrix on the separation and the repeatability was evaluated with benzodiazepines as model substances in human urine. The applicability of 2D UTLC AP MALDI MS was demonstrated in the detection of metabolites in an authentic urine sample.
Resumo:
This doctoral thesis describes the development of a miniaturized capillary electrochromatography (CEC) technique suitable for the study of interactions between various nanodomains of biological importance. The particular focus of the study was low-density lipoprotein (LDL) particles and their interaction with components of the extracellular matrix (ECM). LDL transports cholesterol to the tissues through the blood circulation, but when the LDL level becomes too high the particles begin to permeate and accumulate in the arteries. Through binding sites on apolipoprotein B-100 (apoB-100), LDL interacts with components of the ECM, such as proteoglycans (PGs) and collagen, in what is considered the key mechanism in the retention of lipoproteins and onset of atherosclerosis. Hydrolytic enzymes and oxidizing agents in the ECM may later successively degrade the LDL surface. Metabolic diseases such as diabetes may provoke damage of the ECM structure through the non-enzymatic reaction of glucose with collagen. In this work, fused silica capillaries of 50 micrometer i.d. were successfully coated with LDL and collagen, and steroids and apoB-100 peptide fragments were introduced as model compounds for interaction studies. The LDL coating was modified with copper sulphate or hydrolytic enzymes, and the interactions of steroids with the native and oxidized lipoproteins were studied. Lipids were also removed from the LDL particle coating leaving behind an apoB-100 surface for further studies. The development of collagen and collagen decorin coatings was helpful in the elucidation of the interactions of apoB-100 peptide fragments with the primary ECM component, collagen. Furthermore, the collagen I coating provided a good platform for glycation studies and for clarification of LDL interactions with native and modified collagen. All methods developed are inexpensive, requiring just small amounts of biomaterial. Moreover, the experimental conditions in CEC are easily modified, and the analyses can be carried out in a reasonable time frame. Other techniques were employed to support and complement the CEC studies. Scanning electron microscopy and atomic force microscopy provided crucial visual information about the native and modified coatings. Asymmetrical flow field-flow fractionation enabled size measurements of the modified lipoproteins. Finally, the CEC results were exploited to develop new sensor chips for a continuous flow quartz crystal microbalance technique, which provided complementary information about LDL ECM interactions. This thesis demonstrates the potential of CEC as a valuable and flexible technique for surface interaction studies. Further, CEC can serve as a novel microreactor for the in situ modification of LDL and collagen coatings. The coatings developed in this study provide useful platforms for a diversity of future investigations on biological nanodomains.
Resumo:
Transfer from aluminum to copper metallization and decreasing feature size of integrated circuit devices generated a need for new diffusion barrier process. Copper metallization comprised entirely new process flow with new materials such as low-k insulators and etch stoppers, which made the diffusion barrier integration demanding. Atomic Layer Deposition technique was seen as one of the most promising techniques to deposit copper diffusion barrier for future devices. Atomic Layer Deposition technique was utilized to deposit titanium nitride, tungsten nitride, and tungsten nitride carbide diffusion barriers. Titanium nitride was deposited with a conventional process, and also with new in situ reduction process where titanium metal was used as a reducing agent. Tungsten nitride was deposited with a well-known process from tungsten hexafluoride and ammonia, but tungsten nitride carbide as a new material required a new process chemistry. In addition to material properties, the process integration for the copper metallization was studied making compatibility experiments on different surface materials. Based on these studies, titanium nitride and tungsten nitride processes were found to be incompatible with copper metal. However, tungsten nitride carbide film was compatible with copper and exhibited the most promising properties to be integrated for the copper metallization scheme. The process scale-up on 300 mm wafer comprised extensive film uniformity studies, which improved understanding of non-uniformity sources of the ALD growth and the process-specific requirements for the ALD reactor design. Based on these studies, it was discovered that the TiN process from titanium tetrachloride and ammonia required the reactor design of perpendicular flow for successful scale-up. The copper metallization scheme also includes process steps of the copper oxide reduction prior to the barrier deposition and the copper seed deposition prior to the copper metal deposition. Easy and simple copper oxide reduction process was developed, where the substrate was exposed gaseous reducing agent under vacuum and at elevated temperature. Because the reduction was observed efficient enough to reduce thick copper oxide film, the process was considered also as an alternative method to make the copper seed film via copper oxide reduction.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.
Resumo:
Sormen koukistajajännevamman korjauksen jälkeisen aktiivisen mobilisaation on todettu johtavan parempaan toiminnalliseen lopputulokseen kuin nykyisin yleisesti käytetyn dynaamisen mobilisaation. Aktiivisen mobilisaation ongelma on jännekorjauksen pettämisriskin lisääntyminen nykyisten ommeltekniikoiden riittämättömän vahvuuden vuoksi. Jännekorjauksen lujuutta on parannettu kehittämällä monisäieommeltekniikoita, joissa jänteeseen tehdään useita rinnakkaisia ydinompeleita. Niiden kliinistä käyttöä rajoittaa kuitenkin monimutkainen ja aikaa vievä tekninen suoritus. Käden koukistajajännekorjauksessa käytetään yleisesti sulamattomia ommelmateriaaleja. Nykyiset käytössä olevat biohajoavat langat heikkenevät liian nopeasti jänteen paranemiseen nähden. Biohajoavan laktidistereokopolymeeri (PLDLA) 96/4 – langan vetolujuuden puoliintumisajan sekä kudosominaisuuksien on aiemmin todettu soveltuvan koukistajajännekorjaukseen. Tutkimuksen tavoitteena oli kehittää välittömän aktiivisen mobilisaation kestävä ja toteutukseltaan yksinkertainen käden koukistajajännekorjausmenetelmä biohajoavaa PLDLA 96/4 –materiaalia käyttäen. Tutkimuksessa analysoitiin viiden eri yleisesti käytetyn koukistajajänneompeleen biomekaanisia ominaisuuksia staattisessa vetolujuustestauksessa ydinompeleen rakenteellisten ominaisuuksien – 1) säikeiden (lankojen) lukumäärän, 2) langan paksuuden ja 3) ompeleen konfiguraation – vaikutuksen selvittämiseksi jännekorjauksen pettämiseen ja vahvuuteen. Jännekorjausten näkyvän avautumisen todettiin alkavan perifeerisen ompeleen pettäessä voima-venymäkäyrän myötöpisteessä. Ydinompeleen lankojen lukumäärän lisääminen paransi ompeleen pitokykyä jänteessä ja suurensi korjauksen myötövoimaa. Sen sijaan paksumman (vahvemman) langan käyttäminen tai ompeleen konfiguraatio eivät vaikuttaneet myötövoimaan. Tulosten perusteella tutkittiin mahdollisuuksia lisätä ompeleen pitokykyä jänteestä yksinkertaisella monisäieompeleella, jossa ydinommel tehtiin kolmen säikeen polyesterilangalla tai nauhamaisen rakenteen omaavalla kolmen säikeen polyesterilangalla. Nauhamainen rakenne lisäsi merkitsevästi ompeleen pitokykyä jänteessä parantaen myötövoimaa sekä maksimivoimaa. Korjauksen vahvuus ylitti aktiivisen mobilisaation jännekorjaukseen kohdistaman kuormitustason. PLDLA 96/4 –langan soveltuvuutta koukistajajännekorjaukseen selvitettiin tutkimalla langan biomekaanisia ominaisuuksia ja solmujen pito-ominaisuuksia staattisessa vetolujuustestauksessa verrattuna yleisimmin jännekorjauksessa käytettävään punottuun polyesterilankaan (Ticron®). PLDLA –langan todettiin soveltuvan hyvin koukistajajännekorjaukseen, sillä se on polyesterilankaa venymättömämpi ja solmujen pitävyys on parempi. Viimeisessä vaiheessa tutkittiin PLDLA 96/4 –langasta valmistetulla kolmisäikeisellä, nauhamaisella jännekorjausvälineellä tehdyn jännekorjauksen kestävyyttä staattisessa vetolujuustestauksessa sekä syklisessä kuormituksessa, joka simuloi staattista testausta paremmin mobilisaation toistuvaa kuormitusta. PLDLA-korjauksen vahvuus ylitti sekä staattisessa että syklisessä kuormituksessa aktiivisen mobilisaation edellyttämän vahvuuden. Nauhamaista litteää ommelmateriaalia ei aiemmin ole tutkittu tai käytetty käden koukistajajännekorjauksessa. Tässä tutkimuksessa ommelmateriaalin nauhamainen rakenne paransi merkitsevästi jännekorjauksen vahvuutta, minkä arvioidaan johtuvan lisääntyneestä kontaktipinnasta jänteen ja ommelmateriaalin välillä estäen ompeleen läpileikkautumista jänteessä. Tutkimuksessa biohajoavasta PLDLA –materiaalista valmistetulla rakenteeltaan nauhamaisella kolmisäikeisellä langalla tehdyn jännekorjauksen vahvuus saavutti aktiivisen mobilisaation edellyttämän tason. Lisäksi uusi menetelmä on helppokäyttöinen ja sillä vältetään perinteisten monisäieompeleiden tekniseen suoritukseen liittyvät ongelmat.
Resumo:
The main method of modifying properties of semiconductors is to introduce small amount of impurities inside the material. This is used to control magnetic and optical properties of materials and to realize p- and n-type semiconductors out of intrinsic material in order to manufacture fundamental components such as diodes. As diffusion can be described as random mixing of material due to thermal movement of atoms, it is essential to know the diffusion behavior of the impurities in order to manufacture working components. In modified radiotracer technique diffusion is studied using radioactive isotopes of elements as tracers. The technique is called modified as atoms are deployed inside the material by ion beam implantation. With ion implantation, a distinct distribution of impurities can be deployed inside the sample surface with good con- trol over the amount of implanted atoms. As electromagnetic radiation and other nuclear decay products emitted by radioactive materials can be easily detected, only very low amount of impurities can be used. This makes it possible to study diffusion in pure materials without essentially modifying the initial properties by doping. In this thesis a modified radiotracer technique is used to study the diffusion of beryllium in GaN, ZnO, SiGe and glassy carbon. GaN, ZnO and SiGe are of great interest to the semiconductor industry and beryllium as a small and possibly rapid dopant hasn t been studied previously using the technique. Glassy carbon has been added to demonstrate the feasibility of the technique. In addition, the diffusion of magnetic impurities, Mn and Co, has been studied in GaAs and ZnO (respectively) with spintronic applications in mind.
Resumo:
We present a measurement of the $WW+WZ$ production cross section observed in a final state consisting of an identified electron or muon, two jets, and missing transverse energy. The measurement is carried out in a data sample corresponding to up to 4.6~fb$^{-1}$ of integrated luminosity at $\sqrt{s} = 1.96$ TeV collected by the CDF II detector. Matrix element calculations are used to separate the diboson signal from the large backgrounds. The $WW+WZ$ cross section is measured to be $17.4\pm3.3$~pb, in agreement with standard model predictions. A fit to the dijet invariant mass spectrum yields a compatible cross section measurement.
Resumo:
We present a measurement of the top quark mass in the all-hadronic channel (\tt $\to$ \bb$q_{1}\bar{q_{2}}q_{3}\bar{q_{4}}$) using 943 pb$^{-1}$ of \ppbar collisions at $\sqrt {s} = 1.96$ TeV collected at the CDF II detector at Fermilab (CDF). We apply the standard model production and decay matrix-element (ME) to $\ttbar$ candidate events. We calculate per-event probability densities according to the ME calculation and construct template models of signal and background. The scale of the jet energy is calibrated using additional templates formed with the invariant mass of pairs of jets. These templates form an overall likelihood function that depends on the top quark mass and on the jet energy scale (JES). We estimate both by maximizing this function. Given 72 observed events, we measure a top quark mass of 171.1 $\pm$ 3.7 (stat.+JES) $\pm$ 2.1 (syst.) GeV/$c^{2}$. The combined uncertainty on the top quark mass is 4.3 GeV/$c^{2}$.
Resumo:
In the thesis we consider inference for cointegration in vector autoregressive (VAR) models. The thesis consists of an introduction and four papers. The first paper proposes a new test for cointegration in VAR models that is directly based on the eigenvalues of the least squares (LS) estimate of the autoregressive matrix. In the second paper we compare a small sample correction for the likelihood ratio (LR) test of cointegrating rank and the bootstrap. The simulation experiments show that the bootstrap works very well in practice and dominates the correction factor. The tests are applied to international stock prices data, and the .nite sample performance of the tests are investigated by simulating the data. The third paper studies the demand for money in Sweden 1970—2000 using the I(2) model. In the fourth paper we re-examine the evidence of cointegration between international stock prices. The paper shows that some of the previous empirical results can be explained by the small-sample bias and size distortion of Johansen’s LR tests for cointegration. In all papers we work with two data sets. The first data set is a Swedish money demand data set with observations on the money stock, the consumer price index, gross domestic product (GDP), the short-term interest rate and the long-term interest rate. The data are quarterly and the sample period is 1970(1)—2000(1). The second data set consists of month-end stock market index observations for Finland, France, Germany, Sweden, the United Kingdom and the United States from 1980(1) to 1997(2). Both data sets are typical of the sample sizes encountered in economic data, and the applications illustrate the usefulness of the models and tests discussed in the thesis.