Biblioteca Digital

38 resultados para Images - Computational methods

Dynamic nature of proteins - interpretation of residual dipolar couplings

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Protein conformations and dynamics can be studied by nuclear magnetic resonance spectroscopy using dilute liquid crystalline samples. This work clarifies the interpretation of residual dipolar coupling data yielded by the experiments. It was discovered that unfolded proteins without any additional structure beyond that of a mere polypeptide chain exhibit residual dipolar couplings. Also, it was found that molecular dynamics induce fluctuations in the molecular alignment and doing so affect residual dipolar couplings. The finding clarified the origins of low order parameter values observed earlier. The work required the development of new analytical and computational methods for the prediction of intrinsic residual dipolar coupling profiles for unfolded proteins. The presented characteristic chain model is able to reproduce the general trend of experimental residual dipolar couplings for denatured proteins. The details of experimental residual dipolar coupling profiles are beyond the analytical model, but improvements are proposed to achieve greater accuracy. A computational method for rapid prediction of unfolded protein residual dipolar couplings was also developed. Protein dynamics were shown to modulate the effective molecular alignment in a dilute liquid crystalline medium. The effects were investigated from experimental and molecular dynamics generated conformational ensembles of folded proteins. It was noted that dynamics induced alignment is significant especially for the interpretation of molecular dynamics in small, globular proteins. A method of correction was presented. Residual dipolar couplings offer an attractive possibility for the direct observation of protein conformational preferences and dynamics. The presented models and methods of analysis provide significant advances in the interpretation of residual dipolar coupling data from proteins.

Multiscale study on hydrogen mobility in metallic fusion divertor material

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For achieving efficient fusion energy production, the plasma-facing wall materials of the fusion reactor should ensure long time operation. In the next step fusion device, ITER, the first wall region facing the highest heat and particle load, i.e. the divertor area, will mainly consist of tiles based on tungsten. During the reactor operation, the tungsten material is slowly but inevitably saturated with tritium. Tritium is the relatively short-lived hydrogen isotope used in the fusion reaction. The amount of tritium retained in the wall materials should be minimized and its recycling back to the plasma must be unrestrained, otherwise it cannot be used for fueling the plasma. A very expensive and thus economically not viable solution is to replace the first walls quite often. A better solution is to heat the walls to temperatures where tritium is released. Unfortunately, the exact mechanisms of hydrogen release in tungsten are not known. In this thesis both experimental and computational methods have been used for studying the release and retention of hydrogen in tungsten. The experimental work consists of hydrogen implantations into pure polycrystalline tungsten, the determination of the hydrogen concentrations using ion beam analyses (IBA) and monitoring the out-diffused hydrogen gas with thermodesorption spectrometry (TDS) as the tungsten samples are heated at elevated temperatures. Combining IBA methods with TDS, the retained amount of hydrogen is obtained as well as the temperatures needed for the hydrogen release. With computational methods the hydrogen-defect interactions and implantation-induced irradiation damage can be examined at the atomic level. The method of multiscale modelling combines the results obtained from computational methodologies applicable at different length and time scales. Electron density functional theory calculations were used for determining the energetics of the elementary processes of hydrogen in tungsten, such as diffusivity and trapping to vacancies and surfaces. Results from the energetics of pure tungsten defects were used in the development of an classical bond-order potential for describing the tungsten defects to be used in molecular dynamics simulations. The developed potential was utilized in determination of the defect clustering and annihilation properties. These results were further employed in binary collision and rate theory calculations to determine the evolution of large defect clusters that trap hydrogen in the course of implantation. The computational results for the defect and trapped hydrogen concentrations were successfully compared with the experimental results. With the aforedescribed multiscale analysis the experimental results within this thesis and found in the literature were explained both quantitatively and qualitatively.

Compact differences of composition operators on Bloch and Lipschitz spaces

Relevância:

80.00% 80.00%

Publicador:

11β-Hydroxysteroid dehydrogenase inhibitors and selectivity modeling of 11β-HSDs

Relevância:

80.00% 80.00%

Publicador:

Resumo:

11β-hydroksisteroididehydrogenaasientsyymit (11β-HSD) 1 ja 2 säätelevät kortisonin ja kortisolin määrää kudoksissa. 11β-HSD1 -entsyymin ylimäärä erityisesti viskeraalisessa rasvakudoksessa aiheuttaa metaboliseen oireyhtymän klassisia oireita, mikä tarjoaa mahdollisuuden metabolisen oireyhtymän hoitoon 11β-HSD1 -entsyymin selektiivisellä estämisellä. 11β-HSD2 -entsyymin inhibitio aiheuttaa kortisonivälitteisen mineralokortikoidireseptorien aktivoitumisen, mikä puolestaan johtaa hypertensiivisiin haittavaikutuksiin. Haittavaikutuksista huolimatta 11β-HSD2 -entsyymin estäminen saattaa olla hyödyllistä tilanteissa, joissa halutaan nostaa kortisolin määrä elimistössä. Lukuisia selektiivisiä 11β-HSD1 inhibiittoreita on kehitetty, mutta 11β-HSD2-inhibiittoreita on raportoitu vähemmän. Ero näiden kahden isotsyymin aktiivisen kohdan välillä on myös tuntematon, mikä vaikeuttaa selektiivisten inhibiittoreiden kehittämistä kummallekin entsyymille. Tällä työllä oli kaksi tarkoitusta: (1) löytää ero 11β-HSD entsyymien välillä ja (2) kehittää farmakoforimalli, jota voitaisiin käyttää selektiivisten 11β-HSD2 -inhibiittoreiden virtuaaliseulontaan. Ongelmaa lähestyttiin tietokoneavusteisesti: homologimallinnuksella, pienmolekyylien telakoinnilla proteiiniin, ligandipohjaisella farmakoforimallinnuksella ja virtuaaliseulonnalla. Homologimallinnukseen käytettiin SwissModeler -ohjelmaa, ja luotu malli oli hyvin päällekäinaseteltavissa niin templaattinsa (17β-HSD1) kuin 11β-HSD1 -entsyymin kanssa. Eroa entsyymien välillä ei löytynyt tarkastelemalla päällekäinaseteltuja entsyymejä. Seitsemän yhdistettä, joista kuusi on 11β-HSD2 -selektiivisiä, telakoitiin molempiin entsyymeihin käyttäen ohjelmaa GOLD. 11β-HSD1 -entsyymiin yhdisteet kiinnittyivät kuten suurin osa 11β-HSD1 -selektiivisistä tai epäselektiivisistä inhibiittoreista, kun taas 11β-HSD2 -entsyymiin kaikki yhdisteet olivat telakoituneet käänteisesti. Tällainen sitoutumistapa mahdollistaa vetysidokset Ser310:een ja Asn171:een, aminohappoihin, jotka olivat nähtävissä vain 11β-HSD2 -entsyymissä. Farmakoforimallinnukseen käytettiin ohjelmaa LigandScout3.0, jolla ajettiin myös virtuaaliseulonnat. Luodut kaksi farmakoforimallia, jotka perustuivat aiemmin telakointiinkin käytettyihin kuuteen 11β-HSD2 -selektiiviseen yhdisteeseen, koostuivat kuudesta ominaisuudesta (vetysidosakseptori, vetysidosdonori ja hydrofobinen), ja kieltoalueista. 11β-HSD2 -selektiivisyyden kannalta tärkeimmät ominaisuudet ovat vetysidosakseptori, joka voi muodostaa sidoksen Ser310 kanssa ja vetysidosdonori sen vieressä. Tälle vetysidosdonorille ei löytynyt vuorovaikutusparia 11β-HSD2-mallista. Sopivasti proteiiniin orientoitunut vesimolekyyli voisi kuitenkin olla sopiva ratkaisu puuttuvalle vuorovaikutusparille. Koska molemmat farmakoforimallit löysivät 11β-HSD2 -selektiivisiä yhdisteitä ja jättivät epäselektiivisiä pois testiseulonnassa, käytettiin molempia malleja Innsbruckin yliopistossa säilytettävistä yhdisteistä (2700 kappaletta) koostetun tietokannan seulontaan. Molemmista seulonnoista löytyneistä hiteistä valittiin yhteensä kymmenen kappaletta, jotka lähetettiin biologisiin testeihin. Biologisien testien tulokset vahvistavat lopullisesti sen kuinka hyvin luodut mallit edustavat todellisuudessa 11β-HSD2 -selektiivisyyttä.

Studies of the Human Transcriptome

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Gene expression is one of the most critical factors influencing the phenotype of a cell. As a result of several technological advances, measuring gene expression levels has become one of the most common molecular biological measurements to study the behaviour of cells. The scientific community has produced enormous and constantly increasing collection of gene expression data from various human cells both from healthy and pathological conditions. However, while each of these studies is informative and enlighting in its own context and research setup, diverging methods and terminologies make it very challenging to integrate existing gene expression data to a more comprehensive view of human transcriptome function. On the other hand, bioinformatic science advances only through data integration and synthesis. The aim of this study was to develop biological and mathematical methods to overcome these challenges and to construct an integrated database of human transcriptome as well as to demonstrate its usage. Methods developed in this study can be divided in two distinct parts. First, the biological and medical annotation of the existing gene expression measurements needed to be encoded by systematic vocabularies. There was no single existing biomedical ontology or vocabulary suitable for this purpose. Thus, new annotation terminology was developed as a part of this work. Second part was to develop mathematical methods correcting the noise and systematic differences/errors in the data caused by various array generations. Additionally, there was a need to develop suitable computational methods for sample collection and archiving, unique sample identification, database structures, data retrieval and visualization. Bioinformatic methods were developed to analyze gene expression levels and putative functional associations of human genes by using the integrated gene expression data. Also a method to interpret individual gene expression profiles across all the healthy and pathological tissues of the reference database was developed. As a result of this work 9783 human gene expression samples measured by Affymetrix microarrays were integrated to form a unique human transcriptome resource GeneSapiens. This makes it possible to analyse expression levels of 17330 genes across 175 types of healthy and pathological human tissues. Application of this resource to interpret individual gene expression measurements allowed identification of tissue of origin with 92.0% accuracy among 44 healthy tissue types. Systematic analysis of transcriptional activity levels of 459 kinase genes was performed across 44 healthy and 55 pathological tissue types and a genome wide analysis of kinase gene co-expression networks was done. This analysis revealed biologically and medically interesting data on putative kinase gene functions in health and disease. Finally, we developed a method for alignment of gene expression profiles (AGEP) to perform analysis for individual patient samples to pinpoint gene- and pathway-specific changes in the test sample in relation to the reference transcriptome database. We also showed how large-scale gene expression data resources can be used to quantitatively characterize changes in the transcriptomic program of differentiating stem cells. Taken together, these studies indicate the power of systematic bioinformatic analyses to infer biological and medical insights from existing published datasets as well as to facilitate the interpretation of new molecular profiling data from individual patients.

Exact bounds for distributed graph colouring

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A distributed system is a collection of networked autonomous processing units which must work in a cooperative manner. Currently, large-scale distributed systems, such as various telecommunication and computer networks, are abundant and used in a multitude of tasks. The field of distributed computing studies what can be computed efficiently in such systems. Distributed systems are usually modelled as graphs where nodes represent the processors and edges denote communication links between processors. This thesis concentrates on the computational complexity of the distributed graph colouring problem. The objective of the graph colouring problem is to assign a colour to each node in such a way that no two nodes connected by an edge share the same colour. In particular, it is often desirable to use only a small number of colours. This task is a fundamental symmetry-breaking primitive in various distributed algorithms. A graph that has been coloured in this manner using at most k different colours is said to be k-coloured. This work examines the synchronous message-passing model of distributed computation: every node runs the same algorithm, and the system operates in discrete synchronous communication rounds. During each round, a node can communicate with its neighbours and perform local computation. In this model, the time complexity of a problem is the number of synchronous communication rounds required to solve the problem. It is known that 3-colouring any k-coloured directed cycle requires at least ½(log* k - 3) communication rounds and is possible in ½(log* k + 7) communication rounds for all k ≥ 3. This work shows that for any k ≥ 3, colouring a k-coloured directed cycle with at most three colours is possible in ½(log* k + 3) rounds. In contrast, it is also shown that for some values of k, colouring a directed cycle with at most three colours requires at least ½(log* k + 1) communication rounds. Furthermore, in the case of directed rooted trees, reducing a k-colouring into a 3-colouring requires at least log* k + 1 rounds for some k and possible in log* k + 3 rounds for all k ≥ 3. The new positive and negative results are derived using computational methods, as the existence of distributed colouring algorithms corresponds to the colourability of so-called neighbourhood graphs. The colourability of these graphs is analysed using Boolean satisfiability (SAT) solvers. Finally, this thesis shows that similar methods are applicable in capturing the existence of distributed algorithms for other graph problems, such as the maximal matching problem.

Lower Gastrointestinal Microbiota in Health and Irritable Bowel Syndrome : Characterisation and Effect of Probiotic Intervention

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The human gastrointestinal (GI) microbiota is a complex ecosystem that lives in symbiosis with its host. The growing awareness of the importance of the microbiota to the host as well as the development of culture-free laboratory techniques and computational methods has enormously expanded our knowledge of this microbial community. Irritable bowel syndrome (IBS) is a common functional bowel disorder affecting up to a fifth of the Western population. To date, IBS diagnosis has been based on GI symptoms and the exclusion of organic diseases. The GI microbiota has been found to be altered in this syndrome and probiotics can alleviate the symptoms, although clear links between the symptoms and the microbiota have not been demonstrated. The aim of the present work was to characterise IBS related alterations in the intestinal microbiota, their relation to IBS symptoms and their responsiveness to probiotic theraphy. In this thesis research, the healthy human microbiota was characterised by cloning and sequencing 16S rRNA genes from a faecal microbial community DNA pool that was first profiled and fractionated according to its guanine and cytosine content (%G+C). The most noticeable finding was that the high G+C Gram-positive bacteria (the phylum Actinobacteria) were more abundant compared to a corresponding library constructed from the unfractionated DNA pool sample. Previous molecular analyses of the gut microbiota have also shown comparatively low amounts of high G+C bacteria. Furthermore, the %G+C profiling approach was applied to a sample constructed of faecal DNA from diarrhea-predominant IBS (IBS-D) subjects. The phylogenetic microbial community comparison performed for healthy and IBS-D sequence libraries revealed that the IBS-D sample was rich in representatives of the phyla Firmicutes and Proteobacteria whereas Actinobacteria and Bacteroidetes were abundant in the healthy subjects. The family Lachnospiraceae within the Firmicutes was especially prevalent in the IBS-D sample. Moreover, associations of the GI microbiota with intestinal symptoms and the quality of life (QOL) were investigated, as well as the effect of probiotics on these factors. The microbial targets that were analysed with the quantitative real-time polymerase chain reaction (qPCR) in this study were phylotypes (species definition according to 16S rRNA gene sequence similarity) previously associated with either health or IBS. With a set of samples, the presence or abundance of a phylotype that had 94% 16S rRNA gene sequence similarity to Ruminococcus torques (R. torques 94%) was shown to be associated with the severity of IBS symptoms. The qPCR analyses for selected phylotypes were also applied to samples from a six-month probiotic intervention with a mixture of Lactobacillus rhamnosus GG, L. rhamnosus Lc705, Propionibacterium freudenreichii ssp. shermanii JS and Bifidobacterium breve Bb99. The intervention had been previously reported to alleviate IBS symptoms, but no associations with the analysed microbiota representatives were shown. However, with the phylotype-specific assays applied here, the abundance of the R. torques 94% -phylotype was shown to be lowered in the probiotic-receiving group during the probiotic supplementation, whereas a Clostridium thermosuccinogenes 85% phylotype, previously associated with a healthy microbiota, was found to be increased compared to the placebo group. To conclude, with the combination of methods applied, higher abundance of Actinobacteria was detected in the healthy gut than found in previous studies, and significant phylum-level microbiota alterations could be shown in IBS-D. Thus, the results of this study provide a detailed overview of the human GI microbiota in healthy subjects and in subjects with IBS. Furthermore, the IBS symptoms were linked to a particular clostridial phylotype, and probiotic supplementation was demonstrated to alter the GI microbiota towards a healthier state with regard to this and an additional bacterial phylotype. For the first time, distinct phylotype-level alterations in the microbiota were linked to IBS symptoms and shown to respond to probiotic therapy.

Studies in Computational Statistical Methods with Applications to Pattern Recognition and Data Analysis

Relevância:

40.00% 40.00%

Publicador:

Numerical Methods for Unconstrained Minimization - An Integrated Computational Environment

Relevância:

40.00% 40.00%

Publicador:

Matrix Decomposition Methods for Data Mining : Computational Complexity and Algorithms

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.

Learning Nonlinear Visual Processing from Natural Images

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The paradigm of computational vision hypothesizes that any visual function -- such as the recognition of your grandparent -- can be replicated by computational processing of the visual input. What are these computations that the brain performs? What should or could they be? Working on the latter question, this dissertation takes the statistical approach, where the suitable computations are attempted to be learned from the natural visual data itself. In particular, we empirically study the computational processing that emerges from the statistical properties of the visual world and the constraints and objectives specified for the learning process. This thesis consists of an introduction and 7 peer-reviewed publications, where the purpose of the introduction is to illustrate the area of study to a reader who is not familiar with computational vision research. In the scope of the introduction, we will briefly overview the primary challenges to visual processing, as well as recall some of the current opinions on visual processing in the early visual systems of animals. Next, we describe the methodology we have used in our research, and discuss the presented results. We have included some additional remarks, speculations and conclusions to this discussion that were not featured in the original publications. We present the following results in the publications of this thesis. First, we empirically demonstrate that luminance and contrast are strongly dependent in natural images, contradicting previous theories suggesting that luminance and contrast were processed separately in natural systems due to their independence in the visual data. Second, we show that simple cell -like receptive fields of the primary visual cortex can be learned in the nonlinear contrast domain by maximization of independence. Further, we provide first-time reports of the emergence of conjunctive (corner-detecting) and subtractive (opponent orientation) processing due to nonlinear projection pursuit with simple objective functions related to sparseness and response energy optimization. Then, we show that attempting to extract independent components of nonlinear histogram statistics of a biologically plausible representation leads to projection directions that appear to differentiate between visual contexts. Such processing might be applicable for priming, \ie the selection and tuning of later visual processing. We continue by showing that a different kind of thresholded low-frequency priming can be learned and used to make object detection faster with little loss in accuracy. Finally, we show that in a computational object detection setting, nonlinearly gain-controlled visual features of medium complexity can be acquired sequentially as images are encountered and discarded. We present two online algorithms to perform this feature selection, and propose the idea that for artificial systems, some processing mechanisms could be selectable from the environment without optimizing the mechanisms themselves. In summary, this thesis explores learning visual processing on several levels. The learning can be understood as interplay of input data, model structures, learning objectives, and estimation algorithms. The presented work adds to the growing body of evidence showing that statistical methods can be used to acquire intuitively meaningful visual processing mechanisms. The work also presents some predictions and ideas regarding biological visual processing.

Preparation of Industrially Important Hydroxy Acids and Diacids from 2,2-Disubstituted Propane-1,3-Diols and Linear Primary Diols by Green Chemistry Methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Environmentally benign and economical methods for the preparation of industrially important hydroxy acids and diacids were developed. The carboxylic acids, used in polyesters, alkyd resins, and polyamides, were obtained by the oxidation of the corresponding alcohols with hydrogen peroxide or air catalyzed by sodium tungstate or supported noble metals. These oxidations were carried out using water as a solvent. The alcohols are also a useful alternative to the conventional reactants, hydroxyaldehydes and cycloalkanes. The oxidation of 2,2-disubstituted propane-1,3-diols with hydrogen peroxide catalyzed by sodium tungstate afforded 2,2-disubstituted 3-hydroxypropanoic acids and 1,1-disubstituted ethane-1,2-diols as products. A computational study of the Baeyer-Villiger rearrangement of the intermediate 2,2-disubstituted 3-hydroxypropanals gave in-depth data of the mechanism of the reaction. Linear primary diols having chain length of at least six carbons were easily oxidized with hydrogen peroxide to linear dicarboxylic acids catalyzed by sodium tungstate. The Pt/C catalyzed air oxidation of 2,2-disubstituted propane-1,3-diols and linear primary diols afforded the highest yield of the corresponding hydroxy acids, while the Pt, Bi/C catalyzed oxidation of the diols afforded the highest yield of the corresponding diacids. The mechanism of the promoted oxidation was best described by the ensemble effect, and by the formation of a complex of the hydroxy and the carboxy groups of the hydroxy acids with bismuth atoms. The Pt, Bi/C catalyzed air oxidation of 2-substituted 2-hydroxymethylpropane-1,3-diols gave 2-substituted malonic acids by the decarboxylation of the corresponding triacids. Activated carbon was the best support and bismuth the most efficient promoter in the air oxidation of 2,2-dialkylpropane-1,3-diols to diacids. In oxidations carried out in organic solvents barium sulfate could be a valuable alternative to activated carbon as a non-flammable support. In the Pt/C catalyzed air oxidation of 2,2-disubstituted propane-1,3-diols to 2,2-disubstituted 3-hydroxypropanoic acids the small size of the 2-substituents enhanced the rate of the oxidation. When the potential of platinum of the catalyst was not controlled, the highest yield of the diacids in the Pt, Bi/C catalyzed air oxidation of 2,2-dialkylpropane-1,3-diols was obtained in the regime of mass transfer. The most favorable pH of the reaction mixture of the promoted oxidation was 10. The reaction temperature of 40°C prevented the decarboxylation of the diacids.

Computational Studies on Bimetallic Clusters

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The chemical and physical properties of bimetallic clusters have attracted considerable attention due to the potential technological applications of mixed-metal systems. It is of fundamental interests to study clusters because they are the link between atomic surface and bulk properties. More information of metal-metal bond in small clusters can be hence released. The studies in my thesis mainly focus on the two different kinds of bimetallic clusters: the clusters consisting of extraordinary shaped all metal four-membered rings and a series of sodium auride clusters. As described in most general organic chemistry books nowadays, a group of compounds are classified as aromatic compounds because of their remarkable stabilities, particular geometrical and energetic properties and so on. The notation of aromaticity is essentially qualitative. More recently, the connection has been made between aromaticity and energetic and magnetic properties. Also, the discussions of the aromatic nature of molecular rings are no longer limited to organic compounds obeying the Hückel’s rule. In our research, we mainly applied the GIMIC method to several bimetallic clusters at the CCSD level, and compared the results with those obtained by using chemical shift based methods. The magnetically induced ring currents can be generated easily by employing GIMIC method, and the nature of aromaticity for each system can be therefore clarified. We performed intensive quantum chemical calculations to explore the characters of the anionic sodium auride clusters and the corresponding neutral clusters since it has been fascinating in investigating molecules with gold atom involved due to its distinctive physical and chemical properties. As small gold clusters, the sodium auride clusters seem to form planar structures. With the addition of a negative charge, the gold atom in anionic clusters prefers to carry the charge and orients itself away from other gold atoms. As a result, the energetically lowest isomer for an anionic cluster is distinguished from the one for the corresponding neutral cluster. Mostly importantly, we presented a comprehensive strategy of ab initio applications to computationally implement the experimental photoelectron spectra.

Statistical and Information-Theoretic Methods for Data-Analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.

Computational Techniques for Haplotype Inference and for Local Alignment Significance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.

«
1
2
3
»