5 resultados para Database application, Biologia cellulare, Image retrieval
em Helda - Digital Repository of University of Helsinki
Resumo:
The usual task in music information retrieval (MIR) is to find occurrences of a monophonic query pattern within a music database, which can contain both monophonic and polyphonic content. The so-called query-by-humming systems are a famous instance of content-based MIR. In such a system, the user's hummed query is converted into symbolic form to perform search operations in a similarly encoded database. The symbolic representation (e.g., textual, MIDI or vector data) is typically a quantized and simplified version of the sampled audio data, yielding to faster search algorithms and space requirements that can be met in real-life situations. In this thesis, we investigate geometric approaches to MIR. We first study some musicological properties often needed in MIR algorithms, and then give a literature review on traditional (e.g., string-matching-based) MIR algorithms and novel techniques based on geometry. We also introduce some concepts from digital image processing, namely the mathematical morphology, which we will use to develop and implement four algorithms for geometric music retrieval. The symbolic representation in the case of our algorithms is a binary 2-D image. We use various morphological pre- and post-processing operations on the query and the database images to perform template matching / pattern recognition for the images. The algorithms are basically extensions to classic image correlation and hit-or-miss transformation techniques used widely in template matching applications. They aim to be a future extension to the retrieval engine of C-BRAHMS, which is a research project of the Department of Computer Science at University of Helsinki.
Resumo:
Drug Analysis without Primary Reference Standards: Application of LC-TOFMS and LC-CLND to Biofluids and Seized Material Primary reference standards for new drugs, metabolites, designer drugs or rare substances may not be obtainable within a reasonable period of time or their availability may also be hindered by extensive administrative requirements. Standards are usually costly and may have a limited shelf life. Finally, many compounds are not available commercially and sometimes not at all. A new approach within forensic and clinical drug analysis involves substance identification based on accurate mass measurement by liquid chromatography coupled with time-of-flight mass spectrometry (LC-TOFMS) and quantification by LC coupled with chemiluminescence nitrogen detection (LC-CLND) possessing equimolar response to nitrogen. Formula-based identification relies on the fact that the accurate mass of an ion from a chemical compound corresponds to the elemental composition of that compound. Single-calibrant nitrogen based quantification is feasible with a nitrogen-specific detector since approximately 90% of drugs contain nitrogen. A method was developed for toxicological drug screening in 1 ml urine samples by LC-TOFMS. A large target database of exact monoisotopic masses was constructed, representing the elemental formulae of reference drugs and their metabolites. Identification was based on matching the sample component s measured parameters with those in the database, including accurate mass and retention time, if available. In addition, an algorithm for isotopic pattern match (SigmaFit) was applied. Differences in ion abundance in urine extracts did not affect the mass accuracy or the SigmaFit values. For routine screening practice, a mass tolerance of 10 ppm and a SigmaFit tolerance of 0.03 were established. Seized street drug samples were analysed instantly by LC-TOFMS and LC-CLND, using a dilute and shoot approach. In the quantitative analysis of amphetamine, heroin and cocaine findings, the mean relative difference between the results of LC-CLND and the reference methods was only 11%. In blood specimens, liquid-liquid extraction recoveries for basic lipophilic drugs were first established and the validity of the generic extraction recovery-corrected single-calibrant LC-CLND was then verified with proficiency test samples. The mean accuracy was 24% and 17% for plasma and whole blood samples, respectively, all results falling within the confidence range of the reference concentrations. Further, metabolic ratios for the opioid drug tramadol were determined in a pharmacogenetic study setting. Extraction recovery estimation, based on model compounds with similar physicochemical characteristics, produced clinically feasible results without reference standards.
Resumo:
Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions. This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis. The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies. Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.
Resumo:
Gene expression is one of the most critical factors influencing the phenotype of a cell. As a result of several technological advances, measuring gene expression levels has become one of the most common molecular biological measurements to study the behaviour of cells. The scientific community has produced enormous and constantly increasing collection of gene expression data from various human cells both from healthy and pathological conditions. However, while each of these studies is informative and enlighting in its own context and research setup, diverging methods and terminologies make it very challenging to integrate existing gene expression data to a more comprehensive view of human transcriptome function. On the other hand, bioinformatic science advances only through data integration and synthesis. The aim of this study was to develop biological and mathematical methods to overcome these challenges and to construct an integrated database of human transcriptome as well as to demonstrate its usage. Methods developed in this study can be divided in two distinct parts. First, the biological and medical annotation of the existing gene expression measurements needed to be encoded by systematic vocabularies. There was no single existing biomedical ontology or vocabulary suitable for this purpose. Thus, new annotation terminology was developed as a part of this work. Second part was to develop mathematical methods correcting the noise and systematic differences/errors in the data caused by various array generations. Additionally, there was a need to develop suitable computational methods for sample collection and archiving, unique sample identification, database structures, data retrieval and visualization. Bioinformatic methods were developed to analyze gene expression levels and putative functional associations of human genes by using the integrated gene expression data. Also a method to interpret individual gene expression profiles across all the healthy and pathological tissues of the reference database was developed. As a result of this work 9783 human gene expression samples measured by Affymetrix microarrays were integrated to form a unique human transcriptome resource GeneSapiens. This makes it possible to analyse expression levels of 17330 genes across 175 types of healthy and pathological human tissues. Application of this resource to interpret individual gene expression measurements allowed identification of tissue of origin with 92.0% accuracy among 44 healthy tissue types. Systematic analysis of transcriptional activity levels of 459 kinase genes was performed across 44 healthy and 55 pathological tissue types and a genome wide analysis of kinase gene co-expression networks was done. This analysis revealed biologically and medically interesting data on putative kinase gene functions in health and disease. Finally, we developed a method for alignment of gene expression profiles (AGEP) to perform analysis for individual patient samples to pinpoint gene- and pathway-specific changes in the test sample in relation to the reference transcriptome database. We also showed how large-scale gene expression data resources can be used to quantitatively characterize changes in the transcriptomic program of differentiating stem cells. Taken together, these studies indicate the power of systematic bioinformatic analyses to infer biological and medical insights from existing published datasets as well as to facilitate the interpretation of new molecular profiling data from individual patients.
Resumo:
Use of adverse drug combinations, abuse of medicinal drugs and substance abuse are considerable social problems that are difficult to study. Prescription database studies might fail to incorporate factors like use of over-the-counter drugs and patient compliance, and spontaneous reporting databases suffer from underreporting. Substance abuse and smoking studies might be impeded by poor participation activity and reliability. The Forensic Toxicology Unit at the University of Helsinki is the only laboratory in Finland that performs forensic toxicology related to cause-of-death investigations comprising the analysis of over 6000 medico-legal cases yearly. The analysis repertoire covers most commonly used drugs and drugs of abuse, and the ensuing database contains also background information and information extracted from the final death certificate. In this thesis, the data stored in this comprehensive post-mortem toxicology database was combined with additional metabolite and genotype analyses that were performed to complete the profile of selected cases. The incidence of drug combinations possessing serious adverse drug interactions was generally low (0.71%), but it was notable for the two individually studied drugs, a common anticoagulant warfarin (33%) and a new generation antidepressant venlafaxine (46%). Serotonin toxicity and adverse cardiovascular effects were the most prominent possible adverse outcomes. However, the specific role of the suspected adverse drug combinations was rarely recognized in the death certificates. The frequency of bleeds was observed to be elevated when paracetamol and warfarin were used concomitantly. Pharmacogenetic factors did not play a major role in fatalities related to venlafaxine, but the presence of interacting drugs was more common in cases showing high venlafaxine concentrations. Nicotine findings in deceased young adults were roughly three times more prevalent than the smoking frequency estimation of living population. Contrary to previous studies, no difference in the proportion of suicides was observed between nicotine users and non-nicotine users. However, findings of abused substances, including abused prescription drugs, were more common in the nicotine users group than in the non-nicotine users group. The results of the thesis are important for forensic and clinical medicine, as well as for public health. The possibility of drug interactions and pharmacogenetic issues should be taken into account in cause-of-death investigations, especially in unclear cases, medical malpractice suspicions and cases where toxicological findings are scarce. Post-mortem toxicological epidemiology is a new field of research that can help to reveal problems in drug use and prescription practises.