11 resultados para Methods: Data Analysis
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.
Resumo:
Nowadays the used fuel variety in power boilers is widening and new boiler constructions and running models have to be developed. This research and development is done in small pilot plants where more faster analyse about the boiler mass and heat balance is needed to be able to find and do the right decisions already during the test run. The barrier on determining boiler balance during test runs is the long process of chemical analyses of collected input and outputmatter samples. The present work is concentrating on finding a way to determinethe boiler balance without chemical analyses and optimise the test rig to get the best possible accuracy for heat and mass balance of the boiler. The purpose of this work was to create an automatic boiler balance calculation method for 4 MW CFB/BFB pilot boiler of Kvaerner Pulping Oy located in Messukylä in Tampere. The calculation was created in the data management computer of pilot plants automation system. The calculation is made in Microsoft Excel environment, which gives a good base and functions for handling large databases and calculations without any delicate programming. The automation system in pilot plant was reconstructed und updated by Metso Automation Oy during year 2001 and the new system MetsoDNA has good data management properties, which is necessary for big calculations as boiler balance calculation. Two possible methods for calculating boiler balance during test run were found. Either the fuel flow is determined, which is usedto calculate the boiler's mass balance, or the unburned carbon loss is estimated and the mass balance of the boiler is calculated on the basis of boiler's heat balance. Both of the methods have their own weaknesses, so they were constructed parallel in the calculation and the decision of the used method was left to user. User also needs to define the used fuels and some solid mass flowsthat aren't measured automatically by the automation system. With sensitivity analysis was found that the most essential values for accurate boiler balance determination are flue gas oxygen content, the boiler's measured heat output and lower heating value of the fuel. The theoretical part of this work concentrates in the error management of these measurements and analyses and on measurement accuracy and boiler balance calculation in theory. The empirical part of this work concentrates on the creation of the balance calculation for the boiler in issue and on describing the work environment.
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
This research concerns the Urban Living Idea Contest conducted by Creator Space™ of BASF SE during its 150th anniversary in 2015. The main objectives of the thesis are to provide a comprehensive analysis of the Urban Living Idea Contest (ULIC) and propose a number of improvement suggestions for future years. More than 4,000 data points were collected and analyzed to investigate the functionality of different elements of the contest. Furthermore, a set of improvement suggestions were proposed to BASF SE. Novelty of this thesis lies in the data collection and the original analysis of the contest, which identified its critical elements, as well as the areas that could be improved. The author of this research was a member of the organizing team and involved in the decision making process from the beginning until the end of the ULIC.
Resumo:
Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.
Resumo:
Väitöstutkimuksessa on tarkasteltuinfrapunaspektroskopian ja monimuuttujaisten aineistonkäsittelymenetelmien soveltamista kiteytysprosessin monitoroinnissa ja kidemäisen tuotteen analysoinnissa. Parhaillaan kiteytysprosessitutkimuksessa maailmanlaajuisesti tutkitaan intensiivisesti erilaisten mittausmenetelmien soveltamista kiteytysprosessin ilmiöidenjatkuvaan mittaamiseen niin nestefaasista kuin syntyvistä kiteistäkin. Lisäksi tuotteen karakterisointi on välttämätöntä tuotteen laadun varmistamiseksi. Erityisesti lääkeaineiden valmistuksessa kiinnostusta tämäntyyppiseen tutkimukseen edistää Yhdysvaltain elintarvike- ja lääkeaineviraston (FDA) prosessianalyyttisiintekniikoihin (PAT) liittyvä ohjeistus, jossa määritellään laajasti vaatimukset lääkeaineiden valmistuksessa ja tuotteen karakterisoinnissa tarvittaville mittauksille turvallisten valmistusprosessien takaamiseksi. Jäähdytyskiteytyson erityisesti lääketeollisuudessa paljon käytetty erotusmenetelmä kiinteän raakatuotteen puhdistuksessa. Menetelmässä puhdistettava kiinteä raaka-aine liuotetaan sopivaan liuottimeen suhteellisen korkeassa lämpötilassa. Puhdistettavan aineen liukoisuus käytettävään liuottimeen laskee lämpötilan laskiessa, joten systeemiä jäähdytettäessä liuenneen aineen konsentraatio prosessissa ylittää liukoisuuskonsentraation. Tällaiseen ylikylläiseen systeemiin pyrkii muodostumaan uusia kiteitä tai olemassa olevat kiteet kasvavat. Ylikylläisyys on yksi tärkeimmistä kidetuotteen laatuun vaikuttavista tekijöistä. Jäähdytyskiteytyksessä syntyvän tuotteen ominaisuuksiin voidaan vaikuttaa mm. liuottimen valinnalla, jäähdytyprofiililla ja sekoituksella. Lisäksi kiteytysprosessin käynnistymisvaihe eli ensimmäisten kiteiden muodostumishetki vaikuttaa tuotteen ominaisuuksiin. Kidemäisen tuotteen laatu määritellään kiteiden keskimääräisen koon, koko- ja muotojakaumansekä puhtauden perusteella. Lääketeollisuudessa on usein vaatimuksena, että tuote edustaa tiettyä polymorfimuotoa, mikä tarkoittaa molekyylien kykyä järjestäytyä kidehilassa usealla eri tavalla. Edellä mainitut ominaisuudet vaikuttavat tuotteen jatkokäsiteltävyyteen, kuten mm. suodattuvuuteen, jauhautuvuuteen ja tabletoitavuuteen. Lisäksi polymorfiamuodolla on vaikutusta moniin tuotteen käytettävyysominaisuuksiin, kuten esim. lääkeaineen liukenemisnopeuteen elimistössä. Väitöstyössä on tutkittu sulfatiatsolin jäähdytyskiteytystä käyttäen useita eri liuotinseoksia ja jäähdytysprofiileja sekä tarkasteltu näiden tekijöiden vaikutustatuotteen laatuominaisuuksiin. Infrapunaspektroskopia on laajalti kemian alan tutkimuksissa sovellettava menetelmä. Siinä mitataan tutkittavan näytteenmolekyylien värähtelyjen aiheuttamia spektrimuutoksia IR alueella. Tutkimuksessa prosessinaikaiset mittaukset toteutettiin in-situ reaktoriin sijoitettavalla uppoanturilla käyttäen vaimennettuun kokonaisheijastukseen (ATR) perustuvaa Fourier muunnettua infrapuna (FTIR) spektroskopiaa. Jauhemaiset näytteet mitattiin off-line diffuusioheijastukseen (DRIFT) perustuvalla FTIR spektroskopialla. Monimuuttujamenetelmillä (kemometria) voidaan useita satoja, jopa tuhansia muuttujia käsittävä spektridata jalostaa kvalitatiiviseksi (laadulliseksi) tai kvantitatiiviseksi (määrälliseksi) prosessia kuvaavaksi informaatioksi. Väitöstyössä tarkasteltiin laajasti erilaisten monimuuttujamenetelmien soveltamista mahdollisimman monipuolisen prosessia kuvaavan informaation saamiseksi mitatusta spektriaineistosta. Väitöstyön tuloksena on ehdotettu kalibrointirutiini liuenneen aineen konsentraation ja edelleen ylikylläisyystason mittaamiseksi kiteytysprosessin aikana. Kalibrointirutiinin kehittämiseen kuuluivat aineiston hyvyyden tarkastelumenetelmät, aineiston esikäsittelymenetelmät, varsinainen kalibrointimallinnus sekä mallin validointi. Näin saadaan reaaliaikaista informaatiota kiteytysprosessin ajavasta voimasta, mikä edelleen parantaa kyseisen prosessin tuntemusta ja hallittavuutta. Ylikylläisyystason vaikutuksia syntyvän kidetuotteen laatuun seurattiin usein kiteytyskokein. Työssä on esitetty myös monimuuttujaiseen tilastolliseen prosessinseurantaan perustuva menetelmä, jolla voidaan ennustaa spontaania primääristä ytimenmuodostumishetkeä mitatusta spektriaineistosta sekä mahdollisesti päätellä ydintymisessä syntyvä polymorfimuoto. Ehdotettua menetelmää hyödyntäen voidaan paitsi ennakoida kideytimien muodostumista myös havaita mahdolliset häiriötilanteet kiteytysprosessin alkuhetkillä. Syntyvää polymorfimuotoa ennustamalla voidaan havaita ei-toivotun polymorfin ydintyminen,ja mahdollisesti muuttaa kiteytyksen ohjausta halutun polymorfimuodon saavuttamiseksi. Monimuuttujamenetelmiä sovellettiin myös kiteytyspanosten välisen vaihtelun määrittämiseen mitatusta spektriaineistosta. Tämäntyyppisestä analyysistä saatua informaatiota voidaan hyödyntää kiteytysprosessien suunnittelussa ja optimoinnissa. Väitöstyössä testattiin IR spektroskopian ja erilaisten monimuuttujamenetelmien soveltuvuutta kidetuotteen polymorfikoostumuksen nopeaan määritykseen. Jauhemaisten näytteiden luokittelu eri polymorfeja sisältäviin näytteisiin voitiin tehdä käyttäen tarkoitukseen soveltuvia monimuuttujaisia luokittelumenetelmiä. Tämä tarjoaa nopean menetelmän jauhemaisen näytteen polymorfikoostumuksen karkeaan arviointiin, eli siihen mitä yksittäistä polymorfia kyseinen näyte pääasiassa sisältää. Varsinainen kvantitatiivinen analyysi, eli sen selvittäminen paljonko esim. painoprosentteina näyte sisältää eri polymorfeja, vaatii kaikki polymorfit kattavan fysikaalisen kalibrointisarjan, mikä voi olla puhtaiden polymorfien huonon saatavuuden takia hankalaa.
Resumo:
Statistical analyses of measurements that can be described by statistical models are of essence in astronomy and in scientific inquiry in general. The sensitivity of such analyses, modelling approaches, and the consequent predictions, is sometimes highly dependent on the exact techniques applied, and improvements therein can result in significantly better understanding of the observed system of interest. Particularly, optimising the sensitivity of statistical techniques in detecting the faint signatures of low-mass planets orbiting the nearby stars is, together with improvements in instrumentation, essential in estimating the properties of the population of such planets, and in the race to detect Earth-analogs, i.e. planets that could support liquid water and, perhaps, life on their surfaces. We review the developments in Bayesian statistical techniques applicable to detections planets orbiting nearby stars and astronomical data analysis problems in general. We also discuss these techniques and demonstrate their usefulness by using various examples and detailed descriptions of the respective mathematics involved. We demonstrate the practical aspects of Bayesian statistical techniques by describing several algorithms and numerical techniques, as well as theoretical constructions, in the estimation of model parameters and in hypothesis testing. We also apply these algorithms to Doppler measurements of nearby stars to show how they can be used in practice to obtain as much information from the noisy data as possible. Bayesian statistical techniques are powerful tools in analysing and interpreting noisy data and should be preferred in practice whenever computational limitations are not too restrictive.
Resumo:
Studying testis is complex, because the tissue has a very heterogeneous cell composition and its structure changes dynamically during development. In reproductive field, the cell composition is traditionally studied by morphometric methods such as immunohistochemistry and immunofluorescence. These techniques provide accurate quantitative information about cell composition, cell-cell association and localization of the cells of interest. However, the sample preparation, processing, staining and data analysis are laborious and may take several working days. Flow cytometry protocols coupled with DNA stains have played an important role in providing quantitative information of testicular cells populations ex vivo and in vitro studies. Nevertheless, the addition of specific cells markers such as intracellular antibodies would allow the more specific identification of cells of crucial interest during spermatogenesis. For this study, adult rat Sprague-Dawley rats were used for optimization of the flow cytometry protocol. Specific steps within the protocol were optimized to obtain a singlecell suspension representative of the cell composition of the starting material. Fixation and permeabilization procedure were optimized to be compatible with DNA stains and fluorescent intracellular antibodies. Optimization was achieved by quantitative analysis of specific parameters such as recovery of meiotic cells, amount of debris and comparison of the proportions of the various cell populations with already published data. As a result, a new and fast flow cytometry method coupled with DNA stain and intracellular antigen detection was developed. This new technique is suitable for analysis of population behavior and specific cells during postnatal testis development and spermatogenesis in rodents. This rapid protocol recapitulated the known vimentin and γH2AX protein expression patterns during rodent testis ontogenesis. Moreover, the assay was applicable for phenotype characterization of SCRbKO and E2F1KO mouse models.
Resumo:
Tutkimuksen tarkoituksena on selvittää, miten valtionomistajuus vaikuttaa yrityksen suorituskykyyn suomalaisissa pörssinoteeratuissa valtionyhtiöissä, joissa valtio toimii pää- tai osaomistajana. Suorituskykyä tutkitaan kandella eri menetelmällä. Ensin tutkitaan osaketuottoja Jensenin alfan avulla, jonka jälkeen suoritetaan tilinpäätöstunnuslukujen toimialavertailu. Tutkimuksen teoriaosuudessa esitetään yksityistämisen tuottamia etuja yrityksen taloudelliseen suorituskykyyn, sekä myöskin valtionomistajuuden tuottamia etuja. Lisäksi teoriaosuudessa käsitellään aikaisempien empiiristen tutkimusten tuloksia valtionomistajuuden vaikutuksista. Tämän tutkimuksen empiirisessä osiossa käytettävä data on saatu osakedatan osalta Datastreamista ja tilinpäätöstunnuslukujen osalta Balance Consulting Oy:ltä. Kokonaisosakedataa koskeva tutkimus Jensenin alfalla ei osoittanut valtionyhtiöiden toimivan tehottomasti, vaan osoitti yritysten kyenneen tuottamaan epänormaaleja tuottoja riskitasoonsa nähden. Vuositasolle pilkotun datan analysointi sen sijaan tuotti useita negatiivisia alfoja yrityksille eli merkkejä tehottomuudesta tiettyinä vuosina. Lisäksi tilinpäätöstunnuslukujen analysointi osoitti osan valtionyhtiöistä olleen pääosin omaa toimialaansa tehottomampia, kun taas osa kykenipäihittämään toimialansa.
Resumo:
Muokatun matriisi-geometrian tekniikan kehitys yleimmäksi jonoksi on esitelty tässä työssä. Jonotus systeemi koostuu useista jonoista joilla on rajatut kapasiteetit. Tässä työssä on myös tutkittu PH-tyypin jakautumista kun ne jaetaan. Rakenne joka vastaa lopullista Markovin ketjua jossa on itsenäisiä matriiseja joilla on QBD rakenne. Myös eräitä rajallisia olotiloja on käsitelty tässä työssä. Sen esitteleminen matriisi-geometrisessä muodossa, muokkaamalla matriisi-geometristä ratkaisua on tämän opinnäytetyön tulos.
Resumo:
The term proteome is used to define the complete set of proteins expressed in cells or tissues of an organism at a certain timepoint. Respectively, proteomics is used to describe the methods, which are used to study such proteomes. These methods include chromatographic and electrophoretic techniques for protein or peptide fractionation, mass spectrometry for their identification, and use of computational methods to assist the complicated data analysis. A primary aim in this Ph.D. thesis was to set-up, optimize, and develop proteomics methods for analysing proteins extracted from T-helper (Th) lymphocytes. First, high-throughput LC-MS/MS and ICAT labeling methods were set-up and optimized for analysing the microsomal fraction proteins extracted from Th lymphocytes. Later, iTRAQ method was optimized to study cytokine regulated protein expression in the nuclei of Th lymphocytes. High-throughput LC-MS/MS analyses, like ICAT and iTRAQ, produce large quantities of data and robust software and data analysis pipelines are needed. Therefore, different software programs used for analysing such data were evaluated. Moreover, a pre-filtering algorithm was developed to classify good-quality and bad-quality spectra prior to the database searches. Th-lymphocytes can differentiate into Th1 or Th2 cells based on surrounding antigens, co-stimulatory molecules, and cytokines. Both subsets have individual cytokine secretion profiles and specific functions. Th1 cells participate in the cellular immunity against intracellular pathogens, while Th2 cells have important role in the humoral immunity against extracellular parasites. An abnormal response of Th1 and Th2 cells and imbalance between the subsets are charasteristic of several diseases. Th1 specific reactions and cytokines have been detected in autoimmune diseases, while Th2 specific response and cytokine profile is common in allergy and asthma. In this Ph. D. thesis mass spectrometry-based proteomics was used to study the effects of Th1 and Th2 promoting cytokines IL-12 and IL-4 on the proteome of Th lymphocytes. Characterization of microsomal fraction proteome extracted from IL-12 treated lymphobasts and IL-4 stimulated cord blood CD4+ cells resulted in finding of cytokine regulated proteins. Galectin-1 and CD7 were down-regulated in IL-12 treated cells, while IL-4 stimulation decreased the expression of STAT1, MXA, GIMAP1, and GIMAP4. Interestingly, the transcription of both GIMAP genes was up-regulated in Th1 polarized cells and down-regulated in Th2 promoting conditions.