15 resultados para Functional data analysis
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Nowadays the used fuel variety in power boilers is widening and new boiler constructions and running models have to be developed. This research and development is done in small pilot plants where more faster analyse about the boiler mass and heat balance is needed to be able to find and do the right decisions already during the test run. The barrier on determining boiler balance during test runs is the long process of chemical analyses of collected input and outputmatter samples. The present work is concentrating on finding a way to determinethe boiler balance without chemical analyses and optimise the test rig to get the best possible accuracy for heat and mass balance of the boiler. The purpose of this work was to create an automatic boiler balance calculation method for 4 MW CFB/BFB pilot boiler of Kvaerner Pulping Oy located in Messukylä in Tampere. The calculation was created in the data management computer of pilot plants automation system. The calculation is made in Microsoft Excel environment, which gives a good base and functions for handling large databases and calculations without any delicate programming. The automation system in pilot plant was reconstructed und updated by Metso Automation Oy during year 2001 and the new system MetsoDNA has good data management properties, which is necessary for big calculations as boiler balance calculation. Two possible methods for calculating boiler balance during test run were found. Either the fuel flow is determined, which is usedto calculate the boiler's mass balance, or the unburned carbon loss is estimated and the mass balance of the boiler is calculated on the basis of boiler's heat balance. Both of the methods have their own weaknesses, so they were constructed parallel in the calculation and the decision of the used method was left to user. User also needs to define the used fuels and some solid mass flowsthat aren't measured automatically by the automation system. With sensitivity analysis was found that the most essential values for accurate boiler balance determination are flue gas oxygen content, the boiler's measured heat output and lower heating value of the fuel. The theoretical part of this work concentrates in the error management of these measurements and analyses and on measurement accuracy and boiler balance calculation in theory. The empirical part of this work concentrates on the creation of the balance calculation for the boiler in issue and on describing the work environment.
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.
Resumo:
This research concerns the Urban Living Idea Contest conducted by Creator Space™ of BASF SE during its 150th anniversary in 2015. The main objectives of the thesis are to provide a comprehensive analysis of the Urban Living Idea Contest (ULIC) and propose a number of improvement suggestions for future years. More than 4,000 data points were collected and analyzed to investigate the functionality of different elements of the contest. Furthermore, a set of improvement suggestions were proposed to BASF SE. Novelty of this thesis lies in the data collection and the original analysis of the contest, which identified its critical elements, as well as the areas that could be improved. The author of this research was a member of the organizing team and involved in the decision making process from the beginning until the end of the ULIC.
Resumo:
The study develops an approach that tries to validate software functionality to work systems needs in SMEs. The formulated approach is constructed by using a SAAS based software i.e., work collaboration service (WCS), and SMEs as the elements of study. Where the WCS’s functionality is qualified to the collaboration needs that exist in operational and project work within SMEs. For this research constructivist approach and case study method is selected because the nature of the current study requires an in depth study of the work collaboration service as well as a detailed study of the work systems within different enterprises. Four different companies are selected in which fourteen interviews are conducted to gather data pertaining. The work systems method and framework are used as a central part of the approach to collect, analyze and interpret the enterprises work systems model and the underlying collaboration needs on operational and project work. On the other hand, the functional model of the WCS and its functionality is determined from functional model analysis, software testing, documentation and meetings with the service vendor. The enterprise work system model and the WCS model are compared to reveal how work progression differs between the two and make visible unaddressed stages of work progression. The WCS functionality is compared to work systems collaboration needs to ascertain if the service will suffice the needs of the project and operational work under study. The unaddressed needs provide opportunities to improve the functionality of the service for better conformity to the needs of enterprise and work. The results revealed that the functional models actually differed in how operational and project work progressed within the stages. WCS shared similar stages of work progression apart from the stages of identification and acceptance, and progress and completion stages were only partially addressed. Conclusion is that the identified unaddressed needs such as, single point of reference, SLA and OLA inclusion etc., should be implemented or improved within the WCS at appropriate stages of work to gain better compliance of the service to the needs of the enterprise an work itself. The developed approach can hence be used to carry out similar analysis for the conformance of pre-built software functionality to work system needs with SMEs.
Resumo:
Effects of counseling and guidance on health behavior, health, and functional abilities of coronary artery bypass (cab) patients Hospital periods of heart patients are brief and full of activity today, and for that reason, the meaning of counseling and guidance becomes emphasized. The present intervention study started based on observations of staff members at the heart organization. According to these observations, there were gaps in counseling and guidance intended for coronary artery bypass (CAB) patients. The purpose of the present intervention study was to describe and evaluate the program on counseling and guidance organized for patients who were referred to CAB operations. More specifically, the study was to assess its short-term (3-month), intermediate (6-month), and long-term (12-month) effects on health behavior, health, and functional abilities of CAB patients of any age on one hand and elderly on the other, as well as on their mortality. The data consisted of those individuals having coronary heart disease (CHD) and living in Uusimaa (n = 365) who went through their first CAB operation at the Helsinki University Hospital between May 7th, 1998 and December 31st, 2001. Based on the need of urgency, they were divided into two groups: 1) surgery with regular referral procedure (non-acute) or 2) surgery in the acute phase of CHD. Randomization into an intervention and a control group was separately carried out within these two groups. A subgroup was formed by including those 65 years or older who were operated on with regular referral procedure. Data on health behavior, health, and functional abilities were gathered with survey questionnaires. Times and causes of death were examined January 1st, 1998 through December 31st, 2004. Intervention included counseling and guidance in small groups. The intervention of the non-acutely operated patients was implemented prior to and following surgery, whereas the intervention of the acutely operated patients was implemented after surgery alone. The control group received regular health care services. Counseling and guidance contributed in positive terms to the frequency of alcohol use among non-acutely operated men and to the frequencies of exercise and functional ability among women. The intervention was also capable of having an effect on the exercise frequencies of elderly and acutely operated men. The present intervention did not have an effect on the body mass index, whereas it had barely a slight effect on the health status of the CAB patients. The findings of the intervention and generalizations resulting from them must be viewed critically because the data analysis utilized a multi-testing situation, many variables, and several subgroups. The study did not involve intention to treat analysis. Additionally, a loss of patients was great especially among the elderly and acutely operated patients.
Resumo:
Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.
Resumo:
Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.
Resumo:
High-throughput screening of cellular effects of RNA interference (RNAi) libraries is now being increasingly applied to explore the role of genes in specific cell biological processes and disease states. However, the technology is still limited to specialty laboratories, due to the requirements for robotic infrastructure, access to expensive reagent libraries, expertise in high-throughput screening assay development, standardization, data analysis and applications. In the future, alternative screening platforms will be required to expand functional large-scale experiments to include more RNAi constructs, allow combinatorial loss-of-function analyses (e.g. genegene or gene-drug interaction), gain-of-function screens, multi-parametric phenotypic readouts or comparative analysis of many different cell types. Such comprehensive perturbation of gene networks in cells will require a major increase in the flexibility of the screening platforms, throughput and reduction of costs. As an alternative for the conventional multi-well based high-throughput screening -platforms, here the development of a novel cell spot microarray method for production of high density siRNA reverse transfection arrays is described. The cell spot microarray platform is distinguished from the majority of other transfection cell microarray techniques by the spatially confined array layout that allow highly parallel screening of large-scale RNAi reagent libraries with assays otherwise difficult or not applicable to high-throughput screening. This study depicts the development of the cell spot microarray method along with biological application examples of high-content immunofluorescence and phenotype based cancer cell biological analyses focusing on the regulation of prostate cancer cell growth, maintenance of genomic integrity in breast cancer cells, and functional analysis of integrin protein-protein interactions in situ.
Resumo:
Prostate cancers form a heterogeneous group of diseases and there is a need for novel biomarkers, and for more efficient and targeted methods of treatment. In this thesis, the potential of microarray data, RNA interference (RNAi) and compound screens were utilized in order to identify novel biomarkers, drug targets and drugs for future personalized prostate cancer therapeutics. First, a bioinformatic mRNA expression analysis covering 9873 human tissue and cell samples, including 349 prostate cancer and 147 normal prostate samples, was used to distinguish in silico prevalidated putative prostate cancer biomarkers and drug targets. Second, RNAi based high-throughput (HT) functional profiling of 295 prostate and prostate cancer tissue specific genes was performed in cultured prostate cancer cells. Third, a HT compound screen approach using a library of 4910 drugs and drug-like molecules was exploited to identify potential drugs inhibiting prostate cancer cell growth. Nine candidate drug targets, with biomarker potential, and one cancer selective compound were validated in vitro and in vivo. In addition to androgen receptor (AR) signaling, endoplasmic reticulum (ER) function, arachidonic acid (AA) pathway, redox homeostasis and mitosis were identified as vital processes in prostate cancer cells. ERG oncogene positive cancer cells exhibited sensitivity to induction of oxidative and ER stress, whereas advanced and castrate-resistant prostate cancer (CRPC) could be potentially targeted through AR signaling and mitosis. In conclusion, this thesis illustrates the power of systems biological data analysis in the discovery of potential vulnerabilities present in prostate cancer cells, as well as novel options for personalized cancer management.
Resumo:
Workshop at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Retaining players and re-attracting switching players has long been a central topic for SNG providers with regard to the post-adoption stage of playing an online game. However, there has not been much research which has explored players’ post-adoption behavior by incorporating the continuance intention and the switching intention. In addition, traditional IS continuance theories were mainly developed to investigate users’ continued use of utilitarian IS, and thus they may fall short when trying to explain the continued use of hedonic IS. Furthermore, compared to the richer literature on IS continuance, far too little attention has been paid to IS switching, leading to a dearth of knowledge on the subject, despite the increased incidence of the switching phenomenon in the IS field. By addressing the limitations of prior literature, this study seeks to examine the determinants of SNG players’ two different post-adoption behaviors, including the continuance intention and the switching intention. This study takes a positivist approach and uses survey research method to test five proposed research models based on Unified Theory of Acceptance and Use of Technology 2; Use and Gratification Theory; Push-Pull-Mooring model; Cognitive Dissonance Theory; and a self-developed model respectively with empirical data collected from the SNG players of one of the biggest SNG providers in China. A total of 3919 valid responses and 541 valid responses were used to examine the continuance intention and the switching intention, respectively. SEM is utilized as the data analysis method. The proposed research models are supported by the empirical data. The continuance intention is determined by enjoyment, fantasy, escapism, social interaction, social presence, social influence, achievement and habit. The switching intention is determined by enjoyment, satisfaction, subjective norms, descriptive norms, alternative attractiveness, the need for variety, change experience, and adaptation cost. This study contributes to IS theories in three important ways. Firstly, it shows IS switching should be included in IS post-adoption research together with IS continuance. Secondly, a modern IS is usually multi-functional and SNG players have multiple reasons for using a SNG, thus a player’s beliefs about the hedonic, social and utilitarian perceptions of their continued use of the SNG exert significant effects on the continuance intention. Thirdly, the determinants of the switch ing intention mainly exert push, pull, and mooring effects. Players’ beliefs about their current SNG and the available alternatives, as well as their individual characteristics are all significant determinants of the switching intention. SNG players combine these effects in order to formulate the switching intention. Finally, this study presents limitations and suggestions for future research.
Resumo:
Studying testis is complex, because the tissue has a very heterogeneous cell composition and its structure changes dynamically during development. In reproductive field, the cell composition is traditionally studied by morphometric methods such as immunohistochemistry and immunofluorescence. These techniques provide accurate quantitative information about cell composition, cell-cell association and localization of the cells of interest. However, the sample preparation, processing, staining and data analysis are laborious and may take several working days. Flow cytometry protocols coupled with DNA stains have played an important role in providing quantitative information of testicular cells populations ex vivo and in vitro studies. Nevertheless, the addition of specific cells markers such as intracellular antibodies would allow the more specific identification of cells of crucial interest during spermatogenesis. For this study, adult rat Sprague-Dawley rats were used for optimization of the flow cytometry protocol. Specific steps within the protocol were optimized to obtain a singlecell suspension representative of the cell composition of the starting material. Fixation and permeabilization procedure were optimized to be compatible with DNA stains and fluorescent intracellular antibodies. Optimization was achieved by quantitative analysis of specific parameters such as recovery of meiotic cells, amount of debris and comparison of the proportions of the various cell populations with already published data. As a result, a new and fast flow cytometry method coupled with DNA stain and intracellular antigen detection was developed. This new technique is suitable for analysis of population behavior and specific cells during postnatal testis development and spermatogenesis in rodents. This rapid protocol recapitulated the known vimentin and γH2AX protein expression patterns during rodent testis ontogenesis. Moreover, the assay was applicable for phenotype characterization of SCRbKO and E2F1KO mouse models.
Resumo:
It has long been known that amino acids are the building blocks for proteins and govern their folding into specific three-dimensional structures. However, the details of this process are still unknown and represent one of the main problems in structural bioinformatics, which is a highly active research area with the focus on the prediction of three-dimensional structure and its relationship to protein function. The protein structure prediction procedure encompasses several different steps from searches and analyses of sequences and structures, through sequence alignment to the creation of the structural model. Careful evaluation and analysis ultimately results in a hypothetical structure, which can be used to study biological phenomena in, for example, research at the molecular level, biotechnology and especially in drug discovery and development. In this thesis, the structures of five proteins were modeled with templatebased methods, which use proteins with known structures (templates) to model related or structurally similar proteins. The resulting models were an important asset for the interpretation and explanation of biological phenomena, such as amino acids and interaction networks that are essential for the function and/or ligand specificity of the studied proteins. The five proteins represent different case studies with their own challenges like varying template availability, which resulted in a different structure prediction process. This thesis presents the techniques and considerations, which should be taken into account in the modeling procedure to overcome limitations and produce a hypothetical and reliable three-dimensional structure. As each project shows, the reliability is highly dependent on the extensive incorporation of experimental data or known literature and, although experimental verification of in silico results is always desirable to increase the reliability, the presented projects show that also the experimental studies can greatly benefit from structural models. With the help of in silico studies, the experiments can be targeted and precisely designed, thereby saving both money and time. As the programs used in structural bioinformatics are constantly improved and the range of templates increases through structural genomics efforts, the mutual benefits between in silico and experimental studies become even more prominent. Hence, reliable models for protein three-dimensional structures achieved through careful planning and thoughtful executions are, and will continue to be, valuable and indispensable sources for structural information to be combined with functional data.
Resumo:
In this thesis the process of building a software for transport accessibility analysis is described. The goal was to create a software which is easy to distribute and simple to use for the user without particular background in the field of the geographical data analysis. It was shown that existing tools do not suit for this particular task due to complex interface or significant rendering time. The goal was accomplished by applying modern approaches in the process of building web applications such as maps based on vector tiles, FLUX architecture design pattern and module bundling. It was discovered that vector tiles have considerable advantages over image-based tiles such as faster rendering and real-time styling.