24 resultados para Techniques of data analysis
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Nowadays the used fuel variety in power boilers is widening and new boiler constructions and running models have to be developed. This research and development is done in small pilot plants where more faster analyse about the boiler mass and heat balance is needed to be able to find and do the right decisions already during the test run. The barrier on determining boiler balance during test runs is the long process of chemical analyses of collected input and outputmatter samples. The present work is concentrating on finding a way to determinethe boiler balance without chemical analyses and optimise the test rig to get the best possible accuracy for heat and mass balance of the boiler. The purpose of this work was to create an automatic boiler balance calculation method for 4 MW CFB/BFB pilot boiler of Kvaerner Pulping Oy located in Messukylä in Tampere. The calculation was created in the data management computer of pilot plants automation system. The calculation is made in Microsoft Excel environment, which gives a good base and functions for handling large databases and calculations without any delicate programming. The automation system in pilot plant was reconstructed und updated by Metso Automation Oy during year 2001 and the new system MetsoDNA has good data management properties, which is necessary for big calculations as boiler balance calculation. Two possible methods for calculating boiler balance during test run were found. Either the fuel flow is determined, which is usedto calculate the boiler's mass balance, or the unburned carbon loss is estimated and the mass balance of the boiler is calculated on the basis of boiler's heat balance. Both of the methods have their own weaknesses, so they were constructed parallel in the calculation and the decision of the used method was left to user. User also needs to define the used fuels and some solid mass flowsthat aren't measured automatically by the automation system. With sensitivity analysis was found that the most essential values for accurate boiler balance determination are flue gas oxygen content, the boiler's measured heat output and lower heating value of the fuel. The theoretical part of this work concentrates in the error management of these measurements and analyses and on measurement accuracy and boiler balance calculation in theory. The empirical part of this work concentrates on the creation of the balance calculation for the boiler in issue and on describing the work environment.
Resumo:
The effective notch stress approach for the fatigue strength assessment of welded structures as included in the Fatigue Design Recommendation of the IIW requires the numerical analysis of the elastic notch stress in the weld toe and weld root which is fictitiously rounded with a radius of 1mm. The goal of this thesis work was to consider alternate meshing strategies when using the effective notch stress approach to assess the fatigue strength of load carrying partial penetration fillet-welded cruciform joints. In order to establish guidelines for modeling the joint and evaluating the results, various two-dimensional (2D) finite element analyses were carried out by systematically varying the thickness of the plates, the weld throat thickness, the degree of bending, and the shape and location of the modeled effective notch. To extend the scope of this work, studies were also carried out on the influence of
Resumo:
Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.
Resumo:
This thesis examines the application of data envelopment analysis as an equity portfolio selection criterion in the Finnish stock market during period 2001-2011. A sample of publicly traded firms in the Helsinki Stock Exchange is examined in this thesis. The sample covers the majority of the publicly traded firms in the Helsinki Stock Exchange. Data envelopment analysis is used to determine the efficiency of firms using a set of input and output financial parameters. The set of financial parameters consist of asset utilization, liquidity, capital structure, growth, valuation and profitability measures. The firms are divided into artificial industry categories, because of the industry-specific nature of the input and output parameters. Comparable portfolios are formed inside the industry category according to the efficiency scores given by the DEA and the performance of the portfolios is evaluated with several measures. The empirical evidence of this thesis suggests that with certain limitations, data envelopment analysis can successfully be used as portfolio selection criterion in the Finnish stock market when the portfolios are rebalanced at annual frequency according to the efficiency scores given by the data envelopment analysis. However, when the portfolios were rebalanced every two or three years, the results are mixed and inconclusive.
Resumo:
This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.
Resumo:
Tietokonejärjestelmän osien ja ohjelmistojen suorituskykymittauksista saadaan tietoa,jota voidaan käyttää suorituskyvyn parantamiseen ja laitteistohankintojen päätöksen tukena. Tässä työssä tutustutaan suorituskyvyn mittaamiseen ja mittausohjelmiin eli ns. benchmark-ohjelmistoihin. Työssä etsittiin ja arvioitiin eri tyyppisiä vapaasti saatavilla olevia benchmark-ohjelmia, jotka soveltuvat Linux-laskentaklusterin suorituskyvynanalysointiin. Benchmarkit ryhmiteltiin ja arvioitiin testaamalla niiden ominaisuuksia Linux-klusterissa. Työssä käsitellään myös mittausten tekemisen ja rinnakkaislaskennan haasteita. Benchmarkkeja löytyi moneen tarkoitukseen ja ne osoittautuivat laadultaan ja laajuudeltaan vaihteleviksi. Niitä on myös koottu ohjelmistopaketeiksi, jotta laitteiston suorituskyvystä saisi laajemman kuvan kuin mitä yhdellä ohjelmalla on mahdollista saada. Olennaista on ymmärtää nopeus, jolla dataa saadaan siirretyä prosessorille keskusmuistista, levyjärjestelmistä ja toisista laskentasolmuista. Tyypillinen benchmark-ohjelma sisältää paljon laskentaa tarvitsevan matemaattisen algoritmin, jota käytetään tieteellisissä ohjelmistoissa. Benchmarkista riippuen tulosten ymmärtäminen ja hyödyntäminen voi olla haasteellista.
Resumo:
To enable a mathematically and physically sound execution of the fatigue test and a correct interpretation of its results, statistical evaluation methods are used to assist in the analysis of fatigue testing data. The main objective of this work is to develop step-by-stepinstructions for statistical analysis of the laboratory fatigue data. The scopeof this project is to provide practical cases about answering the several questions raised in the treatment of test data with application of the methods and formulae in the document IIW-XIII-2138-06 (Best Practice Guide on the Statistical Analysis of Fatigue Data). Generally, the questions in the data sheets involve some aspects: estimation of necessary sample size, verification of the statistical equivalence of the collated sets of data, and determination of characteristic curves in different cases. The series of comprehensive examples which are given in this thesis serve as a demonstration of the various statistical methods to develop a sound procedure to create reliable calculation rules for the fatigue analysis.
Resumo:
This study is dedicated to search engine marketing (SEM). It aims for developing a business model of SEM firms and to provide explicit research of trustworthy practices of virtual marketing companies. Optimization is a general term that represents a variety of techniques and methods of the web pages promotion. The research addresses optimization as a business activity, and it explains its role for the online marketing. Additionally, it highlights issues of unethical techniques utilization by marketers which created relatively negative attitude to them on the Internet environment. Literature insight combines in the one place both technical and economical scientific findings in order to highlight technological and business attributes incorporated in SEM activities. Empirical data regarding search marketers was collected via e-mail questionnaires. 4 representatives of SEM companies were engaged in this study to accomplish the business model design. Additionally, the fifth respondent was a representative of the search engine portal, who provided insight on relations between search engines and marketers. Obtained information of the respondents was processed qualitatively. Movement of commercial organizations to the online market increases demand on promotional programs. SEM is the largest part of online marketing, and it is a prerogative of search engines portals. However, skilled users, or marketers, are able to implement long-term marketing programs by utilizing web page optimization techniques, key word consultancy or content optimization to increase web site visibility to search engines and, therefore, user’s attention to the customer pages. SEM firms are related to small knowledge-intensive businesses. On the basis of data analysis the business model was constructed. The SEM model includes generalized constructs, although they represent a wider amount of operational aspects. Constructing blocks of the model includes fundamental parts of SEM commercial activity: value creation, customer, infrastructure and financial segments. Also, approaches were provided on company’s differentiation and competitive advantages evaluation. It is assumed that search marketers should apply further attempts to differentiate own business out of the large number of similar service providing companies. Findings indicate that SEM companies are interested in the increasing their trustworthiness and the reputation building. Future of the search marketing is directly depending on search engines development.
Resumo:
Data is the most important asset of a company in the information age. Other assets, such as technology, facilities or products can be copied or reverse-engineered, employees can be brought over, but data remains unique to every company. As data management topics are slowly moving from unknown unknowns to known unknowns, tools to evaluate and manage data properly are developed and refined. Many projects are in progress today to develop various maturity models for evaluating information and data management practices. These maturity models come in many shapes and sizes: from short and concise ones meant for a quick assessment, to complex ones that call for an expert assessment by experienced consultants. In this paper several of them, made not only by external inter-organizational groups and authors, but also developed internally at a Major Energy Provider Company (MEPC) are juxtaposed and thoroughly analyzed. Apart from analyzing the available maturity models related to Data Management, this paper also selects the one with the most merit and describes and analyzes using it to perform a maturity assessment in MEPC. The utility of maturity models is two-fold: descriptive and prescriptive. Besides recording the current state of Data Management practices maturity by performing the assessments, this maturity model is also used to chart the way forward. Thus, after the current situation is presented, analysis and recommendations on how to improve it based on the definitions of higher levels of maturity are given. Generally, the main trend observed was the widening of the Data Management field to include more business and “soft” areas (as opposed to technical ones) and the change of focus towards business value of data, while assuming that the underlying IT systems for managing data are “ideal”, that is, left to the purely technical disciplines to design and maintain. This trend is not only present in Data Management but in other technological areas as well, where more and more attention is given to innovative use of technology, while acknowledging that the strategic importance of IT as such is diminishing.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.
Resumo:
This research concerns the Urban Living Idea Contest conducted by Creator Space™ of BASF SE during its 150th anniversary in 2015. The main objectives of the thesis are to provide a comprehensive analysis of the Urban Living Idea Contest (ULIC) and propose a number of improvement suggestions for future years. More than 4,000 data points were collected and analyzed to investigate the functionality of different elements of the contest. Furthermore, a set of improvement suggestions were proposed to BASF SE. Novelty of this thesis lies in the data collection and the original analysis of the contest, which identified its critical elements, as well as the areas that could be improved. The author of this research was a member of the organizing team and involved in the decision making process from the beginning until the end of the ULIC.
Resumo:
The purpose of the thesis was to explore expectations of elderly people on the nurse-client relationship and interaction in home care. The aim is to improve the quality of care to better meet the needs of the clients. A qualitative approach was adopted. Semi-structured theme interviews were used for data collection. The interviews were conducted during spring 2006. Six elderly clients of a private home care company in Southern Finland acted as informants. Content analysis was used as the method of data analysis. The findings suggest that clients expect nurses to provide professional care with loving-kindness. Trust and mutual, active interaction were expected from the nurse-client relationship. Clients considered it important that the nurse recognizes each client's individual needs. The nurse was expected to perform duties efficiently, but in a calm and unrushed manner. A mechanic performance of tasks was considered negative. Humanity was viewed as a crucial element in the nurse-client relationship. Clients expressed their need to be seen as human beings. Seeing beyond the illness was considered important. A smiling nurse was described to be able to alleviate pain and anxiety. Clients hoped to have a close relationship with the nurse. The development of a close relationship was considered to be more likely if the nurse is familiar and genuine. Clients wish the nurses to have a more attending presence. Clients suggested that the work areas of the nurses could be limited so that they would have more time to transfer from one place to another. Clients felt that they would benefit from this as well. The nurses were expected to be more considerate. Clients wished for more information regarding changes that affect their care. They wished to be informed about changes in schedules and plans. Clients hoped for continuity from the nurse-client relationship. Considering the expectations of clients promotes client satisfaction. Home care providers have an opportunity to reflect their own care behaviour on the findings. To better meet the needs of the clients, nurses could apply the concept of loving-kindness in their work, and strive for a more attending presence.
Resumo:
Abstract