17 resultados para Automatic Analysis of Multivariate Categorical Data Sets
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
In this diploma work advantages of coherent anti-Stokes Raman scattering spectrometry (CARS) and various methods of the quantitative analysis of substance structure with its help are considered. The basic methods and concepts of the adaptive analysis are adduced. On the basis of these methods the algorithm of automatic measurement of a scattering strip size of a target component in CARS spectrum is developed. The algorithm uses known full spectrum of target substance and compares it with a CARS spectrum. The form of a differential spectrum is used as a feedback to control the accuracy of matching. To exclude the influence of a background in CARS spectra the differential spectrum is analysed by means of its second derivative. The algorithm is checked up on the simulated simple spectra and on the spectra of organic compounds received experimentally.
Resumo:
In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.
Resumo:
The research of condition monitoring of electric motors has been wide for several decades. The research and development at universities and in industry has provided means for the predictive condition monitoring. Many different devices and systems are developed and are widely used in industry, transportation and in civil engineering. In addition, many methods are developed and reported in scientific arenas in order to improve existing methods for the automatic analysis of faults. The methods, however, are not widely used as a part of condition monitoring systems. The main reasons are, firstly, that many methods are presented in scientific papers but their performance in different conditions is not evaluated, secondly, the methods include parameters that are so case specific that the implementation of a systemusing such methods would be far from straightforward. In this thesis, some of these methods are evaluated theoretically and tested with simulations and with a drive in a laboratory. A new automatic analysis method for the bearing fault detection is introduced. In the first part of this work the generation of the bearing fault originating signal is explained and its influence into the stator current is concerned with qualitative and quantitative estimation. The verification of the feasibility of the stator current measurement as a bearing fault indicatoris experimentally tested with the running 15 kW induction motor. The second part of this work concentrates on the bearing fault analysis using the vibration measurement signal. The performance of the micromachined silicon accelerometer chip in conjunction with the envelope spectrum analysis of the cyclic bearing faultis experimentally tested. Furthermore, different methods for the creation of feature extractors for the bearing fault classification are researched and an automatic fault classifier using multivariate statistical discrimination and fuzzy logic is introduced. It is often important that the on-line condition monitoring system is integrated with the industrial communications infrastructure. Two types of a sensor solutions are tested in the thesis: the first one is a sensor withcalculation capacity for example for the production of the envelope spectra; the other one can collect the measurement data in memory and another device can read the data via field bus. The data communications requirements highly depend onthe type of the sensor solution selected. If the data is already analysed in the sensor the data communications are needed only for the results but in the other case, all measurement data need to be transferred. The complexity of the classification method can be great if the data is analysed at the management level computer, but if the analysis is made in sensor itself, the analyses must be simple due to the restricted calculation and memory capacity.
Resumo:
This work is devoted to the problem of reconstructing the basis weight structure at paper web with black{box techniques. The data that is analyzed comes from a real paper machine and is collected by an o®-line scanner. The principal mathematical tool used in this work is Autoregressive Moving Average (ARMA) modelling. When coupled with the Discrete Fourier Transform (DFT), it gives a very flexible and interesting tool for analyzing properties of the paper web. Both ARMA and DFT are independently used to represent the given signal in a simplified version of our algorithm, but the final goal is to combine the two together. Ljung-Box Q-statistic lack-of-fit test combined with the Root Mean Squared Error coefficient gives a tool to separate significant signals from noise.
Resumo:
To enable a mathematically and physically sound execution of the fatigue test and a correct interpretation of its results, statistical evaluation methods are used to assist in the analysis of fatigue testing data. The main objective of this work is to develop step-by-stepinstructions for statistical analysis of the laboratory fatigue data. The scopeof this project is to provide practical cases about answering the several questions raised in the treatment of test data with application of the methods and formulae in the document IIW-XIII-2138-06 (Best Practice Guide on the Statistical Analysis of Fatigue Data). Generally, the questions in the data sheets involve some aspects: estimation of necessary sample size, verification of the statistical equivalence of the collated sets of data, and determination of characteristic curves in different cases. The series of comprehensive examples which are given in this thesis serve as a demonstration of the various statistical methods to develop a sound procedure to create reliable calculation rules for the fatigue analysis.
Resumo:
This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.
Resumo:
The uncertainty of any analytical determination depends on analysis and sampling. Uncertainty arising from sampling is usually not controlled and methods for its evaluation are still little known. Pierre Gy’s sampling theory is currently the most complete theory about samplingwhich also takes the design of the sampling equipment into account. Guides dealing with the practical issues of sampling also exist, published by international organizations such as EURACHEM, IUPAC (International Union of Pure and Applied Chemistry) and ISO (International Organization for Standardization). In this work Gy’s sampling theory was applied to several cases, including the analysis of chromite concentration estimated on SEM (Scanning Electron Microscope) images and estimation of the total uncertainty of a drug dissolution procedure. The results clearly show that Gy’s sampling theory can be utilized in both of the above-mentioned cases and that the uncertainties achieved are reliable. Variographic experiments introduced in Gy’s sampling theory are beneficially applied in analyzing the uncertainty of auto-correlated data sets such as industrial process data and environmental discharges. The periodic behaviour of these kinds of processes can be observed by variographic analysis as well as with fast Fourier transformation and auto-correlation functions. With variographic analysis, the uncertainties are estimated as a function of the sampling interval. This is advantageous when environmental data or process data are analyzed as it can be easily estimated how the sampling interval is affecting the overall uncertainty. If the sampling frequency is too high, unnecessary resources will be used. On the other hand, if a frequency is too low, the uncertainty of the determination may be unacceptably high. Variographic methods can also be utilized to estimate the uncertainty of spectral data produced by modern instruments. Since spectral data are multivariate, methods such as Principal Component Analysis (PCA) are needed when the data are analyzed. Optimization of a sampling plan increases the reliability of the analytical process which might at the end have beneficial effects on the economics of chemical analysis,
Resumo:
Diabetes is a rapidly increasing worldwide problem which is characterised by defective metabolism of glucose that causes long-term dysfunction and failure of various organs. The most common complication of diabetes is diabetic retinopathy (DR), which is one of the primary causes of blindness and visual impairment in adults. The rapid increase of diabetes pushes the limits of the current DR screening capabilities for which the digital imaging of the eye fundus (retinal imaging), and automatic or semi-automatic image analysis algorithms provide a potential solution. In this work, the use of colour in the detection of diabetic retinopathy is statistically studied using a supervised algorithm based on one-class classification and Gaussian mixture model estimation. The presented algorithm distinguishes a certain diabetic lesion type from all other possible objects in eye fundus images by only estimating the probability density function of that certain lesion type. For the training and ground truth estimation, the algorithm combines manual annotations of several experts for which the best practices were experimentally selected. By assessing the algorithm’s performance while conducting experiments with the colour space selection, both illuminance and colour correction, and background class information, the use of colour in the detection of diabetic retinopathy was quantitatively evaluated. Another contribution of this work is the benchmarking framework for eye fundus image analysis algorithms needed for the development of the automatic DR detection algorithms. The benchmarking framework provides guidelines on how to construct a benchmarking database that comprises true patient images, ground truth, and an evaluation protocol. The evaluation is based on the standard receiver operating characteristics analysis and it follows the medical practice in the decision making providing protocols for image- and pixel-based evaluations. During the work, two public medical image databases with ground truth were published: DIARETDB0 and DIARETDB1. The framework, DR databases and the final algorithm, are made public in the web to set the baseline results for automatic detection of diabetic retinopathy. Although deviating from the general context of the thesis, a simple and effective optic disc localisation method is presented. The optic disc localisation is discussed, since normal eye fundus structures are fundamental in the characterisation of DR.
Resumo:
The concept of open innovation has recently gained widespread attention, and is particularly relevant now as many firms endeavouring to implement open innovation, face different sets of challenges associated with managing it. Prior research on open innovation has focused on the internal processes dealing with open innovation implementation and the organizational changes, already taking place or yet required in companies order to succeed in the global open innovation market. Despite the intensive research on open innovation, the question of what influences its adoption by companies in different contexts has not received much attention in studies. To fill this gap, this thesis contribute to the discussion on open innovation influencing factors by bringing in the perspective of environmental impacts, i.e. gathering data on possible sources of external influences, classifying them and testing their systemic impact through conceptual system dynamics simulation model. The insights from data collection and conceptualization in modelling are used to answer the question of how the external environment affects the adoption of open innovation. The thesis research is presented through five research papers reflecting the method triangulation based study (conducted at initial stage as case study, later as quantitative analysis and finally as system dynamics simulation). This multitude of methods was used to collect the possible external influence factors and to assess their impact (on positive/negative scale rather than numerical). The results obtained throughout the thesis research bring valuable insights into understanding of open innovation influencing factors inside a firm’s operating environment, point out the balance required in the system for successful open innovation performance and discover the existence of tipping point of open innovation success when driven by market dynamics and structures. The practical implications on how firms and policy-makers can leverage environment for their potential benefits are offered in the conclusions.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.
Resumo:
Med prediktion avses att man skattar det framtida värdet på en observerbar storhet. Kännetecknande för det bayesianska paradigmet är att osäkerhet gällande okända storheter uttrycks i form av sannolikheter. En bayesiansk prediktiv modell är således en sannolikhetsfördelning över de möjliga värden som en observerbar, men ännu inte observerad storhet kan anta. I de artiklar som ingår i avhandlingen utvecklas metoder, vilka bl.a. tillämpas i analys av kromatografiska data i brottsutredningar. Med undantag för den första artikeln, bygger samtliga metoder på bayesiansk prediktiv modellering. I artiklarna betraktas i huvudsak tre olika typer av problem relaterade till kromatografiska data: kvantifiering, parvis matchning och klustring. I den första artikeln utvecklas en icke-parametrisk modell för mätfel av kromatografiska analyser av alkoholhalt i blodet. I den andra artikeln utvecklas en prediktiv inferensmetod för jämförelse av två stickprov. Metoden tillämpas i den tredje artik eln för jämförelse av oljeprover i syfte att kunna identifiera den förorenande källan i samband med oljeutsläpp. I den fjärde artikeln härleds en prediktiv modell för klustring av data av blandad diskret och kontinuerlig typ, vilken bl.a. tillämpas i klassificering av amfetaminprover med avseende på produktionsomgångar.
Resumo:
Wind power is a low-carbon energy production form that reduces the dependence of society on fossil fuels. Finland has adopted wind energy production into its climate change mitigation policy, and that has lead to changes in legislation, guidelines, regional wind power areas allocation and establishing a feed-in tariff. Wind power production has indeed boosted in Finland after two decades of relatively slow growth, for instance from 2010 to 2011 wind energy production increased with 64 %, but there is still a long way to the national goal of 6 TWh by 2020. This thesis introduces a GIS-based decision-support methodology for the preliminary identification of suitable areas for wind energy production including estimation of their level of risk. The goal of this study was to define the least risky places for wind energy development within Kemiönsaari municipality in Southwest Finland. Spatial multicriteria decision analysis (SMCDA) has been used for searching suitable wind power areas along with many other location-allocation problems. SMCDA scrutinizes complex ill-structured decision problems in GIS environment using constraints and evaluation criteria, which are aggregated using weighted linear combination (WLC). Weights for the evaluation criteria were acquired using analytic hierarchy process (AHP) with nine expert interviews. Subsequently, feasible alternatives were ranked in order to provide a recommendation and finally, a sensitivity analysis was conducted for the determination of recommendation robustness. The first study aim was to scrutinize the suitability and necessity of existing data for this SMCDA study. Most of the available data sets were of sufficient resolution and quality. Input data necessity was evaluated qualitatively for each data set based on e.g. constraint coverage and attribute weights. Attribute quality was estimated mainly qualitatively by attribute comprehensiveness, operationality, measurability, completeness, decomposability, minimality and redundancy. The most significant quality issue was redundancy as interdependencies are not tolerated by WLC and AHP does not include measures to detect them. The third aim was to define the least risky areas for wind power development within the study area. The two highest ranking areas were Nordanå-Lövböle and Påvalsby followed by Helgeboda, Degerdal, Pungböle, Björkboda, and Östanå-Labböle. The fourth aim was to assess the recommendation reliability, and the top-ranking two areas proved robust whereas the other ones were more sensitive.
Integration of marketing research data in new product development. Case study: Food industry company
Resumo:
The aim of this master’s thesis is to provide a real life example of how marketing research data is used by different functions in the NPD process. In order to achieve this goal, a case study in a company was implemented where gathering, analysis, distribution and synthesis of marketing research data in NPD were studied. The main research question was formulated as follows: How is marketing research data integrated and used by different company functions in the NPD process? The theory part of the master’s thesis was focused on the discussion of the marketing function role in NPD, use of marketing research particularly in the food industry, as well as issues related to the marketing/R&D interface during the NPD process. The empirical part of the master’s thesis was based on qualitative explanatory case study research. Individual in-depth interviews with company representatives, company documents and online research were used for data collection and analyzed through triangulation method. The empirical findings advocate that the most important marketing data sources at the concept generation stage of NPD are: global trends monitoring, retailing audit and consumers insights. These data sets are crucial for establishing the potential of the product on the market and defining the desired features for the new product to be developed. The findings also suggest the example of successful crossfunctional communication during the NPD process with formal and informal communication patterns. General managerial recommendations are given on the integration in NPD of a strategy, process, continuous improvement, and motivated cross-functional product development teams.