7 resultados para PRINCIPAL COMPONENTS-ANALYSIS

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the large number of characteristics, there is a need to extract the most relevant characteristicsfrom the input data, so that the amount of information lost in this way is minimal, and the classification realized with the projected data set is relevant with respect to the original data. In order to achieve this feature extraction, different statistical techniques, as well as the principal components analysis (PCA) may be used. This thesis describes an extension of principal components analysis (PCA) allowing the extraction ofa finite number of relevant features from high-dimensional fuzzy data and noisy data. PCA finds linear combinations of the original measurement variables that describe the significant variation in the data. The comparisonof the two proposed methods was produced by using postoperative patient data. Experiment results demonstrate the ability of using the proposed two methods in complex data. Fuzzy PCA was used in the classificationproblem. The classification was applied by using the similarity classifier algorithm where total similarity measures weights are optimized with differential evolution algorithm. This thesis presents the comparison of the classification results based on the obtained data from the fuzzy PCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The uncertainty of any analytical determination depends on analysis and sampling. Uncertainty arising from sampling is usually not controlled and methods for its evaluation are still little known. Pierre Gy’s sampling theory is currently the most complete theory about samplingwhich also takes the design of the sampling equipment into account. Guides dealing with the practical issues of sampling also exist, published by international organizations such as EURACHEM, IUPAC (International Union of Pure and Applied Chemistry) and ISO (International Organization for Standardization). In this work Gy’s sampling theory was applied to several cases, including the analysis of chromite concentration estimated on SEM (Scanning Electron Microscope) images and estimation of the total uncertainty of a drug dissolution procedure. The results clearly show that Gy’s sampling theory can be utilized in both of the above-mentioned cases and that the uncertainties achieved are reliable. Variographic experiments introduced in Gy’s sampling theory are beneficially applied in analyzing the uncertainty of auto-correlated data sets such as industrial process data and environmental discharges. The periodic behaviour of these kinds of processes can be observed by variographic analysis as well as with fast Fourier transformation and auto-correlation functions. With variographic analysis, the uncertainties are estimated as a function of the sampling interval. This is advantageous when environmental data or process data are analyzed as it can be easily estimated how the sampling interval is affecting the overall uncertainty. If the sampling frequency is too high, unnecessary resources will be used. On the other hand, if a frequency is too low, the uncertainty of the determination may be unacceptably high. Variographic methods can also be utilized to estimate the uncertainty of spectral data produced by modern instruments. Since spectral data are multivariate, methods such as Principal Component Analysis (PCA) are needed when the data are analyzed. Optimization of a sampling plan increases the reliability of the analytical process which might at the end have beneficial effects on the economics of chemical analysis,

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Finnish healthcare industry is currently facing significant challenges due to economic crises, aging population and major structural reforms, which have resulted in decreased job satisfaction and increased levels of turnover. This proposes that healthcare organizations need to come up with new, creative means to tackle these issues. Several researchers have argued that corporate entrepreneurship may be the necessary means to achieve this. As previous research has mainly focused on examining this concept from organizational perspective, this study looks at how it occurs on the level of individual employees. The purpose of this study is to examine how corporate entrepreneurship is manifested in individual behavior, and how this type of behavior is associated with the individual’s job satisfaction and turnover intention. Additionally, this study will examine the differences in corporate entrepreneurial behavior between private and public sector organizations, as previous research suggests that these two may be characterized differently. Data was collected with the help of a literature review as well as a survey study, which was sent out to a number of employees of four different healthcare organizations, out of which three were public and one was a private sector organization. Six distinct behavioral characteristics were recognized in previous research, which make up the measure for corporate entrepreneurial behavior. Principal components were formed from the different areas of the survey (corporate entrepreneurial behavior, job satisfaction, turnover intention), after which the association of these components were examined with linear regression analysis, which proved that corporate entrepreneurial behavior is positively correlated with both job satisfaction and intention to leave the organization. Differences between sectors were analyzed with analysis of variance and cross tabulation analysis, but neither of these suggested that any significant differences would occur. These results suggest that employees who behave entrepreneurially tend to be more satisfied with their jobs, but also consider leaving their current organizations more often than others. This may be due to the fact that healthcare organizations are not fertile for entrepreneurial behavior, which will drive entrepreneurial individuals looking for employers who may be more supportive of this type of behavior. With growing levels of dissatisfaction as well as little room for entrepreneurial behavior, the studied organizations may actually be in the process of losing those employees who have the ability and desire to behave in such manner, and who could very well be those who will eventually come up with solutions for the major challenges that these organizations are facing.