17 resultados para High-dimensional Data

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ongoing global financial crisis has demonstrated the importance of a systemwide, or macroprudential, approach to safeguarding financial stability. An essential part of macroprudential oversight concerns the tasks of early identification and assessment of risks and vulnerabilities that eventually may lead to a systemic financial crisis. Thriving tools are crucial as they allow early policy actions to decrease or prevent further build-up of risks or to otherwise enhance the shock absorption capacity of the financial system. In the literature, three types of systemic risk can be identified: i ) build-up of widespread imbalances, ii ) exogenous aggregate shocks, and iii ) contagion. Accordingly, the systemic risks are matched by three categories of analytical methods for decision support: i ) early-warning, ii ) macro stress-testing, and iii ) contagion models. Stimulated by the prolonged global financial crisis, today's toolbox of analytical methods includes a wide range of innovative solutions to the two tasks of risk identification and risk assessment. Yet, the literature lacks a focus on the task of risk communication. This thesis discusses macroprudential oversight from the viewpoint of all three tasks: Within analytical tools for risk identification and risk assessment, the focus concerns a tight integration of means for risk communication. Data and dimension reduction methods, and their combinations, hold promise for representing multivariate data structures in easily understandable formats. The overall task of this thesis is to represent high-dimensional data concerning financial entities on lowdimensional displays. The low-dimensional representations have two subtasks: i ) to function as a display for individual data concerning entities and their time series, and ii ) to use the display as a basis to which additional information can be linked. The final nuance of the task is, however, set by the needs of the domain, data and methods. The following ve questions comprise subsequent steps addressed in the process of this thesis: 1. What are the needs for macroprudential oversight? 2. What form do macroprudential data take? 3. Which data and dimension reduction methods hold most promise for the task? 4. How should the methods be extended and enhanced for the task? 5. How should the methods and their extensions be applied to the task? Based upon the Self-Organizing Map (SOM), this thesis not only creates the Self-Organizing Financial Stability Map (SOFSM), but also lays out a general framework for mapping the state of financial stability. This thesis also introduces three extensions to the standard SOM for enhancing the visualization and extraction of information: i ) fuzzifications, ii ) transition probabilities, and iii ) network analysis. Thus, the SOFSM functions as a display for risk identification, on top of which risk assessments can be illustrated. In addition, this thesis puts forward the Self-Organizing Time Map (SOTM) to provide means for visual dynamic clustering, which in the context of macroprudential oversight concerns the identification of cross-sectional changes in risks and vulnerabilities over time. Rather than automated analysis, the aim of visual means for identifying and assessing risks is to support disciplined and structured judgmental analysis based upon policymakers' experience and domain intelligence, as well as external risk communication.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Due to the large number of characteristics, there is a need to extract the most relevant characteristicsfrom the input data, so that the amount of information lost in this way is minimal, and the classification realized with the projected data set is relevant with respect to the original data. In order to achieve this feature extraction, different statistical techniques, as well as the principal components analysis (PCA) may be used. This thesis describes an extension of principal components analysis (PCA) allowing the extraction ofa finite number of relevant features from high-dimensional fuzzy data and noisy data. PCA finds linear combinations of the original measurement variables that describe the significant variation in the data. The comparisonof the two proposed methods was produced by using postoperative patient data. Experiment results demonstrate the ability of using the proposed two methods in complex data. Fuzzy PCA was used in the classificationproblem. The classification was applied by using the similarity classifier algorithm where total similarity measures weights are optimized with differential evolution algorithm. This thesis presents the comparison of the classification results based on the obtained data from the fuzzy PCA.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract The ultimate problem considered in this thesis is modeling a high-dimensional joint distribution over a set of discrete variables. For this purpose, we consider classes of context-specific graphical models and the main emphasis is on learning the structure of such models from data. Traditional graphical models compactly represent a joint distribution through a factorization justi ed by statements of conditional independence which are encoded by a graph structure. Context-speci c independence is a natural generalization of conditional independence that only holds in a certain context, speci ed by the conditioning variables. We introduce context-speci c generalizations of both Bayesian networks and Markov networks by including statements of context-specific independence which can be encoded as a part of the model structures. For the purpose of learning context-speci c model structures from data, we derive score functions, based on results from Bayesian statistics, by which the plausibility of a structure is assessed. To identify high-scoring structures, we construct stochastic and deterministic search algorithms designed to exploit the structural decomposition of our score functions. Numerical experiments on synthetic and real-world data show that the increased exibility of context-specific structures can more accurately emulate the dependence structure among the variables and thereby improve the predictive accuracy of the models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work presents new, efficient Markov chain Monte Carlo (MCMC) simulation methods for statistical analysis in various modelling applications. When using MCMC methods, the model is simulated repeatedly to explore the probability distribution describing the uncertainties in model parameters and predictions. In adaptive MCMC methods based on the Metropolis-Hastings algorithm, the proposal distribution needed by the algorithm learns from the target distribution as the simulation proceeds. Adaptive MCMC methods have been subject of intensive research lately, as they open a way for essentially easier use of the methodology. The lack of user-friendly computer programs has been a main obstacle for wider acceptance of the methods. This work provides two new adaptive MCMC methods: DRAM and AARJ. The DRAM method has been built especially to work in high dimensional and non-linear problems. The AARJ method is an extension to DRAM for model selection problems, where the mathematical formulation of the model is uncertain and we want simultaneously to fit several different models to the same observations. The methods were developed while keeping in mind the needs of modelling applications typical in environmental sciences. The development work has been pursued while working with several application projects. The applications presented in this work are: a winter time oxygen concentration model for Lake Tuusulanjärvi and adaptive control of the aerator; a nutrition model for Lake Pyhäjärvi and lake management planning; validation of the algorithms of the GOMOS ozone remote sensing instrument on board the Envisat satellite of European Space Agency and the study of the effects of aerosol model selection on the GOMOS algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis addresses the problem of computing the minimal and maximal diameter of the Cayley graph of Coxeter groups. We first present and assert relevant parts of polytope theory and related Coxeter theory. After this, a method of contracting the orthogonal projections of a polytope from Rd onto R2 and R3, d ¸ 3 is presented. This method is the Equality Set Projection algorithm that requires a constant number of linearprogramming problems per facet of the projection in the absence of degeneracy. The ESP algorithm allows us to compute also projected geometric diameters of high-dimensional polytopes. A representation set of projected polytopes is presented to illustrate the methods adopted in this thesis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The aim of the work is to study the existing analytical calculation procedures found in literature to calculate the eddy-current losses in surface mounted permanent magnets within PMSM application. The most promising algorithms are implemented with MATLAB software under the dimensional data of LUT prototype machine. In addition finite elements analyze, utilized with help of Flux 2D software from Cedrat Ltd, is applied to calculate the eddy-current losses in permanent magnets. The results obtained from analytical methods are compared with numerical results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Internetin yhteisöpalvelut ovat saavuttaneet suuren suosion. Ne mahdollistavat digitaalisten yhteisöjen ja yhteisöllisyyden tunteiden aikaansaamisen käyttäjien välille. Kehittyneet verkkoyhteydet ja sisältötekniikat ovat antaneet mahdollisuuden monipuolisten vuorovaikutustyökalujen toteuttamiseksi. Sähköinen yhteisöllisyys on nouseva trendi, joka tukee arkipäivästä todellisuutta. Verkkojen paikallinen kehitys, niin laajakaistan kuin erilaisten alueverkkojen avulla on nostanut ajatuksia luoda myös paikallista yhteisöllisyyttä globaalien yhteisöpalvelujen rinnalle. Nopeiden alueverkkojen laajentuminen ovat tuoneet edistyneet verkkoyhteydet niin lähiöihin kuin taajamienkin ulkopuolelle. Avointen alueverkkojen mallissa verkko ja palvelukerros ovat eriytetty toisistaan. Ulkopuoliset palveluntarjoajat voivat tarjota palveluitaan suoraan alueverkon käyttäjille lähempänä verkkotasoa, verrattuna perinteiseen yhden verkkooperaattorin malliin. Tämä mahdollistaa uusien innovatiivisempien palveluiden kehittelyn. Tässä diplomityössä tutkittiin mahdollisuuksia joilla voidaan edistää yhteisöllisyyttä paikallisissa alueverkoissa ja hyödyntää niiden paikallista suorituskykyä, sekä resursseja palveluiden toteutuksessa. Työssä selvitettiin ensin mistä käsitteet yhteisö ja yhteisöllisyys ovat muodostuneet. Selvityksen pohjalta tutkittiin mitä teknisiä menetelmiä on yhteisöjen ja yhteisöllisyyden tunteen aikaansaamiseksi olemassa. Selvityksen tuloksena syntyi teknologinen tiekartta tietoverkkoyhteisöllisyyteen, sekä sosiaalisten palvelualustojen luokittelukaavio, joiden tarkoitus on yhdessä kuvastaa yhteisöllisyyttä tukevia palvelumahdollisuuksia. Työn viimeisessä vaiheessa toteutettiin yhteisöllinen alueverkkopalvelu, sekä yhteisövalvontajärjestelmä – lisäarvopalvelukonsepti, jotka pyrkivät hyödyntämään paikallisen alueverkon tarjoamaa suorituskykyä ja resursseja.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis is concerned with the state and parameter estimation in state space models. The estimation of states and parameters is an important task when mathematical modeling is applied to many different application areas such as the global positioning systems, target tracking, navigation, brain imaging, spread of infectious diseases, biological processes, telecommunications, audio signal processing, stochastic optimal control, machine learning, and physical systems. In Bayesian settings, the estimation of states or parameters amounts to computation of the posterior probability density function. Except for a very restricted number of models, it is impossible to compute this density function in a closed form. Hence, we need approximation methods. A state estimation problem involves estimating the states (latent variables) that are not directly observed in the output of the system. In this thesis, we use the Kalman filter, extended Kalman filter, Gauss–Hermite filters, and particle filters to estimate the states based on available measurements. Among these filters, particle filters are numerical methods for approximating the filtering distributions of non-linear non-Gaussian state space models via Monte Carlo. The performance of a particle filter heavily depends on the chosen importance distribution. For instance, inappropriate choice of the importance distribution can lead to the failure of convergence of the particle filter algorithm. In this thesis, we analyze the theoretical Lᵖ particle filter convergence with general importance distributions, where p ≥2 is an integer. A parameter estimation problem is considered with inferring the model parameters from measurements. For high-dimensional complex models, estimation of parameters can be done by Markov chain Monte Carlo (MCMC) methods. In its operation, the MCMC method requires the unnormalized posterior distribution of the parameters and a proposal distribution. In this thesis, we show how the posterior density function of the parameters of a state space model can be computed by filtering based methods, where the states are integrated out. This type of computation is then applied to estimate parameters of stochastic differential equations. Furthermore, we compute the partial derivatives of the log-posterior density function and use the hybrid Monte Carlo and scaled conjugate gradient methods to infer the parameters of stochastic differential equations. The computational efficiency of MCMC methods is highly depend on the chosen proposal distribution. A commonly used proposal distribution is Gaussian. In this kind of proposal, the covariance matrix must be well tuned. To tune it, adaptive MCMC methods can be used. In this thesis, we propose a new way of updating the covariance matrix using the variational Bayesian adaptive Kalman filter algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The objective of the thesis is to examine the market reaction of Finnish large-cap stocks to layoff announcements, using the event study methodology to gain insight in to whether the reaction is positive or negative, and whether it has changed over the years since the last studies were conducted. Another aim is also to examine whether the market reaction has changed during the times of the financial crisis, when the number of layoffs in Finland has been unusually high. The data consists of 128 publicly announced layoff announcements during the eight years from January 2006 to January 2014. The average market reaction to layoff announcements during different time periods within the overall sample was studied based on abnormal returns indicated by the event study methodology. The earlier research suggest that the overall market reaction to layoff announcements is negative. An overwhelming majority of these studies were conducted in the 1990s based on 80’s data. The market reaction found in this study was slightly positive, although the result was not statistically significant. The market reaction has decreased during the years of the financial crisis, but this result too, is not statistically significant.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work is devoted to the problem of reconstructing the basis weight structure at paper web with black{box techniques. The data that is analyzed comes from a real paper machine and is collected by an o®-line scanner. The principal mathematical tool used in this work is Autoregressive Moving Average (ARMA) modelling. When coupled with the Discrete Fourier Transform (DFT), it gives a very flexible and interesting tool for analyzing properties of the paper web. Both ARMA and DFT are independently used to represent the given signal in a simplified version of our algorithm, but the final goal is to combine the two together. Ljung-Box Q-statistic lack-of-fit test combined with the Root Mean Squared Error coefficient gives a tool to separate significant signals from noise.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Cells of epithelial origin, e.g. from breast and prostate cancers, effectively differentiate into complex multicellular structures when cultured in three-dimensions (3D) instead of conventional two-dimensional (2D) adherent surfaces. The spectrum of different organotypic morphologies is highly dependent on the culture environment that can be either non-adherent or scaffold-based. When embedded in physiological extracellular matrices (ECMs), such as laminin-rich basement membrane extracts, normal epithelial cells differentiate into acinar spheroids reminiscent of glandular ductal structures. Transformed cancer cells, in contrast, typically fail to undergo acinar morphogenic patterns, forming poorly differentiated or invasive multicellular structures. The 3D cancer spheroids are widely accepted to better recapitulate various tumorigenic processes and drug responses. So far, however, 3D models have been employed predominantly in the Academia, whereas the pharmaceutical industry has yet to adopt a more widely and routine use. This is mainly due to poor characterisation of cell models, lack of standardised workflows and high throughput cell culture platforms, and the availability of proper readout and quantification tools. In this thesis, a complete workflow has been established entailing well-characterised 3D cell culture models for prostate cancer, a standardised 3D cell culture routine based on high-throughput-ready platform, automated image acquisition with concomitant morphometric image analysis, and data visualisation, in order to enable large-scale high-content screens. Our integrated suite of software and statistical analysis tools were optimised and validated using a comprehensive panel of prostate cancer cell lines and 3D models. The tools quantify multiple key cancer-relevant morphological features, ranging from cancer cell invasion through multicellular differentiation to growth, and detect dynamic changes both in morphology and function, such as cell death and apoptosis, in response to experimental perturbations including RNA interference and small molecule inhibitors. Our panel of cell lines included many non-transformed and most currently available classic prostate cancer cell lines, which were characterised for their morphogenetic properties in 3D laminin-rich ECM. The phenotypes and gene expression profiles were evaluated concerning their relevance for pre-clinical drug discovery, disease modelling and basic research. In addition, a spontaneous model for invasive transformation was discovered, displaying a highdegree of epithelial plasticity. This plasticity is mediated by an abundant bioactive serum lipid, lysophosphatidic acid (LPA), and its receptor LPAR1. The invasive transformation was caused by abrupt cytoskeletal rearrangement through impaired G protein alpha 12/13 and RhoA/ROCK, and mediated by upregulated adenylyl cyclase/cyclic AMP (cAMP)/protein kinase A, and Rac/ PAK pathways. The spontaneous invasion model tangibly exemplifies the biological relevance of organotypic cell culture models. Overall, this thesis work underlines the power of novel morphometric screening tools in drug discovery.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The paper is devoted to study specific aspects of heat transfer in the combustion chamber of compression ignited reciprocating internal combustion engines and possibility to directly measure the heat flux by means of Gradient Heat Flux Sensors (GHFS). A one – dimensional single zone model proposed by Kyung Tae Yun et al. and implemented with the aid of Matlab, was used to obtain approximate picture of heat flux behavior in the combustion chamber with relation to the crank angle. The model’s numerical output was compared to the experimental results. The experiment was accomplished by A. Mityakov at four stroke diesel engine Indenor XL4D. Local heat fluxes on the surface of cylinder head were measured with fast – response, high – sensitive GHFS. The comparison of numerical data with experimental results has revealed a small deviation in obtained heat flux values throughout the cycle and different behavior of heat flux curve after Top Dead Center.