19 resultados para high dimensional secondary classifier
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Due to the large number of characteristics, there is a need to extract the most relevant characteristicsfrom the input data, so that the amount of information lost in this way is minimal, and the classification realized with the projected data set is relevant with respect to the original data. In order to achieve this feature extraction, different statistical techniques, as well as the principal components analysis (PCA) may be used. This thesis describes an extension of principal components analysis (PCA) allowing the extraction ofa finite number of relevant features from high-dimensional fuzzy data and noisy data. PCA finds linear combinations of the original measurement variables that describe the significant variation in the data. The comparisonof the two proposed methods was produced by using postoperative patient data. Experiment results demonstrate the ability of using the proposed two methods in complex data. Fuzzy PCA was used in the classificationproblem. The classification was applied by using the similarity classifier algorithm where total similarity measures weights are optimized with differential evolution algorithm. This thesis presents the comparison of the classification results based on the obtained data from the fuzzy PCA.
Resumo:
This work presents new, efficient Markov chain Monte Carlo (MCMC) simulation methods for statistical analysis in various modelling applications. When using MCMC methods, the model is simulated repeatedly to explore the probability distribution describing the uncertainties in model parameters and predictions. In adaptive MCMC methods based on the Metropolis-Hastings algorithm, the proposal distribution needed by the algorithm learns from the target distribution as the simulation proceeds. Adaptive MCMC methods have been subject of intensive research lately, as they open a way for essentially easier use of the methodology. The lack of user-friendly computer programs has been a main obstacle for wider acceptance of the methods. This work provides two new adaptive MCMC methods: DRAM and AARJ. The DRAM method has been built especially to work in high dimensional and non-linear problems. The AARJ method is an extension to DRAM for model selection problems, where the mathematical formulation of the model is uncertain and we want simultaneously to fit several different models to the same observations. The methods were developed while keeping in mind the needs of modelling applications typical in environmental sciences. The development work has been pursued while working with several application projects. The applications presented in this work are: a winter time oxygen concentration model for Lake Tuusulanjärvi and adaptive control of the aerator; a nutrition model for Lake Pyhäjärvi and lake management planning; validation of the algorithms of the GOMOS ozone remote sensing instrument on board the Envisat satellite of European Space Agency and the study of the effects of aerosol model selection on the GOMOS algorithm.
Resumo:
This thesis addresses the problem of computing the minimal and maximal diameter of the Cayley graph of Coxeter groups. We first present and assert relevant parts of polytope theory and related Coxeter theory. After this, a method of contracting the orthogonal projections of a polytope from Rd onto R2 and R3, d ¸ 3 is presented. This method is the Equality Set Projection algorithm that requires a constant number of linearprogramming problems per facet of the projection in the absence of degeneracy. The ESP algorithm allows us to compute also projected geometric diameters of high-dimensional polytopes. A representation set of projected polytopes is presented to illustrate the methods adopted in this thesis.
Resumo:
Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.
Resumo:
The ongoing global financial crisis has demonstrated the importance of a systemwide, or macroprudential, approach to safeguarding financial stability. An essential part of macroprudential oversight concerns the tasks of early identification and assessment of risks and vulnerabilities that eventually may lead to a systemic financial crisis. Thriving tools are crucial as they allow early policy actions to decrease or prevent further build-up of risks or to otherwise enhance the shock absorption capacity of the financial system. In the literature, three types of systemic risk can be identified: i ) build-up of widespread imbalances, ii ) exogenous aggregate shocks, and iii ) contagion. Accordingly, the systemic risks are matched by three categories of analytical methods for decision support: i ) early-warning, ii ) macro stress-testing, and iii ) contagion models. Stimulated by the prolonged global financial crisis, today's toolbox of analytical methods includes a wide range of innovative solutions to the two tasks of risk identification and risk assessment. Yet, the literature lacks a focus on the task of risk communication. This thesis discusses macroprudential oversight from the viewpoint of all three tasks: Within analytical tools for risk identification and risk assessment, the focus concerns a tight integration of means for risk communication. Data and dimension reduction methods, and their combinations, hold promise for representing multivariate data structures in easily understandable formats. The overall task of this thesis is to represent high-dimensional data concerning financial entities on lowdimensional displays. The low-dimensional representations have two subtasks: i ) to function as a display for individual data concerning entities and their time series, and ii ) to use the display as a basis to which additional information can be linked. The final nuance of the task is, however, set by the needs of the domain, data and methods. The following ve questions comprise subsequent steps addressed in the process of this thesis: 1. What are the needs for macroprudential oversight? 2. What form do macroprudential data take? 3. Which data and dimension reduction methods hold most promise for the task? 4. How should the methods be extended and enhanced for the task? 5. How should the methods and their extensions be applied to the task? Based upon the Self-Organizing Map (SOM), this thesis not only creates the Self-Organizing Financial Stability Map (SOFSM), but also lays out a general framework for mapping the state of financial stability. This thesis also introduces three extensions to the standard SOM for enhancing the visualization and extraction of information: i ) fuzzifications, ii ) transition probabilities, and iii ) network analysis. Thus, the SOFSM functions as a display for risk identification, on top of which risk assessments can be illustrated. In addition, this thesis puts forward the Self-Organizing Time Map (SOTM) to provide means for visual dynamic clustering, which in the context of macroprudential oversight concerns the identification of cross-sectional changes in risks and vulnerabilities over time. Rather than automated analysis, the aim of visual means for identifying and assessing risks is to support disciplined and structured judgmental analysis based upon policymakers' experience and domain intelligence, as well as external risk communication.
Resumo:
The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.
Resumo:
This thesis is concerned with the state and parameter estimation in state space models. The estimation of states and parameters is an important task when mathematical modeling is applied to many different application areas such as the global positioning systems, target tracking, navigation, brain imaging, spread of infectious diseases, biological processes, telecommunications, audio signal processing, stochastic optimal control, machine learning, and physical systems. In Bayesian settings, the estimation of states or parameters amounts to computation of the posterior probability density function. Except for a very restricted number of models, it is impossible to compute this density function in a closed form. Hence, we need approximation methods. A state estimation problem involves estimating the states (latent variables) that are not directly observed in the output of the system. In this thesis, we use the Kalman filter, extended Kalman filter, Gauss–Hermite filters, and particle filters to estimate the states based on available measurements. Among these filters, particle filters are numerical methods for approximating the filtering distributions of non-linear non-Gaussian state space models via Monte Carlo. The performance of a particle filter heavily depends on the chosen importance distribution. For instance, inappropriate choice of the importance distribution can lead to the failure of convergence of the particle filter algorithm. In this thesis, we analyze the theoretical Lᵖ particle filter convergence with general importance distributions, where p ≥2 is an integer. A parameter estimation problem is considered with inferring the model parameters from measurements. For high-dimensional complex models, estimation of parameters can be done by Markov chain Monte Carlo (MCMC) methods. In its operation, the MCMC method requires the unnormalized posterior distribution of the parameters and a proposal distribution. In this thesis, we show how the posterior density function of the parameters of a state space model can be computed by filtering based methods, where the states are integrated out. This type of computation is then applied to estimate parameters of stochastic differential equations. Furthermore, we compute the partial derivatives of the log-posterior density function and use the hybrid Monte Carlo and scaled conjugate gradient methods to infer the parameters of stochastic differential equations. The computational efficiency of MCMC methods is highly depend on the chosen proposal distribution. A commonly used proposal distribution is Gaussian. In this kind of proposal, the covariance matrix must be well tuned. To tune it, adaptive MCMC methods can be used. In this thesis, we propose a new way of updating the covariance matrix using the variational Bayesian adaptive Kalman filter algorithm.
Resumo:
Abstract The ultimate problem considered in this thesis is modeling a high-dimensional joint distribution over a set of discrete variables. For this purpose, we consider classes of context-specific graphical models and the main emphasis is on learning the structure of such models from data. Traditional graphical models compactly represent a joint distribution through a factorization justi ed by statements of conditional independence which are encoded by a graph structure. Context-speci c independence is a natural generalization of conditional independence that only holds in a certain context, speci ed by the conditioning variables. We introduce context-speci c generalizations of both Bayesian networks and Markov networks by including statements of context-specific independence which can be encoded as a part of the model structures. For the purpose of learning context-speci c model structures from data, we derive score functions, based on results from Bayesian statistics, by which the plausibility of a structure is assessed. To identify high-scoring structures, we construct stochastic and deterministic search algorithms designed to exploit the structural decomposition of our score functions. Numerical experiments on synthetic and real-world data show that the increased exibility of context-specific structures can more accurately emulate the dependence structure among the variables and thereby improve the predictive accuracy of the models.
Resumo:
Turbokoneet ja etenkin höyryturbiinit ovat usein suunniteltu ja optimoitu toimimaan tietyssä toimintapisteessä jossa häviöt on minimoitu ja hyötysuhde maksimoitu. Joissakin tapauksissa on kuitenkin tarpeellista käyttää turbiinia toimintapisteen ulkopuolella. Tällöin turbiinin läpi virtaava massavirta muuttuu ja yleensä heikentää hyötysuhdetta. Turbokoneiden suorituskykyä voidaan parantaa käyttämällä kolmidimensionaalisesti muotoiltuja siipiä. Työssä on vertailtu laskennallisesti kahta kohtuullisesti muotoiltua suutinta (Compound lean ja Controlled flow) niiden suunnitellun toimintapisteen ulkopuolella. Kolmas suutin, ilman kolmidimensionaalista muotoilua on mukana vertailukohteena. Suutinten suorituskykyä tutkitaan laskennallisen virtausmekaniikan avulla olosuhteissa, jotka ovat toimintapisteen ulkopuolella. Virtauksen muutoksia tutkitaan kokonaispainehäviön, isentrooppisen hyötysuhteen ja virtauspinnan yhdenmukaisuuden avulla. Virtauspintoja verrataan ulosvirtauskulman, massavirran ja toisiovirtausvektoreiden jakauman avulla. Erot suutinten suorituskykyvyssä korostavat ylikuormalla. Kun massavirran arvoa on kohotettu eniten, Compound lean suuttimilla hyötysuhde laskee Controlled flow suuttimeen verrattuna vähemmän. Alikuormalla, kun massavirran arvoa lasketaan, erot suuttimien suorituskyvyssä pienenevät ja tutkittujen suuttimien ulosvirtaus on samankaltainen.
Resumo:
A method for the analysis of high-speed solid-rotor induction motors in presented. The analysis is based on a new combination of the three dimensional linear method and the transfer matrix method. Both saturation and finite length effects are taken into account. The active region of the solid rotor is divided into saturated and unsaturated parts. The time dependence is assumed to be sinusoidal and phasor quantities are used in the solution. The method is applied to the calculation of smooth solid rotors manufactured of different materials. Six rotor materials are tested: three construction steels, pure iron, a cobaltiron alloy and an aluminium alloy. The results obtained by the method agree fairly well with the measurement quantities.
Resumo:
The results and discussions in this thesis are based on my studies about selfassembled thiol layers on gold, platinum, silver and copper surfaces. These kinds of layers are two-dimensional, one molecule thick and covalently organized at the surface. They are an easy way to modify surface properties. Self-assembly is today an intensive research field because of the promise it holds for producing new technology at nanoscale, the scale of atoms and molecules. These kinds of films have applications for example, in the fields of physics, biology, engineering, chemistry and computer science. Compared to the extensive literature concerning self-assembled monolayers (SAMs) on gold, little is known about the structure and properties of thiolbased SAMs on other metals. In this thesis I have focused on thiol layers on gold, platinum, silver and copper substrates. These studies can be regarded as a basic study of SAMs. Nevertheless, an understanding of the physical and chemical nature of SAMs allows the correlation between atomic structure and macroscopic properties. The results can be used as a starting point for many practical applications. X-ray photoelectron spectroscopy (XPS) and synchrotron radiation excited high resolution photoelectron spectroscopy (HR-XPS) together with time-offlight secondary ion mass spectrometry (ToF-SIMS) were applied to investigate thin organic films formed by the spontaneous adsorption of molecules on metal surfaces. Photoelectron spectroscopy was the main method used in these studies. In photoelectron spectroscopy, the sample is irradiated with photons and emitted photoelectrons are energy-analyzed. The obtained spectra give information about the atomic composition of the surface and about the chemical state of the detected elements. It is widely used in the study of thin layers and is a very powerful tool for this purpose. Some XPS results were complemented with ToF-SIMS measurements. It provides information on the chemical composition and molecular structure of the samples. Thiol (1-Dodecanethiol, CH3(CH2)11SH) solution was used to create SAMs on metal substrates. Uniform layers were formed on most of the studied metal surfaces. On platinum, surface aligned molecules were also detected in investigations by XPS and ToF-SIMS. The influence of radiation on the layer structure was studied, leading to the conclusion that parts of the hydrocarbon chains break off due to radiation and the rest of the layer is deformed. The results obtained showed differences depending on the substrate material. The influence of oxygen on layer formation was also studied. Thiol molecules were found to replace some of the oxygen from the metal surfaces.
Resumo:
The purpose of this thesis was to investigate the compression of filter cakes at high filtration pressures with five different test materials and to compare the energy consumption of high pressure compression with the energy consumption of thermal drying. The secondary target of this study was to investigate the particle deformation of test materials during filtration and compression. Literature part consists of basic theory of filtration and compression and of the basic parameters that influence the filtration process. There is also a brief description about all of the test materials including their properties and their industrial production and processing. Theoretical equations for calculating the energy consumptions of the filtrations at different conditions are also presented. At the beginning of the experiments at experimental part, the basic filtration tests were done with all the five test materials. Filtration tests were made at eight different pressures, from 6 bars up to 100 bars, by using piston press pressure filter. Filtration tests were then repeated by using a cylinder with smaller slurry volume than in the first series of filtration tests. Separate filtration tests were also done for investigating the deformation of solid particles during filtration and for finding the optimal curve for raising the filtration pressure. Energy consumption differences between high pressure filtration and ideal thermal drying process were done partly experimentally and partly by using theoretical calculation equations. By comparing these two water removal methods, the optimal ranges for their use were found considering their energy efficiency. The results of the measurements shows that the filtration rate increased and the moisture content of the filter cakes decreased as the filtration pressure was increased. Also the porosity of the filter cakes mainly decreased when the filtration pressure was increased. Particle deformation during the filtration was observed only with coal particles.
Resumo:
Cells of epithelial origin, e.g. from breast and prostate cancers, effectively differentiate into complex multicellular structures when cultured in three-dimensions (3D) instead of conventional two-dimensional (2D) adherent surfaces. The spectrum of different organotypic morphologies is highly dependent on the culture environment that can be either non-adherent or scaffold-based. When embedded in physiological extracellular matrices (ECMs), such as laminin-rich basement membrane extracts, normal epithelial cells differentiate into acinar spheroids reminiscent of glandular ductal structures. Transformed cancer cells, in contrast, typically fail to undergo acinar morphogenic patterns, forming poorly differentiated or invasive multicellular structures. The 3D cancer spheroids are widely accepted to better recapitulate various tumorigenic processes and drug responses. So far, however, 3D models have been employed predominantly in the Academia, whereas the pharmaceutical industry has yet to adopt a more widely and routine use. This is mainly due to poor characterisation of cell models, lack of standardised workflows and high throughput cell culture platforms, and the availability of proper readout and quantification tools. In this thesis, a complete workflow has been established entailing well-characterised 3D cell culture models for prostate cancer, a standardised 3D cell culture routine based on high-throughput-ready platform, automated image acquisition with concomitant morphometric image analysis, and data visualisation, in order to enable large-scale high-content screens. Our integrated suite of software and statistical analysis tools were optimised and validated using a comprehensive panel of prostate cancer cell lines and 3D models. The tools quantify multiple key cancer-relevant morphological features, ranging from cancer cell invasion through multicellular differentiation to growth, and detect dynamic changes both in morphology and function, such as cell death and apoptosis, in response to experimental perturbations including RNA interference and small molecule inhibitors. Our panel of cell lines included many non-transformed and most currently available classic prostate cancer cell lines, which were characterised for their morphogenetic properties in 3D laminin-rich ECM. The phenotypes and gene expression profiles were evaluated concerning their relevance for pre-clinical drug discovery, disease modelling and basic research. In addition, a spontaneous model for invasive transformation was discovered, displaying a highdegree of epithelial plasticity. This plasticity is mediated by an abundant bioactive serum lipid, lysophosphatidic acid (LPA), and its receptor LPAR1. The invasive transformation was caused by abrupt cytoskeletal rearrangement through impaired G protein alpha 12/13 and RhoA/ROCK, and mediated by upregulated adenylyl cyclase/cyclic AMP (cAMP)/protein kinase A, and Rac/ PAK pathways. The spontaneous invasion model tangibly exemplifies the biological relevance of organotypic cell culture models. Overall, this thesis work underlines the power of novel morphometric screening tools in drug discovery.
Resumo:
Integrins are heterodimeric, signaling transmembrane adhesion receptors that connect the intracellular actin microfilaments to the extracellular matrix composed of collagens and other matrix molecules. Bidirectional signaling is mediated via drastic conformational changes in integrins. These changes also occur in the integrin αI domains, which are responsible for ligand binding by collagen receptor and leukocyte specific integrins. Like intact integrins, soluble αI domains exist in the closed, low affinity form and in the open, high affinity form, and so it is possible to use isolated αI domains to study the factors and mechanisms involved in integrin activation/deactivation. Integrins are found in all mammalian tissues and cells, where they play crucial roles in growth, migration, defense mechanisms and apoptosis. Integrins are involved in many human diseases, such as inflammatory, cardiovascular and metastatic diseases, and so plenty of effort has been invested into developing integrin specific drugs. Humans have 24 different integrins, four of which are collagen receptor (α1β1, α2β1, α10β1, α11β1) and five leukocyte specific integrins (αLβ2, αMβ2, αXβ2, αDβ2, αEβ7). These two integrin groups are quite unselective having both primary and secondary ligands. This work presents the first systematic studies performed on these integrin groups to find out how integrin activation affects ligand binding and selectivity. These kinds of studies are important not only for understanding the partially overlapping functions of integrins, but also for drug development. In general, our results indicated that selectivity in ligand recognition is greatly reduced upon integrin activation. Interestingly, in some cases the ligand binding properties of integrins have been shown to be cell type specific. The reason for this is not known, but our observations suggest that cell types with a higher integrin activation state have lower ligand selectivity, and vice versa. Furthermore, we solved the three-dimensional structure for the activated form of the collagen receptor α1I domain. This structure revealed a novel intermediate conformation not previously seen with any other integrin αI domain. This is the first 3D structure for an activated collagen receptor αI domain without ligand. Based on the differences between the open and closed conformation of the αI domain we set structural criteria for a search for effective collagen receptor drugs. By docking a large number of molecules into the closed conformation of the α2I domain we discovered two polyketides, which best fulfilled the set structural criteria, and by cell adhesion studies we showed them to be specific inhibitors of the collagen receptor integrins.