850 resultados para High-dimensional data visualization
Resumo:
Experimental characterization of high dimensional dynamic systems sometimes uses the proper orthogonal decomposition (POD). If there are many measurement locations and relatively fewer sensors, then steady-state behavior can still be studied by sequentially taking several sets of simultaneous measurements. The number required of such sets of measurements can be minimized if we solve a combinatorial optimization problem. We aim to bring this problem to the attention of engineering audiences, summarize some known mathematical results about this problem, and present a heuristic (suboptimal) calculation that gives reasonable, if not stellar, results.
Resumo:
The Basic Local Alignment Search Tool (BLAST) is one of the most widely used sequence alignment programs with which similarity searches, for both protein and nucleic acid sequences, can be performed against large databases at high speed. A large number of tools exist for processing BLAST output, but none of them provide three-dimensional structure visualization. This shortcoming has been addressed in the proposed tool BLAST Server for Structural Biologists (BSSB), which maps a BLAST output onto the three-dimensional structure of the subject protein. The three-dimensional structure of the subject protein is represented using a three-color coding scheme (identical: red; similar: yellow; and mismatch: white) based on the pairwise alignment obtained. Thus, the user will be able to visualize a possible three-dimensional structure for the query protein sequence. This information can be used to gain a deeper insight into the sequence-structure correlation. Furthermore, the additional structure-level information enables the user to make coherent and logical decisions regarding the type of input model structure or fragment that can be used for molecular replacement calculations. This tool is freely available to all users at http://bioserver1.physics.iisc.ernet.in/bssb/.
Resumo:
The Morse-Smale complex is a useful topological data structure for the analysis and visualization of scalar data. This paper describes an algorithm that processes all mesh elements of the domain in parallel to compute the Morse-Smale complex of large two-dimensional data sets at interactive speeds. We employ a reformulation of the Morse-Smale complex using Forman's Discrete Morse Theory and achieve scalability by computing the discrete gradient using local accesses only. We also introduce a novel approach to merge gradient paths that ensures accurate geometry of the computed complex. We demonstrate that our algorithm performs well on both multicore environments and on massively parallel architectures such as the GPU.
Resumo:
Structural Support Vector Machines (SSVMs) have recently gained wide prominence in classifying structured and complex objects like parse-trees, image segments and Part-of-Speech (POS) tags. Typical learning algorithms used in training SSVMs result in model parameters which are vectors residing in a large-dimensional feature space. Such a high-dimensional model parameter vector contains many non-zero components which often lead to slow prediction and storage issues. Hence there is a need for sparse parameter vectors which contain a very small number of non-zero components. L1-regularizer and elastic net regularizer have been traditionally used to get sparse model parameters. Though L1-regularized structural SVMs have been studied in the past, the use of elastic net regularizer for structural SVMs has not been explored yet. In this work, we formulate the elastic net SSVM and propose a sequential alternating proximal algorithm to solve the dual formulation. We compare the proposed method with existing methods for L1-regularized Structural SVMs. Experiments on large-scale benchmark datasets show that the proposed dual elastic net SSVM trained using the sequential alternating proximal algorithm scales well and results in highly sparse model parameters while achieving a comparable generalization performance. Hence the proposed sequential alternating proximal algorithm is a competitive method to achieve sparse model parameters and a comparable generalization performance when elastic net regularized Structural SVMs are used on very large datasets.
Resumo:
In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search. Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.
Resumo:
The design and development of a Bottom Pressure Recorder for a Tsunami Early Warning System is described here. The special requirements that it should satisfy for the specific application of deployment at ocean bed and pressure monitoring of the water column above are dealt with. A high-resolution data digitization and low circuit power consumption are typical ones. The implementation details of the data sensing and acquisition part to meet these are also brought out. The data processing part typically encompasses a Tsunami detection algorithm that should detect an event of significance in the background of a variety of periodic and aperiodic noise signals. Such an algorithm and its simulation are presented. Further, the results of sea trials carried out on the system off the Chennai coast are presented. The high quality and fidelity of the data prove that the system design is robust despite its low cost and with suitable augmentations, is ready for a full-fledged deployment at ocean bed. (C) 2013 Elsevier Ltd. All rights reserved.
Resumo:
We propose a simulation-based algorithm for computing the optimal pricing policy for a product under uncertain demand dynamics. We consider a parameterized stochastic differential equation (SDE) model for the uncertain demand dynamics of the product over the planning horizon. In particular, we consider a dynamic model that is an extension of the Bass model. The performance of our algorithm is compared to that of a myopic pricing policy and is shown to give better results. Two significant advantages with our algorithm are as follows: (a) it does not require information on the system model parameters if the SDE system state is known via either a simulation device or real data, and (b) as it works efficiently even for high-dimensional parameters, it uses the efficient smoothed functional gradient estimator.
Resumo:
We consider the problem of optimizing the workforce of a service system. Adapting the staffing levels in such systems is non-trivial due to large variations in workload and the large number of system parameters do not allow for a brute force search. Further, because these parameters change on a weekly basis, the optimization should not take longer than a few hours. Our aim is to find the optimum staffing levels from a discrete high-dimensional parameter set, that minimizes the long run average of the single-stage cost function, while adhering to the constraints relating to queue stability and service-level agreement (SLA) compliance. The single-stage cost function balances the conflicting objectives of utilizing workers better and attaining the target SLAs. We formulate this problem as a constrained parameterized Markov cost process parameterized by the (discrete) staffing levels. We propose novel simultaneous perturbation stochastic approximation (SPSA)-based algorithms for solving the above problem. The algorithms include both first-order as well as second-order methods and incorporate SPSA-based gradient/Hessian estimates for primal descent, while performing dual ascent for the Lagrange multipliers. Both algorithms are online and update the staffing levels in an incremental fashion. Further, they involve a certain generalized smooth projection operator, which is essential to project the continuous-valued worker parameter tuned by our algorithms onto the discrete set. The smoothness is necessary to ensure that the underlying transition dynamics of the constrained Markov cost process is itself smooth (as a function of the continuous-valued parameter): a critical requirement to prove the convergence of both algorithms. We validate our algorithms via performance simulations based on data from five real-life service systems. For the sake of comparison, we also implement a scatter search based algorithm using state-of-the-art optimization tool-kit OptQuest. From the experiments, we observe that both our algorithms converge empirically and consistently outperform OptQuest in most of the settings considered. This finding coupled with the computational advantage of our algorithms make them amenable for adaptive labor staffing in real-life service systems.
Resumo:
NMR-based approach to metabolomics typically involves the collection of two-dimensional (2D) heteronuclear correlation spectra for identification and assignment of metabolites. In case of spectral overlap, a 3D spectrum becomes necessary, which is hampered by slow data acquisition for achieving sufficient resolution. We describe here a method to simultaneously acquire three spectra (one 3D and two 2D) in a single data set, which is based on a combination of different fast data acquisition techniques such as G-matrix Fourier transform (GFT) NMR spectroscopy, parallel data acquisition and non-uniform sampling. The following spectra are acquired simultaneously: (1) C-13 multiplicity edited GFT (3,2)D HSQC-TOCSY, (2) 2D H-1- H-1] TOCSY and (3) 2D C-13- H-1] HETCOR. The spectra are obtained at high resolution and provide high-dimensional spectral information for resolving ambiguities. While the GFT spectrum has been shown previously to provide good resolution, the editing of spin systems based on their CH multiplicities further resolves the ambiguities for resonance assignments. The experiment is demonstrated on a mixture of 21 metabolites commonly observed in metabolomics. The spectra were acquired at natural abundance of C-13. This is the first application of a combination of three fast NMR methods for small molecules and opens up new avenues for high-throughput approaches for NMR-based metabolomics.
Resumo:
This article presents a new method for acquiring three-dimensional (3-D) volumes of ultrasonic axial strain data. The method uses a mechanically-swept probe to sweep out a single volume while applying a continuously varying axial compression. Acquisition of a volume takes 15-20 s. A strain volume is then calculated by comparing frame pairs throughout the sequence. The method uses strain quality estimates to automatically pick out high quality frame pairs, and so does not require careful control of the axial compression. In a series of in vitro and in vivo experiments, we quantify the image quality of the new method and also assess its ease of use. Results are compared with those for the current best alternative, which calculates strain between two complete volumes. The volume pair approach can produce high quality data, but skillful scanning is required to acquire two volumes with appropriate relative strain. In the new method, the automatic quality-weighted selection of image pairs overcomes this difficulty and the method produces superior quality images with a relatively relaxed scanning technique.
Resumo:
Our recent progress in numerical studies of bluff body flow structures and a new method for the numerical analysis of near wake flow field for high Reynolds number flow are introduced. The paper consists of three parts. In part one, the evolution of wake vortex structure and variation of forces on a flat plate in harmonic oscillatory flows and in in-line steady-harmonic combined flows are presented by an improved discrete vortex method, as the Keulegan-Carpenter number (KC) varies from 2 to 40 and ratios of U-m to U-0 are of O(10(-1)), O(10) and O(10), respectively. In part 2, a domain decomposition hybrid method, combining the finite-difference and vortex methods for numerical simulation of unsteady viscous separated flow around a bluff body, is introduced. By the new method, some high resolution numerical visualization on near wake evolution behind a circular cylinder at Re = 10(2), 10(3) and 3 x 10(3) are shown. In part 3, the mechanism and the dynamic process for the three-dimensional evolution of the Karman vortex and vortex filaments in braid regions as well as the early features of turbulent structure in the wake behind a circular cylinder are presented numerically by the vortex dynamics method.
Resumo:
The Olympic Coast National Marine Sanctuary (OCNMS) continues to invest significant resources into seafloor mapping activities along Washington’s outer coast (Intelmann and Cochrane 2006; Intelmann et al. 2006; Intelmann 2006). Results from these annual mapping efforts offer a snapshot of current ground conditions, help to guide research and management activities, and provide a baseline for assessing the impacts of various threats to important habitat. During the months of August 2004 and May and July 2005, we used side scan sonar to image several regions of the sea floor in the northern OCNMS, and the data were mosaicked at 1-meter pixel resolution. Video from a towed camera sled, bathymetry data, sedimentary samples and side scan sonar mapping were integrated to describe geological and biological aspects of habitat. Polygon features were created and attributed with a hierarchical deep-water marine benthic classification scheme (Greene et al. 1999). For three small areas that were mapped with both side scan sonar and multibeam echosounder, we made a comparison of output from the classified images indicating little difference in results between the two methods. With these considerations, backscatter derived from multibeam bathymetry is currently a costefficient and safe method for seabed imaging in the shallow (<30 meters) rocky waters of OCNMS. The image quality is sufficient for classification purposes, the associated depths provide further descriptive value and risks to gear are minimized. In shallow waters (<30 meters) which do not have a high incidence of dangerous rock pinnacles, a towed multi-beam side scan sonar could provide a better option for obtaining seafloor imagery due to the high rate of acquisition speed and high image quality, however the high probability of losing or damaging such a costly system when deployed as a towed configuration in the extremely rugose nearshore zones within OCNMS is a financially risky proposition. The development of newer technologies such as intereferometric multibeam systems and bathymetric side scan systems could also provide great potential for mapping these nearshore rocky areas as they allow for high speed data acquisition, produce precisely geo-referenced side scan imagery to bathymetry, and do not experience the angular depth dependency associated with multibeam echosounders allowing larger range scales to be used in shallower water. As such, further investigation of these systems is needed to assess their efficiency and utility in these environments compared to traditional side scan sonar and multibeam bathymetry. (PDF contains 43 pages.)
Resumo:
The learning of probability distributions from data is a ubiquitous problem in the fields of Statistics and Artificial Intelligence. During the last decades several learning algorithms have been proposed to learn probability distributions based on decomposable models due to their advantageous theoretical properties. Some of these algorithms can be used to search for a maximum likelihood decomposable model with a given maximum clique size, k, which controls the complexity of the model. Unfortunately, the problem of learning a maximum likelihood decomposable model given a maximum clique size is NP-hard for k > 2. In this work, we propose a family of algorithms which approximates this problem with a computational complexity of O(k · n^2 log n) in the worst case, where n is the number of implied random variables. The structures of the decomposable models that solve the maximum likelihood problem are called maximal k-order decomposable graphs. Our proposals, called fractal trees, construct a sequence of maximal i-order decomposable graphs, for i = 2, ..., k, in k − 1 steps. At each step, the algorithms follow a divide-and-conquer strategy based on the particular features of this type of structures. Additionally, we propose a prune-and-graft procedure which transforms a maximal k-order decomposable graph into another one, increasing its likelihood. We have implemented two particular fractal tree algorithms called parallel fractal tree and sequential fractal tree. These algorithms can be considered a natural extension of Chow and Liu’s algorithm, from k = 2 to arbitrary values of k. Both algorithms have been compared against other efficient approaches in artificial and real domains, and they have shown a competitive behavior to deal with the maximum likelihood problem. Due to their low computational complexity they are especially recommended to deal with high dimensional domains.
Resumo:
Optical Coherence Tomography(OCT) is a popular, rapidly growing imaging technique with an increasing number of bio-medical applications due to its noninvasive nature. However, there are three major challenges in understanding and improving an OCT system: (1) Obtaining an OCT image is not easy. It either takes a real medical experiment or requires days of computer simulation. Without much data, it is difficult to study the physical processes underlying OCT imaging of different objects simply because there aren't many imaged objects. (2) Interpretation of an OCT image is also hard. This challenge is more profound than it appears. For instance, it would require a trained expert to tell from an OCT image of human skin whether there is a lesion or not. This is expensive in its own right, but even the expert cannot be sure about the exact size of the lesion or the width of the various skin layers. The take-away message is that analyzing an OCT image even from a high level would usually require a trained expert, and pixel-level interpretation is simply unrealistic. The reason is simple: we have OCT images but not their underlying ground-truth structure, so there is nothing to learn from. (3) The imaging depth of OCT is very limited (millimeter or sub-millimeter on human tissues). While OCT utilizes infrared light for illumination to stay noninvasive, the downside of this is that photons at such long wavelengths can only penetrate a limited depth into the tissue before getting back-scattered. To image a particular region of a tissue, photons first need to reach that region. As a result, OCT signals from deeper regions of the tissue are both weak (since few photons reached there) and distorted (due to multiple scatterings of the contributing photons). This fact alone makes OCT images very hard to interpret.
This thesis addresses the above challenges by successfully developing an advanced Monte Carlo simulation platform which is 10000 times faster than the state-of-the-art simulator in the literature, bringing down the simulation time from 360 hours to a single minute. This powerful simulation tool not only enables us to efficiently generate as many OCT images of objects with arbitrary structure and shape as we want on a common desktop computer, but it also provides us the underlying ground-truth of the simulated images at the same time because we dictate them at the beginning of the simulation. This is one of the key contributions of this thesis. What allows us to build such a powerful simulation tool includes a thorough understanding of the signal formation process, clever implementation of the importance sampling/photon splitting procedure, efficient use of a voxel-based mesh system in determining photon-mesh interception, and a parallel computation of different A-scans that consist a full OCT image, among other programming and mathematical tricks, which will be explained in detail later in the thesis.
Next we aim at the inverse problem: given an OCT image, predict/reconstruct its ground-truth structure on a pixel level. By solving this problem we would be able to interpret an OCT image completely and precisely without the help from a trained expert. It turns out that we can do much better. For simple structures we are able to reconstruct the ground-truth of an OCT image more than 98% correctly, and for more complicated structures (e.g., a multi-layered brain structure) we are looking at 93%. We achieved this through extensive uses of Machine Learning. The success of the Monte Carlo simulation already puts us in a great position by providing us with a great deal of data (effectively unlimited), in the form of (image, truth) pairs. Through a transformation of the high-dimensional response variable, we convert the learning task into a multi-output multi-class classification problem and a multi-output regression problem. We then build a hierarchy architecture of machine learning models (committee of experts) and train different parts of the architecture with specifically designed data sets. In prediction, an unseen OCT image first goes through a classification model to determine its structure (e.g., the number and the types of layers present in the image); then the image is handed to a regression model that is trained specifically for that particular structure to predict the length of the different layers and by doing so reconstruct the ground-truth of the image. We also demonstrate that ideas from Deep Learning can be useful to further improve the performance.
It is worth pointing out that solving the inverse problem automatically improves the imaging depth, since previously the lower half of an OCT image (i.e., greater depth) can be hardly seen but now becomes fully resolved. Interestingly, although OCT signals consisting the lower half of the image are weak, messy, and uninterpretable to human eyes, they still carry enough information which when fed into a well-trained machine learning model spits out precisely the true structure of the object being imaged. This is just another case where Artificial Intelligence (AI) outperforms human. To the best knowledge of the author, this thesis is not only a success but also the first attempt to reconstruct an OCT image at a pixel level. To even give a try on this kind of task, it would require fully annotated OCT images and a lot of them (hundreds or even thousands). This is clearly impossible without a powerful simulation tool like the one developed in this thesis.
Resumo:
This paper describes results obtained using the modified Kanerva model to perform word recognition in continuous speech after being trained on the multi-speaker Alvey 'Hotel' speech corpus. Theoretical discoveries have recently enabled us to increase the speed of execution of part of the model by two orders of magnitude over that previously reported by Prager & Fallside. The memory required for the operation of the model has been similarly reduced. The recognition accuracy reaches 95% without syntactic constraints when tested on different data from seven trained speakers. Real time simulation of a model with 9,734 active units is now possible in both training and recognition modes using the Alvey PARSIFAL transputer array. The modified Kanerva model is a static network consisting of a fixed nonlinear mapping (location matching) followed by a single layer of conventional adaptive links. A section of preprocessed speech is transformed by the non-linear mapping to a high dimensional representation. From this intermediate representation a simple linear mapping is able to perform complex pattern discrimination to form the output, indicating the nature of the speech features present in the input window.