896 resultados para High-dimensional data visualization
Resumo:
BACKGROUND: For free-breathing cardiovascular magnetic resonance (CMR), the self-navigation technique recently emerged, which is expected to deliver high-quality data with a high success rate. The purpose of this study was to test the hypothesis that self-navigated 3D-CMR enables the reliable assessment of cardiovascular anatomy in patients with congenital heart disease (CHD) and to define factors that affect image quality. METHODS: CHD patients ≥2 years-old and referred for CMR for initial assessment or for a follow-up study were included to undergo a free-breathing self-navigated 3D CMR at 1.5T. Performance criteria were: correct description of cardiac segmental anatomy, overall image quality, coronary artery visibility, and reproducibility of great vessels diameter measurements. Factors associated with insufficient image quality were identified using multivariate logistic regression. RESULTS: Self-navigated CMR was performed in 105 patients (55% male, 23 ± 12y). Correct segmental description was achieved in 93% and 96% for observer 1 and 2, respectively. Diagnostic quality was obtained in 90% of examinations, and it increased to 94% if contrast-enhanced. Left anterior descending, circumflex, and right coronary arteries were visualized in 93%, 87% and 98%, respectively. Younger age, higher heart rate, lower ejection fraction, and lack of contrast medium were independently associated with reduced image quality. However, a similar rate of diagnostic image quality was obtained in children and adults. CONCLUSION: In patients with CHD, self-navigated free-breathing CMR provides high-resolution 3D visualization of the heart and great vessels with excellent robustness.
Resumo:
Self-organizing maps (Kohonen 1997) is a type of artificial neural network developed to explore patterns in high-dimensional multivariate data. The conventional version of the algorithm involves the use of Euclidean metric in the process of adaptation of the model vectors, thus rendering in theory a whole methodology incompatible with non-Euclidean geometries. In this contribution we explore the two main aspects of the problem: 1. Whether the conventional approach using Euclidean metric can shed valid results with compositional data. 2. If a modification of the conventional approach replacing vectorial sum and scalar multiplication by the canonical operators in the simplex (i.e. perturbation and powering) can converge to an adequate solution. Preliminary tests showed that both methodologies can be used on compositional data. However, the modified version of the algorithm performs poorer than the conventional version, in particular, when the data is pathological. Moreover, the conventional ap- proach converges faster to a solution, when data is \well-behaved". Key words: Self Organizing Map; Artificial Neural networks; Compositional data
Resumo:
The influence matrix is used in ordinary least-squares applications for monitoring statistical multiple-regression analyses. Concepts related to the influence matrix provide diagnostics on the influence of individual data on the analysis - the analysis change that would occur by leaving one observation out, and the effective information content (degrees of freedom for signal) in any sub-set of the analysed data. In this paper, the corresponding concepts have been derived in the context of linear statistical data assimilation in numerical weather prediction. An approximate method to compute the diagonal elements of the influence matrix (the self-sensitivities) has been developed for a large-dimension variational data assimilation system (the four-dimensional variational system of the European Centre for Medium-Range Weather Forecasts). Results show that, in the boreal spring 2003 operational system, 15% of the global influence is due to the assimilated observations in any one analysis, and the complementary 85% is the influence of the prior (background) information, a short-range forecast containing information from earlier assimilated observations. About 25% of the observational information is currently provided by surface-based observing systems, and 75% by satellite systems. Low-influence data points usually occur in data-rich areas, while high-influence data points are in data-sparse areas or in dynamically active regions. Background-error correlations also play an important role: high correlation diminishes the observation influence and amplifies the importance of the surrounding real and pseudo observations (prior information in observation space). Incorrect specifications of background and observation-error covariance matrices can be identified, interpreted and better understood by the use of influence-matrix diagnostics for the variety of observation types and observed variables used in the data assimilation system. Copyright © 2004 Royal Meteorological Society
Resumo:
The identification and visualization of clusters formed by motor unit action potentials (MUAPs) is an essential step in investigations seeking to explain the control of the neuromuscular system. This work introduces the generative topographic mapping (GTM), a novel machine learning tool, for clustering of MUAPs, and also it extends the GTM technique to provide a way of visualizing MUAPs. The performance of GTM was compared to that of three other clustering methods: the self-organizing map (SOM), a Gaussian mixture model (GMM), and the neural-gas network (NGN). The results, based on the study of experimental MUAPs, showed that the rate of success of both GTM and SOM outperformed that of GMM and NGN, and also that GTM may in practice be used as a principled alternative to the SOM in the study of MUAPs. A visualization tool, which we called GTM grid, was devised for visualization of MUAPs lying in a high-dimensional space. The visualization provided by the GTM grid was compared to that obtained from principal component analysis (PCA). (c) 2005 Elsevier Ireland Ltd. All rights reserved.
Resumo:
A common problem in many data based modelling algorithms such as associative memory networks is the problem of the curse of dimensionality. In this paper, a new two-stage neurofuzzy system design and construction algorithm (NeuDeC) for nonlinear dynamical processes is introduced to effectively tackle this problem. A new simple preprocessing method is initially derived and applied to reduce the rule base, followed by a fine model detection process based on the reduced rule set by using forward orthogonal least squares model structure detection. In both stages, new A-optimality experimental design-based criteria we used. In the preprocessing stage, a lower bound of the A-optimality design criterion is derived and applied as a subset selection metric, but in the later stage, the A-optimality design criterion is incorporated into a new composite cost function that minimises model prediction error as well as penalises the model parameter variance. The utilisation of NeuDeC leads to unbiased model parameters with low parameter variance and the additional benefit of a parsimonious model structure. Numerical examples are included to demonstrate the effectiveness of this new modelling approach for high dimensional inputs.
Resumo:
A potential problem with Ensemble Kalman Filter is the implicit Gaussian assumption at analysis times. Here we explore the performance of a recently proposed fully nonlinear particle filter on a high-dimensional but simplified ocean model, in which the Gaussian assumption is not made. The model simulates the evolution of the vorticity field in time, described by the barotropic vorticity equation, in a highly nonlinear flow regime. While common knowledge is that particle filters are inefficient and need large numbers of model runs to avoid degeneracy, the newly developed particle filter needs only of the order of 10-100 particles on large scale problems. The crucial new ingredient is that the proposal density cannot only be used to ensure all particles end up in high-probability regions of state space as defined by the observations, but also to ensure that most of the particles have similar weights. Using identical twin experiments we found that the ensemble mean follows the truth reliably, and the difference from the truth is captured by the ensemble spread. A rank histogram is used to show that the truth run is indistinguishable from any of the particles, showing statistical consistency of the method.
Resumo:
Learning low dimensional manifold from highly nonlinear data of high dimensionality has become increasingly important for discovering intrinsic representation that can be utilized for data visualization and preprocessing. The autoencoder is a powerful dimensionality reduction technique based on minimizing reconstruction error, and it has regained popularity because it has been efficiently used for greedy pretraining of deep neural networks. Compared to Neural Network (NN), the superiority of Gaussian Process (GP) has been shown in model inference, optimization and performance. GP has been successfully applied in nonlinear Dimensionality Reduction (DR) algorithms, such as Gaussian Process Latent Variable Model (GPLVM). In this paper we propose the Gaussian Processes Autoencoder Model (GPAM) for dimensionality reduction by extending the classic NN based autoencoder to GP based autoencoder. More interestingly, the novel model can also be viewed as back constrained GPLVM (BC-GPLVM) where the back constraint smooth function is represented by a GP. Experiments verify the performance of the newly proposed model.
Resumo:
The integration of nanostructured films containing biomolecules and silicon-based technologies is a promising direction for reaching miniaturized biosensors that exhibit high sensitivity and selectivity. A challenge, however, is to avoid cross talk among sensing units in an array with multiple sensors located on a small area. In this letter, we describe an array of 16 sensing units, of a light-addressable potentiometric sensor (LAPS), which was made with layer-by-Layer (LbL) films of a poly(amidomine) dendrimer (PAMAM) and single-walled carbon nanotubes (SWNTs), coated with a layer of the enzyme penicillinase. A visual inspection of the data from constant-current measurements with liquid samples containing distinct concentrations of penicillin, glucose, or a buffer indicated a possible cross talk between units that contained penicillinase and those that did not. With the use of multidimensional data projection techniques, normally employed in information Visualization methods, we managed to distinguish the results from the modified LAPS, even in cases where the units were adjacent to each other. Furthermore, the plots generated with the interactive document map (IDMAP) projection technique enabled the distinction of the different concentrations of penicillin, from 5 mmol L(-1) down to 0.5 mmol L(-1). Data visualization also confirmed the enhanced performance of the sensing units containing carbon nanotubes, consistent with the analysis of results for LAPS sensors. The use of visual analytics, as with projection methods, may be essential to handle a large amount of data generated in multiple sensor arrays to achieve high performance in miniaturized systems.
Resumo:
Objective: To investigate whether advanced visualizations of spirography-based objective measures are useful in differentiating drug-related motor dysfunctions between Off and dyskinesia in Parkinson’s disease (PD). Background: During the course of a 3 year longitudinal clinical study, in total 65 patients (43 males and 22 females with mean age of 65) with advanced PD and 10 healthy elderly (HE) subjects (5 males and 5 females with mean age of 61) were assessed. Both patients and HE subjects performed repeated and time-stamped assessments of their objective health indicators using a test battery implemented on a telemetry touch screen handheld computer, in their home environment settings. Among other tasks, the subjects were asked to trace a pre-drawn Archimedes spiral using the dominant hand and repeat the test three times per test occasion. Methods: A web-based framework was developed to enable a visual exploration of relevant spirography-based kinematic features by clinicians so they can in turn evaluate the motor states of the patients i.e. Off and dyskinesia. The system uses different visualization techniques such as time series plots, animation, and interaction and organizes them into different views to aid clinicians in measuring spatial and time-dependent irregularities that could be associated with the motor states. Along with the animation view, the system displays two time series plots for representing drawing speed (blue line) and displacement from ideal trajectory (orange line). The views are coordinated and linked i.e. user interactions in one of the views will be reflected in other views. For instance, when the user points in one of the pixels in the spiral view, the circle size of the underlying pixel increases and a vertical line appears in the time series views to depict the corresponding position. In addition, in order to enable clinicians to observe erratic movements more clearly and thus improve the detection of irregularities, the system displays a color-map which gives an idea of the longevity of the spirography task. Figure 2 shows single randomly selected spirals drawn by a: A) patient who experienced dyskinesias, B) HE subject, and C) patient in Off state. Results: According to a domain expert (DN), the spirals drawn in the Off and dyskinesia motor states are characterized by different spatial and time features. For instance, the spiral shown in Fig. 2A was drawn by a patient who showed symptoms of dyskinesia; the drawing speed was relatively high (cf. blue-colored time series plot and the short timestamp scale in the x axis) and the spatial displacement was high (cf. orange-colored time series plot) associated with smooth deviations as a result of uncontrollable movements. The patient also exhibited low amount of hesitation which could be reflected both in the animation of the spiral as well as time series plots. In contrast, the patient who was in the Off state exhibited different kinematic features, as shown in Fig. 2C. In the case of spirals drawn by a HE subject, there was a great precision during the drawing process as well as unchanging levels of time-dependent features over the test trial, as seen in Fig. 2B. Conclusions: Visualizing spirography-based objective measures enables identification of trends and patterns of drug-related motor dysfunctions at the patient’s individual level. Dynamic access of visualized motor tests may be useful during the evaluation of drug-related complications such as under- and over-medications, providing decision support to clinicians during evaluation of treatment effects as well as improve the quality of life of patients and their caregivers. In future, we plan to evaluate the proposed approach by assessing within- and between-clinician variability in ratings in order to determine its actual usefulness and then use these ratings as target outcomes in supervised machine learning, similarly as it was previously done in the study performed by Memedi et al. (2013).
Resumo:
This paper develops a framework to test whether discrete-valued irregularly-spaced financial transactions data follow a subordinated Markov process. For that purpose, we consider a specific optional sampling in which a continuous-time Markov process is observed only when it crosses some discrete level. This framework is convenient for it accommodates not only the irregular spacing of transactions data, but also price discreteness. Further, it turns out that, under such an observation rule, the current price duration is independent of previous price durations given the current price realization. A simple nonparametric test then follows by examining whether this conditional independence property holds. Finally, we investigate whether or not bid-ask spreads follow Markov processes using transactions data from the New York Stock Exchange. The motivation lies on the fact that asymmetric information models of market microstructures predict that the Markov property does not hold for the bid-ask spread. The results are mixed in the sense that the Markov assumption is rejected for three out of the five stocks we have analyzed.
Resumo:
Aiming at empirical findings, this work focuses on applying the HEAVY model for daily volatility with financial data from the Brazilian market. Quite similar to GARCH, this model seeks to harness high frequency data in order to achieve its objectives. Four variations of it were then implemented and their fit compared to GARCH equivalents, using metrics present in the literature. Results suggest that, in such a market, HEAVY does seem to specify daily volatility better, but not necessarily produces better predictions for it, what is, normally, the ultimate goal. The dataset used in this work consists of intraday trades of U.S. Dollar and Ibovespa future contracts from BM&FBovespa.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The Assimilation in the Unstable Subspace (AUS) was introduced by Trevisan and Uboldi in 2004, and developed by Trevisan, Uboldi and Carrassi, to minimize the analysis and forecast errors by exploiting the flow-dependent instabilities of the forecast-analysis cycle system, which may be thought of as a system forced by observations. In the AUS scheme the assimilation is obtained by confining the analysis increment in the unstable subspace of the forecast-analysis cycle system so that it will have the same structure of the dominant instabilities of the system. The unstable subspace is estimated by Breeding on the Data Assimilation System (BDAS). AUS- BDAS has already been tested in realistic models and observational configurations, including a Quasi-Geostrophicmodel and a high dimensional, primitive equation ocean model; the experiments include both fixed and“adaptive”observations. In these contexts, the AUS-BDAS approach greatly reduces the analysis error, with reasonable computational costs for data assimilation with respect, for example, to a prohibitive full Extended Kalman Filter. This is a follow-up study in which we revisit the AUS-BDAS approach in the more basic, highly nonlinear Lorenz 1963 convective model. We run observation system simulation experiments in a perfect model setting, and with two types of model error as well: random and systematic. In the different configurations examined, and in a perfect model setting, AUS once again shows better efficiency than other advanced data assimilation schemes. In the present study, we develop an iterative scheme that leads to a significant improvement of the overall assimilation performance with respect also to standard AUS. In particular, it boosts the efficiency of regime’s changes tracking, with a low computational cost. Other data assimilation schemes need estimates of ad hoc parameters, which have to be tuned for the specific model at hand. In Numerical Weather Prediction models, tuning of parameters — and in particular an estimate of the model error covariance matrix — may turn out to be quite difficult. Our proposed approach, instead, may be easier to implement in operational models.
Resumo:
In many applications the observed data can be viewed as a censored high dimensional full data random variable X. By the curve of dimensionality it is typically not possible to construct estimators that are asymptotically efficient at every probability distribution in a semiparametric censored data model of such a high dimensional censored data structure. We provide a general method for construction of one-step estimators that are efficient at a chosen submodel of the full-data model, are still well behaved off this submodel and can be chosen to always improve on a given initial estimator. These one-step estimators rely on good estimators of the censoring mechanism and thus will require a parametric or semiparametric model for the censoring mechanism. We present a general theorem that provides a template for proving the desired asymptotic results. We illustrate the general one-step estimation methods by constructing locally efficient one-step estimators of marginal distributions and regression parameters with right-censored data, current status data and bivariate right-censored data, in all models allowing the presence of time-dependent covariates. The conditions of the asymptotics theorem are rigorously verified in one of the examples and the key condition of the general theorem is verified for all examples.
Resumo:
Use of microarray technology often leads to high-dimensional and low- sample size data settings. Over the past several years, a variety of novel approaches have been proposed for variable selection in this context. However, only a small number of these have been adapted for time-to-event data where censoring is present. Among standard variable selection methods shown both to have good predictive accuracy and to be computationally efficient is the elastic net penalization approach. In this paper, adaptation of the elastic net approach is presented for variable selection both under the Cox proportional hazards model and under an accelerated failure time (AFT) model. Assessment of the two methods is conducted through simulation studies and through analysis of microarray data obtained from a set of patients with diffuse large B-cell lymphoma where time to survival is of interest. The approaches are shown to match or exceed the predictive performance of a Cox-based and an AFT-based variable selection method. The methods are moreover shown to be much more computationally efficient than their respective Cox- and AFT- based counterparts.