60 resultados para High-dimensional data visualization
Resumo:
Principal component analysis (PCA) is well recognized in dimensionality reduction, and kernel PCA (KPCA) has also been proposed in statistical data analysis. However, KPCA fails to detect the nonlinear structure of data well when outliers exist. To reduce this problem, this paper presents a novel algorithm, named iterative robust KPCA (IRKPCA). IRKPCA works well in dealing with outliers, and can be carried out in an iterative manner, which makes it suitable to process incremental input data. As in the traditional robust PCA (RPCA), a binary field is employed for characterizing the outlier process, and the optimization problem is formulated as maximizing marginal distribution of a Gibbs distribution. In this paper, this optimization problem is solved by stochastic gradient descent techniques. In IRKPCA, the outlier process is in a high-dimensional feature space, and therefore kernel trick is used. IRKPCA can be regarded as a kernelized version of RPCA and a robust form of kernel Hebbian algorithm. Experimental results on synthetic data demonstrate the effectiveness of IRKPCA. © 2010 Taylor & Francis.
Resumo:
A novel interrogation technique for fully distributed linearly chirped fiber Bragg grating (LCFBG) strain sensors with simultaneous high temporal and spatial resolution based on optical time-stretch frequency-domain reflectometry (OTS-FDR) is proposed and experimentally demonstrated. LCFBGs is a promising candidate for fully distributed sensors thanks to its longer grating length and broader reflection bandwidth compared to normal uniform FBGs. In the proposed system, two identical LCFBGs are employed in a Michelson interferometer setup with one grating serving as the reference grating whereas the other serving as the sensing element. Broadband spectral interferogram is formed and the strain information is encoded into the wavelength-dependent free spectral range (FSR). Ultrafast interrogation is achieved based on dispersion-induced time stretch such that the target spectral interferogram is mapped to a temporal interference waveform that can be captured in real-Time using a single-pixel photodector. The distributed strain along the sensing grating can be reconstructed from the instantaneous RF frequency of the captured waveform. High-spatial resolution is also obtained due to high-speed data acquisition. In a proof-of-concept experiment, ultrafast real-Time interrogation of fully-distributed grating sensors with various strain distributions is experimentally demonstrated. An ultrarapid measurement speed of 50 MHz with a high spatial resolution of 31.5 μm over a gauge length of 25 mm and a strain resolution of 9.1 μϵ have been achieved.
Resumo:
Using techniques from Statistical Physics, the annealed VC entropy for hyperplanes in high dimensional spaces is calculated as a function of the margin for a spherical Gaussian distribution of inputs.
Resumo:
In data visualization, characterizing local geometric properties of non-linear projection manifolds provides the user with valuable additional information that can influence further steps in the data analysis. We take advantage of the smooth character of GTM projection manifold and analytically calculate its local directional curvatures. Curvature plots are useful for detecting regions where geometry is distorted, for changing the amount of regularization in non-linear projection manifolds, and for choosing regions of interest when constructing detailed lower-level visualization plots.
Resumo:
We propose a generative topographic mapping (GTM) based data visualization with simultaneous feature selection (GTM-FS) approach which not only provides a better visualization by modeling irrelevant features ("noise") using a separate shared distribution but also gives a saliency value for each feature which helps the user to assess their significance. This technical report presents a varient of the Expectation-Maximization (EM) algorithm for GTM-FS.
Resumo:
Are the learning procedures of genetic algorithms (GAs) able to generate optimal architectures for artificial neural networks (ANNs) in high frequency data? In this experimental study,GAs are used to identify the best architecture for ANNs. Additional learning is undertaken by the ANNs to forecast daily excess stock returns. No ANN architectures were able to outperform a random walk,despite the finding of non-linearity in the excess returns. This failure is attributed to the absence of suitable ANN structures and further implies that researchers need to be cautious when making inferences from ANN results that use high frequency data.
Resumo:
Purpose – The purpose of this paper is to investigate the impact of foreign exchange and interest rate changes on US banks’ stock returns. Design/methodology/approach – The approach employs an EGARCH model to account for the ARCH effects in daily returns. Most prior studies have used standard OLS estimation methods with the result that the presence of ARCH effects would have affected estimation efficiency. For comparative purposes, the standard OLS estimation method is also used to measure sensitivity. Findings – The findings are as follows: under the conditional t-distributional assumption, the EGARCH model generated a much better fit to the data although the goodness-of-fit of the model is not entirely satisfactory; the market index return accounts for most of the variation in stock returns at both the individual bank and portfolio levels; and the degree of sensitivity of the stock returns to interest rate and FX rate changes is not very pronounced despite the use of high frequency data. Earlier results had indicated that daily data provided greater evidence of exposure sensitivity. Practical implications – Assuming that banks do not hedge perfectly, these findings have important financial implications as they suggest that the hedging policies of the banks are not reflected in their stock prices. Alternatively, it is possible that different GARCH-type models might be more appropriate when modelling high frequency returns. Originality/value – The paper contributes to existing knowledge in the area by showing that ARCH effects do impact on measures of sensitivity.
Resumo:
This thesis examines options for high capacity all optical networks. Specifically optical time division multiplexed (OTDM) networks based on electro-optic modulators are investigated experimentally, whilst comparisons with alternative approaches are carried out. It is intended that the thesis will form the basis of comparison between optical time division multiplexed networks and the more mature approach of wavelength division multiplexed networks. Following an introduction to optical networking concepts, the required component technologies are discussed. In particular various optical pulse sources are described with the demanding restrictions of optical multiplexing in mind. This is followed by a discussion of the construction of multiplexers and demultiplexers, including favoured techniques for high speed clock recovery. Theoretical treatments of the performance of Mach Zehnder and electroabsorption modulators support the design criteria that are established for the construction of simple optical time division multiplexed systems. Having established appropriate end terminals for an optical network, the thesis examines transmission issues associated with high speed RZ data signals. Propagation of RZ signals over both installed (standard fibre) and newly commissioned fibre routes are considered in turn. In the case of standard fibre systems, the use of dispersion compensation is summarised, and the application of mid span spectral inversion experimentally investigated. For green field sites, soliton like propagation of high speed data signals is demonstrated. In this case the particular restrictions of high speed soliton systems are discussed and experimentally investigated, namely the increasing impact of timing jitter and the downward pressure on repeater spacings due to the constraint of the average soliton model. These issues are each addressed through investigations of active soliton control for OTDM systems and through investigations of novel fibre types respectively. Finally the particularly remarkable networking potential of optical time division multiplexed systems is established, and infinite node cascadability using soliton control is demonstrated. A final comparison of the various technologies for optical multiplexing is presented in the conclusions, where the relative merits of the technologies for optical networking emerges as the key differentiator between technologies.
Resumo:
A study of vapour-liquid equilibria is presented together with current developments. The theory of vapour-liquid equilibria is discussed. Both experimental and prediction methods for obtaining vapour-liquid equilibria data are critically reviewed. The development of a new family of equilibrium stills to measure experimental VLE data from sub-atmosphere to 35 bar pressure is described. Existing experimental techniques are reviewed, to highlight the needs for these new apparati and their major attributes. Details are provided of how apparatus may be further improved and how computer control may be implemented. To provide a rigorous test of the apparatus the stills have been commissioned using acetic acid-water mixture at one atmosphere pressure. A Barker-type consistency test computer program, which allows for association in both phases has been applied to the data generated and clearly shows that the stills produce data of a very high quality. Two high quality data sets, for the mixture acetone-chloroform, have been generated at one atmosphere and 64.3oC. These data are used to investigate the ability of the new novel technique, based on molecular parameters, to predict VLE data for highly polar mixtures. Eight, vapour-liquid equilibrium data sets have been produced for the cyclohexane-ethanol mixture at one atmosphere, 2, 4, 6, 8 and 11 bar, 90.9oC and 132.8oC. These data sets have been tested for thermodynamic consistency using a Barker-type fitting package and shown to be of high quality. The data have been used to investigate the dependence of UNIQUAC parameters with temperature. The data have in addition been used to compare directly the performance of the predictive methods - Original UNIFAC, a modified version of UNIFAC, and the new novel technique, based on molecular parameters developed from generalised London's potential (GLP) theory.
Resumo:
Background: The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods: We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results: The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion: The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers. However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses. We conclude that many of the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.
Resumo:
This study examines the selectivity and timing performance of 218 UK investment trusts over the period July 1981 to June 2009. We estimate the Treynor and Mazuy (1966) and Henriksson and Merton (1981) models augmented with the size, value, and momentum factors, either under the OLS method adjusted with the Newey-West procedure or under the GARCH(1,1)-in-mean method following the specification of Glosten et al. (1993; hereafter GJR-GARCH-M). We find that the OLS method provides little evidence in favour of the selectivity and timing ability, consistent with previous studies. Interestingly, the GJR-GARCH-M method reverses this result, showing some relatively strong evidence on favourable selectivity ability, particularly for international funds, as well as favourable timing ability, particularly for domestic funds. We conclude that the GJR-GARCH-M method performs better in evaluating fund performance compared with the OLS method and the non-parametric approach, as it essentially accounts for the time-varying characteristics of factor loadings and hence obtains more reliable results, in particular, when the high frequency data, such as the daily returns, are used in the analysis. Our results are robust to various in-sample and out-of-sample tests and have valuable implications for practitioners in making their asset allocation decisions across different fund styles. © 2012 Elsevier B.V.
Resumo:
Efficiency in the mutual fund (MF), is one of the issues that has attracted many investors in countries with advanced financial market for many years. Due to the need for frequent study of MF's efficiency in short-term periods, investors need a method that not only has high accuracy, but also high speed. Data envelopment analysis (DEA) is proven to be one of the most widely used methods in the measurement of the efficiency and productivity of decision making units (DMUs). DEA for a large dataset with many inputs/outputs would require huge computer resources in terms of memory and CPU time. This paper uses neural network back-ropagation DEA in measurement of mutual funds efficiency and shows the requirements, in the proposed method, for computer memory and CPU time are far less than that needed by conventional DEA methods and can therefore be a useful tool in measuring the efficiency of a large set of MFs. Copyright © 2014 Inderscience Enterprises Ltd.
Resumo:
The simulated classical dynamics of a small molecule exhibiting self-organizing behavior via a fast transition between two states is analyzed by calculation of the statistical complexity of the system. It is shown that the complexity of molecular descriptors such as atom coordinates and dihedral angles have different values before and after the transition. This provides a new tool to identify metastable states during molecular self-organization. The highly concerted collective motion of the molecule is revealed. Low-dimensional subspaces dynamics is found sensitive to the processes in the whole, high-dimensional phase space of the system. © 2004 Wiley Periodicals, Inc.
Resumo:
This paper studies the key aspects of an optical link which transmits a broadband microwave filter bank multicarrier (FBMC) signal. The study is presented in the context of creating an all-analogue real-time multigigabit orthogonal frequency division multiplexing electro-optical transceiver for short range and high-capacity data center networks. Passive microwave filters are used to perform the pulse shaping of the bit streams, allowing an orthogonal transmission without the necessity of digital signal processing (DSP). Accordingly, a cyclic prefix that would cause a reduction in the net data rate is not required. An experiment consisting of three orthogonally spaced 2.7 Gbaud quadrature phase shift keyed subchannels demonstrates that the spectral efficiency of traditional DSP-less subcarrier multiplexed links can be potentially doubled. A sensitivity of -29.5 dBm is achieved in a 1-km link.
Resumo:
We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.