75 resultados para Dunkl Kernel
Resumo:
Financial processes may possess long memory and their probability densities may display heavy tails. Many models have been developed to deal with this tail behaviour, which reflects the jumps in the sample paths. On the other hand, the presence of long memory, which contradicts the efficient market hypothesis, is still an issue for further debates. These difficulties present challenges with the problems of memory detection and modelling the co-presence of long memory and heavy tails. This PhD project aims to respond to these challenges. The first part aims to detect memory in a large number of financial time series on stock prices and exchange rates using their scaling properties. Since financial time series often exhibit stochastic trends, a common form of nonstationarity, strong trends in the data can lead to false detection of memory. We will take advantage of a technique known as multifractal detrended fluctuation analysis (MF-DFA) that can systematically eliminate trends of different orders. This method is based on the identification of scaling of the q-th-order moments and is a generalisation of the standard detrended fluctuation analysis (DFA) which uses only the second moment; that is, q = 2. We also consider the rescaled range R/S analysis and the periodogram method to detect memory in financial time series and compare their results with the MF-DFA. An interesting finding is that short memory is detected for stock prices of the American Stock Exchange (AMEX) and long memory is found present in the time series of two exchange rates, namely the French franc and the Deutsche mark. Electricity price series of the five states of Australia are also found to possess long memory. For these electricity price series, heavy tails are also pronounced in their probability densities. The second part of the thesis develops models to represent short-memory and longmemory financial processes as detected in Part I. These models take the form of continuous-time AR(∞) -type equations whose kernel is the Laplace transform of a finite Borel measure. By imposing appropriate conditions on this measure, short memory or long memory in the dynamics of the solution will result. A specific form of the models, which has a good MA(∞) -type representation, is presented for the short memory case. Parameter estimation of this type of models is performed via least squares, and the models are applied to the stock prices in the AMEX, which have been established in Part I to possess short memory. By selecting the kernel in the continuous-time AR(∞) -type equations to have the form of Riemann-Liouville fractional derivative, we obtain a fractional stochastic differential equation driven by Brownian motion. This type of equations is used to represent financial processes with long memory, whose dynamics is described by the fractional derivative in the equation. These models are estimated via quasi-likelihood, namely via a continuoustime version of the Gauss-Whittle method. The models are applied to the exchange rates and the electricity prices of Part I with the aim of confirming their possible long-range dependence established by MF-DFA. The third part of the thesis provides an application of the results established in Parts I and II to characterise and classify financial markets. We will pay attention to the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX), the NASDAQ Stock Exchange (NASDAQ) and the Toronto Stock Exchange (TSX). The parameters from MF-DFA and those of the short-memory AR(∞) -type models will be employed in this classification. We propose the Fisher discriminant algorithm to find a classifier in the two and three-dimensional spaces of data sets and then provide cross-validation to verify discriminant accuracies. This classification is useful for understanding and predicting the behaviour of different processes within the same market. The fourth part of the thesis investigates the heavy-tailed behaviour of financial processes which may also possess long memory. We consider fractional stochastic differential equations driven by stable noise to model financial processes such as electricity prices. The long memory of electricity prices is represented by a fractional derivative, while the stable noise input models their non-Gaussianity via the tails of their probability density. A method using the empirical densities and MF-DFA will be provided to estimate all the parameters of the model and simulate sample paths of the equation. The method is then applied to analyse daily spot prices for five states of Australia. Comparison with the results obtained from the R/S analysis, periodogram method and MF-DFA are provided. The results from fractional SDEs agree with those from MF-DFA, which are based on multifractal scaling, while those from the periodograms, which are based on the second order, seem to underestimate the long memory dynamics of the process. This highlights the need and usefulness of fractal methods in modelling non-Gaussian financial processes with long memory.
Resumo:
The refractive error of a human eye varies across the pupil and therefore may be treated as a random variable. The probability distribution of this random variable provides a means for assessing the main refractive properties of the eye without the necessity of traditional functional representation of wavefront aberrations. To demonstrate this approach, the statistical properties of refractive error maps are investigated. Closed-form expressions are derived for the probability density function (PDF) and its statistical moments for the general case of rotationally-symmetric aberrations. A closed-form expression for a PDF for a general non-rotationally symmetric wavefront aberration is difficult to derive. However, for specific cases, such as astigmatism, a closed-form expression of the PDF can be obtained. Further, interpretation of the distribution of the refractive error map as well as its moments is provided for a range of wavefront aberrations measured in real eyes. These are evaluated using a kernel density and sample moments estimators. It is concluded that the refractive error domain allows non-functional analysis of wavefront aberrations based on simple statistics in the form of its sample moments. Clinicians may find this approach to wavefront analysis easier to interpret due to the clinical familiarity and intuitive appeal of refractive error maps.
Resumo:
The recently proposed data-driven background dataset refinement technique provides a means of selecting an informative background for support vector machine (SVM)-based speaker verification systems. This paper investigates the characteristics of the impostor examples in such highly-informative background datasets. Data-driven dataset refinement individually evaluates the suitability of candidate impostor examples for the SVM background prior to selecting the highest-ranking examples as a refined background dataset. Further, the characteristics of the refined dataset were analysed to investigate the desired traits of an informative SVM background. The most informative examples of the refined dataset were found to consist of large amounts of active speech and distinctive language characteristics. The data-driven refinement technique was shown to filter the set of candidate impostor examples to produce a more disperse representation of the impostor population in the SVM kernel space, thereby reducing the number of redundant and less-informative examples in the background dataset. Furthermore, data-driven refinement was shown to provide performance gains when applied to the difficult task of refining a small candidate dataset that was mis-matched to the evaluation conditions.
Resumo:
We explore the empirical usefulness of conditional coskewness to explain the cross-section of equity returns. We find that coskewness is an important determinant of the returns to equity, and that the pricing relationship varies through time. In particular we find that when the conditional market skewness is positive investors are willing to sacrifice 7.87% annually per unit of gamma (a standardized measure of coskewness risk) while they only demand a premium of 1.80% when the market is negatively skewed. A similar picture emerges from the coskewness factor of Harvey and Siddique (Harvey, C., Siddique, A., 2000a. Conditional skewness in asset pricing models tests. Journal of Finance 65, 1263–1295.) (a portfolio that is long stocks with small coskewness with the market and short high coskewness stocks) which earns 5.00% annually when the market is positively skewed but only 2.81% when the market is negatively skewed. The conditional two-moment CAPM and a conditional Fama and French (Fama, E., French, K., 1992. The cross-section of expected returns. Journal of Finance 47,427465.) three-factor model are rejected, but a model which includes coskewness is not rejected by the data. The model also passes a structural break test which many existing asset pricing models fail.
Resumo:
This thesis deals with the problem of the instantaneous frequency (IF) estimation of sinusoidal signals. This topic plays significant role in signal processing and communications. Depending on the type of the signal, two major approaches are considered. For IF estimation of single-tone or digitally-modulated sinusoidal signals (like frequency shift keying signals) the approach of digital phase-locked loops (DPLLs) is considered, and this is Part-I of this thesis. For FM signals the approach of time-frequency analysis is considered, and this is Part-II of the thesis. In part-I we have utilized sinusoidal DPLLs with non-uniform sampling scheme as this type is widely used in communication systems. The digital tanlock loop (DTL) has introduced significant advantages over other existing DPLLs. In the last 10 years many efforts have been made to improve DTL performance. However, this loop and all of its modifications utilizes Hilbert transformer (HT) to produce a signal-independent 90-degree phase-shifted version of the input signal. Hilbert transformer can be realized approximately using a finite impulse response (FIR) digital filter. This realization introduces further complexity in the loop in addition to approximations and frequency limitations on the input signal. We have tried to avoid practical difficulties associated with the conventional tanlock scheme while keeping its advantages. A time-delay is utilized in the tanlock scheme of DTL to produce a signal-dependent phase shift. This gave rise to the time-delay digital tanlock loop (TDTL). Fixed point theorems are used to analyze the behavior of the new loop. As such TDTL combines the two major approaches in DPLLs: the non-linear approach of sinusoidal DPLL based on fixed point analysis, and the linear tanlock approach based on the arctan phase detection. TDTL preserves the main advantages of the DTL despite its reduced structure. An application of TDTL in FSK demodulation is also considered. This idea of replacing HT by a time-delay may be of interest in other signal processing systems. Hence we have analyzed and compared the behaviors of the HT and the time-delay in the presence of additive Gaussian noise. Based on the above analysis, the behavior of the first and second-order TDTLs has been analyzed in additive Gaussian noise. Since DPLLs need time for locking, they are normally not efficient in tracking the continuously changing frequencies of non-stationary signals, i.e. signals with time-varying spectra. Nonstationary signals are of importance in synthetic and real life applications. An example is the frequency-modulated (FM) signals widely used in communication systems. Part-II of this thesis is dedicated for the IF estimation of non-stationary signals. For such signals the classical spectral techniques break down, due to the time-varying nature of their spectra, and more advanced techniques should be utilized. For the purpose of instantaneous frequency estimation of non-stationary signals there are two major approaches: parametric and non-parametric. We chose the non-parametric approach which is based on time-frequency analysis. This approach is computationally less expensive and more effective in dealing with multicomponent signals, which are the main aim of this part of the thesis. A time-frequency distribution (TFD) of a signal is a two-dimensional transformation of the signal to the time-frequency domain. Multicomponent signals can be identified by multiple energy peaks in the time-frequency domain. Many real life and synthetic signals are of multicomponent nature and there is little in the literature concerning IF estimation of such signals. This is why we have concentrated on multicomponent signals in Part-H. An adaptive algorithm for IF estimation using the quadratic time-frequency distributions has been analyzed. A class of time-frequency distributions that are more suitable for this purpose has been proposed. The kernels of this class are time-only or one-dimensional, rather than the time-lag (two-dimensional) kernels. Hence this class has been named as the T -class. If the parameters of these TFDs are properly chosen, they are more efficient than the existing fixed-kernel TFDs in terms of resolution (energy concentration around the IF) and artifacts reduction. The T-distributions has been used in the IF adaptive algorithm and proved to be efficient in tracking rapidly changing frequencies. They also enables direct amplitude estimation for the components of a multicomponent
Resumo:
Biodiesel is a renewable fuel that has been shown to reduce many exhaust emissions, except oxides of nitrogen (NOx), in diesel engine cars. This is of special concern in inner urban areas that are subject to strict environmental regulations, such as EURO norms. Also, the use of pure biodiesel (B100) is inhibited because of its higher NOx emissions compared to petroleum diesel fuel. The aim of this present work is to investigate the effect of the iodine value and cetane number of various biodiesel fuels obtained from different feed stocks on the combustion and NOx emission characteristics of a direct injection (DI) diesel engine. The biodiesel fuels were chosen from various feed stocks such as coconut, palm kernel, mahua (Madhuca indica), pongamia pinnata, jatropha curcas, rice bran, and sesame seed oils. The experimental results show an approximately linear relationship between iodine value and NOx emissions. The biodiesels obtained from coconut and palm kernel showed lower NOx levels than diesel, but other biodiesels showed an increase in NOx. It was observed that the nature of the fatty acids of the biodiesel fuels had a significant influence on the NOx emissions. Also, the cetane numbers of the biodiesel fuels are affected both premixed combustion and the combustion rate, which further affected the amount of NOx formation. It was concluded that NOx emissions are influenced by many parameters of biodiesel fuels, particularly the iodine value and cetane number.
Resumo:
This paper presents an extended study on the implementation of support vector machine(SVM) based speaker verification in systems that employ continuous progressive model adaptation using the weight-based factor analysis model. The weight-based factor analysis model compensates for session variations in unsupervised scenarios by incorporating trial confidence measures in the general statistics used in the inter-session variability modelling process. Employing weight-based factor analysis in Gaussian mixture models (GMM) was recently found to provide significant performance gains to unsupervised classification. Further improvements in performance were found through the integration of SVM-based classification in the system by means of GMM supervectors. This study focuses particularly on the way in which a client is represented in the SVM kernel space using single and multiple target supervectors. Experimental results indicate that training client SVMs using a single target supervector maximises performance while exhibiting a certain robustness to the inclusion of impostor training data in the model. Furthermore, the inclusion of low-scoring target trials in the adaptation process is investigated where they were found to significantly aid performance.
Resumo:
This paper derives from research-in-progress intending both Design Research (DR) and Design Science (DS) outputs; the former a management decision tool based in IS-Impact (Gable et al. 2008) kernel theory; the latter being methodological learnings deriving from synthesis of the literature and reflection on the DR ‘case study’ experience. The paper introduces a generic, detailed and pragmatic DS ‘Research Roadmap’ or methodology, deriving at this stage primarily from synthesis and harmonization of relevant concepts identified through systematic archival analysis of related literature. The scope of the Roadmap too has been influenced by the parallel study aim to undertake DR applying and further evolving the Roadmap. The Roadmap is presented in attention to the dearth of detailed guidance available to novice Researchers in Design Science Research (DSR), and though preliminary, is expected to evolve and gradually be substantiated through experience of its application. A key distinction of the Roadmap from other DSR methods is its breadth of coverage of published DSR concepts and activities; its detail and scope. It represents a useful synthesis and integration of otherwise highly disparate DSR-related concepts.
Resumo:
In semisupervised learning (SSL), a predictive model is learn from a collection of labeled data and a typically much larger collection of unlabeled data. These paper presented a framework called multi-view point cloud regularization (MVPCR), which unifies and generalizes several semisupervised kernel methods that are based on data-dependent regularization in reproducing kernel Hilbert spaces (RKHSs). Special cases of MVPCR include coregularized least squares (CoRLS), manifold regularization (MR), and graph-based SSL. An accompanying theorem shows how to reduce any MVPCR problem to standard supervised learning with a new multi-view kernel.
Resumo:
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Resumo:
One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic results for the fraction of data that becomes support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions.
Resumo:
Background The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. Results In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. Conclusion A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
Resumo:
The conventional manual power line corridor inspection processes that are used by most energy utilities are labor-intensive, time consuming and expensive. Remote sensing technologies represent an attractive and cost-effective alternative approach to these monitoring activities. This paper presents a comprehensive investigation into automated remote sensing based power line corridor monitoring, focusing on recent innovations in the area of increased automation of fixed-wing platforms for aerial data collection, and automated data processing for object recognition using a feature fusion process. Airborne automation is achieved by using a novel approach that provides improved lateral control for tracking corridors and automatic real-time dynamic turning for flying between corridor segments, we call this approach PTAGS. Improved object recognition is achieved by fusing information from multi-sensor (LiDAR and imagery) data and multiple visual feature descriptors (color and texture). The results from our experiments and field survey illustrate the effectiveness of the proposed aircraft control and feature fusion approaches.
Resumo:
Two decades after its inception, Latent Semantic Analysis(LSA) has become part and parcel of every modern introduction to Information Retrieval. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition. For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l2 space for an l1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.
Resumo:
Historically, determining the country of origin of a published work presented few challenges, because works were generally published physically – whether in print or otherwise – in a distinct location or few locations. However, publishing opportunities presented by new technologies mean that we now live in a world of simultaneous publication – works that are first published online are published simultaneously to every country in world in which there is Internet connectivity. While this is certainly advantageous for the dissemination and impact of information and creative works, it creates potential complications under the Berne Convention for the Protection of Literary and Artistic Works (“Berne Convention”), an international intellectual property agreement to which most countries in the world now subscribe. Under the Berne Convention’s national treatment provisions, rights accorded to foreign copyright works may not be subject to any formality, such as registration requirements (although member countries are free to impose formalities in relation to domestic copyright works). In Kernel Records Oy v. Timothy Mosley p/k/a Timbaland, et al. however, the Florida Southern District Court of the United States ruled that first publication of a work on the Internet via an Australian website constituted “simultaneous publication all over the world,” and therefore rendered the work a “United States work” under the definition in section 101 of the U.S. Copyright Act, subjecting the work to registration formality under section 411. This ruling is in sharp contrast with an earlier decision delivered by the Delaware District Court in Håkan Moberg v. 33T LLC, et al. which arrived at an opposite conclusion. The conflicting rulings of the U.S. courts reveal the problems posed by new forms of publishing online and demonstrate a compelling need for further harmonization between the Berne Convention, domestic laws and the practical realities of digital publishing. In this article, we argue that even if a work first published online can be considered to be simultaneously published all over the world it does not follow that any country can assert itself as the “country of origin” of the work for the purpose of imposing domestic copyright formalities. More specifically, we argue that the meaning of “United States work” under the U.S. Copyright Act should be interpreted in line with the presumption against extraterritorial application of domestic law to limit its application to only those works with a real and substantial connection to the United States. There are gaps in the Berne Convention’s articulation of “country of origin” which provide scope for judicial interpretation, at a national level, of the most pragmatic way forward in reconciling the goals of the Berne Convention with the practical requirements of domestic law. We believe that the uncertainties arising under the Berne Convention created by new forms of online publishing can be resolved at a national level by the sensible application of principles of statutory interpretation by the courts. While at the international level we may need a clearer consensus on what amounts to “simultaneous publication” in the digital age, state practice may mean that we do not yet need to explore textual changes to the Berne Convention.