51 resultados para Classification error rate
Resumo:
This paper uses appropriately modified information criteria to select models from the GARCH family, which are subsequently used for predicting US dollar exchange rate return volatility. The out of sample forecast accuracy of models chosen in this manner compares favourably on mean absolute error grounds, although less favourably on mean squared error grounds, with those generated by the commonly used GARCH(1, 1) model. An examination of the orders of models selected by the criteria reveals that (1, 1) models are typically selected less than 20% of the time.
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
We extend extreme learning machine (ELM) classifiers to complex Reproducing Kernel Hilbert Spaces (RKHS) where the input/output variables as well as the optimization variables are complex-valued. A new family of classifiers, called complex-valued ELM (CELM) suitable for complex-valued multiple-input–multiple-output processing is introduced. In the proposed method, the associated Lagrangian is computed using induced RKHS kernels, adopting a Wirtinger calculus approach formulated as a constrained optimization problem similarly to the conventional ELM classifier formulation. When training the CELM, the Karush–Khun–Tuker (KKT) theorem is used to solve the dual optimization problem that consists of satisfying simultaneously smallest training error as well as smallest norm of output weights criteria. The proposed formulation also addresses aspects of quaternary classification within a Clifford algebra context. For 2D complex-valued inputs, user-defined complex-coupled hyper-planes divide the classifier input space into four partitions. For 3D complex-valued inputs, the formulation generates three pairs of complex-coupled hyper-planes through orthogonal projections. The six hyper-planes then divide the 3D space into eight partitions. It is shown that the CELM problem formulation is equivalent to solving six real-valued ELM tasks, which are induced by projecting the chosen complex kernel across the different user-defined coordinate planes. A classification example of powdered samples on the basis of their terahertz spectral signatures is used to demonstrate the advantages of the CELM classifiers compared to their SVM counterparts. The proposed classifiers retain the advantages of their ELM counterparts, in that they can perform multiclass classification with lower computational complexity than SVM classifiers. Furthermore, because of their ability to perform classification tasks fast, the proposed formulations are of interest to real-time applications.
Resumo:
Accurate and reliable rain rate estimates are important for various hydrometeorological applications. Consequently, rain sensors of different types have been deployed in many regions. In this work, measurements from different instruments, namely, rain gauge, weather radar, and microwave link, are combined for the first time to estimate with greater accuracy the spatial distribution and intensity of rainfall. The objective is to retrieve the rain rate that is consistent with all these measurements while incorporating the uncertainty associated with the different sources of information. Assuming the problem is not strongly nonlinear, a variational approach is implemented and the Gauss–Newton method is used to minimize the cost function containing proper error estimates from all sensors. Furthermore, the method can be flexibly adapted to additional data sources. The proposed approach is tested using data from 14 rain gauges and 14 operational microwave links located in the Zürich area (Switzerland) to correct the prior rain rate provided by the operational radar rain product from the Swiss meteorological service (MeteoSwiss). A cross-validation approach demonstrates the improvement of rain rate estimates when assimilating rain gauge and microwave link information.
Resumo:
The co-polar correlation coefficient (ρhv) has many applications, including hydrometeor classification, ground clutter and melting layer identification, interpretation of ice microphysics and the retrieval of rain drop size distributions (DSDs). However, we currently lack the quantitative error estimates that are necessary if these applications are to be fully exploited. Previous error estimates of ρhv rely on knowledge of the unknown "true" ρhv and implicitly assume a Gaussian probability distribution function of ρhv samples. We show that frequency distributions of ρhv estimates are in fact highly negatively skewed. A new variable: L = -log10(1 - ρhv) is defined, which does have Gaussian error statistics, and a standard deviation depending only on the number of independent radar pulses. This is verified using observations of spherical drizzle drops, allowing, for the first time, the construction of rigorous confidence intervals in estimates of ρhv. In addition, we demonstrate how the imperfect co-location of the horizontal and vertical polarisation sample volumes may be accounted for. The possibility of using L to estimate the dispersion parameter (µ) in the gamma drop size distribution is investigated. We find that including drop oscillations is essential for this application, otherwise there could be biases in retrieved µ of up to ~8. Preliminary results in rainfall are presented. In a convective rain case study, our estimates show µ to be substantially larger than 0 (an exponential DSD). In this particular rain event, rain rate would be overestimated by up to 50% if a simple exponential DSD is assumed.
Resumo:
Atmosphere only and ocean only variational data assimilation (DA) schemes are able to use window lengths that are optimal for the error growth rate, non-linearity and observation density of the respective systems. Typical window lengths are 6-12 hours for the atmosphere and 2-10 days for the ocean. However, in the implementation of coupled DA schemes it has been necessary to match the window length of the ocean to that of the atmosphere, which may potentially sacrifice the accuracy of the ocean analysis in order to provide a more balanced coupled state. This paper investigates how extending the window length in the presence of model error affects both the analysis of the coupled state and the initialized forecast when using coupled DA with differing degrees of coupling. Results are illustrated using an idealized single column model of the coupled atmosphere-ocean system. It is found that the analysis error from an uncoupled DA scheme can be smaller than that from a coupled analysis at the initial time, due to faster error growth in the coupled system. However, this does not necessarily lead to a more accurate forecast due to imbalances in the coupled state. Instead coupled DA is more able to update the initial state to reduce the impact of the model error on the accuracy of the forecast. The effect of model error is potentially most detrimental in the weakly coupled formulation due to the inconsistency between the coupled model used in the outer loop and uncoupled models used in the inner loop.