33 resultados para matrix data analytics
em CentAUR: Central Archive University of Reading - UK
Resumo:
This paper discusses how global financial institutions are using big data analytics within their compliance operations. A lot of previous research has focused on the strategic implications of big data, but not much research has considered how such tools are entwined with regulatory breaches and investigations in financial services. Our work covers two in-depth qualitative case studies, each addressing a distinct type of analytics. The first case focuses on analytics which manage everyday compliance breaches and so are expected by managers. The second case focuses on analytics which facilitate investigation and litigation where serious unexpected breaches may have occurred. In doing so, the study focuses on the micro/data to understand how these tools are influencing operational risks and practices. The paper draws from two bodies of literature, the social studies of information systems and finance to guide our analysis and practitioner recommendations. The cases illustrate how technologies are implicated in multijurisdictional challenges and regulatory conflicts at each end of the operational risk spectrum. We find that compliance analytics are both shaping and reporting regulatory matters yet often firms may have difficulties in recruiting individuals with relevant but diverse skill sets. The cases also underscore the increasing need for financial organizations to adopt robust information governance policies and processes to ease future remediation efforts.
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
Automatic generation of classification rules has been an increasingly popular technique in commercial applications such as Big Data analytics, rule based expert systems and decision making systems. However, a principal problem that arises with most methods for generation of classification rules is the overfit-ting of training data. When Big Data is dealt with, this may result in the generation of a large number of complex rules. This may not only increase computational cost but also lower the accuracy in predicting further unseen instances. This has led to the necessity of developing pruning methods for the simplification of rules. In addition, classification rules are used further to make predictions after the completion of their generation. As efficiency is concerned, it is expected to find the first rule that fires as soon as possible by searching through a rule set. Thus a suit-able structure is required to represent the rule set effectively. In this chapter, the authors introduce a unified framework for construction of rule based classification systems consisting of three operations on Big Data: rule generation, rule simplification and rule representation. The authors also review some existing methods and techniques used for each of the three operations and highlight their limitations. They introduce some novel methods and techniques developed by them recently. These methods and techniques are also discussed in comparison to existing ones with respect to efficient processing of Big Data.
Resumo:
The influence matrix is used in ordinary least-squares applications for monitoring statistical multiple-regression analyses. Concepts related to the influence matrix provide diagnostics on the influence of individual data on the analysis - the analysis change that would occur by leaving one observation out, and the effective information content (degrees of freedom for signal) in any sub-set of the analysed data. In this paper, the corresponding concepts have been derived in the context of linear statistical data assimilation in numerical weather prediction. An approximate method to compute the diagonal elements of the influence matrix (the self-sensitivities) has been developed for a large-dimension variational data assimilation system (the four-dimensional variational system of the European Centre for Medium-Range Weather Forecasts). Results show that, in the boreal spring 2003 operational system, 15% of the global influence is due to the assimilated observations in any one analysis, and the complementary 85% is the influence of the prior (background) information, a short-range forecast containing information from earlier assimilated observations. About 25% of the observational information is currently provided by surface-based observing systems, and 75% by satellite systems. Low-influence data points usually occur in data-rich areas, while high-influence data points are in data-sparse areas or in dynamically active regions. Background-error correlations also play an important role: high correlation diminishes the observation influence and amplifies the importance of the surrounding real and pseudo observations (prior information in observation space). Incorrect specifications of background and observation-error covariance matrices can be identified, interpreted and better understood by the use of influence-matrix diagnostics for the variety of observation types and observed variables used in the data assimilation system. Copyright © 2004 Royal Meteorological Society
Resumo:
Two wavelet-based control variable transform schemes are described and are used to model some important features of forecast error statistics for use in variational data assimilation. The first is a conventional wavelet scheme and the other is an approximation of it. Their ability to capture the position and scale-dependent aspects of covariance structures is tested in a two-dimensional latitude-height context. This is done by comparing the covariance structures implied by the wavelet schemes with those found from the explicit forecast error covariance matrix, and with a non-wavelet- based covariance scheme used currently in an operational assimilation scheme. Qualitatively, the wavelet-based schemes show potential at modeling forecast error statistics well without giving preference to either position or scale-dependent aspects. The degree of spectral representation can be controlled by changing the number of spectral bands in the schemes, and the least number of bands that achieves adequate results is found for the model domain used. Evidence is found of a trade-off between the localization of features in positional and spectral spaces when the number of bands is changed. By examining implied covariance diagnostics, the wavelet-based schemes are found, on the whole, to give results that are closer to diagnostics found from the explicit matrix than from the nonwavelet scheme. Even though the nature of the covariances has the right qualities in spectral space, variances are found to be too low at some wavenumbers and vertical correlation length scales are found to be too long at most scales. The wavelet schemes are found to be good at resolving variations in position and scale-dependent horizontal length scales, although the length scales reproduced are usually too short. The second of the wavelet-based schemes is often found to be better than the first in some important respects, but, unlike the first, it has no exact inverse transform.
Resumo:
The complexity inherent in climate data makes it necessary to introduce more than one statistical tool to the researcher to gain insight into the climate system. Empirical orthogonal function (EOF) analysis is one of the most widely used methods to analyze weather/climate modes of variability and to reduce the dimensionality of the system. Simple structure rotation of EOFs can enhance interpretability of the obtained patterns but cannot provide anything more than temporal uncorrelatedness. In this paper, an alternative rotation method based on independent component analysis (ICA) is considered. The ICA is viewed here as a method of EOF rotation. Starting from an initial EOF solution rather than rotating the loadings toward simplicity, ICA seeks a rotation matrix that maximizes the independence between the components in the time domain. If the underlying climate signals have an independent forcing, one can expect to find loadings with interpretable patterns whose time coefficients have properties that go beyond simple noncorrelation observed in EOFs. The methodology is presented and an application to monthly means sea level pressure (SLP) field is discussed. Among the rotated (to independence) EOFs, the North Atlantic Oscillation (NAO) pattern, an Arctic Oscillation–like pattern, and a Scandinavian-like pattern have been identified. There is the suggestion that the NAO is an intrinsic mode of variability independent of the Pacific.
Resumo:
Cross-hole anisotropic electrical and seismic tomograms of fractured metamorphic rock have been obtained at a test site where extensive hydrological data were available. A strong correlation between electrical resistivity anisotropy and seismic compressional-wave velocity anisotropy has been observed. Analysis of core samples from the site reveal that the shale-rich rocks have fabric-related average velocity anisotropy of between 10% and 30%. The cross-hole seismic data are consistent with these values, indicating that observed anisotropy might be principally due to the inherent rock fabric rather than to the aligned sets of open fractures. One region with velocity anisotropy greater than 30% has been modelled as aligned open fractures within an anisotropic rock matrix and this model is consistent with available fracture density and hydraulic transmissivity data from the boreholes and the cross-hole resistivity tomography data. However, in general the study highlights the uncertainties that can arise, due to the relative influence of rock fabric and fluid-filled fractures, when using geophysical techniques for hydrological investigations.
Resumo:
With its highly fluctuating ion production matrix-assisted laser desorption/ionization (MALDI) poses many practical challenges for its application in mass spectrometry. Instrument tuning and quantitative ion abundance measurements using ion signal alone depend on a stable ion beam. Liquid MALDI matrices have been shown to be a promising alternative to the commonly used solid matrices. Their application in areas where a stable ion current is essential has been discussed but only limited data have been provided to demonstrate their practical use and advantages in the formation of stable MALDI ion beams. In this article we present experimental data showing high MALDI ion beam stability over more than two orders of magnitude at high analytical sensitivity (low femtomole amount prepared) for quantitative peptide abundance measurements and instrument tuning in a MALDI Q-TOF mass spectrometer. Samples were deposited on an inexpensive conductive hydrophobic surface and shrunk to droplets <10 nL in size. By using a sample droplet <10 nL it was possible to acquire data from a single irradiated spot for roughly 10,000 shots with little variation in ion signal intensity at a laser repetition rate of 5-20 Hz.
Resumo:
We have combined several key sample preparation steps for the use of a liquid matrix system to provide high analytical sensitivity in automated ultraviolet -- matrix-assisted laser desorption/ionisation -- mass spectrometry (UV-MALDI-MS). This new sample preparation protocol employs a matrix-mixture which is based on the glycerol matrix-mixture described by Sze et al. The low-femtomole sensitivity that is achievable with this new preparation protocol enables proteomic analysis of protein digests comparable to solid-state matrix systems. For automated data acquisition and analysis, the MALDI performance of this liquid matrix surpasses the conventional solid-state MALDI matrices. Besides the inherent general advantages of liquid samples for automated sample preparation and data acquisition the use of the presented liquid matrix significantly reduces the extent of unspecific ion signals in peptide mass fingerprints compared to typically used solid matrices, such as 2,5-dihydroxybenzoic acid (DHB) or alpha-cyano-hydroxycinnamic acid (CHCA). In particular, matrix and low-mass ion signals and ion signals resulting from cation adduct formation are dramatically reduced. Consequently, the confidence level of protein identification by peptide mass mapping of in-solution and in-gel digests is generally higher.
Resumo:
We have combined several key sample preparation steps for the use of a liquid matrix system to provide high analytical sensitivity in automated ultraviolet - matrix-assisted laser desorption/ ionisation - mass spectrometry (UV-MALDI-MS). This new sample preparation protocol employs a matrix-mixture which is based on the glycerol matrix-mixture described by Sze et al. U. Am. Soc. Mass Spectrom. 1998, 9, 166-174). The low-ferntomole sensitivity that is achievable with this new preparation protocol enables proteomic analysis of protein digests comparable to solid-state matrix systems. For automated data acquisition and analysis, the MALDI performance of this liquid matrix surpasses the conventional solid-state MALDI matrices. Besides the inherent general advantages of liquid samples for automated sample preparation and data acquisition the use of the presented liquid matrix significantly reduces the extent of unspecific ion signals in peptide mass fingerprints compared to typically used solid matrices, such as 2,5-dihydrox-ybenzoic acid (DHB) or alpha-cyano-hydroxycinnamic acid (CHCA). In particular, matrix and lowmass ion signals and ion signals resulting from cation adduct formation are dramatically reduced. Consequently, the confidence level of protein identification by peptide mass mapping of in-solution and in-gel digests is generally higher.
Resumo:
The molecular structures of NbOBr3, NbSCl3, and NbSBr3 have been determined by gas-phase electron diffraction (GED) at nozzle-tip temperatures of 250 degreesC, taking into account the possible presence of NbOCl3 as a contaminant in the NbSCl3 sample and NbOBr3 in the NbSBr3 sample. The experimental data are consistent with trigonal-pyramidal molecules having C-3v symmetry. Infrared spectra of molecules trapped in argon or nitrogen matrices were recorded and exhibit the characteristic fundamental stretching modes for C-3v species. Well resolved isotopic fine structure (Cl-35 and Cl-37) was observed for NbSCl3, and for NbOCl3 which occurred as an impurity in the NbSCl3 spectra. Quantum mechanical calculations of the structures and vibrational frequencies of the four YNbX3 molecules (Y = O, S; X = Cl, Br) were carried out at several levels of theory, most importantly B3LYP DFT with either the Stuttgart RSC ECP or Hay-Wadt (n + 1) ECP VDZ basis set for Nb and the 6-311 G* basis set for the nonmetal atoms. Theoretical values for the bond lengths are 0.01-0.04 Angstrom longer than the experimental ones of type r(a), in accord with general experience, but the bond angles with theoretical minus experimental differences of only 1.0-1.5degrees are notably accurate. Symmetrized force fields were also calculated. The experimental bond lengths (r(g)/Angstrom) and angles (angle(alpha)/deg) with estimated 2sigma uncertainties from GED are as follows. NbOBr3: r(Nb=O) = 1.694(7), r(Nb-Br) = 2.429(2), angle(O=Nb-Br) = 107.3(5), angle(Br-Nb-Br) = 111.5(5). NbSBr3: r(Nb=S) = 2.134(10), r(Nb-Br) = 2.408(4), angle(S=Nb-Br) = 106.6(7), angle(Br-Nb-Br) = 112.2(6). NbSCl3: Nb=S) = 2.120(10), r(Nb-Cl) = 2.271(6), angle(S=Nb-Cl) = 107.8(12), angle(Cl-Nb-Cl) = 111.1(11).
Resumo:
Event-related functional magnetic resonance imaging (efMRI) has emerged as a powerful technique for detecting brains' responses to presented stimuli. A primary goal in efMRI data analysis is to estimate the Hemodynamic Response Function (HRF) and to locate activated regions in human brains when specific tasks are performed. This paper develops new methodologies that are important improvements not only to parametric but also to nonparametric estimation and hypothesis testing of the HRF. First, an effective and computationally fast scheme for estimating the error covariance matrix for efMRI is proposed. Second, methodologies for estimation and hypothesis testing of the HRF are developed. Simulations support the effectiveness of our proposed methods. When applied to an efMRI dataset from an emotional control study, our method reveals more meaningful findings than the popular methods offered by AFNI and FSL. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leave-one-out (LOO) criteria is developed to construct parsimonious radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines the center vector and diagonal covariance matrix of one RBF node by minimizing the LOO statistics. For regression applications, the LOO criterion is chosen to be the LOO mean square error, while the LOO misclassification rate is adopted in two-class classification applications. By adopting the Parzen window estimate as the desired response, the unsupervised density estimation problem is transformed into a constrained regression problem. This PSO aided OFR algorithm for tunable-node RBF networks is capable of constructing very parsimonious RBF models that generalize well, and our analysis and experimental results demonstrate that the algorithm is computationally even simpler than the efficient regularization assisted orthogonal least square algorithm based on LOO criteria for selecting fixed-node RBF models. Another significant advantage of the proposed learning procedure is that it does not have learning hyperparameters that have to be tuned using costly cross validation. The effectiveness of the proposed PSO aided OFR construction procedure is illustrated using several examples taken from regression and classification, as well as density estimation applications.