74 resultados para Data Driven Clustering


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This contribution introduces a new digital predistorter to compensate serious distortions caused by memory high power amplifiers (HPAs) which exhibit output saturation characteristics. The proposed design is based on direct learning using a data-driven B-spline Wiener system modeling approach. The nonlinear HPA with memory is first identified based on the B-spline neural network model using the Gauss-Newton algorithm, which incorporates the efficient De Boor algorithm with both B-spline curve and first derivative recursions. The estimated Wiener HPA model is then used to design the Hammerstein predistorter. In particular, the inverse of the amplitude distortion of the HPA's static nonlinearity can be calculated effectively using the Newton-Raphson formula based on the inverse of De Boor algorithm. A major advantage of this approach is that both the Wiener HPA identification and the Hammerstein predistorter inverse can be achieved very efficiently and accurately. Simulation results obtained are presented to demonstrate the effectiveness of this novel digital predistorter design.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, various types of fault detection methods for fuel cells are compared. For example, those that use a model based approach or a data driven approach or a combination of the two. The potential advantages and drawbacks of each method are discussed and comparisons between methods are made. In particular, classification algorithms are investigated, which separate a data set into classes or clusters based on some prior knowledge or measure of similarity. In particular, the application of classification methods to vectors of reconstructed currents by magnetic tomography or to vectors of magnetic field measurements directly is explored. Bases are simulated using the finite integration technique (FIT) and regularization techniques are employed to overcome ill-posedness. Fisher's linear discriminant is used to illustrate these concepts. Numerical experiments show that the ill-posedness of the magnetic tomography problem is a part of the classification problem on magnetic field measurements as well. This is independent of the particular working mode of the cell but influenced by the type of faulty behavior that is studied. The numerical results demonstrate the ill-posedness by the exponential decay behavior of the singular values for three examples of fault classes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Effective public policy to mitigate climate change footprints should build on data-driven analysis of firm-level strategies. This article’s conceptual approach augments the resource-based view (RBV) of the firm and identifies investments in four firm-level resource domains (Governance, Information management, Systems, and Technology [GISTe]) to develop capabilities in climate change impact mitigation. The authors denote the resulting framework as the GISTe model, which frames their analysis and public policy recommendations. This research uses the 2008 Carbon Disclosure Project (CDP) database, with high-quality information on firm-level climate change strategies for 552 companies from North America and Europe. In contrast to the widely accepted myth that European firms are performing better than North American ones, the authors find a different result. Many firms, whether European or North American, do not just “talk” about climate change impact mitigation, but actually do “walk the talk.” European firms appear to be better than their North American counterparts in “walk I,” denoting attention to governance, information management, and systems. But when it comes down to “walk II,” meaning actual Technology-related investments, North American firms’ performance is equal or superior to that of the European companies. The authors formulate public policy recommendations to accelerate firm-level, sector-level, and cluster-level implementation of climate change strategies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Empirical mode decomposition (EMD) is a data-driven method used to decompose data into oscillatory components. This paper examines to what extent the defined algorithm for EMD might be susceptible to data format. Two key issues with EMD are its stability and computational speed. This paper shows that for a given signal there is no significant difference between results obtained with single (binary32) and double (binary64) floating points precision. This implies that there is no benefit in increasing floating point precision when performing EMD on devices optimised for single floating point format, such as graphical processing units (GPUs).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Empirical Mode Decomposition (EMD) is a data driven technique for extraction of oscillatory components from data. Although it has been introduced over 15 years ago, its mathematical foundations are still missing which also implies lack of objective metrics for decomposed set evaluation. Most common technique for assessing results of EMD is their visual inspection, which is very subjective. This article provides objective measures for assessing EMD results based on the original definition of oscillatory components.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a data-driven mathematical model of a key initiating step in platelet activation, a central process in the prevention of bleeding following Injury. In vascular disease, this process is activated inappropriately and causes thrombosis, heart attacks and stroke. The collagen receptor GPVI is the primary trigger for platelet activation at sites of injury. Understanding the complex molecular mechanisms initiated by this receptor is important for development of more effective antithrombotic medicines. In this work we developed a series of nonlinear ordinary differential equation models that are direct representations of biological hypotheses surrounding the initial steps in GPVI-stimulated signal transduction. At each stage model simulations were compared to our own quantitative, high-temporal experimental data that guides further experimental design, data collection and model refinement. Much is known about the linear forward reactions within platelet signalling pathways but knowledge of the roles of putative reverse reactions are poorly understood. An initial model, that includes a simple constitutively active phosphatase, was unable to explain experimental data. Model revisions, incorporating a complex pathway of interactions (and specifically the phosphatase TULA-2), provided a good description of the experimental data both based on observations of phosphorylation in samples from one donor and in those of a wider population. Our model was used to investigate the levels of proteins involved in regulating the pathway and the effect of low GPVI levels that have been associated with disease. Results indicate a clear separation in healthy and GPVI deficient states in respect of the signalling cascade dynamics associated with Syk tyrosine phosphorylation and activation. Our approach reveals the central importance of this negative feedback pathway that results in the temporal regulation of a specific class of protein tyrosine phosphatases in controlling the rate, and therefore extent, of GPVI-stimulated platelet activation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nonlinear data assimilation is high on the agenda in all fields of the geosciences as with ever increasing model resolution and inclusion of more physical (biological etc.) processes, and more complex observation operators the data-assimilation problem becomes more and more nonlinear. The suitability of particle filters to solve the nonlinear data assimilation problem in high-dimensional geophysical problems will be discussed. Several existing and new schemes will be presented and it is shown that at least one of them, the Equivalent-Weights Particle Filter, does indeed beat the curse of dimensionality and provides a way forward to solve the problem of nonlinear data assimilation in high-dimensional systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Clustering methods are increasingly being applied to residential smart meter data, providing a number of important opportunities for distribution network operators (DNOs) to manage and plan the low voltage networks. Clustering has a number of potential advantages for DNOs including, identifying suitable candidates for demand response and improving energy profile modelling. However, due to the high stochasticity and irregularity of household level demand, detailed analytics are required to define appropriate attributes to cluster. In this paper we present in-depth analysis of customer smart meter data to better understand peak demand and major sources of variability in their behaviour. We find four key time periods in which the data should be analysed and use this to form relevant attributes for our clustering. We present a finite mixture model based clustering where we discover 10 distinct behaviour groups describing customers based on their demand and their variability. Finally, using an existing bootstrapping technique we show that the clustering is reliable. To the authors knowledge this is the first time in the power systems literature that the sample robustness of the clustering has been tested.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

ISO19156 Observations and Measurements (O&M) provides a standardised framework for organising information about the collection of information about the environment. Here we describe the implementation of a specialisation of O&M for environmental data, the Metadata Objects for Linking Environmental Sciences (MOLES3). MOLES3 provides support for organising information about data, and for user navigation around data holdings. The implementation described here, “CEDA-MOLES”, also supports data management functions for the Centre for Environmental Data Archival, CEDA. The previous iteration of MOLES (MOLES2) saw active use over five years, being replaced by CEDA-MOLES in late 2014. During that period important lessons were learnt both about the information needed, as well as how to design and maintain the necessary information systems. In this paper we review the problems encountered in MOLES2; how and why CEDA-MOLES was developed and engineered; the migration of information holdings from MOLES2 to CEDA-MOLES; and, finally, provide an early assessment of MOLES3 (as implemented in CEDA-MOLES) and its limitations. Key drivers for the MOLES3 development included the necessity for improved data provenance, for further structured information to support ISO19115 discovery metadata export (for EU INSPIRE compliance), and to provide appropriate fixed landing pages for Digital Object Identifiers (DOIs) in the presence of evolving datasets. Key lessons learned included the importance of minimising information structure in free text fields, and the necessity to support as much agility in the information infrastructure as possible without compromising on maintainability both by those using the systems internally and externally (e.g. citing in to the information infrastructure), and those responsible for the systems themselves. The migration itself needed to ensure continuity of service and traceability of archived assets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several global quantities are computed from the ERA40 reanalysis for the period 1958-2001 and explored for trends. These are discussed in the context of changes to the global observing system. Temperature, integrated water vapor (IWV), and kinetic energy are considered. The ERA40 global mean temperature in the lower troposphere has a trend of +0.11 K per decade over the period of 1979-2001, which is slightly higher than the MSU measurements, but within the estimated error limit. For the period 1958 2001 the warming trend is 0.14 K per decade but this is likely to be an artifact of changes in the observing system. When this is corrected for, the warming trend is reduced to 0.10 K per decade. The global trend in IWV for the period 1979-2001 is +0.36 mm per decade. This is about twice as high as the trend determined from the Clausius-Clapeyron relation assuming conservation of relative humidity. It is also larger than results from free climate model integrations driven by the same observed sea surface temperature as used in ERA40. It is suggested that the large trend in IWV does not represent a genuine climate trend but an artifact caused by changes in the global observing system such as the use of SSM/I and more satellite soundings in later years. Recent results are in good agreement with GPS measurements. The IWV trend for the period 1958-2001 is still higher but reduced to +0.16 mm per decade when corrected for changes in the observing systems. Total kinetic energy shows an increasing global trend. Results from data assimilation experiments strongly suggest that this trend is also incorrect and mainly caused by the huge changes in the global observing system in 1979. When this is corrected for, no significant change in global kinetic energy from 1958 onward can be found.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Global Ocean Data Assimilation Experiment (GODAE [http:// www.godae.org]) has spanned a decade of rapid technological development. The ever-increasing volume and diversity of oceanographic data produced by in situ instruments, remote-sensing platforms, and computer simulations have driven the development of a number of innovative technologies that are essential for connecting scientists with the data that they need. This paper gives an overview of the technologies that have been developed and applied in the course of GODAE, which now provide users of oceanographic data with the capability to discover, evaluate, visualize, download, and analyze data from all over the world. The key to this capability is the ability to reduce the inherent complexity of oceanographic data by providing a consistent, harmonized view of the various data products. The challenges of data serving have been addressed over the last 10 years through the cooperative skills and energies of many individuals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, the amount of data in biological field has become larger and larger; Bio-techniques for analysis of biological data have been developed and new tools have been introduced. Several computational methods are based on unsupervised neural network algorithms that are widely used for multiple purposes including clustering and visualization, i.e. the Self Organizing Maps (SOM). Unfortunately, even though this method is unsupervised, the performances in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. In this paper we present a new initialization technique based on a totally connected undirected graph, that report relations among some intersting features of data input. Result of experimental tests, where the proposed algorithm is compared to the original initialization techniques, shows that our technique assures faster learning and better performance in terms of quantization error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.