15 resultados para Kolmogorov-Smirnov

em Deakin Research Online - Australia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article describes the utilisation of an unsupervised machine learning technique and statistical approaches (e.g., the Kolmogorov-Smirnov test) that assist cycling experts in the crucial decision-making processes for athlete selection, training, and strategic planning in the track cycling Omnium. The Omnium is a multi-event competition that will be included in the summer Olympic Games for the first time in 2012. Presently, selectors and cycling coaches make decisions based on experience and intuition. They rarely have access to objective data. We analysed both the old five-event (first raced internationally in 2007) and new six-event (first raced internationally in 2011) Omniums and found that the addition of the elimination race component to the Omnium has, contrary to expectations, not favoured track endurance riders. We analysed the Omnium data and also determined the inter-relationships between different individual events as well as between those events and the final standings of riders. In further analysis, we found that there is no maximum ranking (poorest performance) in each individual event that riders can afford whilst still winning a medal. We also found the required times for riders to finish the timed components that are necessary for medal winning. The results of this study consider the scoring system of the Omnium and inform decision-making toward successful participation in future major Omnium competitions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article describes the utilisation of an unsupervised machine learning technique and statistical approaches (e.g., the Kolmogorov-Smirnov test) that assist cycling experts in the crucial decision-making processes for athlete selection, training, and strategic planning in the track cycling Omnium. The Omnium is a multi-event competition that will be included in the summer Olympic Games for the first time in 2012. Presently, selectors and cycling coaches make decisions based on experience and intuition. They rarely have access to objective data. We analysed both the old five-event (first raced internationally in 2007) and new six-event (first raced internationally in 2011) Omniums and found that the addition of the elimination race component to the Omnium has, contrary to expectations, not favoured track endurance riders. We analysed the Omnium data and also determined the inter-relationships between different individual events as well as between those events and the final standings of riders. In further analysis, we found that there is no maximum ranking (poorest performance) in each individual event that riders can afford whilst still winning a medal. We also found the required times for riders to finish the timed components that are necessary for medal winning. The results of this study consider the scoring system of the Omnium and inform decision-making toward successful participation in future major Omnium competitions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this research, we study the effect of feature selection in the spike detection and sorting accuracy.We introduce a new feature representation for neural spikes from multichannel recordings. The features selection plays a significant role in analyzing the response of brain neurons. The more precise selection of features leads to a more accurate spike sorting, which can group spikes more precisely into clusters based on the similarity of spikes. Proper spike sorting will enable the association between spikes and neurons. Different with other threshold-based methods, the cepstrum of spike signals is employed in our method to select the candidates of spike features. To choose the best features among different candidates, the Kolmogorov-Smirnov (KS) test is utilized. Then, we rely on the superparamagnetic method to cluster the neural spikes based on KS features. Simulation results demonstrate that the proposed method not only achieve more accurate clustering results but also reduce computational burden, which implies that it can be applied into real-time spike analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In neuroscience, the extracellular actions potentials of neurons are the most important signals, which are called spikes. However, a single extracellular electrode can capture spikes from more than one neuron. Spike sorting is an important task to diagnose various neural activities. The more we can understand neurons the more we can cure more neural diseases. The process of sorting these spikes is typically made in some steps which are detection, feature extraction and clustering. In this paper we propose to use the Mel-frequency cepstral coefficients (MFCC) to extract spike features associated with Hidden Markov model (HMM) in the clustering step. Our results show that using MFCC features can differentiate between spikes more clearly than the other feature extraction methods, and also using HMM as a clustering algorithm also yields a better sorting accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Industrial producers face the task of optimizing production process in an attempt to achieve the desired quality such as mechanical properties with the lowest energy consumption. In industrial carbon fiber production, the fibers are processed in bundles containing (batches) several thousand filaments and consequently the energy optimization will be a stochastic process as it involves uncertainty, imprecision or randomness. This paper presents a stochastic optimization model to reduce energy consumption a given range of desired mechanical properties. Several processing condition sets are developed and for each set of conditions, 50 samples of fiber are analyzed for their tensile strength and modulus. The energy consumption during production of the samples is carefully monitored on the processing equipment. Then, five standard distribution functions are examined to determine those which can best describe the distribution of mechanical properties of filaments. To verify the distribution goodness of fit and correlation statistics, the Kolmogorov-Smirnov test is used. In order to estimate the selected distribution (Weibull) parameters, the maximum likelihood, least square and genetic algorithm methods are compared. An array of factors including the sample size, the confidence level, and relative error of estimated parameters are used for evaluating the tensile strength and modulus properties. The energy consumption and N2 gas cost are modeled by Convex Hull method. Finally, in order to optimize the carbon fiber production quality and its energy consumption and total cost, mixed integer linear programming is utilized. The results show that using the stochastic optimization models, we are able to predict the production quality in a given range and minimize the energy consumption of its industrial process.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the last 30 to 40 years, many researchers have combined to build the knowledge base of theory and solution techniques that can be applied to the case of differential equations which include the effects of noise. This class of ``noisy'' differential equations is now known as stochastic differential equations (SDEs). Markov diffusion processes are included within the field of SDEs through the drift and diffusion components of the Itô form of an SDE. When these drift and diffusion components are moderately smooth functions, then the processes' transition probability densities satisfy the Fokker-Planck-Kolmogorov (FPK) equation -- an ordinary partial differential equation (PDE). Thus there is a mathematical inter-relationship that allows solutions of SDEs to be determined from the solution of a noise free differential equation which has been extensively studied since the 1920s. The main numerical solution technique employed to solve the FPK equation is the classical Finite Element Method (FEM). The FEM is of particular importance to engineers when used to solve FPK systems that describe noisy oscillators. The FEM is a powerful tool but is limited in that it is cumbersome when applied to multidimensional systems and can lead to large and complex matrix systems with their inherent solution and storage problems. I show in this thesis that the stochastic Taylor series (TS) based time discretisation approach to the solution of SDEs is an efficient and accurate technique that provides transition and steady state solutions to the associated FPK equation. The TS approach to the solution of SDEs has certain advantages over the classical techniques. These advantages include their ability to effectively tackle stiff systems, their simplicity of derivation and their ease of implementation and re-use. Unlike the FEM approach, which is difficult to apply in even only two dimensions, the simplicity of the TS approach is independant of the dimension of the system under investigation. Their main disadvantage, that of requiring a large number of simulations and the associated CPU requirements, is countered by their underlying structure which makes them perfectly suited for use on the now prevalent parallel or distributed processing systems. In summary, l will compare the TS solution of SDEs to the solution of the associated FPK equations using the classical FEM technique. One, two and three dimensional FPK systems that describe noisy oscillators have been chosen for the analysis. As higher dimensional FPK systems are rarely mentioned in the literature, the TS approach will be extended to essentially infinite dimensional systems through the solution of stochastic PDEs. In making these comparisons, the advantages of modern computing tools such as computer algebra systems and simulation software, when used as an adjunct to the solution of SDEs or their associated FPK equations, are demonstrated.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A critical question in data mining is that can we always trust what discovered by a data mining system unconditionally? The answer is obviously not. If not, when can we trust the discovery then? What are the factors that affect the reliability of the discovery? How do they affect the reliability of the discovery? These are some interesting questions to be investigated. In this chapter we will firstly provide a definition and the measurements of reliability, and analyse the factors that affect the reliability. We then examine the impact of model complexity, weak links, varying sample sizes and the ability of different learners to the reliability of graphical model discovery. The experimental results reveal that (1) the larger sample size for the discovery, the higher reliability we will get; (2) the stronger a graph link is, the easier the discovery will be and thus the higher the reliability it can achieve; (3) the complexity of a graph also plays an important role in the discovery. The higher the complexity of a graph is, the more difficult to induce the graph and the lower reliability it would be. We also examined the performance difference of different discovery algorithms. This reveals the impact of discovery process. The experimental results show the superior reliability and robustness of MML method to standard significance tests in the recovery of graph links with small samples and weak links.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Regression lies heart in statistics, it is the one of the most important branch of multivariate techniques available for extracting knowledge in almost every field of study and research. Nowadays, it has drawn a huge interest to perform the tasks with different fields like machine learning, pattern recognition and data mining. Investigating outlier (exceptional) is a century long problem to the data analyst and researchers. Blind application of data could have dangerous consequences and leading to discovery of meaningless patterns and carrying to the imperfect knowledge. As a result of digital revolution and the growth of the Internet and Intranet data continues to be accumulated at an exponential rate and thereby importance of detecting outliers and study their costs and benefits as a tool for reliable knowledge discovery claims perfect attention. Investigating outliers in regression has been paid great value for the last few decades within two frames of thoughts in the name of robust regression and regression diagnostics. Robust regression first wants to fit a regression to the majority of the data and then to discover outliers as those points that possess large residuals from the robust output whereas in regression diagnostics one first finds the outliers, delete/correct them and then fit the regular data by classical (usual) methods. At the beginning there seems to be much confusion but now the researchers reach to the consensus, robustness and diagnostics are two complementary approaches to the analysis of data and any one is not good enough. In this chapter, we discuss both of them under the unique spectrum of regression diagnostics. Chapter expresses the necessity and views of regression diagnostics as well as presents several contemporary methods through numerical examples in linear regression within each aforesaid category together with current challenges and possible future research directions. Our aim is to make the chapter self-explained maintaining its general accessibility.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The reliability of an induced classifier can be affected by several factors including the data oriented factors and the algorithm oriented factors [3]. In some cases, the reliability could also be affected by knowledge oriented factors. In this chapter, we analyze three special cases to examine the reliability of the discovered knowledge. Our case study results show that (1) in the cases of mining from low quality data, rough classification approach is more reliable than exact approach which in general tolerate to low quality data; (2) Without sufficient large size of the data, the reliability of the discovered knowledge will be decreased accordingly; (3) The reliability of point learning approach could easily be misled by noisy data. It will in most cases generate an unreliable interval and thus affect the reliability of the discovered knowledge. It is also reveals that the inexact field is a good learning strategy that could model the potentials and to improve the discovery reliability.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Class imbalance in textual data is one important factor that affects the reliability of text mining. For imbalanced textual data, conventional classifiers tend to have a strong performance bias, which results in high accuracy rate on the majority class but very low rate on the minorities. An extreme strategy for unbalanced learning is to discard the majority instances and apply one-class classification to the minority class. However, this could easily cause another type of bias, which increases the accuracy rate on minorities by sacrificing the majorities. This chapter aims to investigate approaches that reduce these two types of performance bias and improve the reliability of discovered classification rules. Experimental results show that the inexact field learning method and parameter optimized one class classifiers achieve more balanced performance than the standard approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Subsequence frequency measurement is a basic and essential problem in knowledge discovery in single sequences. Frequency based knowledge discovery in single sequences tends to be unreliable since different resulting sets may be obtained from a same sequence when different frequency metrics are adopted. In this chapter, we investigate subsequence frequency measurement and its impact on the reliability of knowledge discovery in single sequences. We analyse seven previous frequency metrics, identify their inherent inaccuracies, and explore their impacts on two kinds of knowledge discovered from single sequences, frequent episodes and episode rules. We further give three suggestions for frequency metrics and introduce a new frequency metric in order to improve the reliability. Empirical evaluation reveals the inaccuracies and verifies our findings.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The web is a rich resource for information discovery, as a result web mining is a hot topic. However, a reliable mining result depends on the reliability of the data set. For every single second, the web generate huge amount of data, such as web page requests, file transportation. The data reflect human behavior in the cyber space and therefore valuable for our analysis in various disciplines, e.g. social science, network security. How to deposit the data is a challenge. An usual strategy is to save the abstract of the data, such as using aggregation functions to preserve the features of the original data with much smaller space. A key problem, however is that such information can be distorted by the presence of illegitimate traffic, e.g. botnet recruitment scanning, DDoS attack traffic, etc. An important consideration in web related knowledge discovery then is the robustness of the aggregation method , which in turn may be affected by the reliability of network traffic data. In this chapter, we first present the methods of aggregation functions, and then we employe information distances to filter out anomaly data as a preparation for web data mining.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper considers GSD projects as designed artefacts, and proposes the application of an Extended Axiomatic Design theory to reduce their complexity in order to increase the probability of project success. Using an upper bound estimation of the Kolmogorov complexity of the so-called ‘design matrix’ (as a proxy of Information Content as a complexity measure) we demonstrate on two hypothetical examples how good and bad designs of GSD planning compare in terms of complexity. We also demonstrate how to measure and calculate the ‘structural’ complexity of GSD projects and show that by satisfying all design axioms this ‘structural’ complexity could be minimised.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

RIKD 2010 is the Third International Workshop on Reliability Issues in Knowledge Discovery. This paper provides an introduction to the workshop. It summarizes the main workshop features and provides a formulation of the field of reliable knowledge discovery.