43 resultados para Cross-validation

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an efficient evaluation algorithm for cross-validating the two-stage approach of KFD classifiers. The proposed algorithm is of the same complexity level as the existing indirect efficient cross-validation methods but it is more reliable since it is direct and constitutes exact cross-validation for the KFD classifier formulation. Simulations demonstrate that the proposed algorithm is almost as fast as the existing fast indirect evaluation algorithm and the twostage cross-validation selects better models on most of the thirteen benchmark data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given n training examples, the training of a Kernel Fisher Discriminant (KFD) classifier corresponds to solving a linear system of dimension n. In cross-validating KFD, the training examples are split into 2 distinct subsets for a number of times (L) wherein a subset of m examples is used for validation and the other subset of(n - m) examples is used for training the classifier. In this case L linear systems of dimension (n - m) need to be solved. We propose a novel method for cross-validation of KFD in which instead of solving L linear systems of dimension (n - m), we compute the inverse of an n × n matrix and solve L linear systems of dimension 2m, thereby reducing the complexity when L is large and/or m is small. For typical 10-fold and leave-one-out cross-validations, the proposed algorithm is approximately 4 and (4/9n) times respectively as efficient as the naive implementations. Simulations are provided to demonstrate the efficiency of the proposed algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background : The Neighborhood Environment Walkability Scale (NEWS) and its abbreviated form (NEWS-A) assess perceived environmental attributes believed to influence physical activity. A multilevel confirmatory factor analysis (MCFA) conducted on a sample from Seattle, WA showed that, at the respondent level, the factor-analyzable items of the NEWS and NEWS-A measured 11 and 10 constructs of perceived neighborhood environment, respectively. At the census blockgroup (used by the US Census Bureau as a subunit of census tracts) level, the MCFA yielded five factors for both NEWS and NEWS-A. The aim of this study was to cross-validate the individual- and blockgroup-level measurement models of the NEWS and NEWS-A in a geographical location and population different from those used in the original validation study.

Methods : A sample of 912 adults was recruited from 16 selected neighborhoods (116 census blockgroups) in the Baltimore, MD region. Neighborhoods were stratified according to their socio-economic status and transport-related walkability level measured using Geographic Information Systems. Participants self-completed the NEWS. MCFA was used to cross-validate the individual- and blockgroup-level measurement models of the NEWS and NEWS-A.

Results : The data provided sufficient support for the factorial validity of the original individual-level measurement models, which consisted of 11 (NEWS) and 10 (NEWS-A) correlated factors. The original blockgroup-level measurement model of the NEWS and NEWS-A showed poor fit to the data and required substantial modifications. These included the combining of aspects of building aesthetics with safety from crime into one factor; the separation of natural aesthetics and building aesthetics into two factors; and for the NEWS-A, the separation of presence of sidewalks/walking routes from other infrastructure for walking.

Conclusion : This study provided support for the generalizability of the individual-level measurement models of the NEWS and NEWS-A to different urban geographical locations in the USA. It is recommended that the NEWS and NEWS-A be scored according to their individual-level measurement models, which are relatively stable and correspond to constructs commonly used in the urban planning and transportation fields. However, prior to using these instruments in international and multi-cultural studies, further validation work across diverse non-English speaking countries and populations is needed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

From a future history of 2025: Continuous development is common for build/test (continuous integration) and operations (devOps). This trend continues through the lifecycle, into what we call `devUsage': continuous usage validation. In addition to ensuring systems meet user needs, organisations continuously validate their legal and ethical use. The rise of end-user programming and multi-sided platforms exacerbate validation challenges. A separate trend isthe specialisation of software engineering for technical domains, including data analytics. This domain has specific validation challenges. We must validate the accuracy of sta-tistical models, but also whether they have illegal or unethical biases. Usage needs addressed by machine learning are sometimes not speci able in the traditional sense, and statistical models are often `black boxes'. We describe future research to investigate solutions to these devUsage challenges for data analytics systems. We will adapt risk management and governance frameworks previously used for soft-ware product qualities, use social network communities for input from aligned stakeholder groups, and perform cross-validation using autonomic experimentation, cyber-physical data streams, and online discursive feedback.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes a new method of monotone interpolation and smoothing of multivariate scattered data. It is based on the assumption that the function to be approximated is Lipschitz continuous. The method provides the optimal approximation in the worst case scenario and tight error bounds. Smoothing of noisy data subject to monotonicity constraints is converted into a quadratic programming problem. Estimation of the unknown Lipschitz constant from the data by sample splitting and cross-validation is described. Extension of the method for locally Lipschitz functions is presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Appropriate choice of a kernel is the most important ingredient of the kernel-based learning methods such as support vector machine (SVM). Automatic kernel selection is a key issue given the number of kernels available, and the current trial-and-error nature of selecting the best kernel for a given problem. This paper introduces a new method for automatic kernel selection, with empirical results based on classification. The empirical study has been conducted among five kernels with 112 different classification problems, using the popular kernel based statistical learning algorithm SVM. We evaluate the kernels’ performance in terms of accuracy measures. We then focus on answering the question: which kernel is best suited to which type of classification problem? Our meta-learning methodology involves measuring the problem characteristics using classical, distance and distribution-based statistical information. We then combine these measures with the empirical results to present a rule-based method to select the most appropriate kernel for a classification problem. The rules are generated by the decision tree algorithm C5.0 and are evaluated with 10 fold cross validation. All generated rules offer high accuracy ratings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Classifying malware correctly is an important research issue for anti-malware software producers. This paper presents an effective and efficient malware classification technique based on string information using several wellknown classification algorithms. In our testing we extracted the printable strings from 1367 samples, including unpacked trojans and viruses and clean files. Information describing the printable strings contained in each sample was input to various classification algorithms, including treebased classifiers, a nearest neighbour algorithm, statistical algorithms and AdaBoost. Using k-fold cross validation on the unpacked malware and clean files, we achieved a classification accuracy of 97%. Our results reveal that strings from library code (rather than malicious code itself) can be utilised to distinguish different malware families.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Anti-malware software producers are continually challenged to identify and counter new malware as it is released into the wild. A dramatic increase in malware production in recent years has rendered the conventional method of manually determining a signature for each new malware sample untenable. This paper presents a scalable, automated approach for detecting and classifying malware by using pattern recognition algorithms and statistical methods at various stages of the malware analysis life cycle. Our framework combines the static features of function length and printable string information extracted from malware samples into a single test which gives classification results better than those achieved by using either feature individually. In our testing we input feature information from close to 1400 unpacked malware samples to a number of different classification algorithms. Using k-fold cross validation on the malware, which includes Trojans and viruses, along with 151 clean files, we achieve an overall classification accuracy of over 98%.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As a result of selective pressures faced during lactation, vocal recognition may play a crucial role in maintaining the phocid mother–pup bond during the period of dependence. To investigate this possibility, we examined whether Weddell seal (Leptonychotes weddellii) pups produce individually distinctive “primary” calls. One temporal, nine fundamental frequency features, and two spectral characteristics were measured. A discriminant function analysis (DFA) of 15 Vestfold Hills pups correctly classified 52% of calls, while the cross-validation procedure classified 29% of calls to the correct pup. A second DFA of 10 known-age McMurdo Sound pups correctly classified 44% of “test” calls. For novel calls, the probabilities of attaining such classification rates by chance are low. The relationship between age and call stereotypy indicated that pups 2 wk and older may be more vocally distinctive. Overall, findings suggest that Weddell seal pup “primary” calls are moderately distinctive and only exhibit sufficient stereotypy to aid maternal recognition by approximately two weeks of age.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a human daily activity classification approach based on the sensory data collected from a single tri-axial accelerometer worn on waist belt. The classification algorithm was realized to distinguish 6 different activities including standing, jumping, sitting-down, walking, running and falling through three major steps: wavelet transformation, Principle Component Analysis (PCA)-based dimensionality reduction and followed by implementing a radial basis function (RBF) kernel Support Vector Machine (SVM) classifier. Two trials were conducted to evaluate different aspects of the classification scheme. In the first trial, the classifier was trained and evaluated by using a dataset of 420 samples collected from seven subjects by using a k-fold cross-validation method. The parameters σ and c of the RBF kernel were optimized through automatic searching in terms of yielding the highest recognition accuracy and robustness. In the second trial, the generation capability of the classifier was also validated by using the dataset collected from six new subjects. The average classification rates of 95% and 93% are obtained in trials 1 and 2, respectively. The results in trial 2 show the system is also good at classifying activity signals of new subjects. It can be concluded that the collective effects of the usage of single accelerometer sensing, the setting of the accelerometer placement and efficient classifier would make this wearable sensing system more realistic and more comfortable to be implemented for long-term human activity monitoring and classification in ambulatory environment, therefore, more acceptable by users.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we present novel ridge regression (RR) and kernel ridge regression (KRR) techniques for multivariate labels and apply the methods to the problem of face recognition. Motivated by the fact that the regular simplex vertices are separate points with highest degree of symmetry, we choose such vertices as the targets for the distinct individuals in recognition and apply RR or KRR to map the training face images into a face subspace where the training images from each individual will locate near their individual targets. We identify the new face image by mapping it into this face subspace and comparing its distance to all individual targets. An efficient cross-validation algorithm is also provided for selecting the regularization and kernel parameters. Experiments were conducted on two face databases and the results demonstrate that the proposed algorithm significantly outperforms the three popular linear face recognition techniques (Eigenfaces, Fisherfaces and Laplacianfaces) and also performs comparably with the recently developed Orthogonal Laplacianfaces with the advantage of computational speed. Experimental results also demonstrate that KRR outperforms RR as expected since KRR can utilize the nonlinear structure of the face images. Although we concentrate on face recognition in this paper, the proposed method is general and may be applied for general multi-category classification problems.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Purpose

To test a field-based protocol using intermittent activities representative of children's physical activity behaviours, to generate behaviourally valid, population-specific accelerometer cut-points for sedentary behaviour, moderate, and vigorous physical activity.
Methods

Twenty-eight children (46% boys) aged 10–11 years wore a hip-mounted uniaxial GT1M ActiGraph and engaged in 6 activities representative of children's play. A validated direct observation protocol was used as the criterion measure of physical activity. Receiver Operating Characteristics (ROC) curve analyses were conducted with four semi-structured activities to determine the accelerometer cut-points. To examine classification differences, cut-points were cross-validated with free-play and DVD viewing activities.
Results

Cut-points of ≤372, >2160 and >4806 counts•min−1 representing sedentary, moderate and vigorous intensity thresholds, respectively, provided the optimal balance between the related needs for sensitivity (accurately detecting activity) and specificity (limiting misclassification of the activity). Cross-validation data demonstrated that these values yielded the best overall kappa scores (0.97; 0.71; 0.62), and a high classification agreement (98.6%; 89.0%; 87.2%), respectively. Specificity values of 96–97% showed that the developed cut-points accurately detected physical activity, and sensitivity values (89–99%) indicated that minutes of activity were seldom incorrectly classified as inactivity.
Conclusion

The development of an inexpensive and replicable field-based protocol to generate behaviourally valid and population-specific accelerometer cut-points may improve the classification of physical activity levels in children, which could enhance subsequent intervention and observational studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sleep stage identification is the first step in modern sleep disorder diagnostics process. K-complex is an indicator for the sleep stage 2. However, due to the ambiguity of the translation of the medical standards into a computer-based procedure, reliability of automated K-complex detection from the EEG wave is still far from expectation. More specifically, there are some significant barriers to the research of automatic K-complex detection. First, there is no adequate description of K-complex that makes it difficult to develop automatic detection algorithm. Second, human experts only provided the label for whether a whole EEG segment contains K-complex or not, rather than individual labels for each subsegment. These barriers render most pattern recognition algorithms inapplicable in detecting K-complex. In this paper, we attempt to address these two challenges, by designing a new feature extraction method that can transform visual features of the EEG wave with any length into mathematical representation and proposing a hybrid-synergic machine learning method to build a K-complex classifier. The tenfold cross-validation results indicate that both the accuracy and the precision of this proposed model are at least as good as a human expert in K-complex detection.