8 resultados para Pruning.

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Computionally efficient sequential learning algorithms are developed for direct-link resource-allocating networks (DRANs). These are achieved by decomposing existing recursive training algorithms on a layer by layer and neuron by neuron basis. This allows network weights to be updated in an efficient parallel manner and facilitates the implementation of minimal update extensions that yield a significant reduction in computation load per iteration compared to existing sequential learning methods employed in resource-allocation network (RAN) and minimal RAN (MRAN) approaches. The new algorithms, which also incorporate a pruning strategy to control network growth, are evaluated on three different system identification benchmark problems and shown to outperform existing methods both in terms of training error convergence and computational efficiency. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

There is considerable interest in creating embedded, speech recognition hardware using the weighted finite state transducer (WFST) technique but there are performance and memory usage challenges. Two system optimization techniques are presented to address this; one approach improves token propagation by removing the WFST epsilon input arcs; another one-pass, adaptive pruning algorithm gives a dramatic reduction in active nodes to be computed. Results for memory and bandwidth are given for a 5,000 word vocabulary giving a better practical performance than conventional WFST; this is then exploited in an adaptive pruning algorithm that reduces the active nodes from 30,000 down to 4,000 with only a 2 percent sacrifice in speech recognition accuracy; these optimizations lead to a more simplified design with deterministic performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many graph datasets are labelled with discrete and numeric attributes. Most frequent substructure discovery algorithms ignore numeric attributes; in this paper we show how they can be used to improve search performance and discrimination. Our thesis is that the most descriptive substructures are those which are normative both in terms of their structure and in terms of their numeric values. We explore the relationship between graph structure and the distribution of attribute values and propose an outlier-detection step, which is used as a constraint during substructure discovery. By pruning anomalous vertices and edges, more weight is given to the most descriptive substructures. Our method is applicable to multi-dimensional numeric attributes; we outline how it can be extended for high-dimensional data. We support our findings with experiments on transaction graphs and single large graphs from the domains of physical building security and digital forensics, measuring the effect on runtime, memory requirements and coverage of discovered patterns, relative to the unconstrained approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Today there is a growing interest in the integration of health monitoring applications in portable devices necessitating the development of methods that improve the energy efficiency of such systems. In this paper, we present a systematic approach that enables energy-quality trade-offs in spectral analysis systems for bio-signals, which are useful in monitoring various health conditions as those associated with the heart-rate. To enable such trade-offs, the processed signals are expressed initially in a basis in which significant components that carry most of the relevant information can be easily distinguished from the parts that influence the output to a lesser extent. Such a classification allows the pruning of operations associated with the less significant signal components leading to power savings with minor quality loss since only less useful parts are pruned under the given requirements. To exploit the attributes of the modified spectral analysis system, thresholding rules are determined and adopted at design- and run-time, allowing the static or dynamic pruning of less-useful operations based on the accuracy and energy requirements. The proposed algorithm is implemented on a typical sensor node simulator and results show up-to 82% energy savings when static pruning is combined with voltage and frequency scaling, compared to the conventional algorithm in which such trade-offs were not available. In addition, experiments with numerous cardiac samples of various patients show that such energy savings come with a 4.9% average accuracy loss, which does not affect the system detection capability of sinus-arrhythmia which was used as a test case. 

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Quantifying the similarity between two trajectories is a fundamental operation in analysis of spatio-temporal databases. While a number of distance functions exist, the recent shift in the dynamics of the trajectory generation procedure violates one of their core assumptions; a consistent and uniform sampling rate. In this paper, we formulate a robust distance function called Edit Distance with Projections (EDwP) to match trajectories under inconsistent and variable sampling rates through dynamic interpolation. This is achieved by deploying the idea of projections that goes beyond matching only the sampled points while aligning trajectories. To enable efficient trajectory retrievals using EDwP, we design an index structure called TrajTree. TrajTree derives its pruning power by employing the unique combination of bounding boxes with Lipschitz embedding. Extensive experiments on real trajectory databases demonstrate EDwP to be up to 5 times more accurate than the state-of-the-art distance functions. Additionally, TrajTree increases the efficiency of trajectory retrievals by up to an order of magnitude over existing techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Algorithms for concept drift handling are important for various applications including video analysis and smart grids. In this paper we present decision tree ensemble classication method based on the Random Forest algorithm for concept drift. The weighted majority voting ensemble aggregation rule is employed based on the ideas of Accuracy Weighted Ensemble (AWE) method. Base learner weight in our case is computed for each sample evaluation using base learners accuracy and intrinsic proximity measure of Random Forest. Our algorithm exploits both temporal weighting of samples and ensemble pruning as a forgetting strategy. We present results of empirical comparison of our method with îriginal random forest with incorporated replace-the-looser forgetting andother state-of-the-art concept-drift classiers like AWE2.