32 resultados para Euler discretization

em Deakin Research Online - Australia


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biological sequence assembly is an essential step for sequencing the genomes of organisms. Sequence assembly is very computing intensive especially for the large-scale sequence assembly. Parallel computing is an effective way to reduce the computing time and support the assembly for large amount of biological fragments. Euler sequence assembly algorithm is an innovative algorithm proposed recently. The advantage of this algorithm is that its computing complexity is polynomial and it provides a better solution to the notorious “repeat” problem. This paper introduces the parallelization of the Euler sequence assembly algorithm. All the Genome fragments generated by whole genome shotgun (WGS) will be assembled as a whole rather than dividing them into groups which may incurs errors due to the inaccurate group partition. The implemented system can be run on supercomputers, network of workstations or even network of PC computers. The experimental results have demonstrated the performance of our system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Investigates what can go wrong when dynamical systems are modelled with a computer. Number theoretic techniques were used to detail the effects "discretization" errors caused by computer round-off had on characteristics of a system. In particular, a relationship was established between the occurrence of long cycles in a system and the classical result known as Artin's conjecture. Algorithms were then developed which eliminated discretization errors.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large-scale sequence assembly and alignment are fundamental parts of biological computing. However, most of the large-scale sequence assembly and alignment require intensive computing power and normally take very long time to complete. To speedup the assembly and alignment process, this paper parallelizes the Euler sequence assembly and pair-wise/multiple sequence assembly, two important sequence assembly methods, and takes advantage of Computing Grid which has a colossal computing capacity to meet the large-scale biological computing demand.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This work describes an error correction method based on the Euler Superpath problem. Sequence data is mapped to an Euler Superpath dynamically by Merging Transformation. With restriction and guiding rules, data consistency is maintained and error paths are separated from correct data: Error edges are mapped to the correct ones and after substitution (of error edges with right paths), corresponding errors in the sequencing data are eliminated.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper further develops Aumann and Lindell's [3] proposal for a variant of association rules for which the consequent is a numeric variable. It is argued that these rules can discover useful interactions with numeric data that cannot be discovered directly using traditional association rules with discretization. Alternative measures for identifying interesting rules are proposed. Efficient algorithms are presented that enable these rules to be discovered for dense data sets for which application of Auman and Lindell's algorithm is infeasible.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The rough set is a new mathematical approach to imprecision, vagueness and uncertainty. The concept of reduction of the decision table based on the rough sets is very useful for feature selection. The paper describes an application of rough sets method to feature selection and reduction in texture images recognition. The methods applied include continuous data discretization based on Fuzzy c-means and, and rough set method for feature selection and reduction. The trees extractions in the aerial images were applied. The experiments show that the methods presented in this paper are practical and effective.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Fragments assembly is among the core problems in the research of Genome. Although many assembly tools based on the "overlap-layout-consensus" paradigm are widely used such as in the Human Genome Project currently, they still can not resolve the "repeats problem" in the DNA sequencing. For the purpose of resolving such problem, Pevzner et al. put forward a new Euler Superpath assembly algorithm. But it needs a big and complex de Bruijin graph which consumes large amounts of memories i.e. becomes the bottleneck of the performance. We present a parallel DNA fragment assembly algorithm based on the Eularian Superpath theory and solve the bottleneck in the current assembly program. The experimental results demonstrate that our approach has a good scalability, and can be used in DNA assembly of middle and large size of eukaryote genome.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The method of Fields and Backofen has been commonly used to reduce the data obtained by hot torsion test into flow curves. The method, however, is most suitable for materials with monotonic strain hardening behaviour. Other methods such as Stüwe’s method, tubular specimens, differential testing and the inverse method, each suffer from similar drawbacks. It is shown in the current work that for materials with multiple regimes of hardening any method based on an assumption of constant hardening indices introduces some errors into the flow curve obtained from the hot torsion test. Therefore such methods do not enable accurate prediction of onset of recrystallisation where slow softening occurs. A new method to convert results from the hot torsion test into flow curves by taking into account the variation of constitutive parameters during deformation is presented. The method represents the torque twist data by a parametric linear least square model in which Euler and hyperbolic coefficients are used as the parameters. A closed form relationship obtained from the mathematical representation of the data is employed next for flow stress determination. Two different solution strategies, the method of normal equations and singular value decomposition, were used for parametric modelling of the data with hyperbolic basis functions. The performance of both methods is compared. Experimental data obtained by FHTTM, a flexible hot torsion test machine developed at IROST, for a C–Mn austenitic steel was used to demonstrate the method. The results were compared with those obtained using constant strain and strain rate hardening characteristics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Selecting a set of features which is optimal for a given task is a problem which plays an important role in a wide variety of contexts including pattern recognition, images understanding and machine learning. The paper describes an application of rough sets method to feature selection and reduction in texture images recognition. The proposed methods include continuous data discretization based on Kohonen neural network and maximum covariance, and rough set algorithms for feature selection and reduction. The experiments on trees extraction from aerial images show that the methods presented in this paper are practical and effective.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper introduces a new technique in the investigation of object classification and illustrates the potential use of this technique for the analysis of a range of biological data, using avian morphometric data as an example. The nascent variable precision rough sets (VPRS) model is introduced and compared with the decision tree method ID3 (through a ‘leave n out’ approach), using the same dataset of morphometric measures of European barn swallows (Hirundo rustica) and assessing the accuracy of gender classification based on these measures. The results demonstrate that the VPRS model, allied with the use of a modern method of discretization of data, is comparable with the more traditional non-parametric ID3 decision tree method. We show that, particularly in small samples, the VPRS model can improve classification and to a lesser extent prediction aspects over ID3. Furthermore, through the ‘leave n out’ approach, some indication can be produced of the relative importance of the different morphometric measures used in this problem. In this case we suggest that VPRS has advantages over ID3, as it intelligently uses more of the morphometric data available for the data classification, whilst placing less emphasis on variables with low reliability. In biological terms, the results suggest that the gender of swallows can be determined with reasonable accuracy from morphometric data and highlight the most important variables in this process. We suggest that both analysis techniques are potentially useful for the analysis of a range of different types of biological datasets, and that VPRS in particular has potential for application to a range of biological circumstances.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An optimization problem arising in the analysis of controllability and stabilization of cycles in discrete time chaotic systems is considered.