905 resultados para Algorithms
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.
Resumo:
We present a fast method for finding optimal parameters for a low-resolution (threading) force field intended to distinguish correct from incorrect folds for a given protein sequence. In contrast to other methods, the parameterization uses information from >10(7) misfolded structures as well as a set of native sequence-structure pairs. In addition to testing the resulting force field's performance on the protein sequence threading problem, results are shown that characterize the number of parameters necessary for effective structure recognition.
Resumo:
Evolution strategies are a class of general optimisation algorithms which are applicable to functions that are multimodal, nondifferentiable, or even discontinuous. Although recombination operators have been introduced into evolution strategies, the primary search operator is still mutation. Classical evolution strategies rely on Gaussian mutations. A new mutation operator based on the Cauchy distribution is proposed in this paper. It is shown empirically that the new evolution strategy based on Cauchy mutation outperforms the classical evolution strategy on most of the 23 benchmark problems tested in this paper. The paper also shows empirically that changing the order of mutating the objective variables and mutating the strategy parameters does not alter the previous conclusion significantly, and that Cauchy mutations with different scaling parameters still outperform the Gaussian mutation with self-adaptation. However, the advantage of Cauchy mutations disappears when recombination is used in evolution strategies. It is argued that the search step size plays an important role in determining evolution strategies' performance. The large step size of recombination plays a similar role as Cauchy mutation.
Resumo:
The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.
Resumo:
Subcycling algorithms which employ multiple timesteps have been previously proposed for explicit direct integration of first- and second-order systems of equations arising in finite element analysis, as well as for integration using explicit/implicit partitions of a model. The author has recently extended this work to implicit/implicit multi-timestep partitions of both first- and second-order systems. In this paper, improved algorithms for multi-timestep implicit integration are introduced, that overcome some weaknesses of those proposed previously. In particular, in the second-order case, improved stability is obtained. Some of the energy conservation properties of the Newmark family of algorithms are shown to be preserved in the new multi-timestep extensions of the Newmark method. In the first-order case, the generalized trapezoidal rule is extended to multiple timesteps, in a simple way that permits an implicit/implicit partition. Explicit special cases of the present algorithms exist. These are compared to algorithms proposed previously. (C) 1998 John Wiley & Sons, Ltd.
Resumo:
Power system small signal stability analysis aims to explore different small signal stability conditions and controls, namely: (1) exploring the power system security domains and boundaries in the space of power system parameters of interest, including load flow feasibility, saddle node and Hopf bifurcation ones; (2) finding the maximum and minimum damping conditions; and (3) determining control actions to provide and increase small signal stability. These problems are presented in this paper as different modifications of a general optimization to a minimum/maximum, depending on the initial guesses of variables and numerical methods used. In the considered problems, all the extreme points are of interest. Additionally, there are difficulties with finding the derivatives of the objective functions with respect to parameters. Numerical computations of derivatives in traditional optimization procedures are time consuming. In this paper, we propose a new black-box genetic optimization technique for comprehensive small signal stability analysis, which can effectively cope with highly nonlinear objective functions with multiple minima and maxima, and derivatives that can not be expressed analytically. The optimization result can then be used to provide such important information such as system optimal control decision making, assessment of the maximum network's transmission capacity, etc. (C) 1998 Elsevier Science S.A. All rights reserved.
Resumo:
This is the first paper in a study on the influence of the environment on the crack tip strain field for AISI 4340. A stressing stage for the environmental scanning electron microscope (ESEM) was constructed which was capable of applying loads up to 60 kN to fracture-mechanics samples. The measurement of the crack tip strain field required preparation (by electron lithography or chemical etching) of a system of reference points spaced at similar to 5 mu m intervals on the sample surface, loading the sample inside an electron microscope, image processing procedures to measure the displacement at each reference point and calculation of the strain field. Two algorithms to calculate strain were evaluated. Possible sources of errors were calculation errors due to the algorithm, errors inherent in the image processing procedure and errors due to the limited precision of the displacement measurements. Estimation of the contribution of each source of error was performed. The technique allows measurement of the crack tip strain field over an area of 50 x 40 mu m with a strain precision better than +/- 0.02 at distances larger than 5 mu m from the crack tip. (C) 1999 Kluwer Academic Publishers.
Resumo:
Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.
Resumo:
In order to use the finite element method for solving fluid-rock interaction problems in pore-fluid saturated hydrothermal/sedimentary basins effectively and efficiently, we have presented, in this paper, the new concept and numerical algorithms to deal with the fundamental issues associated with the fluid-rock interaction problems. These fundamental issues are often overlooked by some purely numerical modelers. (1) Since the fluid-rock interaction problem involves heterogeneous chemical reactions between reactive aqueous chemical species in the pore-fluid and solid minerals in the rock masses, it is necessary to develop the new concept of the generalized concentration of a solid mineral, so that two types of reactive mass transport equations, namely, the conventional mass transport equation for the aqueous chemical species in the pore-fluid and the degenerated mass transport equation for the solid minerals in the rock mass, can be solved simultaneously in computation. (2) Since the reaction area between the pore-fluid and mineral surfaces is basically a function of the generalized concentration of the solid mineral, there is a definite need to appropriately consider the dependence of the dissolution rate of a dissolving mineral on its generalized concentration in the numerical analysis. (3) Considering the direct consequence of the porosity evolution with time in the transient analysis of fluid-rock interaction problems; we have proposed the term splitting algorithm and the concept of the equivalent source/sink terms in mass transport equations so that the problem of variable mesh Peclet number and Courant number has been successfully converted into the problem of constant mesh Peclet and Courant numbers. The numerical results from an application example have demonstrated the usefulness of the proposed concepts and the robustness of the proposed numerical algorithms in dealing with fluid-rock interaction problems in pore-fluid saturated hydrothermal/sedimentary basins. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
The acquisition of HI Parkes All Shy Survey (HIPASS) southern sky data commenced at the Australia Telescope National Facility's Parkes 64-m telescope in 1997 February, and was completed in 2000 March. HIPASS is the deepest HI survey yet of the sky south of declination +2 degrees, and is sensitive to emission out to 170 h(75)(-1) Mpc. The characteristic root mean square noise in the survey images is 13.3 mJy. This paper describes the survey observations, which comprise 23 020 eight-degree scans of 9-min duration, and details the techniques used to calibrate and image the data. The processing algorithms are successfully designed to be statistically robust to the presence of interference signals, and are particular to imaging point (or nearly point) sources. Specifically, a major improvement in image quality is obtained by designing a median-gridding algorithm which uses the median estimator in place of the mean estimator.
Resumo:
Realistic time frames in which management decisions are made often preclude the completion of the detailed analyses necessary for conservation planning. Under these circumstances, efficient alternatives may assist in approximating the results of more thorough studies that require extensive resources and time. We outline a set of concepts and formulas that may be used in lieu of detailed population viability analyses and habitat modeling exercises to estimate the protected areas required to provide desirable conservation outcomes for a suite of threatened plant species. We used expert judgment of parameters and assessment of a population size that results in a specified quasiextinction risk based on simple dynamic models The area required to support a population of this size is adjusted to take into account deterministic and stochastic human influences, including small-scale disturbance deterministic trends such as habitat loss, and changes in population density through processes such as predation and competition. We set targets for different disturbance regimes and geographic regions. We applied our methods to Banksia cuneata, Boronia keysii, and Parsonsia dorrigoensis, resulting in target areas for conservation of 1102, 733, and 1084 ha, respectively. These results provide guidance on target areas and priorities for conservation strategies.
Resumo:
This paper is concerned with the use of scientific visualization methods for the analysis of feedforward neural networks (NNs). Inevitably, the kinds of data associated with the design and implementation of neural networks are of very high dimensionality, presenting a major challenge for visualization. A method is described using the well-known statistical technique of principal component analysis (PCA). This is found to be an effective and useful method of visualizing the learning trajectories of many learning algorithms such as back-propagation and can also be used to provide insight into the learning process and the nature of the error surface.
Resumo:
Allergies are a major cause of chronic ill health in industrialised countries with the incidence of reported cases steadily increasing. This Research Focus details how bioinformatics is transforming the field of allergy through providing databases for management of allergen data, algorithms for characterisation of allergic crossreactivity, structural motifs and B- and T-cell epitopes, tools for prediction of allergenicity and techniques for genomic and proteomic analysis of allergens.
Resumo:
Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.