839 resultados para Polynomial Classifier
Resumo:
Age-related changes in running kinematics have been reported in the literature using classical inferential statistics. However, this approach has been hampered by the increased number of biomechanical gait variables reported and subsequently the lack of differences presented in these studies. Data mining techniques have been applied in recent biomedical studies to solve this problem using a more general approach. In the present work, we re-analyzed lower extremity running kinematic data of 17 young and 17 elderly male runners using the Support Vector Machine (SVM) classification approach. In total, 31 kinematic variables were extracted to train the classification algorithm and test the generalized performance. The results revealed different accuracy rates across three different kernel methods adopted in the classifier, with the linear kernel performing the best. A subsequent forward feature selection algorithm demonstrated that with only six features, the linear kernel SVM achieved 100% classification performance rate, showing that these features provided powerful combined information to distinguish age groups. The results of the present work demonstrate potential in applying this approach to improve knowledge about the age-related differences in running gait biomechanics and encourages the use of the SVM in other clinical contexts. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Sugarcane bagasse hemicellulose was isolated in a one-step chemical extraction using hydrogen peroxide in alkaline media. The polysaccharide containing 80.9% xylose and small amounts of L-arabinose, 4-O-methyl-D-glucuronic acid and glucose, was hydrolyzed by crude enzymatic extracts from Thermoascus aurantiacus at 50 degrees C. Conditions of enzymatic hydrolysis leading to the best yields of xylose and xylooligosaccharides (DP 2-5) were investigated using substrate concentration in the range 0.5-3.5% (w/v), enzyme load 40-80 U/g of the substrate, and reaction time from 3 to 96 h, applying a 22 factorial design. The maximum conversion to xylooligosaccharides (37.1%) was obtained with 2.6% of substrate and xylanase load of 60 U/g. The predicted maximum yield of xylobiose by a polynomial model was 41.6%. Crude enzymatic extract of T. aurantiacus generate from sugarcane bagasse hemicellulose 39% of xylose, 59% of xylobiose, and 2% of other xylooligosaccharides.
Resumo:
A hybrid system to automatically detect, locate and classify disturbances affecting power quality in an electrical power system is presented in this paper. The disturbances characterized are events from an actual power distribution system simulated by the ATP (Alternative Transients Program) software. The hybrid approach introduced consists of two stages. In the first stage, the wavelet transform (WT) is used to detect disturbances in the system and to locate the time of their occurrence. When such an event is flagged, the second stage is triggered and various artificial neural networks (ANNs) are applied to classify the data measured during the disturbance(s). A computational logic using WTs and ANNs together with a graphical user interface (GU) between the algorithm and its end user is then implemented. The results obtained so far are promising and suggest that this approach could lead to a useful application in an actual distribution system. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This paper describes the modeling of a weed infestation risk inference system that implements a collaborative inference scheme based on rules extracted from two Bayesian network classifiers. The first Bayesian classifier infers a categorical variable value for the weed-crop competitiveness using as input categorical variables for the total density of weeds and corresponding proportions of narrow and broad-leaved weeds. The inferred categorical variable values for the weed-crop competitiveness along with three other categorical variables extracted from estimated maps for the weed seed production and weed coverage are then used as input for a second Bayesian network classifier to infer categorical variables values for the risk of infestation. Weed biomass and yield loss data samples are used to learn the probability relationship among the nodes of the first and second Bayesian classifiers in a supervised fashion, respectively. For comparison purposes, two types of Bayesian network structures are considered, namely an expert-based Bayesian classifier and a naive Bayes classifier. The inference system focused on the knowledge interpretation by translating a Bayesian classifier into a set of classification rules. The results obtained for the risk inference in a corn-crop field are presented and discussed. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
In this paper, we address the problem of scheduling jobs in a no-wait flowshop with the objective of minimising the total completion time. This problem is well-known for being nondeterministic polynomial-time hard, and therefore, most contributions to the topic focus on developing algorithms able to obtain good approximate solutions for the problem in a short CPU time. More specifically, there are various constructive heuristics available for the problem [such as the ones by Rajendran and Chaudhuri (Nav Res Logist 37: 695-705, 1990); Bertolissi (J Mater Process Technol 107: 459-465, 2000), Aldowaisan and Allahverdi (Omega 32: 345-352, 2004) and the Chins heuristic by Fink and Voa (Eur J Operat Res 151: 400-414, 2003)], as well as a successful local search procedure (Pilot-1-Chins). We propose a new constructive heuristic based on an analogy with the two-machine problem in order to select the candidate to be appended in the partial schedule. The myopic behaviour of the heuristic is tempered by exploring the neighbourhood of the so-obtained partial schedules. The computational results indicate that the proposed heuristic outperforms existing ones in terms of quality of the solution obtained and equals the performance of the time-consuming Pilot-1-Chins.
Resumo:
In this paper, the method of Galerkin and the Askey-Wiener scheme are used to obtain approximate solutions to the stochastic displacement response of Kirchhoff plates with uncertain parameters. Theoretical and numerical results are presented. The Lax-Milgram lemma is used to express the conditions for existence and uniqueness of the solution. Uncertainties in plate and foundation stiffness are modeled by respecting these conditions, hence using Legendre polynomials indexed in uniform random variables. The space of approximate solutions is built using results of density between the space of continuous functions and Sobolev spaces. Approximate Galerkin solutions are compared with results of Monte Carlo simulation, in terms of first and second order moments and in terms of histograms of the displacement response. Numerical results for two example problems show very fast convergence to the exact solution, at excellent accuracies. The Askey-Wiener Galerkin scheme developed herein is able to reproduce the histogram of the displacement response. The scheme is shown to be a theoretically sound and efficient method for the solution of stochastic problems in engineering. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This paper presents a family of algorithms for approximate inference in credal networks (that is, models based on directed acyclic graphs and set-valued probabilities) that contain only binary variables. Such networks can represent incomplete or vague beliefs, lack of data, and disagreements among experts; they can also encode models based on belief functions and possibilistic measures. All algorithms for approximate inference in this paper rely on exact inferences in credal networks based on polytrees with binary variables, as these inferences have polynomial complexity. We are inspired by approximate algorithms for Bayesian networks; thus the Loopy 2U algorithm resembles Loopy Belief Propagation, while the Iterated Partial Evaluation and Structured Variational 2U algorithms are, respectively, based on Localized Partial Evaluation and variational techniques. (C) 2007 Elsevier Inc. All rights reserved.
Resumo:
Starting from the Durbin algorithm in polynomial space with an inner product defined by the signal autocorrelation matrix, an isometric transformation is defined that maps this vector space into another one where the Levinson algorithm is performed. Alternatively, for iterative algorithms such as discrete all-pole (DAP), an efficient implementation of a Gohberg-Semencul (GS) relation is developed for the inversion of the autocorrelation matrix which considers its centrosymmetry. In the solution of the autocorrelation equations, the Levinson algorithm is found to be less complex operationally than the procedures based on GS inversion for up to a minimum of five iterations at various linear prediction (LP) orders.
Resumo:
The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.
Resumo:
A total of 152,145 weekly test-day milk yield records from 7317 first lactations of Holstein cows distributed in 93 herds in southeastern Brazil were analyzed. Test-day milk yields were classified into 44 weekly classes of DIM. The contemporary groups were defined as herd-year-week of test-day. The model included direct additive genetic, permanent environmental and residual effects as random and fixed effects of contemporary group and age of cow at calving as covariable, linear and quadratic effects. Mean trends were modeled by a cubic regression on orthogonal polynomials of DIM. Additive genetic and permanent environmental random effects were estimated by random regression on orthogonal Legendre polynomials. Residual variances were modeled using third to seventh-order variance functions or a step function with 1, 6,13,17 and 44 variance classes. Results from Akaike`s and Schwarz`s Bayesian information criterion suggested that a model considering a 7th-order Legendre polynomial for additive effect, a 12th-order polynomial for permanent environment effect and a step function with 6 classes for residual variances, fitted best. However, a parsimonious model, with a 6th-order Legendre polynomial for additive effects and a 7th-order polynomial for permanent environmental effects, yielded very similar genetic parameter estimates. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Despite the increase in the use of natural compounds in place of synthetic derivatives as antioxidants in food products, the extent of this substitution is limited by cost constraints. Thus, the objective of this study was to explore the synergism on the antioxidant activity of natural compounds, for further application in food products. Three hydrosoluble compounds (x(1) = caffeic acid, x(2) = carnosic acid, and x(3) = glutathione) and three liposoluble compounds (x(1) = quercetin, x(2) = rutin, and x(3) = genistein) were mixed according to a ""centroid simplex design"". The antioxidant activity of the mixtures was analyzed by the ferric reducing antioxidant power (FRAP) and oxygen radical absorbance capacity (ORAL) methodologies, and activity was also evaluated in an oxidized mixed micelle prepared with linoleic acid (LAOX). Cubic polynomial models with predictive capacity were obtained when the mixtures were submitted to the LAOX methodology ((y) over cap = 0.56 x(1) + 0.59 x(2) + 0.04 x(3) + 0.41 x(1)x(2) - 0.41 x(1)x(3) - 1.12 x(2)x(3) - 4.01 x(1)x(2)x(3)) for the hydrosoluble compounds, and to FRAP methodology ((y) over cap = 3.26 x(1) + 2.39 x(2) + 0.04 x(3) + 1.51 x(1)x(2) + 1.03 x(1)x(3) + 0.29 x(1)x(3) + 3.20 x(1)x(2)x(3)) for the liposoluble compounds. Optimization of the models suggested that a mixture containing 47% caffeic acid + 53% carnosic acid and a mixture containing 67% quercetin + 33% rutin were potential synergistic combinations for further evaluation using a food matrix.
Resumo:
Recently, we have built a classification model that is capable of assigning a given sesquiterpene lactone (STL) into exactly one tribe of the plant family Asteraceae from which the STL has been isolated. Although many plant species are able to biosynthesize a set of peculiar compounds, the occurrence of the same secondary metabolites in more than one tribe of Asteraceae is frequent. Building on our previous work, in this paper, we explore the possibility of assigning an STL to more than one tribe (class) simultaneously. When an object may belong to more than one class simultaneously, it is called multilabeled. In this work, we present a general overview of the techniques available to examine multilabeled data. The problem of evaluating the performance of a multilabeled classifier is discussed. Two particular multilabeled classification methods-cross-training with support vector machines (ct-SVM) and multilabeled k-nearest neighbors (M-L-kNN)were applied to the classification of the STLs into seven tribes from the plant family Asteraceae. The results are compared to a single-label classification and are analyzed from a chemotaxonomic point of view. The multilabeled approach allowed us to (1) model the reality as closely as possible, (2) improve our understanding of the relationship between the secondary metabolite profiles of different Asteraceae tribes, and (3) significantly decrease the number of plant sources to be considered for finding a certain STL. The presented classification models are useful for the targeted collection of plants with the objective of finding plant sources of natural compounds that are biologically active or possess other specific properties of interest.
Resumo:
This paper examines the article system in interlanguage grammar focusing on Japanese learners of English, whose native language lacks articles. It will be demonstrated that for the acquisition of the English article system, count/mass distinctions and definiteness are the crucial factors. Although Japanese does not employ the article system to encode these aspects, it will be argued that they are nevertheless syntactically encoded through its classifier system. Hence, the problem for these learners must be to map these features onto the appropriate surface forms as the Missing Surface Inflection Hypothesis predicts (Prévost & White 2000). This suggestion will further be supported empirically by a fill-in-the article task. It will be concluded that these Japanese learners understand the English article system fairly well, possibly due to their native language, yet have problems with realizing the relevant features (i.e. count/mass distinctions and definiteness) in the target language.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.