126 resultados para Evolutionary Polynomial Regression (EPR) for HydroSystems
Resumo:
Immune evasion by Plasmodium falciparum is favored by extensive allelic diversity of surface antigens. Some of them, most notably the vaccine-candidate merozoite surface protein (MSP)-1, exhibit a poorly understood pattern of allelic dimorphism, in which all observed alleles group into two highly diverged allelic families with few or no inter-family recombinants. Here we describe contrasting levels and patterns of sequence diversity in genes encoding three MSP-1-associated surface antigens of P. falciparum, ranging from an ancient allelic dimorphism in the Msp-6 gene to a near lack of allelic divergence in Msp-9 to a more classical multi-allele polymorphism in Msp-7 Other members of the Msp-7 gene family exhibit very little polymorphism in non-repetitive regions. A comparison of P. falciparum Msp-6 sequences to an orthologous sequence from P. reichenowi provided evidence for distinct evolutionary histories of the 5` and 3` segments of the dimorphic region in PfMsp-6, consistent with one dimorphic lineage having arisen from recombination between now-extinct ancestral alleles. In addition. we uncovered two surprising patterns of evolution in repetitive sequence. Firsts in Msp-6, large deletions are associated with (nearly) identical sequence motifs at their borders. Second, a comparison of PfMsp-9 with the P. reichenowi ortholog indicated retention of a significant inter-unit diversity within an 18-base pair repeat within the coding region of P. falciparum, but homogenization in P. reichenowi. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this study, using a combined data set of SSU rDNA and gGAPDH gene sequences, we provide phylogenetic evidence that supports Clustering of crocodilian trypanosomes from the Brazilian Caiman yacare (Alligatoridae) and Trypanosoma grayi, a species that Circulates between African crocodiles (Crocodilydae) and tsetse flies. In a survey of trypanosomes in Caiman yacare from the Brazilian Pantanal, the prevalence of trypanosome infection was 35% as determined by microhaematocrit and haemoculture, and 9 cultures were obtained. The morphology of trypomastigotes from caiman blood and tissue imprints was compared with those described for other crocodilian trypanosomes. Differences in morphology and growth behaviour of caiman trypanosomes were corroborated by molecular polymorphism that revealed 2 genotypes. Eight isolates were ascribed to genotype Cay01 and 1 to genotype Cay02. Phylogenetic inferences based on concatenated SSU rDNA and gGAPDII sequences showed that caiman isolates are closely related to T. grayi, constituting a well-supported monophyletic assemblage (clade T. grayi). Divergence time estimates based on clade composition, and biogeographical and geological events were used to discuss the relationships between the evolutionary histories of crocodilian trypanosomes and their hosts.
Resumo:
One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
In this article, we present a generalization of the Bayesian methodology introduced by Cepeda and Gamerman (2001) for modeling variance heterogeneity in normal regression models where we have orthogonality between mean and variance parameters to the general case considering both linear and highly nonlinear regression models. Under the Bayesian paradigm, we use MCMC methods to simulate samples for the joint posterior distribution. We illustrate this algorithm considering a simulated data set and also considering a real data set related to school attendance rate for children in Colombia. Finally, we present some extensions of the proposed MCMC algorithm.
Resumo:
In this paper, we compare the performance of two statistical approaches for the analysis of data obtained from the social research area. In the first approach, we use normal models with joint regression modelling for the mean and for the variance heterogeneity. In the second approach, we use hierarchical models. In the first case, individual and social variables are included in the regression modelling for the mean and for the variance, as explanatory variables, while in the second case, the variance at level 1 of the hierarchical model depends on the individuals (age of the individuals), and in the level 2 of the hierarchical model, the variance is assumed to change according to socioeconomic stratum. Applying these methodologies, we analyze a Colombian tallness data set to find differences that can be explained by socioeconomic conditions. We also present some theoretical and empirical results concerning the two models. From this comparative study, we conclude that it is better to jointly modelling the mean and variance heterogeneity in all cases. We also observe that the convergence of the Gibbs sampling chain used in the Markov Chain Monte Carlo method for the jointly modeling the mean and variance heterogeneity is quickly achieved.
Resumo:
This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.
Resumo:
Support vector machines (SVMs) were originally formulated for the solution of binary classification problems. In multiclass problems, a decomposition approach is often employed, in which the multiclass problem is divided into multiple binary subproblems, whose results are combined. Generally, the performance of SVM classifiers is affected by the selection of values for their parameters. This paper investigates the use of genetic algorithms (GAs) to tune the parameters of the binary SVMs in common multiclass decompositions. The developed GA may search for a set of parameter values common to all binary classifiers or for differentiated values for each binary classifier. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Nesse artigo, tem-se o interesse em avaliar diferentes estratégias de estimação de parâmetros para um modelo de regressão linear múltipla. Para a estimação dos parâmetros do modelo foram utilizados dados de um ensaio clínico em que o interesse foi verificar se o ensaio mecânico da propriedade de força máxima (EM-FM) está associada com a massa femoral, com o diâmetro femoral e com o grupo experimental de ratas ovariectomizadas da raça Rattus norvegicus albinus, variedade Wistar. Para a estimação dos parâmetros do modelo serão comparadas três metodologias: a metodologia clássica, baseada no método dos mínimos quadrados; a metodologia Bayesiana, baseada no teorema de Bayes; e o método Bootstrap, baseado em processos de reamostragem.
Resumo:
There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.
Resumo:
Let Y = (f, g, h): R(3) -> R(3) be a C(2) map and let Spec(Y) denote the set of eigenvalues of the derivative DY(p), when p varies in R(3). We begin proving that if, for some epsilon > 0, Spec(Y) boolean AND (-epsilon, epsilon) = empty set, then the foliation F(k), with k is an element of {f, g, h}, made up by the level surfaces {k = constant}, consists just of planes. As a consequence, we prove a bijectivity result related to the three-dimensional case of Jelonek`s Jacobian Conjecture for polynomial maps of R(n).
Resumo:
In this paper, we classify all the global phase portraits of the quadratic polynomial vector fields having a rational first integral of degree 3. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
A positive summability trigonometric kernel {K(n)(theta)}(infinity)(n=1) is generated through a sequence of univalent polynomials constructed by Suffridge. We prove that the convolution {K(n) * f} approximates every continuous 2 pi-periodic function f with the rate omega(f, 1/n), where omega(f, delta) denotes the modulus of continuity, and this provides a new proof of the classical Jackson`s theorem. Despite that it turns out that K(n)(theta) coincide with positive cosine polynomials generated by Fejer, our proof differs from others known in the literature.
Resumo:
The purpose of this paper is to develop a Bayesian analysis for nonlinear regression models under scale mixtures of skew-normal distributions. This novel class of models provides a useful generalization of the symmetrical nonlinear regression models since the error distributions cover both skewness and heavy-tailed distributions such as the skew-t, skew-slash and the skew-contaminated normal distributions. The main advantage of these class of distributions is that they have a nice hierarchical representation that allows the implementation of Markov chain Monte Carlo (MCMC) methods to simulate samples from the joint posterior distribution. In order to examine the robust aspects of this flexible class, against outlying and influential observations, we present a Bayesian case deletion influence diagnostics based on the Kullback-Leibler divergence. Further, some discussions on the model selection criteria are given. The newly developed procedures are illustrated considering two simulations study, and a real data previously analyzed under normal and skew-normal nonlinear regression models. (C) 2010 Elsevier B.V. All rights reserved.
A bivariate regression model for matched paired survival data: local influence and residual analysis
Resumo:
The use of bivariate distributions plays a fundamental role in survival and reliability studies. In this paper, we consider a location scale model for bivariate survival times based on the proposal of a copula to model the dependence of bivariate survival data. For the proposed model, we consider inferential procedures based on maximum likelihood. Gains in efficiency from bivariate models are also examined in the censored data setting. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the bivariate regression model for matched paired survival data. Sensitivity analysis methods such as local and total influence are presented and derived under three perturbation schemes. The martingale marginal and the deviance marginal residual measures are used to check the adequacy of the model. Furthermore, we propose a new measure which we call modified deviance component residual. The methodology in the paper is illustrated on a lifetime data set for kidney patients.