867 resultados para Genetic Algorithm for Rule-Set Prediction (GARP)


Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new robust neurofuzzy model construction algorithm has been introduced for the modeling of a priori unknown dynamical systems from observed finite data sets in the form of a set of fuzzy rules. Based on a Takagi-Sugeno (T-S) inference mechanism a one to one mapping between a fuzzy rule base and a model matrix feature subspace is established. This link enables rule based knowledge to be extracted from matrix subspace to enhance model transparency. In order to achieve maximized model robustness and sparsity, a new robust extended Gram-Schmidt (G-S) method has been introduced via two effective and complementary approaches of regularization and D-optimality experimental design. Model rule bases are decomposed into orthogonal subspaces, so as to enhance model transparency with the capability of interpreting the derived rule base energy level. A locally regularized orthogonal least squares algorithm, combined with a D-optimality used for subspace based rule selection, has been extended for fuzzy rule regularization and subspace based information extraction. By using a weighting for the D-optimality cost function, the entire model construction procedure becomes automatic. Numerical examples are included to demonstrate the effectiveness of the proposed new algorithm.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We present an efficient graph-based algorithm for quantifying the similarity of household-level energy use profiles, using a notion of similarity that allows for small time–shifts when comparing profiles. Experimental results on a real smart meter data set demonstrate that in cases of practical interest our technique is far faster than the existing method for computing the same similarity measure. Having a fast algorithm for measuring profile similarity improves the efficiency of tasks such as clustering of customers and cross-validation of forecasting methods using historical data. Furthermore, we apply a generalisation of our algorithm to produce substantially better household-level energy use forecasts from historical smart meter data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Whole-genome sequencing (WGS) could potentially provide a single platform for extracting all the information required to predict an organism’s phenotype. However, its ability to provide accurate predictions has not yet been demonstrated in large independent studies of specific organisms. In this study, we aimed to develop a genotypic prediction method for antimicrobial susceptibilities. The whole genomes of 501 unrelated Staphylococcus aureus isolates were sequenced, and the assembled genomes were interrogated using BLASTn for a panel of known resistance determinants (chromosomal mutations and genes carried on plasmids). Results were compared with phenotypic susceptibility testing for 12 commonly used antimicrobial agents (penicillin, methicillin, erythromycin, clindamycin, tetracycline, ciprofloxacin, vancomycin, trimethoprim, gentamicin, fusidic acid, rifampin, and mupirocin) performed by the routine clinical laboratory. We investigated discrepancies by repeat susceptibility testing and manual inspection of the sequences and used this information to optimize the resistance determinant panel and BLASTn algorithm. We then tested performance of the optimized tool in an independent validation set of 491 unrelated isolates, with phenotypic results obtained in duplicate by automated broth dilution (BD Phoenix) and disc diffusion. In the validation set, the overall sensitivity and specificity of the genomic prediction method were 0.97 (95% confidence interval [95% CI], 0.95 to 0.98) and 0.99 (95% CI, 0.99 to 1), respectively, compared to standard susceptibility testing methods. The very major error rate was 0.5%, and the major error rate was 0.7%. WGS was as sensitive and specific as routine antimicrobial susceptibility testing methods. WGS is a promising alternative to culture methods for resistance prediction in S. aureus and ultimately other major bacterial pathogens.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision.  Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes.  The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: The sensitivity to microenvironmental changes varies among animals and may be under genetic control. It is essential to take this element into account when aiming at breeding robust farm animals. Here, linear mixed models with genetic effects in the residual variance part of the model can be used. Such models have previously been fitted using EM and MCMC algorithms. Results: We propose the use of double hierarchical generalized linear models (DHGLM), where the squared residuals are assumed to be gamma distributed and the residual variance is fitted using a generalized linear model. The algorithm iterates between two sets of mixed model equations, one on the level of observations and one on the level of variances. The method was validated using simulations and also by re-analyzing a data set on pig litter size that was previously analyzed using a Bayesian approach. The pig litter size data contained 10,060 records from 4,149 sows. The DHGLM was implemented using the ASReml software and the algorithm converged within three minutes on a Linux server. The estimates were similar to those previously obtained using Bayesian methodology, especially the variance components in the residual variance part of the model. Conclusions: We have shown that variance components in the residual variance part of a linear mixed model can be estimated using a DHGLM approach. The method enables analyses of animal models with large numbers of observations. An important future development of the DHGLM methodology is to include the genetic correlation between the random effects in the mean and residual variance parts of the model as a parameter of the DHGLM.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Genomewide marker information can improve the reliability of breeding value predictions for young selection candidates in genomic selection. However, the cost of genotyping limits its use to elite animals, and how such selective genotyping affects predictive ability of genomic selection models is an open question. We performed a simulation study to evaluate the quality of breeding value predictions for selection candidates based on different selective genotyping strategies in a population undergoing selection. The genome consisted of 10 chromosomes of 100 cM each. After 5,000 generations of random mating with a population size of 100 (50 males and 50 females), generation G(0) (reference population) was produced via a full factorial mating between the 50 males and 50 females from generation 5,000. Different levels of selection intensities (animals with the largest yield deviation value) in G(0) or random sampling (no selection) were used to produce offspring of G(0) generation (G(1)). Five genotyping strategies were used to choose 500 animals in G(0) to be genotyped: 1) Random: randomly selected animals, 2) Top: animals with largest yield deviation values, 3) Bottom: animals with lowest yield deviations values, 4) Extreme: animals with the 250 largest and the 250 lowest yield deviations values, and 5) Less Related: less genetically related animals. The number of individuals in G(0) and G(1) was fixed at 2,500 each, and different levels of heritability were considered (0.10, 0.25, and 0.50). Additionally, all 5 selective genotyping strategies (Random, Top, Bottom, Extreme, and Less Related) were applied to an indicator trait in generation G(0), and the results were evaluated for the target trait in generation G(1), with the genetic correlation between the 2 traits set to 0.50. The 5 genotyping strategies applied to individuals in G(0) (reference population) were compared in terms of their ability to predict the genetic values of the animals in G(1) (selection candidates). Lower correlations between genomic-based estimates of breeding values (GEBV) and true breeding values (TBV) were obtained when using the Bottom strategy. For Random, Extreme, and Less Related strategies, the correlation between GEBV and TBV became slightly larger as selection intensity decreased and was largest when no selection occurred. These 3 strategies were better than the Top approach. In addition, the Extreme, Random, and Less Related strategies had smaller predictive mean squared errors (PMSE) followed by the Top and Bottom methods. Overall, the Extreme genotyping strategy led to the best predictive ability of breeding values, indicating that animals with extreme yield deviations values in a reference population are the most informative when training genomic selection models.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on the network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision-tree-based machine-learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes. Finally, the decision-tree classifier generated is applied to the set of genes of this organism to estimate essentiality for each gene. We applied the NTPGE approach for discovering the essential genes in Escherichia coli and then assessed its performance. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Genetic gains predicted for selection, based on both individual performance and progeny testing, were compared to provide information to be used in implementation of progeny testing for a Nelore cattle breeding program. The prediction of genetic gain based on progeny testing was obtained from a formula, derived from methodology of Young and weller (J. Genetics 57: 329-338, 1960) for two-stage selection, which allows prediction of genetic gain per generation when the individuals under test have been pre-selected on the basis of their own performance. The application of this formula also allowed determination of the number of progeny per tested bull needed to maximize genetic gain, when the total number of tested progeny is limited.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Genetic gains predicted for selection, based on both individual performance and progeny testing, were compared to provide information to be used in implementation of progeny testing for a Nelore cattle breeding program. The prediction of genetic gain based on progeny testing was obtained from a formula, derived from methodology of Young and Weiler (J. Genetics 57: 329-338, 1960) for two-stage selection, which allows prediction of genetic gain per generation when the individuals under test have been pre-selected on the basis of their own performance. The application of this formula also allowed determination of the number of progeny per tested bull needed to maximize genetic gain, when the total number of tested progeny is limited.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Predictability is related to the uncertainty in the outcome of future events during the evolution of the state of a system. The cluster weighted modeling (CWM) is interpreted as a tool to detect such an uncertainty and used it in spatially distributed systems. As such, the simple prediction algorithm in conjunction with the CWM forms a powerful set of methods to relate predictability and dimension.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Additive and nonadditive genetic effects on preweaning weight gain (PWG) of a commercial crossbred population were estimated using different genetic models and estimation methods. The data set consisted of 103,445 records on purebred and crossbred Nelore-Hereford calves raised under pasture conditions on farms located in south, southeast, and middle west Brazilian regions. In addition to breed additive and dominance effects, the models including different epistasis covariables were tested. Models considering joint additive and environment (latitude) by genetic effects interactions were also applied. In a first step, analyses were carried out under animal models. In a second step, preadjusted records were analyzed using ordinary least squares (OLS) and ridge regression (RR). The results reinforced evidence that breed additive and dominance effects are not sufficient to explain the observed variability in preweaning traits of Bos taurus x Bos indicus calves, and that genotype x environment interaction plays an important role in the evaluation of crossbred calves. Data were ill-conditioned to estimate the effects of genotype x environment interactions. Models including these effects presented multicolinearity problems. In this case, RR seemed to be a powerful tool for obtaining more plausible and stable estimates. Estimated prediction error variances and variance inflation factors were drastically reduced, and many effects that were not significant under ordinary least squares became significant under RR. Predictions of PWG based on RR estimates were more acceptable from a biological perspective. In temperate and subtropical regions, calves with intermediate genetic compositions (close to 1/2 Nelore) exhibited greater predicted PWG. In the tropics, predicted PWG increased linearly as genotype got closer to Nelore. ©2006 American Society of Animal Science. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The prediction of the traffic behavior could help to make decision about the routing process, as well as enables gains on effectiveness and productivity on the physical distribution. This need motivated the search for technological improvements in the Routing performance in metropolitan areas. The purpose of this paper is to present computational evidences that Artificial Neural Network ANN could be use to predict the traffic behavior in a metropolitan area such So Paulo (around 16 million inhabitants). The proposed methodology involves the application of Rough-Fuzzy Sets to define inference morphology for insertion of the behavior of Dynamic Routing into a structured rule basis, without human expert aid. The dynamics of the traffic parameters are described through membership functions. Rough Sets Theory identifies the attributes that are important, and suggest Fuzzy relations to be inserted on a Rough Neuro Fuzzy Network (RNFN) type Multilayer Perceptron (MLP) and type Radial Basis Function (RBF), in order to get an optimal surface response. To measure the performance of the proposed RNFN, the responses of the unreduced rule basis are compared with the reduced rule one. The results show that by making use of the Feature Reduction through RNFN, it is possible to reduce the need for human expert in the construction of the Fuzzy inference mechanism in such flow process like traffic breakdown. © 2011 IEEE.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A data set based on 50 studies including feed intake and utilization traits was used to perform a meta-analysis to obtain pooled estimates using the variance between studies of genetic parameters for average daily gain (ADG); residual feed intake (RFI); metabolic body weight (MBW); feed conversion ratio (FCR); and daily dry matter intake (DMI) in beef cattle. The total data set included 128 heritability and 122 genetic correlation estimates published in the literature from 1961 to 2012. The meta-analysis was performed using a random effects model where the restricted maximum likelihood estimator was used to evaluate variances among clusters. Also, a meta-analysis using the method of cluster analysis was used to group the heritability estimates. Two clusters were obtained for each trait by different variables. It was observed, for all traits, that the heterogeneity of variance was significant between clusters and studies for genetic correlation estimates. The pooled estimates, adding the variance between clusters, for direct heritability estimates for ADG, DMI, RFI, MBW and FCR were 0.32 +/- 0.04, 0.39 +/- 0.03, 0.31 +/- 0.02, 0.31 +/- 0.03 and 0.26 +/- 0.03, respectively. Pooled genetic correlation estimates ranged from -0.15 to 0.67 among ADG, DMI, RFI, MBW and FCR. These pooled estimates of genetic parameters could be used to solve genetic prediction equations in populations where data is insufficient for variance component estimation. Cluster analysis is recommended as a statistical procedure to combine results from different studies to account for heterogeneity.