998 resultados para validação cruzada


Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the current major concerns in engineering is the development of aircrafts that have low power consumption and high performance. So, airfoils that have a high value of Lift Coefficient and a low value for the Drag Coefficient, generating a High-Efficiency airfoil are studied and designed. When the value of the Efficiency increases, the aircraft s fuel consumption decreases, thus improving its performance. Therefore, this work aims to develop a tool for designing of airfoils from desired characteristics, as Lift and Drag coefficients and the maximum Efficiency, using an algorithm based on an Artificial Neural Network (ANN). For this, it was initially collected an aerodynamic characteristics database, with a total of 300 airfoils, from the software XFoil. Then, through the software MATLAB, several network architectures were trained, between modular and hierarchical, using the Back-propagation algorithm and the Momentum rule. For data analysis, was used the technique of cross- validation, evaluating the network that has the lowest value of Root Mean Square (RMS). In this case, the best result was obtained for a hierarchical architecture with two modules and one layer of hidden neurons. The airfoils developed for that network, in the regions of lower RMS, were compared with the same airfoils imported into the software XFoil

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Expanded Bed Adsorption (EBA) is an integrative process that combines concepts of chromatography and fluidization of solids. The many parameters involved and their synergistic effects complicate the optimization of the process. Fortunately, some mathematical tools have been developed in order to guide the investigation of the EBA system. In this work the application of experimental design, phenomenological modeling and artificial neural networks (ANN) in understanding chitosanases adsorption on ion exchange resin Streamline® DEAE have been investigated. The strain Paenibacillus ehimensis NRRL B-23118 was used for chitosanase production. EBA experiments were carried out using a column of 2.6 cm inner diameter with 30.0 cm in height that was coupled to a peristaltic pump. At the bottom of the column there was a distributor of glass beads having a height of 3.0 cm. Assays for residence time distribution (RTD) revelead a high degree of mixing, however, the Richardson-Zaki coefficients showed that the column was on the threshold of stability. Isotherm models fitted the adsorption equilibrium data in the presence of lyotropic salts. The results of experiment design indicated that the ionic strength and superficial velocity are important to the recovery and purity of chitosanases. The molecular mass of the two chitosanases were approximately 23 kDa and 52 kDa as estimated by SDS-PAGE. The phenomenological modeling was aimed to describe the operations in batch and column chromatography. The simulations were performed in Microsoft Visual Studio. The kinetic rate constant model set to kinetic curves efficiently under conditions of initial enzyme activity 0.232, 0.142 e 0.079 UA/mL. The simulated breakthrough curves showed some differences with experimental data, especially regarding the slope. Sensitivity tests of the model on the surface velocity, axial dispersion and initial concentration showed agreement with the literature. The neural network was constructed in MATLAB and Neural Network Toolbox. The cross-validation was used to improve the ability of generalization. The parameters of ANN were improved to obtain the settings 6-6 (enzyme activity) and 9-6 (total protein), as well as tansig transfer function and Levenberg-Marquardt training algorithm. The neural Carlos Eduardo de Araújo Padilha dezembro/2013 9 networks simulations, including all the steps of cycle, showed good agreement with experimental data, with a correlation coefficient of approximately 0.974. The effects of input variables on profiles of the stages of loading, washing and elution were consistent with the literature

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work is combined with the potential of the technique of near infrared spectroscopy - NIR and chemometrics order to determine the content of diclofenac tablets, without destruction of the sample, to which was used as the reference method, ultraviolet spectroscopy, which is one of the official methods. In the construction of multivariate calibration models has been studied several types of pre-processing of NIR spectral data, such as scatter correction, first derivative. The regression method used in the construction of calibration models is the PLS (partial least squares) using NIR spectroscopic data of a set of 90 tablets were divided into two sets (calibration and prediction). 54 were used in the calibration samples and the prediction was used 36, since the calibration method used was crossvalidation method (full cross-validation) that eliminates the need for a validation set. The evaluation of the models was done by observing the values of correlation coefficient R 2 and RMSEC mean square error (calibration error) and RMSEP (forecast error). As the forecast values estimated for the remaining 36 samples, which the results were consistent with the values obtained by UV spectroscopy

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJETIVO: Realizar a adaptação transcultural da versão em português do Inventário de Burnout de Maslach para estudantes e investigar sua confiabilidade, validade e invariância transcultural. MÉTODOS: A validação de face envolveu participação de equipe multidisciplinar. Foi realizada validação de conteúdo. A versão em português foi preenchida em 2009, pela internet, por 958 estudantes universitários brasileiros e 556 portugueses da zona urbana. Realizou-se análise fatorial confirmatória utilizando-se como índices de ajustamento o χ²/df, o comparative fit index (CFI), goodness of fit index (GFI) e o root mean square error of approximation (RMSEA). Para verificação da estabilidade da solução fatorial conforme a versão original em inglês, realizou-se validação cruzada em 2/3 da amostra total e replicada no 1/3 restante. A validade convergente foi estimada pela variância extraída média e confiabilidade composta. Avaliou-se a validade discriminante e a consistência interna foi estimada pelo coeficiente alfa de Cronbach. A validade concorrente foi estimada por análise correlacional da versão em português e dos escores médios do Inventário de Burnout de Copenhague; a divergente foi comparada à Escala de Depressão de Beck. Foi avaliada a invariância do modelo entre a amostra brasileira e a portuguesa. RESULTADOS: O modelo trifatorial de Exaustão, Descrença e Eficácia apresentou ajustamento adequado (χ²/df = 8,498; CFI = 0,916; GFI = 0,902; RMSEA = 0,086). A estrutura fatorial foi estável (λ: χ²dif = 11,383, p = 0,50; Cov: χ²dif = 6,479, p = 0,372; Resíduos: χ²dif = 21,514, p = 0,121). Observou-se adequada validade convergente (VEM = 0,45;0,64, CC = 0,82;0,88), discriminante (ρ² = 0,06;0,33) e consistência interna (α = 0,83;0,88). A validade concorrente da versão em português com o Inventário de Copenhague foi adequada (r = 0,21;0,74). A avaliação da validade divergente do instrumento foi prejudicada pela aproximação do conceito teórico das dimensões Exaustão e Descrença da versão em português com a Escala de Beck. Não se observou invariância do instrumento entre as amostras brasileiras e portuguesas (λ:χ²dif = 84,768, p < 0,001; Cov: χ²dif = 129,206, p < 0,001; Resíduos: χ²dif = 518,760, p < 0,001). CONCLUSÕES: A versão em português do Inventário de Burnout de Maslach para estudantes apresentou adequada confiabilidade e validade, mas sua estrutura fatorial não foi invariante entre os países, apontando ausência de estabilidade transcultural.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Genética e Melhoramento Animal - FCAV

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Agronomia (Ciência do Solo) - FCAV

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Genética e Melhoramento Animal - FCAV

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Genética e Melhoramento Animal - FCAV

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Genética e Melhoramento Animal - FCAV