931 resultados para Bioinformatics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

E-Science experiments typically involve many distributed services maintained by different organisations. After an experiment has been executed, it is useful for a scientist to verify that the execution was performed correctly or is compatible with some existing experimental criteria or standards, not necessarily anticipated prior to execution. Scientists may also want to review and verify experiments performed by their colleagues. There are no existing frameworks for validating such experiments in today's e-Science systems. Users therefore have to rely on error checking performed by the services, or adopt other ad hoc methods. This paper introduces a platform-independent framework for validating workflow executions. The validation relies on reasoning over the documented provenance of experiment results and semantic descriptions of services advertised in a registry. This validation process ensures experiments are performed correctly, and thus results generated are meaningful. The framework is tested in a bioinformatics application that performs protein compressibility analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Provenance refers to the past processes that brought about a given (version of an) object, item or entity. By knowing the provenance of data, users can often better understand, trust, reproduce, and validate it. A provenance-aware application has the functionality to answer questions regarding the provenance of the data it produces, by using documentation of past processes. PrIMe is a software engineering technique for adapting application designs to enable them to interact with a provenance middleware layer, thereby making them provenance-aware. In this article, we specify the steps involved in applying PrIMe, analyse its effectiveness, and illustrate its use with two case studies, in bioinformatics and medicine.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The microorganisms play very important roles in maintaining ecosystems, which explains the enormous interest in understanding the relationship between these organisms as well as between them and the environment. It is estimated that the total number of prokaryotic cells on Earth is between 4 and 6 x 1030, constituting an enormous biological and genetic pool to be explored. Although currently only 1% of all this wealth can be cultivated by standard laboratory techniques, metagenomic tools allow access to the genomic potential of environmental samples in a independent culture manner, and in combination with third generation sequencing technologies, the samples coverage become even greater. Soils, in particular, are the major reservoirs of this diversity, and many important environments around us, as the Brazilian biomes Caatinga and Atlantic Forest, are poorly studied. Thus, the genetic material from environmental soil samples of Caatinga and Atlantic Forest biomes were extracted by direct techniques, pyrosequenced, and the sequences generated were analyzed by bioinformatics programs (MEGAN MG-RAST and WEBCarma). Taxonomic comparative profiles of the samples showed that the phyla Proteobacteria, Actinobacteria, Acidobacteria and Planctomycetes were the most representative. In addition, fungi of the phylum Ascomycota were identified predominantly in the soil sample from the Atlantic Forest. Metabolic profiles showed that despite the existence of environmental differences, sequences from both samples were similarly placed in the various functional subsystems, indicating no specific habitat functions. This work, a pioneer in taxonomic and metabolic comparative analysis of soil samples from Brazilian biomes, contributes to the knowledge of these complex environmental systems, so far little explored

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Knowledge of the native prokaryotes in hazardous locations favors the application of biotechnology for bioremediation. Independent strategies for cultivation and metagenomics contribute to further microbiological knowledge, enabling studies with non-cultivable about the "native microbiological status and its potential role in bioremediation, for example, of polycyclic aromatic hydrocarbons (HPA's). Considering the biome mangrove interface fragile and critical bordering the ocean, this study characterizes the native microbiota mangrove potential biodegradability of HPA's using a biomarker for molecular detection and assessment of bacterial diversity by PCR in areas under the influence of oil companies in the Basin Petroleum Geology Potiguar (BPP). We chose PcaF, a metabolic enzyme, to be the molecular biomarker in a PCR-DGGE detection of prokaryotes that degrade HPA s. The PCR-DGGE fingerprints obtained from Paracuru-CE, Fortim-CE and Areia Branca-RN samples revealed the occurrence of fluctuations of microbial communities according to the sampling periods and in response to the impact of oil. In the analysis of microbial communities interference of the oil industry, in Areia Branca-RN and Paracuru-CE was observed that oil is a determinant of microbial diversity. Fortim-CE probably has no direct influence with the oil activity. In order to obtain data for better understanding the transport and biodegradation of HPA's, there were conducted in silico studies with modeling and simulation from obtaining 3-D models of proteins involved in the degradation of phenanthrene in the transport of HPA's and also getting the 3-D model of the enzyme PcaF used as molecular marker in this study. Were realized docking studies with substrates and products to a better understanding about the transport mechanism and catalysis of HPA s

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The venom of Crotalus durissus terrificus snakes presents various substances, including a serine protease with thrombin-like activity, called gyroxin, that clots plasmatic fibrinogen and promote the fibrin formation. The aim of this study was to purify and structurally characterize the gyroxin enzyme from Crotalus durissus terrificus venom. For isolation and purification, the following methods were employed: gel filtration on Sephadex G75 column and affinity chromatography on benzamidine Sepharose 6B; 12% SDS-PAGE under reducing conditions; N-terminal sequence analysis; cDNA cloning and expression through RT-PCR and crystallization tests. Theoretical molecular modeling was performed using bioinformatics tools based on comparative analysis of other serine proteases deposited in the NCBI (National Center for Biotechnology Information) database. Protein N-terminal sequencing produced a single chain with a molecular mass of similar to 30 kDa while its full-length cDNA had 714 bp which encoded a mature protein containing 238 amino acids. Crystals were obtained from the solutions 2 and 5 of the Crystal Screen Kit (R), two and one respectively, that reveal the protein constitution of the sample. For multiple sequence alignments of gyroxin-like B2.1 with six other serine proteases obtained from snake venoms (SVSPs), the preservation of cysteine residues and their main structural elements (alpha-helices, beta-barrel and loops) was indicated. The localization of the catalytic triad in His57, Asp102 and Ser198 as well as S1 and S2 specific activity sites in Thr193 and Gli215 amino acids was pointed. The area of recognition and cleavage of fibrinogen in SVSPs for modeling gyroxin B2.1 sequence was located at Arg60, Arg72, Gln75, Arg81, Arg82, Lis85, Glu86 and Lis87 residues. Theoretical modeling of gyroxin fraction generated a classical structure consisting of two alpha-helices, two beta-barrel structures, five disulfide bridges and loops in positions 37, 60, 70, 99, 148, 174 and 218. These results provided information about the functional structure of gyroxin allowing its application in the design of new drugs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated