931 resultados para Bioinformatics
Resumo:
Breast cancer is the most common form of cancer among women and the identification of markers to discriminate tumorigenic from normal cells, as well as the different stages of this pathology, is of critical importance. Two-dimensional electrophoresis has been used before for studying breast cancer, but the progressive completion of human genomic sequencing and the introduction of mass spectrometry, combined with advanced bioinformatics for protein identification, have considerably increased the possibilities for characterizing new markers and therapeutic targets. Breast cancer proteomics has already identified markers of potential clinical interest (such as the molecular chaperone 14-3-3 sigma) and technological innovations such as large scale and high throughput analysis are now driving the field. Methods in functional proteomics have also been developed to study the intracellular signaling pathways that underlie the development of breast cancer. As illustrated with fibroblast growth factor-2, a mitogen and motogen factor for breast cancer cells, proteomics is a powerful approach to identify signaling proteins and to decipher the complex signaling circuitry involved in tumor growth. Together with genomics, proteomics is well on the way to molecularly characterizing the different types of breast tumor, and thus defining new therapeutic targets for future treatment.
Resumo:
Porphyromonas gingivalis is a key periodontal pathogen which has been implicated in the etiology of chronic adult periodontitis. Our aim was to develop a protein based vaccine for the prevention and or treatment of this disease. We used a whole genome sequencing approach to identify potential vaccine candidates. From a genomic sequence, we selected 120 genes using a series of bioinformatics methods. The selected genes were cloned for expression in Escherichia coli and screened with P. gingivalis antisera before purification and testing in an animal model. Two of these recombinant proteins (PG32 and PG33) demonstrated significant protection in the animal model, while a number were reactive with various antisera. This process allows the rapid identification of vaccine candidates from genomic data. (C) 2001 Elsevier Science Ltd. All rights reserved.
Resumo:
The three-dimensional structures of leucine-rich repeat (LRR) -containing proteins from five different families were previously predicted based on the crystal structure of the ribonuclease inhibitor. using an approach that combined homology-based modeling, structure-based sequence alignment of LRRs, and several rational assumptions. The structural models have been produced based on very limited sequence similarity, which, in general. cannot yield trustworthy predictions. Recently, the protein structures from three of these five families have been determined. In this report we estimate the quality of the modeling approach by comparing the models with the experimentally determined structures. The comparison suggests that the general architecture, curvature, interior/exterior orientations of side chains. and backbone conformation of the LRR structures can be predicted correctly. On the other hand. the analysis revealed that, in some cases. it is difficult to predict correctly the twist of the overall super-helical structure. Taking into consideration the conclusions from these comparisons, we identified a new family of bacterial LRR proteins and present its structural model. The reliability of the LRR protein modeling suggests that it would be informative to apply similar modeling approaches to other classes of solenoid proteins.
Resumo:
Human cytomegalovirus (HCMV) can establish both nonproductive (latent) and productive (lytic) infections. Many of the proteins expressed during these phases of infection could be expected to be targets of the immune response; however, much of our understanding of the CD8(+)-T-cell response to HCMV is mainly based on the pp65 antigen. Very little is known about T-cell control over other antigens expressed during the different stages of virus infection; this imbalance in our understanding undermines the importance of these antigens in several aspects of HCMV disease pathogenesis. In the present study, an efficient and rapid strategy based on predictive bioinformatics and ex vivo functional T-cell assays was adopted to profile CD8(+)-T-cell responses to a large panel of HCMV antigens expressed during different phases of replication. These studies revealed that CD8(+)-T-cell responses to HCMV often contained multiple antigen-specific reactivities, which were not just constrained to the previously identified pp65 or IE-1 antigens. Unexpectedly, a number of viral proteins including structural, early/late antigens and HCMV-encoded immunomodulators (pp28, pp50, gH, gB, US2, US3, US6, and UL18) were also identified as potential targets for HCMV-specific CD8(+)-T-cell immunity. Based on this extensive analysis, numerous novel HCMV peptide epitopes and their HLA-restricting determinants recognized by these T cells have been defined. These observations contrast with previous findings that viral interference with the antigen-processing pathway during lytic infection would render immediate-early and early/late proteins less immunogenic. This work strongly suggests that successful HCMV-specific immune control in healthy virus carriers is dependent on a strong T-cell response towards a broad repertoire of antigens.
Resumo:
We present a novel data analysis strategy which combined with subcellular fractionation and liquid chromatography-mass spectrometry (LC-MS) based proteomics provides a simple and effective workflow for global drug profiling. Five subcellular fractions were obtained by differential centrifugation followed by high resolution LC-MS and complete functional regulation analysis. The methodology combines functional regulation and enrichment analysis into a single visual summary. The workflow enables improved insight into perturbations caused by drugs. We provide a statistical argument to demonstrate that even crude subcellular fractions leads to improved functional characterization. We demonstrate this data analysis strategy on data obtained in a MS-based global drug profiling study. However, this strategy can also be performed on other types of large scale biological data.
Resumo:
Object-oriented programming languages presently are the dominant paradigm of application development (e. g., Java,. NET). Lately, increasingly more Java applications have long (or very long) execution times and manipulate large amounts of data/information, gaining relevance in fields related with e-Science (with Grid and Cloud computing). Significant examples include Chemistry, Computational Biology and Bio-informatics, with many available Java-based APIs (e. g., Neobio). Often, when the execution of such an application is terminated abruptly because of a failure (regardless of the cause being a hardware of software fault, lack of available resources, etc.), all of its work already performed is simply lost, and when the application is later re-initiated, it has to restart all its work from scratch, wasting resources and time, while also being prone to another failure and may delay its completion with no deadline guarantees. Our proposed solution to address these issues is through incorporating mechanisms for checkpointing and migration in a JVM. These make applications more robust and flexible by being able to move to other nodes, without any intervention from the programmer. This article provides a solution to Java applications with long execution times, by extending a JVM (Jikes research virtual machine) with such mechanisms. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.
Resumo:
We describe a novel approach to explore DNA nucleotide sequence data, aiming to produce high-level categorical and structural information about the underlying chromosomes, genomes and species. The article starts by analyzing chromosomal data through histograms using fixed length DNA sequences. After creating the DNA-related histograms, a correlation between pairs of histograms is computed, producing a global correlation matrix. These data are then used as input to several data processing methods for information extraction and tabular/graphical output generation. A set of 18 species is processed and the extensive results reveal that the proposed method is able to generate significant and diversified outputs, in good accordance with current scientific knowledge in domains such as genomics and phylogenetics.
Resumo:
Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.
Resumo:
β-lactamases are hydrolytic enzymes that inactivate the β-lactam ring of antibiotics such as penicillins and cephalosporins. The major diversity of studies carried out until now have mainly focused on the characterization of β-lactamases recovered among clinical isolates of Gram-positive staphylococci and Gram-negative enterobacteria, amongst others. However, only some studies refer to the detection and development of β-lactamases carriers in healthy humans, sick animals, or even in strains isolated from environmental stocks such as food, water, or soils. Considering this, we proposed a 10-week laboratory programme for the Biochemistry and Molecular Biology laboratory for majors in the health, environmental, and agronomical sciences. During those weeks, students would be dealing with some basic techniques such as DNA extraction, bacterial transformation, polymerase chain reaction (PCR), gel electrophoresis, and the use of several bioinformatics tools. These laboratory exercises would be conducted as a mini research project in which all the classes would be connected with the previous ones. This curriculum was compared in an experiment involving two groups of students from two different majors. The new curriculum, with classes linked together as a mini research project, was taught to a major in Pharmacy and an old curriculum was taught to students from environmental health. The results showed that students who were enrolled in the new curriculum obtained better results in the final exam than the students who were enrolled in the former curriculum. Likewise, these students were found to be more enthusiastic during the laboratory classes than those from the former curriculum.
Resumo:
In the cell, the correct folding of many proteins depends on the function of preexisting ones known as Molecular Chaperones (for a review see Hartl and Hayer-Hartl 2009). These, were defined as proteins that bind to and stabilize an otherwise unstable conformation of another protein, and by controlling binding and release, facilitate its correct fate in vivo, be it folding, oligomeric assembly, transport to a particular subcellular compartment, or disposal by degradation. Molecular chaperones do not convey steric information specifying correct folding: instead, they prevent incorrect interactions within and between nonnative peptides, thus typically increasing the yield but not the rate of folding reactions. Molecular chaperones are ubiquitous and comprise several protein families that are structurally unrelated (Hartl and Hayer-Hartl 2009). The Hsp70s and the Chaperonin families have been extensively studied.
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Radiotherapy is one of the main treatments used against cancer. Radiotherapy uses radiation to destroy cancerous cells trying, at the same time, to minimize the damages in healthy tissues. The planning of a radiotherapy treatment is patient dependent, resulting in a lengthy trial and error procedure until a treatment complying as most as possible with the medical prescription is found. Intensity Modulated Radiation Therapy (IMRT) is one technique of radiation treatment that allows the achievement of a high degree of conformity between the area to be treated and the dose absorbed by healthy tissues. Nevertheless, it is still not possible to eliminate completely the potential treatments’ side-effects. In this retrospective study we use the clinical data from patients with head-and-neck cancer treated at the Portuguese Institute of Oncology of Coimbra and explore the possibility of classifying new and untreated patients according to the probability of xerostomia 12 months after the beginning of IMRT treatments by using a logistic regression approach. The results obtained show that the classifier presents a high discriminative ability in predicting the binary response “at risk for xerostomia at 12 months”