124 resultados para bioinformatics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: Supporting the functionality of recent duplicate gene copies is usually difficult, owing to high sequence similarity between duplicate counterparts and shallow phylogenies, which hamper both the statistical and experimental inference. RESULTS: We developed an integrated evolutionary approach to identify functional duplicate gene copies and other lineage-specific genes. By repeatedly simulating neutral evolution, our method estimates the probability that an ORF was selectively conserved and is therefore likely to represent a bona fide coding region. In parallel, our method tests whether the accumulation of non-synonymous substitutions reveals signatures of selective constraint. We show that our approach has high power to identify functional lineage-specific genes using simulated and real data. For example, a coding region of average length (approximately 1400 bp), restricted to hominoids, can be predicted to be functional in approximately 94-100% of cases. Notably, the method may support functionality for instances where classical selection tests based on the ratio of non-synonymous to synonymous substitutions fail to reveal signatures of selection. Our method is available as an automated tool, ReEVOLVER, which will also be useful to systematically detect functional lineage-specific genes of closely related species on a large scale. AVAILABILITY: ReEVOLVER is available at http://www.unil.ch/cig/page7858.html.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference. RESULTS: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition. CONCLUSION: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Protein α-helical coiled coil structures that elicit antibody responses, which block critical functions of medically important microorganisms, represent a means for vaccine development. By using bioinformatics algorithms, a total of 50 antigens with α-helical coiled coil motifs orthologous to Plasmodium falciparum were identified in the P. vivax genome. The peptides identified in silico were chemically synthesized; circular dichroism studies indicated partial or high α-helical content. Antigenicity was evaluated using human sera samples from malaria-endemic areas of Colombia and Papua New Guinea. Eight of these fragments were selected and used to assess immunogenicity in BALB/c mice. ELISA assays indicated strong reactivity of serum samples from individuals residing in malaria-endemic regions and sera of immunized mice, with the α-helical coiled coil structures. In addition, ex vivo production of IFN-γ by murine mononuclear cells confirmed the immunogenicity of these structures and the presence of T-cell epitopes in the peptide sequences. Moreover, sera of mice immunized with four of the eight antigens recognized native proteins on blood-stage P. vivax parasites, and antigenic cross-reactivity with three of the peptides was observed when reacted with both the P. falciparum orthologous fragments and whole parasites. Results here point to the α-helical coiled coil peptides as possible P. vivax malaria vaccine candidates as were observed for P. falciparum. Fragments selected here warrant further study in humans and non-human primate models to assess their protective efficacy as single components or assembled as hybrid linear epitopes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: Lateral gene transfer is a major mechanism contributing to bacterial genome dynamics and pathovar emergence via pathogenicity island (PAI) spreading. However, since few of these genomic exchanges are experimentally reproducible, it is difficult to establish evolutionary scenarios for the successive PAI transmissions between bacterial genera. Methods initially developed at the gene and/or nucleotide level for genomics, i.e. comparisons of concatenated sequences, ortholog frequency, gene order or dinucleotide usage, were combined and applied here to homologous PAIs: we call this approach comparative PAI genometrics. RESULTS: YAPI, a Yersinia PAI, and related islands were compared with measure evolutionary relationships between related modules. Through use of our genometric approach designed for tracking codon usage adaptation and gene phylogeny, an ancient inter-genus PAI transfer was oriented for the first time by characterizing the genomic environment in which the ancestral island emerged and its subsequent transfers to other bacterial genera.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The SwissBioisostere database (http://www.swissbioisostere.ch) contains information on molecular replacements and their performance in biochemical assays. It is meant to provide researchers in drug discovery projects with ideas for bioisosteric modifications of their current lead molecule, as well as to give interested scientists access to the details on particular molecular replacements. As of August 2012, the database contains 21 293 355 datapoints corresponding to 5 586 462 unique replacements that have been measured in 35 039 assays against 1948 molecular targets representing 30 target classes. The accessible data were created through detection of matched molecular pairs and mining bioactivity data in the ChEMBL database. The SwissBioisostere database is hosted by the Swiss Institute of Bioinformatics and available via a web-based interface.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Microarray data is frequently used to characterize the expression profile of a whole genome and to compare the characteristics of that genome under several conditions. Geneset analysis methods have been described previously to analyze the expression values of several genes related by known biological criteria (metabolic pathway, pathology signature, co-regulation by a common factor, etc.) at the same time and the cost of these methods allows for the use of more values to help discover the underlying biological mechanisms. Results: As several methods assume different null hypotheses, we propose to reformulate the main question that biologists seek to answer. To determine which genesets are associated with expression values that differ between two experiments, we focused on three ad hoc criteria: expression levels, the direction of individual gene expression changes (up or down regulation), and correlations between genes. We introduce the FAERI methodology, tailored from a two-way ANOVA to examine these criteria. The significance of the results was evaluated according to the self-contained null hypothesis, using label sampling or by inferring the null distribution from normally distributed random data. Evaluations performed on simulated data revealed that FAERI outperforms currently available methods for each type of set tested. We then applied the FAERI method to analyze three real-world datasets on hypoxia response. FAERI was able to detect more genesets than other methodologies, and the genesets selected were coherent with current knowledge of cellular response to hypoxia. Moreover, the genesets selected by FAERI were confirmed when the analysis was repeated on two additional related datasets. Conclusions: The expression values of genesets are associated with several biological effects. The underlying mathematical structure of the genesets allows for analysis of data from several genes at the same time. Focusing on expression levels, the direction of the expression changes, and correlations, we showed that two-step data reduction allowed us to significantly improve the performance of geneset analysis using a modified two-way ANOVA procedure, and to detect genesets that current methods fail to detect.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.

Relevância:

10.00% 10.00%

Publicador:

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MOTIVATION: Most anatomical ontologies are species-specific, whereas a framework for comparative studies is needed. We describe the vertebrate Homologous Organs Groups ontology, vHOG, used to compare expression patterns between species.¦RESULTS: vHOG is a multispecies anatomical ontology for the vertebrate lineage. It is based on the HOGs used in the Bgee database of gene expression evolution. vHOG version 1.4 includes 1184 terms, follows OBO principles and is based on the Common Anatomy Reference Ontology (CARO). vHOG only describes structures with historical homology relations between model vertebrate species. The mapping to species-specific anatomical ontologies is provided as a separate file, so that no homology hypothesis is stated within the ontology itself. Each mapping has been manually reviewed, and we provide support codes and references when available. Availability and implementation: vHOG is available from the Bgee download site (http://bgee.unil.ch/), as well as from the OBO Foundry and the NCBO Bioportal websites.¦CONTACT: bgee@isb-sib.ch; frederic.bastian@unil.ch.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The primary mission of Universal Protein Resource (UniProt) is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: The variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies.Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models ( e. g. relational), they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles.Results: We provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data.Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping of orthologous probes using different orthology databases. We also show how to perform an explorative comparison of disease-related transcriptional changes in human patients and in a genetic mouse model.Conclusion: The R package annotationTools provides a simple solution to handle microarray annotation and orthology tables, as well as other flat molecular biology databases. Thereby, it allows easy integration and analysis of heterogeneous microarray experiments across different technological platforms or species.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We investigate the benefits and experimental feasibility of approaches enabling the shift from short (1.7kDa on average) peptides in bottom-up proteomics to about twice longer (~3.2kDa on average) peptides in the so-called extended bottom-up proteomics. Candida albicans secreted aspartic protease Sap9 has been selected for evaluation as an extended bottom-up proteomic-grade enzyme due to its suggested dibasic cleavage specificity and ease of production. We report the extensive characterization of Sap9 specificity and selectivity revealing that protein cleavage by Sap9 most often occurs in the vicinity of proximal basic amino acids, and in select cases also at basic and hydrophobic residues. Sap9 is found to cleave a large variety of proteins in a relatively short, ~1h, period of time and it is efficient in a broad pH range, including slightly acidic, e. g., pH5.5, conditions. Importantly, the resulting peptide mixtures contain representative peptides primarily in the target 3-7kDa range. The utility and advantages of this enzyme in routine analysis of protein mixtures are demonstrated and the limitations are discussed. Overall, Sap9 has a potential to become an enzyme of choice in an extended bottom-up proteomics, which is technically ready to complement the traditional bottom-up proteomics for improved targeted protein structural analysis and expanded proteome coverage. BIOLOGICAL SIGNIFICANCE: Advances in biological applications of mass spectrometry-based bottom-up proteomics are oftentimes limited by the extreme complexity of biological samples, e.g., proteomes or protein complexes. One of the reasons for it is in the complexity of the mixtures of enzymatically (most often using trypsin) produced short (<3kDa) peptides, which may exceed the analytical capabilities of liquid chromatography and mass spectrometry. Information on localization of protein modifications may also be affected by the small size of typically produced peptides. On the other hand, advances in high-resolution mass spectrometry and liquid chromatography have created an intriguing opportunity of improving proteome analysis by gradually increasing the size of enzymatically-derived peptides in MS-based bottom-up proteomics. Bioinformatics has already confirmed the envisioned advantages of such approach. The remaining bottle-neck is an enzyme that could produce longer peptides. Here, we report on the characterization of a possible candidate enzyme, Sap9, which may be considered for producing longer, e.g., 3-7kDa, peptides and lead to a development of extended bottom-up proteomics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Initiation and progression of most colorectal cancers (CRCs) are driven by hyper-activation of the canonical Wnt/ß-catenin/TCF signaling pathway. However, a basal level of activation of this pathway is necessary for intestinal cell homeostasis; thus only CRC-specific effectors of this pathway could be exploited as potential clinical targets. PROX1 is an evolutionary conserved transcription factor with multiple roles in several tissues in embryogenesis, and increasing relevance in cancer. PROX1 is a colon cancer-specific Wnt target in the intestine, thus it might represent a therapeutic target. The role of PROX1 in promoting the transition from early to highly-dysplastic adenoma was previously described [1], Importantly, tumor metastasis is a leading cause of cancer-related mortality. Frequently, micrometastases are already present in patients at the time of diagnosis, therefore better understanding of the mechanisms regulating growth of macrometastatic lesions is important for the development of novel treatment approaches. In this study we showed that PROX1 is expressed in colon cancer stem cell and promotes the outgrowth of metastatic lesions. Firstly, we analyzed the expression of PROX1 in advanced CRCs and their metastases. We found that PROX1 over-expression is a feature of microsatellite stable tumors (~85% of microsatellite stable (MSS) CRCs), which generally have worse prognosis in comparison to microsatellite unstable CRCs. Analysis of primary CRCs and corresponding metastatic lesions showed that PROX1 expression is conserved, or increased in metastases. Further bioinformatics analysis of tumor and metastases gene expression profiles showed that PROX1 is co- expressed with stem cell and progenitor markers. Moreover, in inducible ApcmLgr5-EGFP-lres-CreERT2 model, Prox1+ cells marked a sub-population of Lgr5+ stem cells and subsequent transient amplifying cell population. Orthotopic model of CRC and lung colonization assays in mice demonstrated that PROX1 promotes tumor cell outgrowth in metastatic lesions, while it has no effect on primary tumor growth, invasion, and survival in circulation or cell extravasation. In vitro, PROX1 expressing tumor cells demonstrated strongly increased capacity to form spheroids, and increased survival and proliferation under hypoxic or nutrient-deprivation conditions. By monitoring cellular respiration under these conditions, we found that PROX1 expressing cells exhibit a better metabolic adaptation to changes in fuel source. Autophagy inhibitors, prevented growth both in vitro and in vivo of PROX1 expressing cells. Importantly, conditional inactivation of PROX1 after the establishment of metastases prevented further growth of macroscopic lesions resulting in stable disease. In summary, we identified a novel mechanism underlying the ability of metastatic colon cancer stem and progenitor cells to survive and grow in target organs through metabolic adaptation. Our results establish PROX1 as a key factor of CRC metastatic disease where it promotes survival of metastatic colon cancer stem-like cells, through their metabolic adaptation in sub-optimal microenvironments - L'initiation et la progression de la plupart des cancers colorectaux (CRC) sont entraînées par une hyper-activation de la voie métabolique Wnt/ß- caténine/TCF. Toutefois, un niveau d'activation minimal de Wnt est nécessaire pour l'homéostasie des cellules intestinales ; ainsi seuls des effecteurs spécifiques du CRC- de cette voie pourraient être exploités comme des cibles cliniques potentielles. PROX1 est un facteur de transcription évolutif conservé avec de multiples rôles dans plusieurs tissus durant l'embryogenèse et une pertinence croissante dans le cancer. PROX1 est une cible Wnt spécifique dans le cancer de l'intestin, donc il pourrait représenter une cible thérapeutique. Le rôle de PROX1 durant l'évolution de la maladie d'un stade précoce jusqu'à l'adénome hautement dysplasique a été décrit précédemment. Surtout, la métastase des tumeurs est une cause majeure de mortalité liée au cancer. Souvent, les micro-métastases sont déjà présentes chez les patients au moment du diagnostic, c'est pourquoi une meilleure compréhension des mécanismes régulant la croissance des lésions macrométastatiques est importante pour le développement de nouvelles approches thérapeutiques. Dans cette étude, nous avons prouvé que PROX1 est exprimé dans les cellules souches du cancer du côlon et favorise l'apparition de lésions métastatiques. Nous avons d'abord analysé l'expression de PROX1 dans des CRC avancés ainsi que dans leurs métastases. Nous avons constaté que la surexpression de PROX1 est une caractéristique des tumeurs stables microsatellites (~85% du MSS CRC), qui ont généralement un pronostic défavorable par rapport aux microsatellites CRC instables. L'analyse des CRC primaires et de leurs métastases liées a montré que l'expression de PROX1 est conservée, voire augmentée dans les métastases. A l'aide d'une base de données de tumeurs et métastases, nous avons observé une co- régulation de PROX1 entre cellules souches et marqueurs de progéniteurs mais pas avec des cellules différenciées. De plus, en utilisant un modèle Apcm Lgr5-EGFP-IRES-CreERT2 inductible, les cellules Prox1+ ont marqué une sous-population de cellules LGR& capable de produire une lignée. Un modèle orthotopique de cancer colorectal et des essais de colonisation du poumon chez la souris ont démontré que PROX1 favorise l'excroissance des cellules tumorales dans les lésions métastatiques, alors qu'il n'a aucun effet sur la croissance tumorale primaire, l'invasion ou une extravasation des cellules. In vitro, les cellules tumorales exprimant PROX1 ont démontré une forte augmentation de leur capacité à former des sphéroïdes, ainsi qu'une augmentation de la survie et de la prolifération dans des conditions hypoxiques ou lors de privation de nutriments. En contrôlant la respiration cellulaire dans ces conditions, nous avons constaté que les cellules exprimant PROX1 présentent une meilleure adaptation métabolique à l'évolution des sources de carburant. Des inhibiteurs de l'autophagie, suggérant une approche thérapeutique potentielle, ont tué à la fois in vitro et in vivo les cellules exprimant PROX1. Surtout, l'inactivation conditionnelle de PROX1 après l'apparition de métastases a empêché la croissance des lésions macroscopiques résultant en une maladie stable. En résumé, nous avons identifié un nouveau mécanisme mettant en évidence la capacité des cellules souches du cancer du côlon métastatique à survivre et à se développer dans les organes cibles grâce à l'adaptation métabolique. Nos résultats définissent PROX1 comme un facteur clé du cancer colorectal métastatique en favorisant la survie des cellules souches métastatiques apparentées au cancer du colon grâce à leur adaptation métabolique aux microenvironnements défavorables.