40 resultados para Bioinformatics

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cell surface proteins are excellent targets for diagnostic and therapeutic interventions. By using bioinformatics tools, we generated a catalog of 3,702 transmembrane proteins located at the surface of human cells (human cell surfaceome). We explored the genetic diversity of the human cell surfaceome at different levels, including the distribution of polymorphisms, conservation among eukaryotic species, and patterns of gene expression. By integrating expression information from a variety of sources, we were able to identify surfaceome genes with a restricted expression in normal tissues and/or differential expression in tumors, important characteristics for putative tumor targets. A high-throughput and efficient quantitative real-time PCR approach was used to validate 593 surfaceome genes selected on the basis of their expression pattern in normal and tumor samples. A number of candidates were identified as potential diagnostic and therapeutic targets for colorectal tumors and glioblastoma. Several candidate genes were also identified as coding for cell surface cancer/testis antigens. The human cell surfaceome will serve as a reference for further studies aimed at characterizing tumor targets at the surface of human cells.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The sciarid DNA puff C4 BhC4-1 gene is amplified and transcribed in salivary glands at the end of the larval stage. In transgenic Drosophila, the BhC4-1 promoter drives transcription in prepupal salivary glands and in the ring gland of late embryos. A bioinformatics analysis has identified 162 sequences similar to distinct regions of the BhC4-1 proximal promoter, which are predominantly located either in 5` or 3` regions or introns in the Drosophila melanogaster genome. A significant number of the identified sequences are found in the regulatory regions of Drosophila genes that are expressed in the salivary gland. Functional assays in Drosophila reveal that the BhC4-1 proximal promoter contains both a 129 bp (-186/-58) salivary gland enhancer and a 67 bp (-253/-187) ring gland enhancer that drive tissue specific patterns of developmentally regulated gene expression, irrespective of their orientation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Though introduced recently, complex networks research has grown steadily because of its potential to represent, characterize and model a wide range of intricate natural systems and phenomena. Because of the intrinsic complexity and systemic organization of life, complex networks provide a specially promising framework for systems biology investigation. The current article is an up-to-date review of the major developments related to the application of complex networks in biology, with special attention focused on the more recent literature. The main concepts and models of complex networks are presented and illustrated in an accessible fashion. Three main types of networks are covered: transcriptional regulatory networks, protein-protein interaction networks and metabolic networks. The key role of complex networks for systems biology is extensively illustrated by several of the papers reviewed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Protein-protein interactions (PPIs) constitute one of the most crucial conditions to sustain life in living organisms. To study PPI in Arabidopsis thaliana we have developed AtPIN, a database and web interface for searching and building interaction networks based on publicly available protein-protein interaction datasets. Description: All interactions were divided into experimentally demonstrated or predicted. The PPIs in the AtPIN database present a cellular compartment classification (C(3)) which divides the PPI into 4 classes according to its interaction evidence and subcellular localization. It has been shown in the literature that a pair of genuine interacting proteins are generally expected to have a common cellular role and proteins that have common interaction partners have a high chance of sharing a common function. In AtPIN, due to its integrative profile, the reliability index for a reported PPI can be postulated in terms of the proportion of interaction partners that two proteins have in common. For this, we implement the Functional Similarity Weight (FSW) calculation for all first level interactions present in AtPIN database. In order to identify target proteins of cytosolic glutamyl-tRNA synthetase (Cyt-gluRS) (AT5G26710) we combined two approaches, AtPIN search and yeast two-hybrid screening. Interestingly, the proteins glutamine synthetase (AT5G35630), a disease resistance protein (AT3G50950) and a zinc finger protein (AT5G24930), which has been predicted as target proteins for Cyt-gluRS by AtPIN, were also detected in the experimental screening. Conclusions: AtPIN is a friendly and easy-to-use tool that aggregates information on Arabidopsis thaliana PPIs, ontology, and sub-cellular localization, and might be a useful and reliable strategy to map protein-protein interactions in Arabidopsis. AtPIN can be accessed at http://bioinfo.esalq.usp.br/atpin.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Hexamerins are hemocyanin-derived proteins that have lost the ability to bind copper ions and transport oxygen; instead, they became storage proteins. The current study aimed to broaden our knowledge on the hexamerin genes found in the honey bee genome by exploring their structural characteristics, expression profiles, evolution, and functions in the life cycle of workers, drones and queens. Results: The hexamerin genes of the honey bee (hex 70a, hex 70b, hex 70c and hex 110) diverge considerably in structure, so that the overall amino acid identity shared among their deduced protein subunits varies from 30 to 42%. Bioinformatics search for motifs in the respective upstream control regions (UCRs) revealed six overrepresented motifs including a potential binding site for Ultraspiracle (Usp), a target of juvenile hormone (JH). The expression of these genes was induced by topical application of JH on worker larvae. The four genes are highly transcribed by the larval fat body, although with significant differences in transcript levels, but only hex 110 and hex 70a are re-induced in the adult fat body in a caste-and sex-specific fashion, workers showing the highest expression. Transcripts for hex 110, hex 70a and hex70b were detected in developing ovaries and testes, and hex 110 was highly transcribed in the ovaries of egg-laying queens. A phylogenetic analysis revealed that HEX 110 is located at the most basal position among the holometabola hexamerins, and like HEX 70a and HEX 70c, it shares potential orthology relationship with hexamerins from other hymenopteran species. Conclusions: Striking differences were found in the structure and developmental expression of the four hexamerin genes in the honey bee. The presence of a potential binding site for Usp in the respective 5' UCRs, and the results of experiments on JH level manipulation in vivo support the hypothesis of regulation by JH. Transcript levels and patterns in the fat body and gonads suggest that, in addition to their primary role in supplying amino acids for metamorphosis, hexamerins serve as storage proteins for gonad development, egg production, and to support foraging activity. A phylogenetic analysis including the four deduced hexamerins and related proteins revealed a complex pattern of evolution, with independent radiation in insect orders.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis. Results: We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes. Conclusions: ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at http://gdm.fmrp.usp.br/probfast.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We report the first quantitative and qualitative analysis of the poly (A)(+) transcriptome of two human mammary cell lines, differentially expressing (human epidermal growth factor receptor) an oncogene over-expressed in approximately 25% of human breast tumors. Full-length cDNA populations from the two cell lines were digested enzymatically, individually tagged according to a customized method for library construction, and simultaneously sequenced by the use of the Titanium 454-Roche-platform. Comprehensive bioinformatics analysis followed by experimental validation confirmed novel genes, splicing variants, single nucleotide polymorphisms, and gene fusions indicated by RNA-seq data from both samples. Moreover, comparative analysis showed enrichment in alternative events, especially in the exon usage category, in ERBB2 over-expressing cells, data indicating regulation of alternative splicing mediated by the oncogene. Alterations in expression levels of genes, such as LOX, ATP5L, GALNT3, and MME revealed by large-scale sequencing were confirmed between cell lines as well as in tumor specimens with different ERBB2 backgrounds. This approach was shown to be suitable for structural, quantitative, and qualitative assessment of complex transcriptomes and revealed new events mediated by ERBB2 overexpression, in addition to potential molecular targets for breast cancer that are driven by this oncogene.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results: The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions: Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/.S3T source code and datasets can also be downloaded from the aforementioned website.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e. g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results: The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes ( targets or predictors) is also implemented in the system. Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: DAPfinder and DAPview are novel BRB-ArrayTools plug-ins to construct gene coexpression networks and identify significant differences in pairwise gene-gene coexpression between two phenotypes. Results: Each significant difference in gene-gene association represents a Differentially Associated Pair (DAP). Our tools include several choices of filtering methods, gene-gene association metrics, statistical testing methods and multiple comparison adjustments. Network results are easily displayed in Cytoscape. Analyses of glioma experiments and microarray simulations demonstrate the utility of these tools. Conclusions: DAPfinder is a new friendly-user tool for reconstruction and comparison of biological networks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: There are several studies in the literature depicting measurement error in gene expression data and also, several others about regulatory network models. However, only a little fraction describes a combination of measurement error in mathematical regulatory networks and shows how to identify these networks under different rates of noise. Results: This article investigates the effects of measurement error on the estimation of the parameters in regulatory networks. Simulation studies indicate that, in both time series (dependent) and non-time series (independent) data, the measurement error strongly affects the estimated parameters of the regulatory network models, biasing them as predicted by the theory. Moreover, when testing the parameters of the regulatory network models, p-values computed by ignoring the measurement error are not reliable, since the rate of false positives are not controlled under the null hypothesis. In order to overcome these problems, we present an improved version of the Ordinary Least Square estimator in independent (regression models) and dependent (autoregressive models) data when the variables are subject to noises. Moreover, measurement error estimation procedures for microarrays are also described. Simulation results also show that both corrected methods perform better than the standard ones (i.e., ignoring measurement error). The proposed methodologies are illustrated using microarray data from lung cancer patients and mouse liver time series data. Conclusions: Measurement error dangerously affects the identification of regulatory network models, thus, they must be reduced or taken into account in order to avoid erroneous conclusions. This could be one of the reasons for high biological false positive rates identified in actual regulatory network models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sequencing technologies and new bioinformatics tools have led to the complete sequencing of various genomes. However, information regarding the human transcriptome and its annotation is yet to be completed. The Human Cancer Genome Project, using ORESTES (open reading frame EST sequences) methodology, contributed to this objective by generating data from about 1.2 million expressed sequence tags. Approximately 30 of these sequences did not align to ESTs in the public databases and were considered no-match ORESTES. On the basis that a set of these ESTs could represent new transcripts, we constructed a cDNA microarray. This platform was used to hybridize against 12 different normal or tumor tissues. We identified 3421 transcribed regions not associated with annotated transcripts, representing 83.3 of the platform. The total number of differentially expressed sequences was 1007. Also, 28 of analyzed sequences could represent noncoding RNAs. Our data reinforces the knowledge of the human genome being pervasively transcribed, and point out molecular marker candidates for different cancers. To reinforce our data, we confirmed, by real-time PCR, the differential expression of three out of eight potentially tumor markers in prostate tissues. Lists of 1007 differentially expressed sequences, and the 291 potentially noncoding tumor markers were provided.