954 resultados para linked open data
Resumo:
Background: Several studies in Drosophila have shown excessive movement of retrogenes from the X chromosome to autosomes, and that these genes are frequently expressed in the testis. This phenomenon has led to several hypotheses invoking natural selection as the process driving male-biased genes to the autosomes. Metta and Schlotterer (BMC Evol Biol 2010, 10:114) analyzed a set of retrogenes where the parental gene has been subsequently lost. They assumed that this class of retrogenes replaced the ancestral functions of the parental gene, and reported that these retrogenes, although mostly originating from movement out of the X chromosome, showed female-biased or unbiased expression. These observations led the authors to suggest that selective forces (such as meiotic sex chromosome inactivation and sexual antagonism) were not responsible for the observed pattern of retrogene movement out of the X chromosome. Results: We reanalyzed the dataset published by Metta and Schlotterer and found several issues that led us to a different conclusion. In particular, Metta and Schlotterer used a dataset combined with expression data in which significant sex-biased expression is not detectable. First, the authors used a segmental dataset where the genes selected for analysis were less testis-biased in expression than those that were excluded from the study. Second, sex-biased expression was defined by comparing male and female whole-body data and not the expression of these genes in gonadal tissues. This approach significantly reduces the probability of detecting sex-biased expressed genes, which explains why the vast majority of the genes analyzed (parental and retrogenes) were equally expressed in both males and females. Third, the female-biased expression observed by Metta and Schltterer is mostly found for parental genes located on the X chromosome, which is known to be enriched with genes with female-biased expression. Fourth, using additional gonad expression data, we found that autosomal genes analyzed by Metta and Schlotterer are less up regulated in ovaries and have higher chance to be expressed in meiotic cells of spermatogenesis when compared to X-linked genes. Conclusions: The criteria used to select retrogenes and the sex-biased expression data based on whole adult flies generated a segmental dataset of female-biased and unbiased expressed genes that was unable to detect the higher propensity of autosomal retrogenes to be expressed in males. Thus, there is no support for the authors' view that the movement of new retrogenes, which originated from X-linked parental genes, was not driven by selection. Therefore, selection-based genetic models remain the most parsimonious explanations for the observed chromosomal distribution of retrogenes.
Resumo:
Abstract Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.
Resumo:
Abstract Background The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. Results BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at http://blasto.iq.usp.br/~tkoide/BayGO in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. Conclusion The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis.
Resumo:
Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Abstract Background Several mathematical and statistical methods have been proposed in the last few years to analyze microarray data. Most of those methods involve complicated formulas, and software implementations that require advanced computer programming skills. Researchers from other areas may experience difficulties when they attempting to use those methods in their research. Here we present an user-friendly toolbox which allows large-scale gene expression analysis to be carried out by biomedical researchers with limited programming skills. Results Here, we introduce an user-friendly toolbox called GEDI (Gene Expression Data Interpreter), an extensible, open-source, and freely-available tool that we believe will be useful to a wide range of laboratories, and to researchers with no background in Mathematics and Computer Science, allowing them to analyze their own data by applying both classical and advanced approaches developed and recently published by Fujita et al. Conclusion GEDI is an integrated user-friendly viewer that combines the state of the art SVR, DVAR and SVAR algorithms, previously developed by us. It facilitates the application of SVR, DVAR and SVAR, further than the mathematical formulas present in the corresponding publications, and allows one to better understand the results by means of available visualizations. Both running the statistical methods and visualizing the results are carried out within the graphical user interface, rendering these algorithms accessible to the broad community of researchers in Molecular Biology.
Resumo:
Abstract Background Sugarcane is an increasingly economically and environmentally important C4 grass, used for the production of sugar and bioethanol, a low-carbon emission fuel. Sugarcane originated from crosses of Saccharum species and is noted for its unique capacity to accumulate high amounts of sucrose in its stems. Environmental stresses limit enormously sugarcane productivity worldwide. To investigate transcriptome changes in response to environmental inputs that alter yield we used cDNA microarrays to profile expression of 1,545 genes in plants submitted to drought, phosphate starvation, herbivory and N2-fixing endophytic bacteria. We also investigated the response to phytohormones (abscisic acid and methyl jasmonate). The arrayed elements correspond mostly to genes involved in signal transduction, hormone biosynthesis, transcription factors, novel genes and genes corresponding to unknown proteins. Results Adopting an outliers searching method 179 genes with strikingly different expression levels were identified as differentially expressed in at least one of the treatments analysed. Self Organizing Maps were used to cluster the expression profiles of 695 genes that showed a highly correlated expression pattern among replicates. The expression data for 22 genes was evaluated for 36 experimental data points by quantitative RT-PCR indicating a validation rate of 80.5% using three biological experimental replicates. The SUCAST Database was created that provides public access to the data described in this work, linked to tissue expression profiling and the SUCAST gene category and sequence analysis. The SUCAST database also includes a categorization of the sugarcane kinome based on a phylogenetic grouping that included 182 undefined kinases. Conclusion An extensive study on the sugarcane transcriptome was performed. Sugarcane genes responsive to phytohormones and to challenges sugarcane commonly deals with in the field were identified. Additionally, the protein kinases were annotated based on a phylogenetic approach. The experimental design and statistical analysis applied proved robust to unravel genes associated with a diverse array of conditions attributing novel functions to previously unknown or undefined genes. The data consolidated in the SUCAST database resource can guide further studies and be useful for the development of improved sugarcane varieties.
Resumo:
Abstract Background Clinical and experimental data suggest that the inflammatory response is impaired in diabetics and can be modulated by insulin. The present study was undertaken to investigate the role of insulin on the early phase of allergic airway inflammation. Methods Diabetic male Wistar rats (alloxan, 42 mg/Kg, i.v., 10 days) and controls were sensitized by s.c. injection of ovalbumin (OA) in aluminium hydroxide 14 days before OA (1 mg/0.4 mL) or saline intratracheal challenge. The following analyses were performed 6 hours thereafter: a) quantification of interleukin (IL)-1β, tumor necrosis factor (TNF)-α and cytokine-induced neutrophil chemoattractant (CINC)-1 in the bronchoalveolar lavage fluid (BALF) by Enzyme-Linked Immunosorbent Assay, b) expression of E- and P- selectins on lung vessels by immunohistochemistry, and c) inflammatory cell infiltration into the airways and lung parenchyma. NPH insulin (4 IU, s.c.) was given i.v. 2 hours before antigen challenge. Results Diabetic rats exhibited significant reduction in the BALF concentrations of IL-1β (30%) and TNF-α (45%), and in the lung expression of P-selectin (30%) compared to non-diabetic animals. This was accompanied by reduced number of neutrophils into the airways and around bronchi and blood vessels. There were no differences in the CINC-1 levels in BALF, and E-selectin expression. Treatment of diabetic rats with NPH insulin, 2 hours before antigen challenge, restored the reduced levels of IL-1β, TNF-α and P-selectin, and neutrophil migration. Conclusion Data presented suggest that insulin modulates the production/release of TNF-α and IL-1β, the expression of P- and E-selectin, and the associated neutrophil migration into the lungs during the early phase of the allergic inflammatory reaction.
Resumo:
Abstract Background Extra-Amazonian autochthonous Plasmodium vivax infections have been reported in mountainous regions surrounded by the Atlantic Forest in Espírito Santo state, Brazil. Methods Sixty-five patients and 1,777 residents were surveyed between April 2001 and March 2004. Laboratory methods included thin and thick smears, multiplex-PCR, immunofluorescent assay (IFA) against P. vivax and Plasmodium malariae crude blood-stage antigens and enzyme-linked immunosorbent assay (ELISA) for antibodies against the P. vivax-complex (P. vivax and variants) and P. malariae/Plasmodium brasilianum circumsporozoite-protein (CSP) antigens. Results Average patient age was 35.1 years. Most (78.5%) were males; 64.6% lived in rural areas; 35.4% were farmers; and 12.3% students. There was no relevant history of travel. Ninety-five per cent of the patients were experiencing their first episode of malaria. Laboratory data from 51 patients were consistent with P. vivax infection, which was determined by thin smear. Of these samples, 48 were assayed by multiplex-PCR. Forty-five were positive for P. vivax, confirming the parasitological results, while P. malariae was detected in one sample and two gave negative results. Fifty percent of the 50 patients tested had IgG antibodies against the P. vivax-complex or P. malariae CSP as determined by ELISA. The percentages of residents with IgM and IgG antibodies detected by IFA for P. malariae, P. vivax and Plasmodium falciparum who did not complain of malaria symptoms at the time blood was collected were 30.1% and 56.5%, 6.2% and 37.7%, and 13.5% and 13%, respectively. The same sera that reacted to P. vivax also reacted to P. malariae. The following numbers of samples were positive in multiplex-PCR: 23 for P. vivax; 15 for P. malariae; 9 for P. falciparum and only one for P. falciparum and P. malariae. All thin and thick smears were negative. ELISA against CSP antigens was positive in 25.4%, 6.3%, 10.7% and 15.1% of the samples tested for "classical" P. vivax (VK210), VK247, P. vivax-like and P. malariae, respectively. Anopheline captures in the transmission area revealed only zoophilic and exophilic species. Conclusion The low incidence of malaria cases, the finding of asymptomatic inhabitants and the geographic separation of patients allied to serological and molecular results raise the possibility of the existence of a simian reservoir in these areas.
Resumo:
Abstract Background Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by massive immunization. Due to its lethality, there are serious concerns about its use as a bioweapon. Here we analyze publicly available microarray data to further understand survival of smallpox infected macaques, using systems biology approaches. Our goal is to improve the knowledge about the progression of this disease. Results We used KEGG pathways annotations to define groups of genes (or modules), and subsequently compared them to macaque survival times. This technique provided additional insights about the host response to this disease, such as increased expression of the cytokines and ECM receptors in the individuals with higher survival times. These results could indicate that these gene groups could influence an effective response from the host to smallpox. Conclusion Macaques with higher survival times clearly express some specific pathways previously unidentified using regular gene-by-gene approaches. Our work also shows how third party analysis of public datasets can be important to support new hypotheses to relevant biological problems.
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.
Resumo:
Abstract Background Dizziness is a common complaint among older adults and has been linked to a wide range of health conditions, psychological and social characteristics in this population. However a profile of dizziness is still uncertain which hampers clinical decision-making. We therefore sought to explore the relationship between dizziness and a comprehensive range of demographic data, diseases, health and geriatric conditions, and geriatric syndromes in a representative sample of community-dwelling older people. Methods This is a cross-sectional, population-based study derived from FIBRA (Network for the Study of Frailty in Brazilian Elderly Adults), with 391 elderly adults, both men and women, aged 65 years and older. Elderly participants living at home in an urban area were enrolled through a process of random cluster sampling of census regions. The outcome variable was the self-report of dizziness in the last year. Several feelings of dizziness were investigated including vertigo, spinning, light or heavy headedness, floating, fuzziness, giddiness and instability. A multivariate logistic regression analysis was conducted to estimate the adjusted odds ratios and build the probability model for dizziness. Results The complaint of dizziness was reported by 45% of elderly adults, from which 71.6% were women (p=0.004). The multivariate regression analysis revealed that dizziness is associated with depressive symptoms (OR = 2.08; 95% CI 1.29–3.35), perceived fatigue (OR = 1.93; 95% CI 1.21-3.10), recurring falls (OR = 2.01; 95% CI 1.11-3.62) and excessive drowsiness (OR = 1.91; 95% CI 1.11–3.29). The discrimination of the final model was AUC = 0.673 (95% CI 0.619-0.727) (p< 0.001). Conclusions The prevalence of dizziness in community-dwelling elderly adults is substantial. It is associated with other common geriatric conditions usually neglected in elderly adults, such as fatigue and drowsiness, supporting its possible multifactorial manifestation. Our findings demonstrate the need to expand the design in future studies, aiming to estimate risk and identify possible causal relations.
Resumo:
Background Vitamin D transcriptional effects were linked to tumor growth control, however, the hormone targets were determined in cell cultures exposed to supra physiological concentrations of 1,25(OH)2D3 (50-100nM). Our aim was to evaluate the transcriptional effects of 1,25(OH)2D3 in a more physiological model of breast cancer, consisting of fresh tumor slices exposed to 1,25(OH)2D3 at concentrations that can be attained in vivo. Methods Tumor samples from post-menopausal breast cancer patients were sliced and cultured for 24 hours with or without 1,25(OH)2D3 0.5nM or 100nM. Gene expression was analyzed by microarray (SAM paired analysis, FDR≤0.1) or RT-qPCR (p≤0.05, Friedman/Wilcoxon test). Expression of candidate genes was then evaluated in mammary epithelial/breast cancer lineages and cancer associated fibroblasts (CAFs), exposed or not to 1,25(OH)2D3 0.5nM, using RT-qPCR, western blot or immunocytochemistry. Results 1,25(OH)2D3 0.5nM or 100nM effects were evaluated in five tumor samples by microarray and seven and 136 genes, respectively, were up-regulated. There was an enrichment of genes containing transcription factor binding sites for the vitamin D receptor (VDR) in samples exposed to 1,25(OH)2D3 near physiological concentration. Genes up-modulated by both 1,25(OH)2D3 concentrations were CYP24A1, DPP4, CA2, EFTUD1, TKTL1, KCNK3. Expression of candidate genes was subsequently evaluated in another 16 samples by RT-qPCR and up-regulation of CYP24A1, DPP4 and CA2 by 1,25(OH)2D3 was confirmed. To evaluate whether the transcripitonal targets of 1,25(OH)2D3 0.5nM were restricted to the epithelial or stromal compartments, gene expression was examined in HB4A, C5.4, SKBR3, MDA-MB231, MCF-7 lineages and CAFs, using RT-qPCR. In epithelial cells, there was a clear induction of CYP24A1, CA2, CD14 and IL1RL1. In fibroblasts, in addition to CYP24A1 induction, there was a trend towards up-regulation of CA2, IL1RL1, and DPP4. A higher protein expression of CD14 in epithelial cells and CA2 and DPP4 in CAFs exposed to 1,25(OH)2D3 0.5nM was detected. Conclusions In breast cancer specimens a short period of 1,25(OH)2D3 exposure at near physiological concentration modestly activates the hormone transcriptional pathway. Induction of CYP24A1, CA2, DPP4, IL1RL1 expression appears to reflect 1,25(OH)2D3 effects in epithelial as well as stromal cells, however, induction of CD14 expression is likely restricted to the epithelial compartment.
Resumo:
Abstract Background HCV is prevalent throughout the world. It is a major cause of chronic liver disease. There is no effective vaccine and the most common therapy, based on Peginterferon, has a success rate of ~50%. The mechanisms underlying viral resistance have not been elucidated but it has been suggested that both host and virus contribute to therapy outcome. Non-structural 5A (NS5A) protein, a critical virus component, is involved in cellular and viral processes. Methods The present study analyzed structural and functional features of 345 sequences of HCV-NS5A genotypes 1 or 3, using in silico tools. Results There was residue type composition and secondary structure differences between the genotypes. In addition, second structural variance were statistical different for each response group in genotype 3. A motif search indicated conserved glycosylation, phosphorylation and myristoylation sites that could be important in structural stabilization and function. Furthermore, a highly conserved integrin ligation site was identified, and could be linked to nuclear forms of NS5A. ProtFun indicated NS5A to have diverse enzymatic and nonenzymatic activities, participating in a great range of cell functions, with statistical difference between genotypes. Conclusion This study presents new insights into the HCV-NS5A. It is the first study that using bioinformatics tools, suggests differences between genotypes and response to therapy that can be related to NS5A protein features. Therefore, it emphasizes the importance of using bioinformatics tools in viral studies. Data acquired herein will aid in clarifying the structure/function of this protein and in the development of antiviral agents.