19 resultados para Publicly Available Online Service
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Background: Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods: We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results: Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [p(trend)] = 2.5 x 10(-3)). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76-0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion: This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config).
Resumo:
The broad goals of verifiable visualization rely on correct algorithmic implementations. We extend a framework for verification of isosurfacing implementations to check topological properties. Specifically, we use stratified Morse theory and digital topology to design algorithms which verify topological invariants. Our extended framework reveals unexpected behavior and coding mistakes in popular publicly available isosurface codes.
Resumo:
Coccidiosis of the domestic fowl is a worldwide disease caused by seven species of protozoan parasites of the genus Eimeria. The genome of the model species, Eimeria tenella, presents a complexity of 55-60 MB distributed in 14 chromosomes. Relatively few studies have been undertaken to unravel the complexity of the transcriptome of Eimeria parasites. We report here the generation of more than 45,000 open reading frame expressed sequence tag (ORESTES) cDNA reads of E. tenella, Eimeria maxima and Eimeria acervulina, covering several developmental stages: unsporulated oocysts, sporoblastic oocysts, sporulated oocysts, sporozoites and second generation merozoites. All reads were assembled to constitute gene indices and submitted to a comprehensive functional annotation pipeline. In the case of E. tenella, we also incorporated publicly available ESTs to generate an integrated body of information. Orthology analyses have identified genes conserved across different apicomplexan parasites, as well as genes restricted to the genus Eimeria. Digital expression profiles obtained from ORESTES/EST countings, submitted to clustering analyses, revealed a high conservation pattern across the three Eimeria spp. Distance trees showed that unsporulated and sporoblastic oocysts constitute a distinct clade in all species, with sporulated oocysts forming a more external branch. This latter stage also shows a close relationship with sporozoites, whereas first and second generation merozoites are more closely related to each other than to sporozoites. The profiles were unambiguously associated with the distinct developmental stages and strongly correlated with the order of the stages in the parasite life cycle. Finally, we present The Eimeria Transcript Database (http://www.coccidia.icb.usp.br/eimeriatdb), a website that provides open access to all sequencing data, annotation and comparative analysis. We expect this repository to represent a useful resource to the Eimeria scientific community, helping to define potential candidates for the development of new strategies to control coccidiosis of the domestic fowl. (C) 2011 Australian Society for Parasitology Inc. Published by Elsevier Ltd. All rights reserved.
Resumo:
While fewer in number than the dominant rotation-powered radio pulsar population, peculiar classes of isolated neutron stars (INSs) which include magnetars, the ROSAT-discovered "Magnificent Seven" (M7), rotating radio transients (RRATs), and central compact objects in supernova remnants (CCOs) - represent a key element in understanding the neutron star phenomenology. We report the results of an observational campaign to study the properties of the source 2XMM J104608.7-594306, a newly discovered thermally emitting INS. The evolutionary state of the neutron star is investigated by means of deep dedicated observations obtained with the XMM-Newton Observatory, the ESO Very Large Telescope, as well as publicly available gamma-ray data from the Fermi Space Telescope and the AGILE Mission. The observations confirm previous expectations and reveal a unique type of object. The source, which is likely within the Carina Nebula (N-H = 2.6x10(21) cm(-2)), has a spectrum that is both thermal and soft, with kT(infinity) = 135 eV. Non-thermal (magnetospheric) emission is not detected down to 1% (3 sigma, 0.1-12 keV) of the source luminosity. Significant deviations (absorption features) from a simple blackbody model are identified in the spectrum of the source around energies 0.6 keV and 1.35 keV. While the former deviation is likely related to a local oxygen overabundance in the Carina Nebula, the latter can only be accounted for by an additional spectral component, which is modelled as a Gaussian line in absorption with EW = 91 eV and sigma = 0.14 keV (1 sigma). Furthermore, the optical counterpart is fainter than m(V) = 27 (2 sigma) and no gamma-ray emission is significantly detected by either the Fermi or AGILE missions. Very interestingly, while these characteristics are remarkably similar to those of the M7 or the only RRAT so far detected in X-rays, which all have spin periods of a few seconds, we found intriguing evidence of very rapid rotation, P = 18.6ms, at the 4 sigma confidence level. We interpret these new results in the light of the observed properties of the currently known neutron star population, in particular those of standard rotation-powered pulsars, recycled objects, and CCOs. We find that none of these scenarios can satisfactorily explain the collective properties of 2XMM J104608.7-594306, although it may be related to the still poorly known class of Galactic anti-magnetars. Future XMM-Newton data, granted for the next cycle of observations (AO11), will help us to improve our current observational interpretation of the source, enabling us to significantly constrain the rate of pulsar spin down.
Resumo:
A common interest in gene expression data analysis is to identify from a large pool of candidate genes the genes that present significant changes in expression levels between a treatment and a control biological condition. Usually, it is done using a statistic value and a cutoff value that are used to separate the genes differentially and nondifferentially expressed. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating sequentially credibility intervals from predictive densities which are constructed using the sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained report evidence that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a well-known publicly available data set on Escherichia coli bacterium.
Resumo:
Abstract Background The CACTA (also called En/Spm) superfamily of DNA-only transposons contain the core sequence CACTA in their Terminal Inverted Repeats (TIRs) and so far have only been described in plants. Large transcriptome and genome sequence data have recently become publicly available for Schistosoma mansoni, a digenetic blood fluke that is a major causative agent of schistosomiasis in humans, and have provided a comprehensive repository for the discovery of novel genes and repetitive elements. Despite the extensive description of retroelements in S. mansoni, just a single DNA-only transposon belonging to the Merlin family has so far been reported in this organism. Results We describe a novel S. mansoni transposon named SmTRC1, for S. mansoni Transposon Related to CACTA 1, an element that shares several characteristics with plant CACTA transposons. Southern blotting indicates approximately 30–300 copies of SmTRC1 in the S. mansoni genome. Using genomic PCR followed by cloning and sequencing, we amplified and characterized a full-length and a truncated copy of this element. RT-PCR using S. mansoni mRNA followed by cloning and sequencing revealed several alternatively spliced transcripts of this transposon, resulting in distinct ORFs coding for different proteins. Interestingly, a survey of complete genomes from animals and fungi revealed several other novel TRC elements, indicating new families of DNA transposons belonging to the CACTA superfamily that have not previously been reported in these kingdoms. The first three bases in the S. mansoni TIR are CCC and they are identical to those in the TIRs of the insects Aedes aegypti and Tribolium castaneum, suggesting that animal TRCs may display a CCC core sequence. Conclusion The DNA-only transposable element SmTRC1 from S. mansoni exhibits various characteristics, such as generation of multiple alternatively-spliced transcripts, the presence of terminal inverted repeats at the extremities of the elements flanked by direct repeats and the presence of a Transposase_21 domain, that suggest a distant relationship to CACTA transposons from Magnoliophyta. Several sequences from other Metazoa and Fungi code for proteins similar to those encoded by SmTRC1, suggesting that such elements have a common ancestry, and indicating inheritance through vertical transmission before separation of the Eumetazoa, Fungi and Plants.
Resumo:
Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.
Resumo:
Abstract Background The thymus is a central lymphoid organ, in which bone marrow-derived T cell precursors undergo a complex process of maturation. Developing thymocytes interact with thymic microenvironment in a defined spatial order. A component of thymic microenvironment, the thymic epithelial cells, is crucial for the maturation of T-lymphocytes through cell-cell contact, cell matrix interactions and secretory of cytokines/chemokines. There is evidence that extracellular matrix molecules play a fundamental role in guiding differentiating thymocytes in both cortical and medullary regions of the thymic lobules. The interaction between the integrin α5β1 (CD49e/CD29; VLA-5) and fibronectin is relevant for thymocyte adhesion and migration within the thymic tissue. Our previous results have shown that adhesion of thymocytes to cultured TEC line is enhanced in the presence of fibronectin, and can be blocked with anti-VLA-5 antibody. Results Herein, we studied the role of CD49e expressed by the human thymic epithelium. For this purpose we knocked down the CD49e by means of RNA interference. This procedure resulted in the modulation of more than 100 genes, some of them coding for other proteins also involved in adhesion of thymocytes; others related to signaling pathways triggered after integrin activation, or even involved in the control of F-actin stress fiber formation. Functionally, we demonstrated that disruption of VLA-5 in human TEC by CD49e-siRNA-induced gene knockdown decreased the ability of TEC to promote thymocyte adhesion. Such a decrease comprised all CD4/CD8-defined thymocyte subsets. Conclusion Conceptually, our findings unravel the complexity of gene regulation, as regards key genes involved in the heterocellular cell adhesion between developing thymocytes and the major component of the thymic microenvironment, an interaction that is a mandatory event for proper intrathymic T cell differentiation.
Resumo:
Abstract Background MicroRNAs (miRNAs) are small regulatory RNAs, some of which are conserved in diverse plant genomes. Therefore, computational identification and further experimental validation of miRNAs from non-model organisms is both feasible and instrumental for addressing miRNA-based gene regulation and evolution. Sugarcane (Saccharum spp.) is an important biofuel crop with publicly available expressed sequence tag and genomic survey sequence databases, but little is known about miRNAs and their targets in this highly polyploid species. Results In this study, we have computationally identified 19 distinct sugarcane miRNA precursors, of which several are highly similar with their sorghum homologs at both nucleotide and secondary structure levels. The accumulation pattern of mature miRNAs varies in organs/tissues from the commercial sugarcane hybrid as well as in its corresponding founder species S. officinarum and S. spontaneum. Using sugarcane MIR827 as a query, we found a novel MIR827 precursor in the sorghum genome. Based on our computational tool, a total of 46 potential targets were identified for the 19 sugarcane miRNAs. Several targets for highly conserved miRNAs are transcription factors that play important roles in plant development. Conversely, target genes of lineage-specific miRNAs seem to play roles in diverse physiological processes, such as SsCBP1. SsCBP1 was experimentally confirmed to be a target for the monocot-specific miR528. Our findings support the notion that the regulation of SsCBP1 by miR528 is shared at least within graminaceous monocots, and this miRNA-based post-transcriptional regulation evolved exclusively within the monocots lineage after the divergence from eudicots. Conclusions Using publicly available nucleotide databases, 19 sugarcane miRNA precursors and one new sorghum miRNA precursor were identified and classified into 14 families. Comparative analyses between sugarcane and sorghum suggest that these two species retain homologous miRNAs and targets in their genomes. Such conservation may help to clarify specific aspects of miRNA regulation and evolution in the polyploid sugarcane. Finally, our dataset provides a framework for future studies on sugarcane RNAi-dependent regulatory mechanisms.
Resumo:
The President of Brazil established an Interministerial Work Group in order to “evaluate the model of classification and valuation of disabilities used in Brazil and to define the elaboration and adoption of a unique model for all the country”. Eight Ministries and/or Secretaries participated in the discussion over a period of 10 months, concluding that a proposed model should be based on the United Nations Convention on the Rights of Person with Disabilities, the International Classification of Functioning, Disability and Health, and the ‘support theory’, and organizing a list of recommendations and necessary actions for a Classification, Evaluation and Certification Network with national coverage.
Resumo:
Abstract Background Smallpox is a lethal disease that was endemic in many parts of the world until eradicated by massive immunization. Due to its lethality, there are serious concerns about its use as a bioweapon. Here we analyze publicly available microarray data to further understand survival of smallpox infected macaques, using systems biology approaches. Our goal is to improve the knowledge about the progression of this disease. Results We used KEGG pathways annotations to define groups of genes (or modules), and subsequently compared them to macaque survival times. This technique provided additional insights about the host response to this disease, such as increased expression of the cytokines and ECM receptors in the individuals with higher survival times. These results could indicate that these gene groups could influence an effective response from the host to smallpox. Conclusion Macaques with higher survival times clearly express some specific pathways previously unidentified using regular gene-by-gene approaches. Our work also shows how third party analysis of public datasets can be important to support new hypotheses to relevant biological problems.
Resumo:
Abstract Background Recently, it was realized that the functional connectivity networks estimated from actual brain-imaging technologies (MEG, fMRI and EEG) can be analyzed by means of the graph theory, that is a mathematical representation of a network, which is essentially reduced to nodes and connections between them. Methods We used high-resolution EEG technology to enhance the poor spatial information of the EEG activity on the scalp and it gives a measure of the electrical activity on the cortical surface. Afterwards, we used the Directed Transfer Function (DTF) that is a multivariate spectral measure for the estimation of the directional influences between any given pair of channels in a multivariate dataset. Finally, a graph theoretical approach was used to model the brain networks as graphs. These methods were used to analyze the structure of cortical connectivity during the attempt to move a paralyzed limb in a group (N=5) of spinal cord injured patients and during the movement execution in a group (N=5) of healthy subjects. Results Analysis performed on the cortical networks estimated from the group of normal and SCI patients revealed that both groups present few nodes with a high out-degree value (i.e. outgoing links). This property is valid in the networks estimated for all the frequency bands investigated. In particular, cingulate motor areas (CMAs) ROIs act as ‘‘hubs’’ for the outflow of information in both groups, SCI and healthy. Results also suggest that spinal cord injuries affect the functional architecture of the cortical network sub-serving the volition of motor acts mainly in its local feature property. In particular, a higher local efficiency El can be observed in the SCI patients for three frequency bands, theta (3-6 Hz), alpha (7-12 Hz) and beta (13-29 Hz). By taking into account all the possible pathways between different ROI couples, we were able to separate clearly the network properties of the SCI group from the CTRL group. In particular, we report a sort of compensatory mechanism in the SCI patients for the Theta (3-6 Hz) frequency band, indicating a higher level of “activation” Ω within the cortical network during the motor task. The activation index is directly related to diffusion, a type of dynamics that underlies several biological systems including possible spreading of neuronal activation across several cortical regions. Conclusions The present study aims at demonstrating the possible applications of graph theoretical approaches in the analyses of brain functional connectivity from EEG signals. In particular, the methodological aspects of the i) cortical activity from scalp EEG signals, ii) functional connectivity estimations iii) graph theoretical indexes are emphasized in the present paper to show their impact in a real application.
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
Abstract Background Oral squamous cell carcinoma (OSCC) is a frequent neoplasm, which is usually aggressive and has unpredictable biological behavior and unfavorable prognosis. The comprehension of the molecular basis of this variability should lead to the development of targeted therapies as well as to improvements in specificity and sensitivity of diagnosis. Results Samples of primary OSCCs and their corresponding surgical margins were obtained from male patients during surgery and their gene expression profiles were screened using whole-genome microarray technology. Hierarchical clustering and Principal Components Analysis were used for data visualization and One-way Analysis of Variance was used to identify differentially expressed genes. Samples clustered mostly according to disease subsite, suggesting molecular heterogeneity within tumor stages. In order to corroborate our results, two publicly available datasets of microarray experiments were assessed. We found significant molecular differences between OSCC anatomic subsites concerning groups of genes presently or potentially important for drug development, including mRNA processing, cytoskeleton organization and biogenesis, metabolic process, cell cycle and apoptosis. Conclusion Our results corroborate literature data on molecular heterogeneity of OSCCs. Differences between disease subsites and among samples belonging to the same TNM class highlight the importance of gene expression-based classification and challenge the development of targeted therapies.
Resumo:
Abstract Background Recent medical and biological technology advances have stimulated the development of new testing systems that have been providing huge, varied amounts of molecular and clinical data. Growing data volumes pose significant challenges for information processing systems in research centers. Additionally, the routines of genomics laboratory are typically characterized by high parallelism in testing and constant procedure changes. Results This paper describes a formal approach to address this challenge through the implementation of a genetic testing management system applied to human genome laboratory. We introduced the Human Genome Research Center Information System (CEGH) in Brazil, a system that is able to support constant changes in human genome testing and can provide patients updated results based on the most recent and validated genetic knowledge. Our approach uses a common repository for process planning to ensure reusability, specification, instantiation, monitoring, and execution of processes, which are defined using a relational database and rigorous control flow specifications based on process algebra (ACP). The main difference between our approach and related works is that we were able to join two important aspects: 1) process scalability achieved through relational database implementation, and 2) correctness of processes using process algebra. Furthermore, the software allows end users to define genetic testing without requiring any knowledge about business process notation or process algebra. Conclusions This paper presents the CEGH information system that is a Laboratory Information Management System (LIMS) based on a formal framework to support genetic testing management for Mendelian disorder studies. We have proved the feasibility and showed usability benefits of a rigorous approach that is able to specify, validate, and perform genetic testing using easy end user interfaces.