928 resultados para genomics
Resumo:
Introduction: Advances in genomics technologies are providing a very large amount of data on genome-wide gene expression profiles, protein molecules and their interactions with other macromolecules and metabolites. Molecular interaction networks provide a useful way to capture this complex data and comprehend it. Networks are beginning to be used in drug discovery, in many steps of the modern discovery pipeline, with large-scale molecular networks being particularly useful for the understanding of the molecular basis of the disease. Areas covered: The authors discuss network approaches used for drug target discovery and lead identification in the drug discovery pipeline. By reconstructing networks of targets, drugs and drug candidates as well as gene expression profiles under normal and disease conditions, the paper illustrates how it is possible to find relationships between different diseases, find biomarkers, explore drug repurposing and study emergence of drug resistance. Furthermore, the authors also look at networks which address particular important aspects such as off-target effects, combination-targets, mechanism of drug action and drug safety. Expert opinion: The network approach represents another paradigm shift in drug discovery science. A network approach provides a fresh perspective of understanding important proteins in the context of their cellular environments, providing a rational basis for deriving useful strategies in drug design. Besides drug target identification and inferring mechanism of action, networks enable us to address new ideas that could prove to be extremely useful for new drug discovery, such as drug repositioning, drug synergy, polypharmacology and personalized medicine.
Resumo:
Genomic sequences are far from being random but are made up of systematically ordered and information rich patterns. These repeated sequence patterns have been vastly utilized for their fundamental importance in understanding the genome function and organization. To this end, a comprehensive toolkit, RepEx, has been developed which extracts repeat (inverted, everted and mirror) patterns from the given genome sequence(s) without any constraints. The toolkit can also be used to fetch the inverted repeats present in the protein sequence (s). Further, it is capable of extracting exact and degenerate repeats with a user defined spacer intervals. It is remarkably more precise and sensitive when compared to the existing tools. An example with comprehensive case studies and a performance evaluation of the proposed toolkit has been presented to authenticate its efficiency and accuracy. (C) 2013 Elsevier Inc. All rights reserved.
Resumo:
In March 2012, the authors met at the National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina, USA, to discuss approaches and cooperative ventures in Indo-Pacific phylogeography. The group emerged with a series of findings: (1) Marine population structure is complex, but single locus mtDNA studies continue to provide powerful first assessment of phylogeographic patterns. (2) These patterns gain greater significance/power when resolved in a diversity of taxa. New analytical tools are emerging to address these analyses with multi-taxon approaches. (3) Genome-wide analyses are warranted if selection is indicated by surveys of standard markers. Such indicators can include discordance between genetic loci, or between genetic loci and morphology. Phylogeographic information provides a valuable context for studies of selection and adaptation. (4) Phylogeographic inferences are greatly enhanced by an understanding of the biology and ecology of study organisms. (5) Thorough, range-wide sampling of taxa is the foundation for robust phylogeographic inference. (6) Congruent geographic and taxonomic sampling by the Indo-Pacific community of scientists would facilitate better comparative analyses. The group concluded that at this stage of technology and software development, judicious rather than wholesale application of genomics appears to be the most robust course for marine phylogeographic studies. Therefore, our group intends to affirm the value of traditional (''unplugged'') approaches, such as those based on mtDNA sequencing and microsatellites, along with essential field studies, in an era with increasing emphasis on genomic approaches.
Resumo:
In China, the recent outbreak of novel influenza A/H7N9 virus has been assumed to be severe, and it may possibly turn brutal in the near future. In order to develop highly protective vaccines and drugs for the A/H7N9 virus, it is critical to find out the selection pressure of each amino acid site. In the present study, six different statistical methods consisting of four independent codon-based maximum likelihood (CML) methods, one hierarchical Bayesian (HB) method and one branch-site (BS) method, were employed to determine if each amino acid site of A/H7N9 virus is under natural selection pressure. Functions for both positively and negatively selected sites were inferred by annotating these sites with experimentally verified amino acid sites. Comprehensively, the single amino acid site 627 of PB2 protein was inferred as positively selected and it function was identified as a T-cell epitope (TCE). Among the 26 negatively selected amino acid sites of PB2, PB1, PA, HA, NP, NA, M1 and NS2 proteins, only 16 amino acid sites were identified to be involved in TCEs. In addition, 7 amino acid sites including, 608 and 609 of PA, 480 of NP, and 24, 25, 109 and 205 of M1, were identified to be involved in both B-cell epitopes (BCEs) and TCEs. Conversely, the function of positions 62 of PA, and, 43 and 113 of HA was unknown. In conclusion, the seven amino acid sites engaged in both BCEs and TCEs were identified as highly suitable targets, as these sites will be predicted to play a principal role in inducing strong humoral and cellular immune responses against A/H7N9 virus. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
Streptococcus pneumoniae causes pneumonia, septicemia and meningitis. S. pneumoniae is responsible for significant mortality both in children and in the elderly. In recent years, the whole genome sequencing of various S. pneumoniae strains have increased manifold and there is an urgent need to provide organism specific annotations to the scientific community. This prompted us to develop the Streptococcus pneumoniae Genome Database (SPGDB) to integrate and analyze the completely sequenced and available S. pneumoniae genome sequences. Further, links to several tools are provided to compare the pool of gene and protein sequences, and proteins structure across different strains of S. pneumoniae. SPGDB aids in the analysis of phenotypic variations as well as to perform extensive genomics and evolutionary studies with reference to S. pneumoniae. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
Background: In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to ``Domains of Unknown Function'' (DUF) or ``Uncharacterized Protein Family'' (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function. Results: We applied a `computational structural genomics' approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low-confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/. For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659. Conclusions: This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still `non-trivial' with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners. Reviewers: This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian.
Resumo:
Background: Candida auris is a multidrug resistant, emerging agent of fungemia in humans. Its actual global distribution remains obscure as the current commercial methods of clinical diagnosis misidentify it as C. haemulonii. Here we report the first draft genome of C. auris to explore the genomic basis of virulence and unique differences that could be employed for differential diagnosis. Results: More than 99.5 % of the C. auris genomic reads did not align to the current whole (or draft) genome sequences of Candida albicans, Candida lusitaniae, Candida glabrata and Saccharomyces cerevisiae; thereby indicating its divergence from the active Candida clade. The genome spans around 12.49 Mb with 8527 predicted genes. Functional annotation revealed that among the sequenced Candida species, it is closest to the hemiascomycete species Clavispora lusitaniae. Comparison with the well-studied species Candida albicans showed that it shares significant virulence attributes with other pathogenic Candida species such as oligopeptide transporters, mannosyl transfersases, secreted proteases and genes involved in biofilm formation. We also identified a plethora of transporters belonging to the ABC and major facilitator superfamily along with known MDR transcription factors which explained its high tolerance to antifungal drugs. Conclusions: Our study emphasizes an urgent need for accurate fungal screening methods such as PCR and electrophoretic karyotyping to ensure proper management of fungemia. Our work highlights the potential genetic mechanisms involved in virulence and pathogenicity of an important emerging human pathogen namely C. auris. Owing to its diversity at the genomic scale; we expect the genome sequence to be a useful resource to map species specific differences that will help develop accurate diagnostic markers and better drug targets.
Resumo:
The Asian elephant Elephas maximus and the African elephant Loxodonta africana that diverged 5-7 million years ago exhibit differences in their physiology, behaviour and morphology. A comparative genomics approach would be useful and necessary for evolutionary and functional genetic studies of elephants. We performed sequencing of E. maximus and map to L. africana at similar to 15X coverage. Through comparative sequence analyses, we have identified Asian elephant specific homozygous, non-synonymous single nucleotide variants (SNVs) that map to 1514 protein coding genes, many of which are involved in olfaction. We also present the first report of a high-coverage transcriptome sequence in E. maximus from peripheral blood lymphocytes. We have identified 103 novel protein coding transcripts and 66-long non-coding (lnc)RNAs. We also report the presence of 181 protein domains unique to elephants when compared to other Afrotheria species. Each of these findings can be further investigated to gain a better understanding of functional differences unique to elephant species, as well as those unique to elephantids in comparison with other mammals. This work therefore provides a valuable resource to explore the immense research potential of comparative analyses of transcriptome and genome sequences in the Asian elephant.
Resumo:
Background: Colorectal cancer (CRC) is a disease of complex aetiology, with much of the expected inherited risk being due to several common low risk variants. Genome-Wide Association Studies (GWAS) have identified 20 CRC risk variants. Nevertheless, these have only been able to explain part of the missing heritability. Moreover, these signals have only been inspected in populations of Northern European origin. Results: Thus, we followed the same approach in a Spanish cohort of 881 cases and 667 controls. Sixty-four variants at 24 loci were found to be associated with CRC at p-values <10-5. We therefore evaluated the 24 loci in another Spanish replication cohort (1481 cases and 1850 controls). Two of these SNPs, rs12080929 at 1p33 (P-replication=0.042; P-pooled=5.523x10(-03); OR (CI95%)=0.866(0.782-0.959)) and rs11987193 at 8p12 (P-replication=0.039; P-pooled=6.985x10(-5); OR (CI95%)=0.786(0.705-0.878)) were replicated in the second Phase, although they did not reach genome-wide statistical significance. Conclusions: We have performed the first CRC GWAS in a Southern European population and by these means we were able to identify two new susceptibility variants at 1p33 and 8p12 loci. These two SNPs are located near the SLC5A9 and DUSP4 loci, respectively, which could be good functional candidates for the association signals. We therefore believe that these two markers constitute good candidates for CRC susceptibility loci and should be further evaluated in other larger datasets. Moreover, we highlight that were these two SNPs true susceptibility variants, they would constitute a decrease in the CRC missing heritability fraction.
Resumo:
A central objective in signal processing is to infer meaningful information from a set of measurements or data. While most signal models have an overdetermined structure (the number of unknowns less than the number of equations), traditionally very few statistical estimation problems have considered a data model which is underdetermined (number of unknowns more than the number of equations). However, in recent times, an explosion of theoretical and computational methods have been developed primarily to study underdetermined systems by imposing sparsity on the unknown variables. This is motivated by the observation that inspite of the huge volume of data that arises in sensor networks, genomics, imaging, particle physics, web search etc., their information content is often much smaller compared to the number of raw measurements. This has given rise to the possibility of reducing the number of measurements by down sampling the data, which automatically gives rise to underdetermined systems.
In this thesis, we provide new directions for estimation in an underdetermined system, both for a class of parameter estimation problems and also for the problem of sparse recovery in compressive sensing. There are two main contributions of the thesis: design of new sampling and statistical estimation algorithms for array processing, and development of improved guarantees for sparse reconstruction by introducing a statistical framework to the recovery problem.
We consider underdetermined observation models in array processing where the number of unknown sources simultaneously received by the array can be considerably larger than the number of physical sensors. We study new sparse spatial sampling schemes (array geometries) as well as propose new recovery algorithms that can exploit priors on the unknown signals and unambiguously identify all the sources. The proposed sampling structure is generic enough to be extended to multiple dimensions as well as to exploit different kinds of priors in the model such as correlation, higher order moments, etc.
Recognizing the role of correlation priors and suitable sampling schemes for underdetermined estimation in array processing, we introduce a correlation aware framework for recovering sparse support in compressive sensing. We show that it is possible to strictly increase the size of the recoverable sparse support using this framework provided the measurement matrix is suitably designed. The proposed nested and coprime arrays are shown to be appropriate candidates in this regard. We also provide new guarantees for convex and greedy formulations of the support recovery problem and demonstrate that it is possible to strictly improve upon existing guarantees.
This new paradigm of underdetermined estimation that explicitly establishes the fundamental interplay between sampling, statistical priors and the underlying sparsity, leads to exciting future research directions in a variety of application areas, and also gives rise to new questions that can lead to stand-alone theoretical results in their own right.
Resumo:
The main focus of this thesis is the use of high-throughput sequencing technologies in functional genomics (in particular in the form of ChIP-seq, chromatin immunoprecipitation coupled with sequencing, and RNA-seq) and the study of the structure and regulation of transcriptomes. Some parts of it are of a more methodological nature while others describe the application of these functional genomic tools to address various biological problems. A significant part of the research presented here was conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project.
The first part of the thesis focuses on the structure and diversity of the human transcriptome. Chapter 1 contains an analysis of the diversity of the human polyadenylated transcriptome based on RNA-seq data generated for the ENCODE Project. Chapter 2 presents a simulation-based examination of the performance of some of the most popular computational tools used to assemble and quantify transcriptomes. Chapter 3 includes a study of variation in gene expression, alternative splicing and allelic expression bias on the single-cell level and on a genome-wide scale in human lymphoblastoid cells; it also brings forward a number of critical to the practice of single-cell RNA-seq measurements methodological considerations.
The second part presents several studies applying functional genomic tools to the study of the regulatory biology of organellar genomes, primarily in mammals but also in plants. Chapter 5 contains an analysis of the occupancy of the human mitochondrial genome by TFAM, an important structural and regulatory protein in mitochondria, using ChIP-seq. In Chapter 6, the mitochondrial DNA occupancy of the TFB2M transcriptional regulator, the MTERF termination factor, and the mitochondrial RNA and DNA polymerases is characterized. Chapter 7 consists of an investigation into the curious phenomenon of the physical association of nuclear transcription factors with mitochondrial DNA, based on the diverse collections of transcription factor ChIP-seq datasets generated by the ENCODE, mouseENCODE and modENCODE consortia. In Chapter 8 this line of research is further extended to existing publicly available ChIP-seq datasets in plants and their mitochondrial and plastid genomes.
The third part is dedicated to the analytical and experimental practice of ChIP-seq. As part of the ENCODE Project, a set of metrics for assessing the quality of ChIP-seq experiments was developed, and the results of this activity are presented in Chapter 9. These metrics were later used to carry out a global analysis of ChIP-seq quality in the published literature (Chapter 10). In Chapter 11, the development and initial application of an automated robotic ChIP-seq (in which these metrics also played a major role) is presented.
The fourth part presents the results of some additional projects the author has been involved in, including the study of the role of the Piwi protein in the transcriptional regulation of transposon expression in Drosophila (Chapter 12), and the use of single-cell RNA-seq to characterize the heterogeneity of gene expression during cellular reprogramming (Chapter 13).
The last part of the thesis provides a review of the results of the ENCODE Project and the interpretation of the complexity of the biochemical activity exhibited by mammalian genomes that they have revealed (Chapters 15 and 16), an overview of the expected in the near future technical developments and their impact on the field of functional genomics (Chapter 14), and a discussion of some so far insufficiently explored research areas, the future study of which will, in the opinion of the author, provide deep insights into many fundamental but not yet completely answered questions about the transcriptional biology of eukaryotes and its regulation.
Resumo:
A long-standing yet to be accomplished task in understanding behavior is to dissect the function of each gene involved in the development and function of a neuron. The C. elegans ALA neuron was chosen in this study for its known function in sleep, an ancient but less understood animal behavior. Single-cell transcriptome profiling identified 8,133 protein-coding genes in the ALA neuron, of which 57 are neuropeptide-coding genes. The most enriched genes are also neuropeptides. In combination with gain-of-function and loss-of-function assays, here I showed that the ALA-enriched FMRFamide neuropeptides, FLP-7, FLP-13, and FLP-24, are sufficient and necessary for inducing C. elegans sleep. These neuropeptides act as neuromodulators through GPCRs, NPR-7, and NPR-22. Further investigation in zebrafish indicates that FMRFamide neuropeptides are sleep-promoting molecules in animals. To correlate the behavioral outputs with genomic context, I constructed a gene regulatory network of the relevant genes controlling C. elegans sleep behavior through EGFR signaling in the ALA neuron. First, I identified an ALA cell-specific motif to conduct a genome-wide search for possible ALA-expressed genes. I then filtered out non ALA-expressed genes by comparing the motif-search genes with ALA transcriptomes from single-cell profiling. In corroborating with ChIP-seq data from modENCODE, I sorted out direct interaction of ALA-expressed transcription factors and differentiation genes in the EGFR sleep regulation pathway. This approach provides a network reference for the molecular regulation of C. elegans sleep behavior, and serves as an entry point for the understanding of functional genomics in animal behaviors.
Resumo:
Background: The impact of nano-scaled materials on photosynthetic organisms needs to be evaluated. Plants represent the largest interface between the environment and biosphere, so understanding how nanoparticles affect them is especially relevant for environmental assessments. Nanotoxicology studies in plants allude to quantum size effects and other properties specific of the nano-stage to explain increased toxicity respect to bulk compounds. However, gene expression profiles after exposure to nanoparticles and other sources of environmental stress have not been compared and the impact on plant defence has not been analysed. Results: Arabidopsis plants were exposed to TiO2-nanoparticles, Ag-nanoparticles, and multi-walled carbon nanotubes as well as different sources of biotic (microbial pathogens) or abiotic (saline, drought, or wounding) stresses. Changes in gene expression profiles and plant phenotypic responses were evaluated. Transcriptome analysis shows similarity of expression patterns for all plants exposed to nanoparticles and a low impact on gene expression compared to other stress inducers. Nanoparticle exposure repressed transcriptional responses to microbial pathogens, resulting in increased bacterial colonization during an experimental infection. Inhibition of root hair development and transcriptional patterns characteristic of phosphate starvation response were also observed. The exogenous addition of salicylic acid prevented some nano-specific transcriptional and phenotypic effects, including the reduction in root hair formation and the colonization of distal leaves by bacteria. Conclusions: This study integrates the effect of nanoparticles on gene expression with plant responses to major sources of environmental stress and paves the way to remediate the impact of these potentially damaging compounds through hormonal priming.
Resumo:
Great advances have been, and are being made in our knowledge of the genetics and molecular biology (including genomics, proteomics and structural biology). Global molecular profiling technologies such as microassays using DNA or oligonucleotide chip, and protein and lipid chips are being developed. The application of such biotechnological advances are inevitable in aquaculture in the areas of improvement of aquaculture stocks where many molecular markers such as RFLPs, AFLDs and RAPD are now available for genome analysis, finger printing and genetic linkage mapping. Transgenic technology has been developed in a number of fish species and research is being pursed to produce transgenic fish carrying genes that encode antimicrobial peptides such as lysozyme thereby achieving disease resistance in fish. Also it is a short cut to achieving genetic change for fast growth and other desirable traits like early sexual maturity, temperature tolerance and feed conversion efficiency. KEYWORDS: Fish genetics, transgenesis, monoploidy, diploidy, polyploidy,gynogenesis, androgenesis, cryopreservation.