944 resultados para Human Genes
Resumo:
Synonymous codon bias has been examined in 78 human genes (19967 codons) and measured by relative synonymous codon usage (RSCU). Relative frequencies of all kinds of dinucleotides in 2,3 or 3,4 codon positions have been calculated, and codon-anticodon bin
Resumo:
728 human genes were divided to four groups according to the GC contents of their coding sequences (from GC<0.43 to GC>0.58). Examination of synonymous-codon bias in the 4 groups show that NTG (N represents any base of T, A, C, G) is most favored and NCG
Resumo:
Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
Resumo:
From the late 1980s, the automation of sequencing techniques and the computer spread gave rise to a flourishing number of new molecular structures and sequences and to proliferation of new databases in which to store them. Here are presented three computational approaches able to analyse the massive amount of publicly avalilable data in order to answer to important biological questions. The first strategy studies the incorrect assignment of the first AUG codon in a messenger RNA (mRNA), due to the incomplete determination of its 5' end sequence. An extension of the mRNA 5' coding region was identified in 477 in human loci, out of all human known mRNAs analysed, using an automated expressed sequence tag (EST)-based approach. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 and the consequences for the functional studies are discussed. The second approach analyses the codon bias, the phenomenon in which distinct synonymous codons are used with different frequencies, and, following integration with a gene expression profile, estimates the total number of codons present across all the expressed mRNAs (named here "codonome value") in a given biological condition. Systematic analyses across different pathological and normal human tissues and multiple species shows a surprisingly tight correlation between the codon bias and the codonome bias. The third approach is useful to studies the expression of human autism spectrum disorder (ASD) implicated genes. ASD implicated genes sharing microRNA response elements (MREs) for the same microRNA are co-expressed in brain samples from healthy and ASD affected individuals. The different expression of a recently identified long non coding RNA which have four MREs for the same microRNA could disrupt the equilibrium in this network, but further analyses and experiments are needed.
Resumo:
Somatic-cell hybrids have been shown to maintain the correct epigenetic chromatin states to study developmental globin gene expression as well as gene expression on the active and inactive X chromosomes. This suggests the potential use of somatic-cell hybrids containing either a maternal or a paternal human chromosome as a model system to study known imprinted genes and to identify as-yet-unknown imprinted genes. Testing gene expression by using reverse transcription followed by PCR, we show that functional imprints are maintained at four previously characterized 15q11–q13 loci in hybrids containing a single human chromosome 15 and at two chromosome 11p15 loci in hybrids containing a single chromosome 11. In contrast, three γ-aminobutyric acid type A receptor subunit genes in 15q12–q13 are nonimprinted. Furthermore, we have found that differential DNA methylation imprints at the SNRPN promoter and at a CpG island in 11p15 are also maintained in somatic-cell hybrids. Somatic-cell hybrids therefore are a valid and powerful system for studying known imprinted genes as well as for rapidly identifying new imprinted genes.
Resumo:
The sequencing of the human genome has led to the identification of many genes whose functions remain to be determined. Because of conservation of genetic function, microbial systems have often been used for identification and characterization of human genes. We have investigated the use of the Escherichia coli SOS induction assay as a screen for yeast and human genes that might play a role in DNA metabolism and/or in genome stability. The SOS system has previously been used to analyze bacterial and viral genes that directly modify DNA. An initial screen of meiotically expressed yeast genes revealed several genes associated with chromosome metabolism (e.g., RAD51 and HHT1 as well as others). The SOS induction assay was then extended to the isolation of human genes. Several known human genes involved in DNA metabolism, such as the Ku70 end-binding protein and DNA ligase IV, were identified, as well as a large number of previously unknown genes. Thus, the SOS assay can be used to identify and characterize human genes, many of which may participate in chromosome metabolism.
Resumo:
Of the rules used by the splicing machinery to precisely determine intron–exon boundaries only a fraction is known. Recent evidence suggests that specific short sequences within exons help in defining these boundaries. Such sequences are known as exonic splicing enhancers (ESE). A possible bioinformatical approach to studying ESE sequences is to compare genes that harbor introns with genes that do not. For this purpose two non-redundant samples of 719 intron-containing and 63 intron-lacking human genes were created. We performed a statistical analysis on these datasets of intron-containing and intron-lacking human coding sequences and found a statistically significant difference (P = 0.01) between these samples in terms of 5–6mer oligonucleotide distributions. The difference is not created by a few strong signals present in the majority of exons, but rather by the accumulation of multiple weak signals through small variations in codon frequencies, codon biases and context-dependent codon biases between the samples. A list of putative novel human splicing regulation sequences has been elucidated by our analysis.
Resumo:
The base composition pattern (BCP) in the putative promoter region (PPRs) up to 5 Kb lengths of 682 human genes on Chromosome 22 (Chr22) was examined. Two-dimensional (2D) and three-dimensional (3D) functions were designed to delineate the DNA base composition, with four major patterns identified. It is found that 17.6% genes include TATA box, 28.0% GC box, 18.9% CAAT box and 38.4% CpG islands, and approximately 10% genes have one of four putative initiator (Inr) motifs. The occurrence of the promoter elements is tightly associated with the base composition features in the promoter regions, and the associations of the base composition features with occurrence of the promoter elements in the promoter regions mediate tissue-wide expression of the genes in human. The occurrence of two or more promoter elements in the promoter regions is required for the medium- and wide-range expression profiles of the human genes on Chr22. Thus, the reported data shed light on the characteristics of the PPRs of the human genes on Chr22, which may improve our understanding of regulatory roles of the PPRs with occurrence of the promoter elements in gene expression.
Resumo:
Abstract Background RNAs transcribed from intronic regions of genes are involved in a number of processes related to post-transcriptional control of gene expression. However, the complement of human genes in which introns are transcribed, and the number of intronic transcriptional units and their tissue expression patterns are not known. Results A survey of mRNA and EST public databases revealed more than 55,000 totally intronic noncoding (TIN) RNAs transcribed from the introns of 74% of all unique RefSeq genes. Guided by this information, we designed an oligoarray platform containing sense and antisense probes for each of 7,135 randomly selected TIN transcripts plus the corresponding protein-coding genes. We identified exonic and intronic tissue-specific expression signatures for human liver, prostate and kidney. The most highly expressed antisense TIN RNAs were transcribed from introns of protein-coding genes significantly enriched (p = 0.002 to 0.022) in the 'Regulation of transcription' Gene Ontology category. RNA polymerase II inhibition resulted in increased expression of a fraction of intronic RNAs in cell cultures, suggesting that other RNA polymerases may be involved in their biosynthesis. Members of a subset of intronic and protein-coding signatures transcribed from the same genomic loci have correlated expression patterns, suggesting that intronic RNAs regulate the abundance or the pattern of exon usage in protein-coding messages. Conclusion We have identified diverse intronic RNA expression patterns, pointing to distinct regulatory roles. This gene-oriented approach, using a combined intron-exon oligoarray, should permit further comparative analysis of intronic transcription under various physiological and pathological conditions, thus advancing current knowledge about the biological functions of these noncoding RNAs.
Resumo:
Human x rodent somatic cell hybrids have played an important role in human genetics research. They have been especially useful for assigning genes to chromosomes and isolating DNA markers from specific regions of the human genome.^ By employing a combination of somatic cell genetic, recombinant DNA, and cytogenetic techniques, human DNA excision repair gene ERCC4 was mapped regionally to human 16p13.13-13.2, even though the gene has not been cloned. Human x Chinese hamster ovary (CHO) cell hybrids selected for human ERCC4 activity and containing 16p13.1-p13.3 as the only human genetic material were identified. These hybrids were used to order DNA markers located in 16p13.1-p13.3. New DNA markers physically close to ERCC4 were isolated from such hybrids. Using amplified human DNA from the hybrids as probe in fluorescent in situ hybridization, the short arm breakpoint in the chromosome 16 inversion associated with acute myelomonocytic leukemia (AMML) was found to be physically close to the ERCC4 gene. The physical mapping and eventually, the cloning of the ERCC4 gene, will benefit the understanding of the DNA repair system and the study of other important biomedical problems such as tumorigenesis.^ To facilitate the cloning of ERCC4 gene and, in general, the cloning of genes from any defined regions of the human genome, a method was developed for the direct isolation of human transcribed genes ffom somatic cell hybrids. cDNA was prepared from human x rodent hybrid by using consensus 5$\sp\prime$ splice site sequences as primers. These primers were designed to select immature, unspliced messenger RNA (still retaining species specific repeat sequences) as templates. Screening of a derived cDNA library for human repeat sequences resulted in the isolation of human clones at the anticipated frequency with characteristics expected of exons of transcribed human genes. The usefulness of the splice site specific primers was analyzed and the cDNA synthesis conditions with these primers were optimized. The procedure was shown to be sensitive enough to clone weakly expressed genes. Studying the expression of the represented genes with the isolated clones was shown to be feasible. Such regional specific human gene fragments will be very valuable for many human genetic studies such as the search of inherited disease genes and the construction of a cDNA map of the human genome. ^
Resumo:
The MMS19 gene of the yeast Saccharomyces cerevisiae encodes a polypeptide of unknown function which is required for both nucleotide excision repair (NER) and RNA polymerase II (RNAP II) transcription. Here we report the molecular cloning of human and mouse orthologs of the yeast MMS19 gene. Both human and Drosophila MMS19 cDNAs correct thermosensitive growth and sensitivity to killing by UV radiation in a yeast mutant deleted for the MMS19 gene, indicating functional conservation between the yeast and mammalian gene products. Alignment of the translated sequences of MMS19 from multiple eukaryotes, including mouse and human, revealed the presence of several conserved regions, including a HEAT repeat domain near the C-terminus. The presence of HEAT repeats, coupled with functional complementation of yeast mutant phenotypes by the orthologous protein from higher eukaryotes, suggests a role of Mms19 protein in the assembly of a multiprotein complex(es) required for NER and RNAP II transcription. Both the mouse and human genes are ubiquitously expressed as multiple transcripts, some of which appear to derive from alternative splicing. The ratio of different transcripts varies in several different tissue types.
Resumo:
Microarrays containing 1046 human cDNAs of unknown sequence were printed on glass with high-speed robotics. These 1.0-cm2 DNA "chips" were used to quantitatively monitor differential expression of the cognate human genes using a highly sensitive two-color hybridization assay. Array elements that displayed differential expression patterns under given experimental conditions were characterized by sequencing. The identification of known and novel heat shock and phorbol ester-regulated genes in human T cells demonstrates the sensitivity of the assay. Parallel gene analysis with microarrays provides a rapid and efficient method for large-scale human gene discovery.
Resumo:
Five human diseases are due to an excessive number of CAG repeats in the coding regions of five different genes. We have analyzed the repeat regions in four of these genes from nonhuman primates, which are not known to suffer from the diseases. These primates have CAG repeats at the same sites as in human alleles, and there is similar polymorphism of repeat number, but this number is smaller than in the human genes. In some of the genes, the segment of poly(CAG) has expanded in nonhuman primates, but the process has advanced further in the human lineage than in other primate lineages, thereby predisposing to diseases of CAG reiteration. Adjacent to stretches of homogeneous present-day codon repeats, previously existing codons of the same kind have undergone nucleotide substitutions with high frequency. Where these lead to amino acid substitutions, the effect will be to reduce the length of the original homopolymeric stretch in the protein.
Resumo:
Prior to the completion of the human genome project, the human genome was thought to have a greater number of genes as it seemed structurally and functionally more complex than other simpler organisms. This along with the belief of “one gene, one protein”, were demonstrated to be incorrect. The inequality in the ratio of gene to protein formation gave rise to the theory of alternative splicing (AS). AS is a mechanism by which one gene gives rise to multiple protein products. Numerous databases and online bioinformatic tools are available for the detection and analysis of AS. Bioinformatics provides an important approach to study mRNA and protein diversity by various tools such as expressed sequence tag (EST) sequences obtained from completely processed mRNA. Microarrays and deep sequencing approaches also aid in the detection of splicing events. Initially it was postulated that AS occurred only in about 5%; of all genes but was later found to be more abundant. Using bioinformatic approaches, the level of AS in human genes was found to be fairly high with 35-59%; of genes having at least one AS form. Our ability to determine and predict AS is important as disorders in splicing patterns may lead to abnormal splice variants resulting in genetic diseases. In addition, the diversity of proteins produced by AS poses a challenge for successful drug discovery and therefore a greater understanding of AS would be beneficial.
Resumo:
Enterovirus 71 (EV71) is one of the main etiological agents for Hand, Foot and Mouth Disease (HFMD) and has been shown to be associated with severe clinical manifestation. Currently, there is no antiviral therapeutic for the treatment of HFMD patients owing to a lack of understanding of EV71 pathogenesis. This study seeks to elucidate the transcriptomic changes that result from EV71 infection. Human whole genome microarray was employed to monitor changes in genomic profiles between infected and uninfected cells. The results reveal altered expression of human genes involved in critical pathways including the immune response and the stress response. Together, data from this study provide valuable insights into the host–pathogen interaction between human colorectal cells and EV71.