59 resultados para Human Genome Project
Resumo:
Familial hyperaldosteronism type II (FH-II) is caused by adrenocortical hyperplasia or aldosteronoma or both and is frequently transmitted in an autosomal dominant fashion. Unlike FH type I (FI-I-I), which results from fusion of the CYP11B1 and CYP11B2 genes, hyperaldosteronism in FH-II is not glucocorticoid remediable. A large family with FH-II was used for a genome wide search and its members were evaluated by measuring the aldosterone:renin ratio. In those with an increased ratio, FH-II was confirmed by fludrocortisone suppression testing. After excluding most of the genome, genetic linkage was identified with a maximum two point lod score of 3.26 at theta =0, between FH-II in this family and the polymorphic markers D7S511, D7S517, and GATA24F03 on chromosome 7,a region that corresponds to cytogenetic band 7p22. This is the first identified locus for FH-II; its molecular elucidation may provide further insight into the aetiology of primary aldosteronism.
Resumo:
Pheochromocytomas are tumors of the adrenal medulla originating in the chromaffin cells derived from the neural crest. Ten % of these tumors are associated with the familial cancer syndromes multiple endocrine neoplasia type 2, von Hippel-Lindau disease (VHL), and rarely, neurofibromatosis type 1, in which germ-line mutations have been identified in RET, VHL, and NF1, respectively. In both the sporadic and familial forms of pheochromocytoma, allelic loss at 1p, 3p, 17p, and 22q has been reported, yet the molecular pathogenesis of these tumors is largely unknown. Allelic loss at chromosome 1p has also been reported in other endocrine tumors, such as medullary thyroid cancer and tumors of the parathyroid gland, as well as in tumors of neural crest origin including neuroblastoma and malignant melanoma, In this study, we performed fine structure mapping of deletions at chromosome 1p in familial and sporadic pheochromocytomas to identify discrete regions likely housing tumor suppressor genes involved in the development of these tumors. Ten microsatellite markers spanning a region of similar to 70 cM (Ipter to 1p34.3) were used to screen 20 pheochromocytomas from 19 unrelated patients for loss of heterozygosity (LOH). LOH was detected at five or more loci in 8 of 13 (61%)sporadic samples and at five or more loci in four of five (80%) tumor samples from patients with multiple endocrine neoplasia type 2. No LOH at 1p was detected in pheochromocytomas from two VHL patients, Analysis of the combined sporadic and familial tumor data suggested three possible regions of common somatic loss, designated as PCI (D1S243 to D1S244), PC2 (D1S228 to D1S507), and PC3 (D1S507 toward the centromere). We propose that chromosome Ip may be the site of at least three putative tumor suppressor loci involved in the tumorigenesis of pheochromocytomas. At least one of these loci, PC2 spanning an interval of <3.8 cM, is Likely to have a broader role in the development of endocrine malignancies.
Resumo:
Fragile sites appear visually as nonstaining gaps on chromosomes that are inducible by specific cell culture conditions. Expansion of CGG/ CCG repeats has been shown to be the molecular basis of all five folate-sensitive fragile sites characterized molecularly so far, i.e., FRAXA, FRAXE, FRAXF, FRA11B, and FRA16A. In the present study we have refined the localization of the FRA10A folate-sensitive fragile site by fluorescence in situ hybridization. Sequence analysis of a BAC clone spanning FRA10A identified a single, imperfect, but polymorphic CGG repeat that is part of a CpG island in the 5'UTR of a novel gene named FRA10ACl. The number of CGG repeats varied in the population from 8 to 13. Expansions exceeding 200 repeat units were methylated in all FRA10A fragile site carriers tested. The FRA10ACl gene consists of 19 exons and is transcribed in the centromeric direction from the FRA10A repeat. The major transcript of similar to 1450 nt is ubiquitously expressed and codes for a highly conserved protein, FRA10ACl, of unknown function. Several splice variants leading to alternative 3' ends were identified (particularly in testis). These give rise to FRA10ACl proteins with altered COOH-termini. Immunofluorescence analysis of full-length, recombinant EGFP-tagged FRA10ACl protein showed that it was present exclusively in the nucleoplasm. We show that the expression of FRA10A, in parallel to the other cloned folate-sensitive fragile sites, is caused by an expansion and subsequent methylation of an unstable CGG trinucleotide repeat. Taking advantage of three cSNPs within the FRA10ACl gene we demonstrate that one allele of the gene is not transcribed in a FRA10A carrier. Our data also suggest that in the heterozygous state FRA10A is likely a benign folate-sensitive fragile site. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
In this paper we describe the assembly and restriction map of a 1.05-Mb cosmid contig spanning the candidate region for familial Mediterranean fever (FMF), a recessively inherited disorder of inflammation localized to 16p13.3. Using a combination of cosmid walking and screening for P1, PAC, BAG, and YAC clones, we have generated a contig of genomic clones spanning similar to 1050 kb that contains the FMF critical region. The map consists of 179 cosmid, 15 P1, 10 PAC, 3 BAG, and 17 YAC clones, anchored by 27 STS markers. Eight additional STSs have been developed from the similar to 700 kb immediately centromeric to this genomic region. Five of the 35 STSs are microsatellites that have not been previously reported. NotI and EcoRI mapping of the overlapping cosmids, hybridization of restriction fragments from cosmids to one another, and STS analyses have been used to validate the assembly of the contig. Our contig totally subsumes the 250-kb interval recently reported, by founder haplotype analysis, to contain the FMF gene. Thus, our high-resolution clone map provides an ideal resource for transcriptional mapping toward the eventual identification of this disease gene. (C) 1997 Academic Press.
Resumo:
The identification of genes responsible for the rare cases of familial leukemia may afford insight into the mechanism underlying the more common sporadic occurrences. Here we test a single family with 11 relevant meioses transmitting autosomal dominant acute myelogenous leukemia (AML) and myelodysplasia for linkage to three potential candidate loci. In a different family with inherited AML, linkage to chromosome 21q22.1-22.2 was recently reported; we exclude linkage to 21q22.1-22.2, demonstrating that familial AML is a heterogeneous disease. After reviewing familial leukemia and observing anticipation in the form of a declining age of onset with each generation, we had proposed 9p21-22 and 16q22 as additional candidate loci. Whereas linkage to 9p21-22 can be excluded, the finding of a maximum two-point LOD score of 2.82 with the microsatellite marker D16S522 at a recombination fraction theta = 0 provides evidence supporting linkage to 16q22. Haplotype analysis reveals a 23.5-cM (17.9-Mb) commonly inherited region among all affected family members extending from D16S451 to D1GS289, In order to extract maximum linkage information with missing individuals, incomplete informativeness with individual markers in this interval, and possible deviance from strict autosomal dominant inheritance, we performed nonparametric linkage analysis (NPL) and found a maximum NPL statistic corresponding to a P-value of .00098, close to the maximum conditional probability of linkage expected for a pedigree with this structure. Mutational analysis in this region specifically excludes expansion of the AT-rich minisatellite repeat FRA16B fragile site and the CAG trinucleotide repeat in the E2F-4 transcription factor. The ''repeat expansion detection'' method, capable of detecting dynamic mutation associated with anticipation, more generally excludes large CAG repeat expansion as a cause of leukemia in this family.
Resumo:
The enormous amount of information generated through sequencing of the human genome has increased demands for more economical and flexible alternatives in genomics, proteomics and drug discovery. Many companies and institutions have recognised the potential of increasing the size and complexity of chemical libraries by producing large chemical libraries on colloidal support beads. Since colloid-based compounds in a suspension are randomly located, an encoding system such as optical barcoding is required to permit rapid elucidation of the compound structures. We describe in this article innovative methods for optical barcoding of colloids for use as support beads in both combinatorial and non-combinatorial libraries. We focus in particular on the difficult problem of barcoding extremely large libraries, which if solved, will transform the manner in which genomics, proteomics and drug discovery research is currently performed.
Resumo:
The choice of genotyping families vs unrelated individuals is a critical factor in any large-scale linkage disequilibrium (LD) study. The use of unrelated individuals for such studies is promising, but in contrast to family designs, unrelated samples do not facilitate detection of genotyping errors, which have been shown to be of great importance for LD and linkage studies and may be even more important in genotyping collaborations across laboratories. Here we employ some of the most commonly-used analysis methods to examine the relative accuracy of haplotype estimation using families vs unrelateds in the presence of genotyping error. The results suggest that even slight amounts of genotyping error can significantly decrease haplotype frequency and reconstruction accuracy, that the ability to detect such errors in large families is essential when the number/complexity of haplotypes is high (low LD/common alleles). In contrast, in situations of low haplotype complexity (high LD and/or many rare alleles) unrelated individuals offer such a high degree of accuracy that there is little reason for less efficient family designs. Moreover, parent-child trios, which comprise the most popular family design and the most efficient in terms of the number of founder chromosomes per genotype but which contain little information for error detection, offer little or no gain over unrelated samples in nearly all cases, and thus do not seem a useful sampling compromise between unrelated individuals and large families. The implications of these results are discussed in the context of large-scale LD mapping projects such as the proposed genome-wide haplotype map.
Resumo:
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Resumo:
We have developed a computational strategy to identify the set of soluble proteins secreted into the extracellular environment of a cell. Within the protein sequences predominantly derived from the RIKEN representative transcript and protein set, we identified 2033 unique soluble proteins that are potentially secreted from the cell. These proteins contain a signal peptide required for entry into the secretory pathway and lack any transmembrane domains or intracellular localization signals. This class of proteins, which we have termed the mouse secretome, included >500 novel proteins and 92 proteins
Resumo:
The central dogma of biology holds that genetic information normally flows from DNA to RNA to protein. As a consequence it has been generally assumed that genes generally code for proteins, and that proteins fulfil not only most structural and catalytic but also most regulatory functions, in all cells, from microbes to mammals. However, the latter may not be the case in complex organisms. A number of startling observations about the extent of non-protein-coding RNA (ncRNA) transcription in the higher eukaryotes and the range of genetic and epigenetic phenomena that are RNA-directed suggests that the traditional view of the structure of genetic regulatory systems in animals and plants may be incorrect. ncRNA dominates the genomic output of the higher organisms and has been shown to control chromosome architecture, mRNA turnover and the developmental timing of protein expression, and may also regulate transcription and alternative splicing. This paper re-examines the available evidence and suggests a new framework for considering and understanding the genomic programming of biological complexity, autopoletic development and phenotypic variation. BioEssays 25:930-939,2003. (C) 2003 Wiley Periodicals, Inc.
Resumo:
The chromodomain is 40-50 amino acids in length and is conserved in a wide range of chromatic and regulatory proteins involved in chromatin remodeling. Chromodomain-containing proteins can be classified into families based on their broader characteristics, in particular the presence of other types of domains, and which correlate with different subclasses of the chromodomains themselves. Hidden Markov model (HMM)-generated profiles of different subclasses of chromodomains were used here to identify sequences encoding chromodomain-containing proteins in the mouse transcriptome and genome. A total of 36 different loci encoding proteins containing chromodomains, including 17 novel loci, were identified. Six of these loci (including three apparent pseudogenes, a novel HP1 ortholog, and two novel Msl-3 transcription factor-like proteins) are not present in the human genome, whereas the human genome contains four loci (two CDY orthologs and two apparent CDY pseuclogenes) that are not present in mouse. A number of these loci exhibit alternative splicing to produce different isoforms, including 43 novel variants, some of which lack the chromodomain. The likely functions of these proteins are discussed in relation to the known functions of other chromodomain-containing proteins within the same family.
Resumo:
The C2 domain is one of the most frequent and widely distributed calcium-binding motifs. Its structure comprises an eight-stranded beta-sandwich with two structural types as if the result of a circular permutation. Combining sequence, structural and modelling information, we have explored, at different levels of granularity, the functional characteristics of several families of C2 domains. At the coarsest level,the similarity correlates with key structural determinants of the C2 domain fold and, at the finest level, with the domain architecture of the proteins containing them, highlighting the functional diversity between the various subfamilies. The functional diversity appears as different conserved surface patches throughout this common fold. In some cases, these patches are related to substrate-binding sites whereas in others they correspond to interfaces of presumably permanent interaction between other domains within the same polypeptide chain. For those related to substrate-binding sites, the predictions overlap with biochemical data in addition to providing some novel observations. For those acting as protein-protein interfaces' our modelling analysis suggests that slight variations between families are a result of not only complementary adaptations in the interfaces involved but also different domain architecture. In the light of the sequence and structural genomic projects, the work presented here shows that modelling approaches along with careful sub-typing of protein families will be a powerful combination for a broader coverage in proteomics. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
With the sequencing and annotation of genomes and transcriptomes of several eukaryotes, the importance of noncoding RNA (ncRNA)-RNA molecules that are not translated to protein products-has become more evident. A subclass of ncRNA transcripts are encoded by highly regulated, multi-exon, transcriptional units, are processed like typical protein-coding mRNAs and are increasingly implicated in regulation of many cellular functions in eukaryotes. This study describes the identification of candidate functional ncRNAs from among the RIKEN mouse full-length cDNA collection, which contains 60,770 sequences, by using a systematic computational filtering approach. We initially searched for previously reported ncRNAs and found nine murine ncRNAs and homologs of several previously described nonmouse ncRNAs. Through our computational approach to filter artifact-free clones that lack protein coding potential, we extracted 4280 transcripts as the largest-candidate set. Many clones in the set had EST hits, potential CpG islands surrounding the transcription start sites, and homologies with the human genome. This implies that many candidates are indeed transcribed in a regulated manner. Our results demonstrate that ncRNAs are a major functional subclass of processed transcripts in mammals.
Resumo:
The current RIKEN transcript set represents a significant proportion of the mouse transcriptome but transcripts expressed in the innate and acquired immune systems are poorly represented. In the present study we have assessed the complexity of the transcriptome expressed in mouse macrophages before and after treatment with lipopolysaccharide, a global regulator of macrophage gene expression, using existing RIKEN 19K arrays. By comparison to array profiles of other cells and tissues, we identify a large set of macrophage-enriched genes, many of which have obvious functions in endocytosis and phagocytosis. In addition, a significant number of LPS-inducible genes were identified. The data suggest that macrophages are a complex source of mRNA for transcriptome studies. To assess complexity and identify additional macrophage expressed genes, cDNA libraries were created from purified populations of macrophage and dendritic cells, a functionally related cell type. Sequence analysis revealed a high incidence of novel mRNAs within these cDNA libraries. These studies provide insights into the depths of transcriptional complexity still untapped amongst products of inducible genes, and identify macrophage and dendritic cell populations as a starting point for sampling the inducible mammalian transcriptome.
Resumo:
We analyzed the FANTOM2 clone set of 60,770 RIKEN full-length mouse cDNA sequences and 44,122 public mRNA sequences. We developed a new computational procedure to identify and classify the forms of splice variation evident in this data set and organized the results into a publicly accessible database that can be used for future expression array construction, structural genomics, and analyses of the mechanism and regulation of alternative splicing. Statistical analysis shows that at least 41% and possibly as much as 60% of multiexon genes in mouse have multiple splice forms. Of the transcription units with multiple splice forms, 49% contain transcripts in which the apparent use of an alternative transcription start (stop) is accompanied by alternative splicing of the initial (terminal) exon. This implies that alternative transcription may frequently induce alternative splicing. The fact that 73% of all exons with splice variation fall within the annotated coding region indicates that most splice variation is likely to affect the protein form. Finally, we compared the set of constitutive (present in all transcripts) exons with the set of cryptic (present only in some transcripts) exons and found statistically significant differences in their length distributions, the nucleoticle distributions around their splice junctions, and the frequencies of occurrence of several short sequence motifs.