959 resultados para GENOME SEQUENCING


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Herein we provide a detailed molecular analysis of the spatial heterogeneity of clinically localized, multifocal prostate cancer to delineate new oncogenes or tumor suppressors. We initially determined the copy number aberration (CNA) profiles of 74 patients with index tumors of Gleason score 7. Of these, 5 patients were subjected to whole-genome sequencing using DNA quantities achievable in diagnostic biopsies, with detailed spatial sampling of 23 distinct tumor regions to assess intraprostatic heterogeneity in focal genomics. Multifocal tumors are highly heterogeneous for single-nucleotide variants (SNVs), CNAs and genomic rearrangements. We identified and validated a new recurrent amplification of MYCL, which is associated with TP53 deletion and unique profiles of DNA damage and transcriptional dysregulation. Moreover, we demonstrate divergent tumor evolution in multifocal cancer and, in some cases, tumors of independent clonal origin. These data represent the first systematic relation of intraprostatic genomic heterogeneity to predicted clinical outcome and inform the development of novel biomarkers that reflect individual prognosis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cholesterol deficiency, a new autosomal recessive inherited genetic defect in Holstein cattle, has been recently reported to have an influence on the rearing success of calves. The affected animals show unresponsive diarrhea accompanied by hypocholesterolemia and usually die within the first weeks or months of life. Here, we show that whole genome sequencing combined with the knowledge about the pedigree and inbreeding status of a livestock population facilitates the identification of the causative mutation. We resequenced the entire genomes of an affected calf and a healthy partially inbred male carrying one copy of the critical 2.24-Mb chromosome 11 segment in its ancestral state and one copy of the same segment with the cholesterol deficiency mutation. We detected a single structural variant, homozygous in the affected case and heterozygous in the non-affected carrier male. The genetic makeup of this key animal provides extremely strong support for the causality of this mutation. The mutation represents a 1.3kb insertion of a transposable LTR element (ERV2-1) in the coding sequence of the APOB gene, which leads to truncated transcripts and aberrant splicing. This finding was further supported by RNA sequencing of the liver transcriptome of an affected calf. The encoded apolipoprotein B is an essential apolipoprotein on chylomicrons and low-density lipoproteins, and therefore, the mutation represents a loss of function mutation similar to autosomal recessive inherited familial hypobetalipoproteinemia-1 (FHBL1) in humans. Our findings provide a direct gene test to improve selection against this deleterious mutation in Holstein cattle.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mycobacterium tuberculosis strains of the Beijing lineage are globally distributed and are associated with the massive spread of multidrug-resistant (MDR) tuberculosis in Eurasia. Here we reconstructed the biogeographical structure and evolutionary history of this lineage by genetic analysis of 4,987 isolates from 99 countries and whole-genome sequencing of 110 representative isolates. We show that this lineage initially originated in the Far East, from where it radiated worldwide in several waves. We detected successive increases in population size for this pathogen over the last 200 years, practically coinciding with the Industrial Revolution, the First World War and HIV epidemics. Two MDR clones of this lineage started to spread throughout central Asia and Russia concomitantly with the collapse of the public health system in the former Soviet Union. Mutations identified in genes putatively under positive selection and associated with virulence might have favored the expansion of the most successful branches of the lineage.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Immigrants from high tuberculosis (TB) incidence regions are a risk group for TB in low-incidence countries such as Switzerland. In a previous analysis of a nationwide collection of 520 Mycobacterium tuberculosis isolates from 2000-2008, we identified 35 clusters comprising 90 patients based on standard genotyping (24-loci MIRU-VNTR and spoligotyping). Here, we used whole genome sequencing (WGS) to revisit these transmission clusters. Genome-based transmission clusters were defined as isolate pairs separated by ≤12 single nucleotide polymorphisms (SNPs). WGS confirmed 17/35 (49%) MIRU-VNTR clusters; the other 18 clusters contained pairs separated by >12 SNPs. Most transmission clusters (3/4) of Swiss-born patients were confirmed by WGS, as opposed to 25% (4/16) of clusters involving only foreign-born patients. The overall clustering proportion using standard genotyping was 17% (90 patients, 95% confidence interval [CI]: 14-21%), but only 8% (43 patients, 95% CI: 6-11%) using WGS. The clustering proportion was 17% (67/401, 95% CI: 13-21%) using standard genotyping and 7% (26/401, 95% CI: 4-9%) using WGS among foreign-born patients, and 19% (23/119, 95% CI: 13-28%) and 14% (17/119, 95% CI: 9-22%), respectively, among Swiss-born patients. Using weighted logistic regression, we found weak evidence for an association between birth origin and transmission (aOR 2.2, 95% CI: 0.9-5.5, comparing Swiss-born patients to others). In conclusion, standard genotyping overestimated recent TB transmission in Switzerland when compared to WGS, particularly among immigrants from high TB incidence regions, where genetically closely related strains often predominate. We recommend the use of WGS to identify transmission clusters in low TB incidence settings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Millions of people worldwide suffer from nutritional imbalances of essential metals like zinc. These same metals, along with pollutants like cadmium and lead, contaminate soils at many sites around the world. In addition to posing a threat to human health, these metals can poison plants, livestock, and wildlife. Deciphering how metals are absorbed, transported, and incorporated as protein cofactors may help solve both of these problems. For example, edible plants could be engineered to serve as better dietary sources of metal nutrients, and other plant species could be tailored to remove metal ions from contaminated soils. We report here the cloning of the first zinc transporter genes from plants, the ZIP1, ZIP2, and ZIP3 genes of Arabidopsis thaliana. Expression in yeast of these closely related genes confers zinc uptake activities. In the plant, ZIP1 and ZIP3 are expressed in roots in response to zinc deficiency, suggesting that they transport zinc from the soil into the plant. Although expression of ZIP2 has not been detected, a fourth related Arabidopsis gene identified by genome sequencing, ZIP4, is induced in both shoots and roots of zinc-limited plants. Thus, ZIP4 may transport zinc intracellularly or between plant tissues. These ZIP proteins define a family of metal ion transporters that are found in plants, protozoa, fungi, invertebrates, and vertebrates, making it now possible to address questions of metal ion accumulation and homeostasis in diverse organisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Taking advantage of the ongoing Dictyostelium genome sequencing project, we have assembled >73 kb of genomic DNA in 15 contigs harbouring 15 genes and one pseudogene of Rho-related proteins. Comparison with EST sequences revealed that every gene is interrupted by at least one and up to four introns. For racC extensive alternative splicing was identified. Northern blot analysis showed that mRNAs for racA, racE, racG, racH and racI were present at all stages of development, whereas racJ and racL were expressed only at late stages. Amino acid sequences have been analysed in the context of Rho-related proteins of other organisms. Rac1a/1b/1c, RacF1/F2 and to a lesser extent RacB and the GTPase domain of RacA can be grouped in the Rac subfamily. None of the additional Dictyostelium Rho-related proteins belongs to any of the well-defined subfamilies, like Rac, Cdc42 or Rho. RacD and RacA are unique in that they lack the prenylation motif characteristic of Rho proteins. RacD possesses a 50 residue C-terminal extension and RacA a 400 residue C-terminal extension that contains a proline-rich region, two BTB domains and a novel C-terminal domain. We have also identified homologues for RacA in Drosophila and mammals, thus defining a new subfamily of Rho proteins, RhoBTB.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

FULL-malaria is a database for a full-length-enriched cDNA library from the human malaria parasite Plasmodium falciparum (http://133.11.149.55/). Because of its medical importance, this organism is the first target for genome sequencing of a eukaryotic pathogen; the sequences of two of its 14 chromosomes have already been determined. However, for the full exploitation of this rapidly accumulating information, correct identification of the genes and study of their expression are essential. Using the oligo-capping method, we have produced a full-length-enriched cDNA library from erythrocytic stage parasites and performed one-pass reading. The database consists of nucleotide sequences of 2490 random clones that include 390 (16%) known malaria genes according to BLASTN analysis of the nr-nt database in GenBank; these represent 98 genes, and the clones for 48 of these genes contain the complete protein-coding sequence (49%). On the other hand, comparisons with the complete chromosome 2 sequence revealed that 35 of 210 predicted genes are expressed, and in addition led to detection of three new gene candidates that were not previously known. In total, 19 of these 38 clones (50%) were full-length. From these obser­vations, it is expected that the database contains ∼1000 genes, including 500 full-length clones. It should be an invaluable resource for the development of vaccines and novel drugs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheri­tance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi.shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT–AG junctions (22 199 entries) and 0.56% have non-canonical GC–AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC–AG pairs (of which one was an error that corrected to GC–AG), 61 errors corrected to GT–AG canonical pairs, six AT–AC pairs (of which two were errors corrected to AT–AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac.uk/spldb/SpliceDB.html and at http://www.softberry.com/spldb/SpliceDB.html.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

TIGRFAMs is a collection of protein families featuring curated multiple sequence alignments, hidden Markov models and associated information designed to support the automated functional identification of proteins by sequence homology. We introduce the term ‘equivalog’ to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor. Related proteins are grouped into equivalog families where possible, and otherwise into protein families with other hierarchically defined homology types. TIGRFAMs currently contains over 800 protein families, available for searching or downloading at www.tigr.org/TIGRFAMs. Classification by equivalog family, where achievable, complements classification by orthology, superfamily, domain or motif. It provides the information best suited for automatic assignment of specific functions to proteins from large-scale genome sequencing projects.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The opportunistic pathogenic bacterium Pseudomonas aeruginosa uses quorum-sensing signaling systems as global regulators of virulence genes. There are two quorum-sensing signal receptor and signal generator pairs, LasR–LasI and RhlR–RhlI. The recently completed P. aeruginosa genome-sequencing project revealed a gene coding for a homolog of the signal receptors, LasR and RhlR. Here we describe a role for this gene, which we call qscR. The qscR gene product governs the timing of quorum-sensing-controlled gene expression and it dampens virulence in an insect model. We present evidence that suggests the primary role of QscR is repression of lasI. A qscR mutant produces the LasI-generated signal prematurely, and this results in premature transcription of a number of quorum-sensing-regulated genes. When fed to Drosophila melanogaster, the qscR mutant kills the animals more rapidly than the parental P. aeruginosa. The repression of lasI by QscR could serve to ensure that quorum-sensing-controlled genes are not activated in environments where they are not useful.