995 resultados para SEQUENCE DATABASES


Relevância:

100.00% 100.00%

Publicador:

Resumo:

NrichD ( ext-link-type=''uri'' xlink:href=''http://proline.biochem.iisc.ernet.in/NRICHD/'' xlink:type=''simple''>http://proline.biochem.iisc.ernet.in/NRICHD/)< /named-content> is a database of computationally designed protein-like sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships. Establishing protein relationships in the absence of structural evidence or natural `intermediately related sequences' is a challenging task. Recently, we have demonstrated that the computational design of artificial intermediary sequences/linkers is an effective approach to fill naturally occurring voids in protein sequence space. Through a large-scale assessment we have demonstrated that such sequences can be plugged into commonly employed search databases to improve the performance of routinely used sequence search methods in detecting remote relationships. Since it is anticipated that such data sets will be employed to establish protein relationships, two databases that have already captured these relationships at the structural and functional domain level, namely, the SCOP database and the Pfam database, have been `enriched' with these artificial intermediary sequences. NrichD database currently contains 3 611 010 artificial sequences that have been generated between 27 882 pairs of families from 374 SCOP folds. The data sets are freely available for download. Additional features include the design of artificial sequences between any two protein families of interest to the user.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

There is no control over the information provided with sequences when they are deposited in the sequence databases. Consequently mistakes can seed the incorrect annotation of other sequences. Grouping genes into families and applying controlled annotation overcomes the problems of incorrect annotation associated with individual sequences. Two databases (http://www.mendel.ac.uk) were created to apply controlled annotation to plant genes and plant ESTs: Mendel-GFDb is a database of plant protein (gene) families based on gapped-BLAST analysis of all sequences in the SWISS-PROT family of databases. Sequences are aligned (ClustalW) and identical and similar residues shaded. The families are visually curated to ensure that one or more criteria, for example overall relatedness and/or domain similarity relate all sequences within a family. Sequence families are assigned a ‘Gene Family Number’ and a unified description is developed which best describes the family and its members. If authority exists the gene family is assigned a ‘Gene Family Name’. This information is placed in Mendel-GFDb. Mendel-ESTS is primarily a database of plant ESTs, which have been compared to Mendel-GFDb, completely sequenced genomes and domain databases. This approach associated ESTs with individual sequences and the controlled annotation of gene families and protein domains; the information being placed in Mendel-ESTS. The controlled annotation applied to genes and ESTs provides a basis from which a plant transcription database can be developed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

SBASE 8.0 is the eighth release of the SBASE library of protein domain sequences that contains 294 898 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to most major sequence databases and sequence pattern collections. The entries are clustered into over 2005 statistically validated domain groups (SBASE-A) and 595 non-validated groups (SBASE-B), provided with several WWW-based search and browsing facilities for online use. A domain-search facility was developed, based on non-parametric pattern recognition methods, including artificial neural networks. SBASE 8.0 is freely available by anonymous ‘ftp’ file transfer from ftp.icgeb.trieste.it. Automated searching of SBASE can be carried out with the WWW servers http://www.icgeb.trieste.it/sbase/ and http://sbase.abc.hu/sbase/.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power of linking cDNA sequence data (including EST sequences) with transcript profiles revealed by cDNA-AFLP, a highly reproducible differential display method based on restriction enzyme digests and selective amplification under high stringency conditions. We have developed a computer program (GenEST) that predicts the sizes of virtual transcript-derived fragments (TDFs) of in silico-digested cDNA sequences retrieved from databases. The vast majority of the resulting virtual TDFs could be traced back among the thousands of TDFs displayed on cDNA-AFLP gels. Sequencing of the corresponding bands excised from cDNA-AFLP gels revealed no inconsistencies. As a consequence, cDNA sequence databases can be screened very efficiently to identify genes with relevant expression profiles. The other way round, it is possible to switch from cDNA-AFLP gels to sequences in the databases. Using the restriction enzyme recognition sites, the primer extensions and the estimated TDF size as identifiers, the DNA sequence(s) corresponding to a TDF with an interesting expression pattern can be identified. In this paper we show examples in both directions by analyzing the plant parasitic nematode Globodera rostochiensis. Various novel pathogenicity factors were identified by combining ESTs from the infective stage juveniles with expression profiles of ∼4000 genes in five developmental stages produced by cDNA-AFLP.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for database screening with individual sequences and alignment blocks. A high fraction of E. coli proteins--86%--shows significant sequence similarity to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% contain ancient conserved regions (ACRs) shared with eukaryotic or Archaeal proteins. For > 90% of the E. coli proteins, either functional information or sequence similarity, or both, are available. Forty-six percent of the E. coli proteins belong to 299 clusters of paralogs (intraspecies homologs) defined on the basis of pairwise similarity. Another 10% could be included in 70 superclusters using motif detection methods. The majority of the clusters contain only two to four members. In contrast, nearly 25% of all E. coli proteins belong to the four largest superclusters--namely, permeases, ATPases and GTPases with the conserved "Walker-type" motif, helix-turn-helix regulatory proteins, and NAD(FAD)-binding proteins. We conclude that bacterial protein sequences generally are highly conserved in evolution, with about 50% of all ACR-containing protein families represented among the E. coli gene products. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evolutionary relationships of the vast majority of genes in a bacterial genome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including several whose products are implicated in human diseases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Light plays a unique role for plants as it is both a source of energy for growth and a signal for development. Light captured by the pigments in the light harvesting complexes is used to drive the synthesis of the chemical energy required for carbon assimilation. The light perceived by photoreceptors activates effectors, such as transcription factors (TFs), which modulate the expression of light-responsive genes. Recently, it has been speculated that increasing the photosynthetic rate could further improve the yield potential of three carbon (C3) crops such as wheat. However, little is currently known about the transcriptional regulation of photosynthesis genes, particularly in crop species. Nuclear factor Y (NF-Y) TF is a functionally diverse regulator of growth and development in the model plant species, with demonstrated roles in embryo development, stress response, flowering time and chloroplast biogenesis. Furthermore, a light-responsive NF-Y binding site (CCAAT-box) is present in the promoter of a spinach photosynthesis gene. As photosynthesis genes are co-regulated by light and co-regulated genes typically have similar regulatory elements in their promoters, it seems likely that other photosynthesis genes would also have light-responsive CCAAT-boxes. This provided the impetus to investigate the NF-Y TF in bread wheat. This thesis is focussed on wheat NF-Y members that have roles in light-mediated gene regulation with an emphasis on their involvement in the regulation of photosynthesis genes. NF-Y is a heterotrimeric complex, comprised of the three subunits NF-YA, NF-YB and NF-YC. Unlike the mammalian and yeast counterparts, each of the three subunits is encoded by multiple genes in Arabidopsis. The initial step taken in this study was the identification of the wheat NF-Y family (Chapter 3). A search of the current wheat nucleotide sequence databases identified 37 NF-Y genes (10 NF-YA, 11 NF-YB, 14 NF-YC & 2 Dr1). Phylogenetic analysis revealed that each of the three wheat NF-Y (TaNF-Y) subunit families could be divided into 4-5 clades based on their conserved core regions. Outside of the core regions, eleven motifs were identified to be conserved between Arabidopsis, rice and wheat NF-Y subunit members. The expression profiles of TaNF-Y genes were constructed using quantitative real-time polymerase chain reaction (RT-PCR). Some TaNF-Y subunit members had little variation in their transcript levels among the organs, while others displayed organ-predominant expression profiles, including those expressed mainly in the photosynthetic organs. To investigate their potential role in light-mediated gene regulation, the light responsiveness of the TaNF-Y genes were examined (Chapters 4 and 5). Two TaNF-YB and five TaNF-YC members were markedly upregulated by light in both the wheat leaves and seedling shoots. To identify the potential target genes of the light-upregulated NF-Y subunit members, a gene expression correlation analysis was conducted using publically available Affymetrix Wheat Genome Array datasets. This analysis revealed that the transcript expression levels of TaNF-YB3 and TaNF-YC11 were significantly correlated with those of photosynthesis genes. These correlated express profiles were also observed in the quantitative RT-PCR dataset from wheat plants grown under light and dark conditions. Sequence analysis of the promoters of these wheat photosynthesis genes revealed that they were enriched with potential NF-Y binding sites (CCAAT-box). The potential role of TaNF-YB3 in the regulation of photosynthetic genes was further investigated using a transgenic approach (Chapter 5). Transgenic wheat lines constitutively expressing TaNF-YB3 were found to have significantly increased expression levels of photosynthesis genes, including those encoding light harvesting chlorophyll a/b-binding proteins, photosystem I reaction centre subunits, a chloroplast ATP synthase subunit and glutamyl-tRNA reductase (GluTR). GluTR is a rate-limiting enzyme in the chlorophyll biosynthesis pathway. In association with the increased expression of the photosynthesis genes, the transgenic lines had a higher leaf chlorophyll content, increased photosynthetic rate and had a more rapid early growth rate compared to the wild-type wheat. In addition to its role in the regulation of photosynthesis genes, TaNF-YB3 overexpression lines flower on average 2-days earlier than the wild-type (Chapter 6). Quantitative RT-PCR analysis showed that there was a 13-fold increase in the expression level of the floral integrator, TaFT. The transcript levels of other downstream genes (TaFT2 and TaVRN1) were also increased in the transgenic lines. Furthermore, the transcript levels of TaNF-YB3 were significantly correlated with those of constans (CO), constans-like (COL) and timing of chlorophyll a/b-binding (CAB) expression 1 [TOC1; (CCT)] domain-containing proteins known to be involved in the regulation of flowering time. To summarise the key findings of this study, 37 NF-Y genes were identified in the crop species wheat. An in depth analysis of TaNF-Y gene expression profiles revealed that the potential role of some light-upregulated members was in the regulation of photosynthetic genes. The involvement of TaNF-YB3 in the regulation of photosynthesis genes was supported by data obtained from transgenic wheat lines with increased constitutive expression of TaNF-YB3. The overexpression of TaNF-YB3 in the transgenic lines revealed this NF-YB member is also involved in the fine-tuning of flowering time. These data suggest that the NF-Y TF plays an important role in light-mediated gene regulation in wheat.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The 3′ UTRs of eukaryotic genes participate in a variety of post-transcriptional (and some transcriptional) regulatory interactions. Some of these interactions are well characterised, but an undetermined number remain to be discovered. While some regulatory sequences in 3′ UTRs may be conserved over long evolutionary time scales, others may have only ephemeral functional significance as regulatory profiles respond to changing selective pressures. Here we propose a sensitive segmentation methodology for investigating patterns of composition and conservation in 3′ UTRs based on comparison of closely related species. We describe encodings of pairwise and three-way alignments integrating information about conservation, GC content and transition/transversion ratios and apply the method to three closely related Drosophila species: D. melanogaster, D. simulans and D. yakuba. Incorporating multiple data types greatly increased the number of segment classes identified compared to similar methods based on conservation or GC content alone. We propose that the number of segments and number of types of segment identified by the method can be used as proxies for functional complexity. Our main finding is that the number of segments and segment classes identified in 3′ UTRs is greater than in the same length of protein-coding sequence, suggesting greater functional complexity in 3′ UTRs. There is thus a need for sustained and extensive efforts by bioinformaticians to delineate functional elements in this important genomic fraction. C code, data and results are available upon request.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sequence-structure correlation studies are important in deciphering the relationships between various structural aspects, which may shed light on the protein-folding problem. The first step of this process is the prediction of secondary structure for a protein sequence of unknown three-dimensional structure. To this end, a web server has been created to predict the consensus secondary structure using well known algorithms from the literature. Furthermore, the server allows users to see the occurrence of predicted secondary structural elements in other structure and sequence databases and to visualize predicted helices as a helical wheel plot. The web server is accessible at http://bioserver1.physics.iisc.ernet.in/cssp/.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Large-scale gene discovery has been performed for the grass fungal endophytes Neotyphodium coenophialum, Neotyphodium lolii, and Epichloë festucae. The resulting sequences have been annotated by comparison with public DNA and protein sequence databases and using intermediate gene ontology annotation tools. Endophyte sequences have also been analysed for the presence of simple sequence repeat and single nucleotide polymorphism molecular genetic markers. Sequences and annotation are maintained within a MySQL database that may be queried using a custom web interface. Two cDNA-based microarrays have been generated from this genome resource. They permit the interrogation of 3806 Neotyphodium genes (NchipTM microarray), and 4195 Neotyphodium and 920 Epichloë genes (EndoChipTM microarray), respectively. These microarrays provide tools for high-throughput transcriptome analysis, including genome-specific gene expression studies, profiling of novel endophyte genes, and investigation of the host grass–symbiont interaction. Comparative transcriptome analysis in Neotyphodium and Epichloë was performed

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sequence motifs occurring in a particular order in proteins or DNA have been proved to be of biological interest. In this paper, a new method to locate the occurrences of up to five user-defined motifs in a specified order in large proteins and in nucleotide sequence databases is proposed. It has been designed using the concept of quantifiers in regular expressions and linked lists for data storage. The application of this method includes the extraction of relevant consensus regions from biological sequences. This might be useful in clustering of protein families as well as to study the correlation between positions of motifs and their functional sites in DNA sequences.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Repeats are two or more contiguous segments of amino acid residues that are believed to have arisen as a result of intragenic duplication, recombination and mutation events. These repeats can be utilized for protein structure prediction and can provide insights into the protein evolution and phylogenetic relationship. Therefore, to aid structural biologists and phylogeneticists in their research, a computing resource (a web server and a database), Repeats in Protein Sequences (RPS), has been created. Using RPS, users can obtain useful information regarding identical, similar and distant repeats (of varying lengths) in protein sequences. In addition, users can check the frequency of occurrence of the repeats in sequence databases such as the Genome Database, PIR and SWISS-PROT and among the protein sequences available in the Protein Data Bank archive. Furthermore, users can view the three-dimensional structure of the repeats using the Java visualization plug-in Jmol. The proposed computing resource can be accessed over the World Wide Web at http://bioserver1.physics.iisc.ernet.in/rps/.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

人类的载脂蛋白A5(apolipoprotein A5,APOA5)是一个新近发现的载脂蛋白家族成员。它在血浆中的含量比其他载脂蛋白低1-2个数量级,但能显著影响血浆三酰甘油水平,对血脂代谢具有重要意义,可以作为降血脂药物治疗中一个强有力的潜在靶标。 由于APOA5在血浆中含量低,直接从血浆中分离纯化很困难,国内一直没有报道简易可靠的纯化方法。为进一步研究APOA5的生物学特性,探讨其与TG代谢中的其它关键成分之间的相互关系,揭示其在脂类代谢相关疾病中的重要地位,必须有大量的蛋白和抗体用于基础研究。因此本研究首先利用基因工程技术,诱导表达纯化APOA5蛋白,免疫动物制备多克隆抗体,为进一步研究人肝脏细胞中APOA5的相互作用蛋白,研究APOA5蛋白在肝脏细胞中的功能奠定基础。 为了深入研究APOA5在肝脏中如何行使功能,我们采用细菌双杂交技术寻找与APOA5相互作用的蛋白因子。并采用Pull-down技术,免疫荧光及免疫共沉淀技术进一步确证其在体外和体内的相互作用关系,为进一步阐明APOA5在体内的生理功能提供了新的线索。 第一部分 APOA5基因的克隆、原核表达、纯化及其多克隆抗体的制备 本研究首先应用基因克隆技术,从人肝癌细胞系SMMC-7721的cDNA中扩增出1.1 kb的ApoA5基因全长序列。然后将其克隆至表达载体pThioHisD,构建原核表达载体pTH-APOA5。该重组质粒转化至大肠杆菌 BL21(DE3),成功实现人APOA5融合蛋白在大肠杆菌中的表达。经发酵得到高效表达的融合蛋白。 融合蛋白在 IPGT 诱导下以包涵体的形式大量表达。利用融合蛋白上的一段组氨酸序列,用镍离子亲和柱进行纯化和复性后,获得较高纯度的人APOA5融合蛋白。利用该融合蛋白免疫新西兰大耳白兔,获得了高效价的兔抗人APOA5多克隆抗体,Western Blot结果显示此多克隆抗体与APOA5特异性结合。 第二部分 细菌双杂交筛选与APOA5相互作用的蛋白 本实验首先构建了pBT-APOA5重组质粒,经双酶切、PCR和测序鉴定证明重组诱饵质粒构建成功,并进行了表达、自激活鉴定。Western Blot鉴定证实报告菌株中表达了分子量为 68 kD左右的重组融合蛋白,与预测的分子量APOA5(41 kD)/lamda cI (27 kD)一致。自激活实验证明诱饵蛋白不能单独激活报告基因,可用于筛选人肝脏cDNA文库。经过双重抗性筛选和回复筛选,分离出10个阳性克隆。对结果进行生物信息学分析,得到7个与APOA5相互作用的蛋白,其中BI1为细胞凋亡调节因子;ATP6、CYTB、ND2、COX-1为线粒体表达蛋白; ALB、TTR为血清蛋白。 第三部分 APOA5与BI1相互作用的确证 首先构建了BI1的原核表达载体pGEX-5X-3-BI1,利用Pull-down实验检测了APOA5与BI1在体外具有相互作用。然后构建了BI1的真核表达载体pCDNA3.1-HA-BI1和APOA5的真核表达载体pCDNA3.1-APOA5,并验证其表达。通过免疫荧光细胞内共定位研究发现,靶蛋白APOA5主要分布于胞浆,与BI1在HEK293细胞有共定位,即APOA5与BI1存在相互作用的可能。最后利用免疫共沉淀手段,在HEK293细胞中确证了靶蛋白APOA5与BI1在体内的相互作用。 上述研究结果,为深入研究APOA5在体内的生物学功能提供了新的思路。 Apolipoprotein A5 (APOA5) is a newly discovered protein belongs to apolipoprotein family. APOA5’s concentration is 1-2 orders of magnitude lower than other apolipoproteins in the circulation. APOA5 significantly affected plasma triglyceride levels, which is important on lipid metabolism. APOA5 has strong potential to be used as a hypolipidemic drug target. Large amount of APOA5 protein and antibodies are needed in basic research, such as biological characteristics study of the APOA5, its relationship with other key components in TG metabolism, its role played in Lipid metabolism-related diseases. Due to its low concentration in plasma, separation and purification of APOA5 from the plasma is very difficult. Until now no report on simple and reliable method for purification has been published in China. In this study, we firstly got APOA5 recombinant protein using genetic engineering technology. The purified recombinant protein was used to immunize rabbits to get antiserum. It is important for further study of the APOA5 protein-interacting protein. And it lays the foundation for studing APOA5 function in liver. In order to study APOA5 function in liver, we used bacterial two-hybrid technology to find the APOA5 protein interactor. Pull-down, immunofluorescence and immunoprecipitation techniques were used to further confirm the interaction between APOA5 with its interactor in vitro and in vivo. All of these stdudies provided new clues on its physiological functions in vivo. Part I: Cloning, prokaryotic expression, purification and polyclonal antibody preparation of APOA5 First of all, we amplified APOA5 CDS sequence from the human hepatoma cell line SMMC-7721, and subcloned into Expression vector pThioHisD, and got the recombinants named pTH-APOA5. The plasmid was transformed to BL21 (DE3). E. coli BL21(DE3) cells bearing the pTH-APOA5 plasmid were cultured and APOA5 protein synthesis was induced by the addition of IPTG. Recombinant protein was expression in the form of inclusion. Inclusion bodies were dissolved in phosphate-buffered saline containing 8 M urea and 40 mM imidazole, then applied to a Ni2+ affinity column, and were eluted in a buffer containing 4 M urea and 200 mM imidazole. Fractions containing the APOA5 protein were pooled and dialyzed against buffer containing phosphate-buffered saline. Antiserum to recombinant human APOA5 was generated by immuning rabbit. Western Blot showed that this antiserum specific binding with APOA5. Part II Two-hybrid system screening protein interactions with the APOA5 The coding sequence of human APOA5 was amplified using synthetic oligonucleotide primers from pTH-APOA5 vector and was subcloned into the pBT plasmidc to yield pBT-APOA5 vector. DNA sequencing was performed to verify that no unwanted mutations occurred during the process of plasmid vector construction. We verified recombinant protein expression and tested self-activation by pBT-APOA5 prior to screening. Western Blot verified inducing a 68 kD band, consistent with the predicted molecular weight (APOA5 41 kD, lamda cI 27 kD). pBT-APOA5 can be used for screening human liver cDNA library because it can not self-activation. Totally 10 positive clones were isolated. The nucleotide sequence of the positive clones were determined and compared to NCBI nucleotide sequence databases. We got 7 protein which interact with APOA5, included BI1(Apoptosis regulator); ATP6, CYTB, ND2, COX-1(Mitochondrial protein) and ALB, TTR(Serum protein). Part III Confirming of interaction between APOA5 with BI1 pGEX-5X-3-BI1 vector was subcloned at first. Pull-down experiments were used to detect the interaction between APOA5 with BI1 in vitro. Later, pCDNA3.1-HA-BI1 and pCDNA3.1-APOA5 were subcloned. Through immunofluorescence co-localization study, we found APOA5 mainly distributed in the cytoplasm. APOA5 is co-localization with BI1 in HEK293 cells. Finally, we verified interaction between APOA5 with BI1 in vivo through immunoprecipitation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Terminal restriction fragment length polymorphism (T-RFLP) analysis is a polymerase chain reaction (PCR)-fingerprinting method that is commonly used for comparative microbial community analysis. The method can be used to analyze communities of bacteria, archaea, fungi, other phylogenetic groups or subgroups, as well as functional genes. The method is rapid, highly reproducible, and often yields a higher number of operational taxonomic units than other, commonly used PCR-fingerprinting methods. Sizing of terminal restriction fragments (T-RFs) can now be done using capillary sequencing technology allowing samples contained in 96- or 384-well plates to be sized in an overnight run. Many multivariate statistical approaches have been used to interpret and compare T-RFLP fingerprints derived from different communities. Detrended correspondence analysis and the additive main effects with multiplicative interaction model are particularly useful for revealing trends in T-RFLP data. Due to biases inherent in the method, linking the size of T-RFs derived from complex communities to existing sequence databases to infer their taxonomic position is not very robust. This approach has been used successfully, however, to identify and follow the dynamics of members within very simple or model communities. The T-RFLP approach has been used successfully to analyze the composition of microbial communities in soil, water, marine, and lacustrine sediments, biofilms, feces, in and on plant tissues, and in the digestive tracts of insects and mammals. The T-RFLP method is a user-friendly molecular approach to microbial community analysis that is adding significant information to studies of microbial populations in many environments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.