16 resultados para DNA sequence

em Duke University


Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

New applications of genetic data to questions of historical biogeography have revolutionized our understanding of how organisms have come to occupy their present distributions. Phylogenetic methods in combination with divergence time estimation can reveal biogeographical centres of origin, differentiate between hypotheses of vicariance and dispersal, and reveal the directionality of dispersal events. Despite their power, however, phylogenetic methods can sometimes yield patterns that are compatible with multiple, equally well-supported biogeographical hypotheses. In such cases, additional approaches must be integrated to differentiate among conflicting dispersal hypotheses. Here, we use a synthetic approach that draws upon the analytical strengths of coalescent and population genetic methods to augment phylogenetic analyses in order to assess the biogeographical history of Madagascar's Triaenops bats (Chiroptera: Hipposideridae). Phylogenetic analyses of mitochondrial DNA sequence data for Malagasy and east African Triaenops reveal a pattern that equally supports two competing hypotheses. While the phylogeny cannot determine whether Africa or Madagascar was the centre of origin for the species investigated, it serves as the essential backbone for the application of coalescent and population genetic methods. From the application of these methods, we conclude that a hypothesis of two independent but unidirectional dispersal events from Africa to Madagascar is best supported by the data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The centromere is the chromosomal locus essential for chromosome inheritance and genome stability. Human centromeres are located at repetitive alpha satellite DNA arrays that compose approximately 5% of the genome. Contiguous alpha satellite DNA sequence is absent from the assembled reference genome, limiting current understanding of centromere organization and function. Here, we review the progress in centromere genomics spanning the discovery of the sequence to its molecular characterization and the work done during the Human Genome Project era to elucidate alpha satellite structure and sequence variation. We discuss exciting recent advances in alpha satellite sequence assembly that have provided important insight into the abundance and complex organization of this sequence on human chromosomes. In light of these new findings, we offer perspectives for future studies of human centromere assembly and function.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Centromeres are chromosomal loci essential for genome stability. Their malfunction can cause chromosome instability associated with cancer, infertility, and birth defects. This study focused on an intriguing centromere on human chromosome 17, which displays normal functional variation. Centromere identity can be found on either of two large arrays of repetitive DNA. We investigated inter-individual sequence variation on these two arrays and found association between array size, array variation, and centromere function. Our data suggest a functional influence of DNA sequence at this critical epigenetic locus.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A previous genome-wide association study (GWAS) of more than 100,000 individuals identified molecular-genetic predictors of educational attainment. We undertook in-depth life-course investigation of the polygenic score derived from this GWAS using the four-decade Dunedin Study (N = 918). There were five main findings. First, polygenic scores predicted adult economic outcomes even after accounting for educational attainments. Second, genes and environments were correlated: Children with higher polygenic scores were born into better-off homes. Third, children's polygenic scores predicted their adult outcomes even when analyses accounted for their social-class origins; social-mobility analysis showed that children with higher polygenic scores were more upwardly mobile than children with lower scores. Fourth, polygenic scores predicted behavior across the life course, from early acquisition of speech and reading skills through geographic mobility and mate choice and on to financial planning for retirement. Fifth, polygenic-score associations were mediated by psychological characteristics, including intelligence, self-control, and interpersonal skill. Effect sizes were small. Factors connecting DNA sequence with life outcomes may provide targets for interventions to promote population-wide positive development.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Email exchange in 2013 between Kathryn Maxson (Duke) and Kris Wetterstrand (NHGRI), regarding country funding and other data for the HGP sequencing centers. Also includes the email request for such information, from NHGRI to the centers, in 2000, and the aggregate data collected.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Jean Weissenbach, telephone interview by Kathryn Maxson and Robert Cook-Deegan, conducted from Durham, NC 09 February 2012. Jean Weissenbach, a leader in French genetic mapping, directed the French national sequencing center, Généthon, during the HGP and was instrumental in helping to build agreement to the Bermuda Principles in France.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mark Guyer and Jane Peterson, in-person interview with Kathryn Maxson and Robert Cook-Deegan, conducted in Rockville, MD (NIH campus), 18 August 2011. Mark Guyer and Jane Peterson were grants program officers at the NIH during the HGP, and were some of the longest-standing employees in the HGP administrative structure. Both witnessed the transformation of the Office of Genome Research into the National Center for Human Genome Research and, finally, the National Human Genome Research Institute. They were close participants in the history of the Bermuda Principles within the NIH.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Photographs from the February 1997 Bermuda meeting. Courtesy of Gert-Jan van Ommen.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thermoplastic materials such as cyclic-olefin copolymers (COC) provide a versatile and cost-effective alternative to the traditional glass or silicon substrate for rapid prototyping and industrial scale fabrication of microdevices. To extend the utility of COC as an effective microarray substrate, we developed a new method that enabled for the first time in situ synthesis of DNA oligonucleotide microarrays on the COC substrate. To achieve high-quality DNA synthesis, a SiO(2) thin film array was prepatterned on the inert and hydrophobic COC surface using RF sputtering technique. The subsequent in situ DNA synthesis was confined to the surface of the prepatterned hydrophilic SiO(2) thin film features by precision delivery of the phosphoramidite chemistry using an inkjet DNA synthesizer. The in situ SiO(2)-COC DNA microarray demonstrated superior quality and stability in hybridization assays and thermal cycling reactions. Furthermore, we demonstrate that pools of high-quality mixed-oligos could be cleaved off the SiO(2)-COC microarrays and used directly for construction of DNA origami nanostructures. It is believed that this method will not only enable synthesis of high-quality and low-cost COC DNA microarrays but also provide a basis for further development of integrated microfluidics microarrays for a broad range of bioanalytical and biofabrication applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have used analytical ultracentrifugation to characterize the binding of the methionine repressor protein, MetJ, to synthetic oligonucleotides containing zero to five specific recognition sites, called metboxes. For all lengths of DNA studied, MetJ binds more tightly to repeats of the consensus sequence than to naturally occurring metboxes, which exhibit a variable number of deviations from the consensus. Strong cooperative binding occurs only in the presence of two or more tandem metboxes, which facilitate protein-protein contacts between adjacent MetJ dimers, but weak affinity is detected even with DNA containing zero or one metbox. The affinity of MetJ for all of the DNA sequences studied is enhanced by the addition of SAM, the known cofactor for MetJ in the cell. This effect extends to oligos containing zero or one metbox, both of which bind two MetJ dimers. In the presence of a large excess concentration of metbox DNA, the effect of cooperativity is to favor populations of DNA oligos bound by two or more MetJ dimers rather than a stochastic redistribution of the repressor onto all available metboxes. These results illustrate the dynamic range of binding affinity and repressor assembly that MetJ can exhibit with DNA and the effect of the corepressor SAM on binding to both specific and nonspecific DNA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cellular stresses activate the tumor suppressor p53 protein leading to selective binding to DNA response elements (REs) and gene transactivation from a large pool of potential p53 REs (p53REs). To elucidate how p53RE sequences and local chromatin context interact to affect p53 binding and gene transactivation, we mapped genome-wide binding localizations of p53 and H3K4me3 in untreated and doxorubicin (DXR)-treated human lymphoblastoid cells. We examined the relationships among p53 occupancy, gene expression, H3K4me3, chromatin accessibility (DNase 1 hypersensitivity, DHS), ENCODE chromatin states, p53RE sequence, and evolutionary conservation. We observed that the inducible expression of p53-regulated genes was associated with the steady-state chromatin status of the cell. Most highly inducible p53-regulated genes were suppressed at baseline and marked by repressive histone modifications or displayed CTCF binding. Comparison of p53RE sequences residing in different chromatin contexts demonstrated that weaker p53REs resided in open promoters, while stronger p53REs were located within enhancers and repressed chromatin. p53 occupancy was strongly correlated with similarity of the target DNA sequences to the p53RE consensus, but surprisingly, inversely correlated with pre-existing nucleosome accessibility (DHS) and evolutionary conservation at the p53RE. Occupancy by p53 of REs that overlapped transposable element (TE) repeats was significantly higher (p<10-7) and correlated with stronger p53RE sequences (p<10-110) relative to nonTE-associated p53REs, particularly for MLT1H, LTR10B, and Mer61 TEs. However, binding at these elements was generally not associated with transactivation of adjacent genes. Occupied p53REs located in L2-like TEs were unique in displaying highly negative PhyloP scores (predicted fast-evolving) and being associated with altered H3K4me3 and DHS levels. These results underscore the systematic interaction between chromatin status and p53RE context in the induced transactivation response. This p53 regulated response appears to have been tuned via evolutionary processes that may have led to repression and/or utilization of p53REs originating from primate-specific transposon elements.