991 resultados para sequence database
Resumo:
Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.
Resumo:
OBJECTIVE: To investigate the effect of statin use after radical prostatectomy (RP) on biochemical recurrence (BCR) in patients with prostate cancer who never received statins before RP. PATIENTS AND METHODS: We conducted a retrospective analysis of 1146 RP patients within the Shared Equal Access Regional Cancer Hospital (SEARCH) database. Multivariable Cox proportional hazards analyses were used to examine differences in risk of BCR between post-RP statin users vs nonusers. To account for varying start dates and duration of statin use during follow-up, post-RP statin use was treated as a time-dependent variable. In a secondary analysis, models were stratified by race to examine the association of post-RP statin use with BCR among black and non-black men. RESULTS: After adjusting for clinical and pathological characteristics, post-RP statin use was significantly associated with 36% reduced risk of BCR (hazard ratio [HR] 0.64, 95% confidence interval [CI] 0.47-0.87; P = 0.004). Post-RP statin use remained associated with reduced risk of BCR after adjusting for preoperative serum cholesterol levels. In secondary analysis, after stratification by race, this protective association was significant in non-black (HR 0.49, 95% CI 0.32-0.75; P = 0.001) but not black men (HR 0.82, 95% CI 0.53-1.28; P = 0.384). CONCLUSION: In this retrospective cohort of men undergoing RP, post-RP statin use was significantly associated with reduced risk of BCR. Whether the association between post-RP statin use and BCR differs by race requires further study. Given these findings, coupled with other studies suggesting that statins may reduce risk of advanced prostate cancer, randomised controlled trials are warranted to formally test the hypothesis that statins slow prostate cancer progression.
Resumo:
BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.
Resumo:
The Feeding Experiments End-user Database (FEED) is a research tool developed by the Mammalian Feeding Working Group at the National Evolutionary Synthesis Center that permits synthetic, evolutionary analyses of the physiology of mammalian feeding. The tasks of the Working Group are to compile physiologic data sets into a uniform digital format stored at a central source, develop a standardized terminology for describing and organizing the data, and carry out a set of novel analyses using FEED. FEED contains raw physiologic data linked to extensive metadata. It serves as an archive for a large number of existing data sets and a repository for future data sets. The metadata are stored as text and images that describe experimental protocols, research subjects, and anatomical information. The metadata incorporate controlled vocabularies to allow consistent use of the terms used to describe and organize the physiologic data. The planned analyses address long-standing questions concerning the phylogenetic distribution of phenotypes involving muscle anatomy and feeding physiology among mammals, the presence and nature of motor pattern conservation in the mammalian feeding muscles, and the extent to which suckling constrains the evolution of feeding behavior in adult mammals. We expect FEED to be a growing digital archive that will facilitate new research into understanding the evolution of feeding anatomy.
Resumo:
X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.
Resumo:
Cellular stresses activate the tumor suppressor p53 protein leading to selective binding to DNA response elements (REs) and gene transactivation from a large pool of potential p53 REs (p53REs). To elucidate how p53RE sequences and local chromatin context interact to affect p53 binding and gene transactivation, we mapped genome-wide binding localizations of p53 and H3K4me3 in untreated and doxorubicin (DXR)-treated human lymphoblastoid cells. We examined the relationships among p53 occupancy, gene expression, H3K4me3, chromatin accessibility (DNase 1 hypersensitivity, DHS), ENCODE chromatin states, p53RE sequence, and evolutionary conservation. We observed that the inducible expression of p53-regulated genes was associated with the steady-state chromatin status of the cell. Most highly inducible p53-regulated genes were suppressed at baseline and marked by repressive histone modifications or displayed CTCF binding. Comparison of p53RE sequences residing in different chromatin contexts demonstrated that weaker p53REs resided in open promoters, while stronger p53REs were located within enhancers and repressed chromatin. p53 occupancy was strongly correlated with similarity of the target DNA sequences to the p53RE consensus, but surprisingly, inversely correlated with pre-existing nucleosome accessibility (DHS) and evolutionary conservation at the p53RE. Occupancy by p53 of REs that overlapped transposable element (TE) repeats was significantly higher (p<10-7) and correlated with stronger p53RE sequences (p<10-110) relative to nonTE-associated p53REs, particularly for MLT1H, LTR10B, and Mer61 TEs. However, binding at these elements was generally not associated with transactivation of adjacent genes. Occupied p53REs located in L2-like TEs were unique in displaying highly negative PhyloP scores (predicted fast-evolving) and being associated with altered H3K4me3 and DHS levels. These results underscore the systematic interaction between chromatin status and p53RE context in the induced transactivation response. This p53 regulated response appears to have been tuned via evolutionary processes that may have led to repression and/or utilization of p53REs originating from primate-specific transposon elements.
Resumo:
DNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have pointed to a substantial cleavage bias of DNase and its negative impact on predictive performance of footprinting. To assess the potential for using DNase-seq to identify individual binding sites, we performed DNase-seq on deproteinized genomic DNA and determined sequence cleavage bias. This allowed us to build bias corrected and TF-specific footprint models. The predictive performance of these models demonstrated that predicted footprints corresponded to high-confidence TF-DNA interactions. DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts. The modeling approach was also able to detect variation in the consensus motifs that TFs bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.
Resumo:
Associating genetic variation with quantitative measures of gene regulation offers a way to bridge the gap between genotype and complex phenotypes. In order to identify quantitative trait loci (QTLs) that influence the binding of a transcription factor in humans, we measured binding of the multifunctional transcription and chromatin factor CTCF in 51 HapMap cell lines. We identified thousands of QTLs in which genotype differences were associated with differences in CTCF binding strength, hundreds of them confirmed by directly observable allele-specific binding bias. The majority of QTLs were either within 1 kb of the CTCF binding motif, or in linkage disequilibrium with a variant within 1 kb of the motif. On the X chromosome we observed three classes of binding sites: a minority class bound only to the active copy of the X chromosome, the majority class bound to both the active and inactive X, and a small set of female-specific CTCF sites associated with two non-coding RNA genes. In sum, our data reveal extensive genetic effects on CTCF binding, both direct and indirect, and identify a diversity of patterns of CTCF binding on the X chromosome.
Resumo:
This article documents the addition of 220 microsatellite marker loci to the Molecular Ecology Resources Database. Loci were developed for the following species: Allanblackia floribunda, Amblyraja radiata, Bactrocera cucurbitae, Brachycaudus helichrysi, Calopogonium mucunoides, Dissodactylus primitivus, Elodea canadensis, Ephydatia fluviatilis, Galapaganus howdenae howdenae, Hoplostethus atlanticus, Ischnura elegans, Larimichthys polyactis, Opheodrys vernalis, Pelteobagrus fulvidraco, Phragmidium violaceum, Pistacia vera, and Thunnus thynnus. These loci were cross-tested on the following species: Allanblackia gabonensis, Allanblackia stanerana, Neoceratitis cyanescens, Dacus ciliatus, Dacus demmerezi, Bactrocera zonata, Ceratitis capitata, Ceratitis rosa, Ceratits catoirii, Dacus punctatifrons, Ephydatia mülleri, Spongilla lacustris, Geodia cydonium, Axinella sp. Ischnura graellsii, Ischnura ramburii, Ischnura pumilio, Pistacia integerrima and Pistacia terebinthus. © 2010 Blackwell Publishing Ltd.
Resumo:
This paper describes the status of the 2008 edition of the HITRAN molecular spectroscopic database. The new edition is the first official public release since the 2004 edition, although a number of crucial updates had been made available online since 2004. The HITRAN compilation consists of several components that serve as input for radiative-transfer calculation codes: individual line parameters for the microwave through visible spectra of molecules in the gas phase; absorption cross-sections for molecules having dense spectral features, i.e. spectra in which the individual lines are not resolved; individual line parameters and absorption cross-sections for bands in the ultraviolet; refractive indices of aerosols, tables and files of general properties associated with the database; and database management software. The line-by-line portion of the database contains spectroscopic parameters for 42 molecules including many of their isotopologues. © 2009 Elsevier Ltd.
Resumo:
This paper describes the status circa 2001, of the HITRAN compilation that comprises the public edition available through 2001. The HITRAN compilation consists of several components useful for radiative transfer calculation codes: high-resolution spectroscopic parameters of molecules in the gas phase, absorption cross-sections for molecules with very dense spectral features, aerosol refractive indices, ultraviolet line-by-line parameters and absorption cross-sections, and associated database management software. The line-by-line portion of the database contains spectroscopic parameters for 38 molecules and their isotopologues and isotopomers suitable for calculating atmospheric transmission and radiance properties. Many more molecular species are presented in the infrared cross-section data than in the previous edition, especially the chlorofluorocarbons and their replacement gases. There is now sufficient representation so that quasi-quantitative simulations can be obtained with the standard radiance codes. In addition to the description and justification of new or modified data that have been incorporated since the last edition of HITRAN (1996), future modifications are indicated for cases considered to have a significant impact on remote-sensing experiments. © 2003 Elsevier Ltd. All rights reserved.
Resumo:
info:eu-repo/semantics/published
Resumo:
The paper considers the open shop scheduling problem to minimize the make-span, provided that one of the machines has to process the jobs according to a given sequence. We show that in the preemptive case the problem is polynomially solvable for an arbitrary number of machines. If preemption is not allowed, the problem is NP-hard in the strong sense if the number of machines is variable, and is NP-hard in the ordinary sense in the case of two machines. For the latter case we give a heuristic algorithm that runs in linear time and produces a schedule with the makespan that is at most 5/4 times the optimal value. We also show that the two-machine problem in the nonpreemptive case is solvable in pseudopolynomial time by a dynamic programming algorithm, and that the algorithm can be converted into a fully polynomial approximation scheme. © 1998 John Wiley & Sons, Inc. Naval Research Logistics 45: 705–731, 1998
Resumo:
Computer based mathematical models describing the aircraft evacuation process have a vital role to play in aviation safety. However such models have a heavy dependency on real evacuation data in order to (a) identify the key processes and factors associated with evacuation, (b) quantify variables and parameters associated with the identified factors/processes and finally (c) validate the models. The Fire Safety Engineering Group of the University of Greenwich is undertaking a large data extraction exercise from three major data sources in order to address these issues. This paper describes the extraction and application of data from one of these sources - aviation accident reports. To aid in the storage and analysis of the raw data, a computer database known as AASK (aircraft accident statistics and knowledge) is under development. AASK is being developed to store human observational and anecdotal data contained in accident reports and interview transcripts. AASK comprises four component sub-databases. These consist of the ACCIDENT (crash details), FLIGHT ATTENDANT (observations and actions of the flight attendants), FATALS (details concerning passenger fatalities) and PAX (observations and accounts from individual passengers) databases. AASK currently contains information from 25 survivable aviation accidents covering the period 4 April 1977 to 6 August 1995, involving some 2415 passengers, 2210 survivors, 205 fatalities and accounts from 669 people. In addition to aiding the development of aircraft evacuation models, AASK is also being used to challenge some of the myths which proliferate in the aviation safety industry such as, passenger exit selection during evacuation, nature and frequency of seat jumping, speed of passenger response and group dynamics. AASK can also be used to aid in the development of a more comprehensive approach to conducting post accident interviews, and will eventually be used to store the data directly.
Resumo:
Computer based mathematical models describing the aircraft evacuation process have a vital role to play in aviation safety. However, such models have a heavy dependency on real evacuation data. The Fire Safety Engineering Group of the University of Greenwich is undertaking a large data extraction exercise in order to address this issue. This paper describes the extraction and application of data from aviation accident reports. To aid in the storage and analysis of the raw data, a computer database known as AASK (Aircraft Accident Statistics and Knowledge) is under development. AASK is being developed to store human observational and anecdotal data contained in accident reports and interview transcripts. AASK currently contains information from 25 survivable aviation accidents covering the period 04/04/77 to 06/08/95, involving some 2415 passengers, 2210 survivors, 205 fatalities and accounts from 669 people. Copyright © 1999 John Wiley & Sons, Ltd.