891 resultados para sequence database
Resumo:
• Examine current pile design and construction procedures used by the Iowa Department of Transportation (DOT). • Recommend changes and improvements to these procedures that are consistent with available pile load test data, soils information, and bridge design practice recommended by the Load and Resistance Factor Design (LRFD) approach.
Resumo:
HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10(-12)). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the 'intermediate phenotype' nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host-pathogen interaction. DOI:http://dx.doi.org/10.7554/eLife.01123.001.
Resumo:
The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) and serial analysis of gene expression (SAGE) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.
Resumo:
Mapping the transcription start points of the eap, emp, and vwb promoters revealed a conserved octanucleotide sequence (COS). Deleting this sequence abolished the expression of eap, emp, and vwb. However, electrophoretic mobility shift assays gave no evidence that this sequence was a binding site for SarA or SaeR, known regulators of eap and emp.
Resumo:
M.C. Addor is included in the Eurocat Working Group
Resumo:
Despite data favouring a role of dietary fat in colonic carcinogenesis, no study has focused on tissue n3 and n6 fatty acid (FA) status in human colon adenoma-carcinoma sequence. Thus, FA profile was measured in plasma phospholipids of patients with colorectal cancer (n = 22), sporadic adenoma (n = 27), and normal colon (n = 12) (control group). Additionally, mucosal FAs were assessed in both diseased and normal mucosa of cancer (n = 15) and adenoma (n = 21) patients, and from normal mucosa of controls (n = 8). There were no differences in FA profile of both plasma phospholipids and normal mucosa, between adenoma and control patients. There were considerable differences, however, in FAs between diseased and paired normal mucosa of adenoma patients, with increases of linoleic (p = 0.02), dihomogammalinolenic (p = 0.014), and eicosapentaenoic (p = 0.012) acids, and decreases of alpha linolenic (p = 0.001) and arachidonic (p = 0.02) acids in diseased mucosa. A stepwise reduction of eicosapentaenoic acid concentrations in diseased mucosa from benign adenoma to the most advanced colon cancer was seen (p = 0.009). Cancer patients showed lower alpha linolenate (p = 0.002) and higher dihomogammalinolenate (p = 0.003) in diseased than in paired normal mucosa. In conclusion changes in tissue n3 and n6 FA status might participate in the early phases of the human colorectal carcinogenesis.
Resumo:
In oviparous vertebrates vitellogenin, the precursor of the major yolk proteins, is synthesized in the liver of mature females under the control of estrogen. We have established the organization and primary structure of the 5' end region of the Xenopus laevis vitellogenin A2 gene and of the major chicken vitellogenin gene. The first three homologous exons have exactly the same length in both species, namely 53, 21 and 152 nucleotides, and present an overall sequence homology of 60%. In both species, the 5'-non-coding region of the vitellogenin mRNA measures only 13 nucleotides, nine of which are conserved. In contrast, the corresponding introns of the Xenopus and the chicken vitellogenin gene show no significant sequence homology. Within the 500 nucleotides preceding the 5' end of the genes, at least six blocks with sequence homologies of greater than 70% were detected. It remains to be demonstrated which of these conserved sequences, if any, are involved in the hormone-regulated expression of the vitellogenin genes.
Resumo:
The aim of the Permanent.Plot.ch project is the conservation of historical data about permanent plots in Switzerland and the monitoring of vegetation in a context of environmental changes (mainly climate and land use). Permanent plots are currently being recognized as valuable tools to monitor long-term effects of environmental changes on vegetation. Often used in short studies (3 to 5 years), they are generally abandoned at the end of projects. However, their full potential might only be revealed after 10 or more years, once the location is lost. For instance, some of the oldest permanent plots in Switzerland (first half of the 20th century) were nearly lost, although they are now very valuable data. The Permanent.Plot.ch national database (GIVD ID EU-CH-001), by storing historical and recent data, will allow to ensuring future access to data from permanent vegetation plots. As the database contains some private data, it is not directly available on internet but an overview of the data can be downloaded from internet (http://www.unil.ch/ppch) and precise data are available on request.
Resumo:
During my PhD, my aim was to provide new tools to increase our capacity to analyse gene expression patterns, and to study on a large-scale basis the evolution of gene expression in animals. Gene expression patterns (when and where a gene is expressed) are a key feature in understanding gene function, notably in development. It appears clear now that the evolution of developmental processes and of phenotypes is shaped both by evolution at the coding sequence level, and at the gene expression level.Studying gene expression evolution in animals, with complex expression patterns over tissues and developmental time, is still challenging. No tools are available to routinely compare expression patterns between different species, with precision, and on a large-scale basis. Studies on gene expression evolution are therefore performed only on small genes datasets, or using imprecise descriptions of expression patterns.The aim of my PhD was thus to develop and use novel bioinformatics resources, to study the evolution of gene expression. To this end, I developed the database Bgee (Base for Gene Expression Evolution). The approach of Bgee is to transform heterogeneous expression data (ESTs, microarrays, and in-situ hybridizations) into present/absent calls, and to annotate them to standard representations of anatomy and development of different species (anatomical ontologies). An extensive mapping between anatomies of species is then developed based on hypothesis of homology. These precise annotations to anatomies, and this extensive mapping between species, are the major assets of Bgee, and have required the involvement of many co-workers over the years. My main personal contribution is the development and the management of both the Bgee database and the web-application.Bgee is now on its ninth release, and includes an important gene expression dataset for 5 species (human, mouse, drosophila, zebrafish, Xenopus), with the most data from mouse, human and zebrafish. Using these three species, I have conducted an analysis of gene expression evolution after duplication in vertebrates.Gene duplication is thought to be a major source of novelty in evolution, and to participate to speciation. It has been suggested that the evolution of gene expression patterns might participate in the retention of duplicate genes. I performed a large-scale comparison of expression patterns of hundreds of duplicated genes to their singleton ortholog in an outgroup, including both small and large-scale duplicates, in three vertebrate species (human, mouse and zebrafish), and using highly accurate descriptions of expression patterns. My results showed unexpectedly high rates of de novo acquisition of expression domains after duplication (neofunctionalization), at least as high or higher than rates of partitioning of expression domains (subfunctionalization). I found differences in the evolution of expression of small- and large-scale duplicates, with small-scale duplicates more prone to neofunctionalization. Duplicates with neofunctionalization seemed to evolve under more relaxed selective pressure on the coding sequence. Finally, even with abundant and precise expression data, the majority fate I recovered was neither neo- nor subfunctionalization of expression domains, suggesting a major role for other mechanisms in duplicate gene retention.
Resumo:
This paper analyses and discusses arguments that emerge from a recent discussion about the proper assessment of the evidential value of correspondences observed between the characteristics of a crime stain and those of a sample from a suspect when (i) this latter individual is found as a result of a database search and (ii) remaining database members are excluded as potential sources (because of different analytical characteristics). Using a graphical probability approach (i.e., Bayesian networks), the paper here intends to clarify that there is no need to (i) introduce a correction factor equal to the size of the searched database (i.e., to reduce a likelihood ratio), nor to (ii) adopt a propositional level not directly related to the suspect matching the crime stain (i.e., a proposition of the kind 'some person in (outside) the database is the source of the crime stain' rather than 'the suspect (some other person) is the source of the crime stain'). The present research thus confirms existing literature on the topic that has repeatedly demonstrated that the latter two requirements (i) and (ii) should not be a cause of concern.
Resumo:
Drilled shafts have been used in the US for more than 100 years in bridges and buildings as a deep foundation alternative. For many of these applications, the drilled shafts were designed using the Working Stress Design (WSD) approach. Even though WSD has been used successfully in the past, a move toward Load Resistance Factor Design (LRFD) for foundation applications began when the Federal Highway Administration (FHWA) issued a policy memorandum on June 28, 2000.The policy memorandum requires all new bridges initiated after October 1, 2007, to be designed according to the LRFD approach. This ensures compatibility between the superstructure and substructure designs, and provides a means of consistently incorporating sources of uncertainty into each load and resistance component. Regionally-calibrated LRFD resistance factors are permitted by the American Association of State Highway and Transportation Officials (AASHTO) to improve the economy and competitiveness of drilled shafts. To achieve this goal, a database for Drilled SHAft Foundation Testing (DSHAFT) has been developed. DSHAFT is aimed at assimilating high quality drilled shaft test data from Iowa and the surrounding regions, and identifying the need for further tests in suitable soil profiles. This report introduces DSHAFT and demonstrates its features and capabilities, such as an easy-to-use storage and sharing tool for providing access to key information (e.g., soil classification details and cross-hole sonic logging reports). DSHAFT embodies a model for effective, regional LRFD calibration procedures consistent with PIle LOad Test (PILOT) database, which contains driven pile load tests accumulated from the state of Iowa. PILOT is now available for broader use at the project website: http://srg.cce.iastate.edu/lrfd/. DSHAFT, available in electronic form at http://srg.cce.iastate.edu/dshaft/, is currently comprised of 32 separate load tests provided by Illinois, Iowa, Minnesota, Missouri and Nebraska state departments of transportation and/or department of roads. In addition to serving as a manual for DSHAFT and providing a summary of the available data, this report provides a preliminary analysis of the load test data from Iowa, and will open up opportunities for others to share their data through this quality–assured process, thereby providing a platform to improve LRFD approach to drilled shafts, especially in the Midwest region.
Resumo:
MR structural T1-weighted imaging using high field systems (>3T) is severely hampered by the existing large transmit field inhomogeneities. New sequences have been developed to better cope with such nuisances. In this work we show the potential of a recently proposed sequence, the MP2RAGE, to obtain improved grey white matter contrast with respect to conventional T1-w protocols, allowing for a better visualization of thalamic nuclei and different white matter bundles in the brain stem. Furthermore, the possibility to obtain high spatial resolution (0.65 mm isotropic) R1 maps fully independent of the transmit field inhomogeneities in clinical acceptable time is demonstrated. In this high resolution R1 maps it was possible to clearly observe varying properties of cortical grey matter throughout the cortex and observe different hippocampus fields with variations of intensity that correlate with known myelin concentration variations.
Resumo:
One major methodological problem in analysis of sequence data is the determination of costs from which distances between sequences are derived. Although this problem is currently not optimally dealt with in the social sciences, it has some similarity with problems that have been solved in bioinformatics for three decades. In this article, the authors propose an optimization of substitution and deletion/insertion costs based on computational methods. The authors provide an empirical way of determining costs for cases, frequent in the social sciences, in which theory does not clearly promote one cost scheme over another. Using three distinct data sets, the authors tested the distances and cluster solutions produced by the new cost scheme in comparison with solutions based on cost schemes associated with other research strategies. The proposed method performs well compared with other cost-setting strategies, while it alleviates the justification problem of cost schemes.
Resumo:
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.
Resumo:
A T(2) magnetization-preparation (T(2) Prep) sequence is proposed that is insensitive to B(1) field variations and simultaneously provides fat suppression without any further increase in specific absorption rate (SAR). Increased B(1) inhomogeneity at higher magnetic field strength (B(0) > or = 3T) necessitates a preparation sequence that is less sensitive to B(1) variations. For the proposed technique, T(2) weighting in the image is achieved using a segmented B(1)-insensitive rotation (BIR-4) adiabatic pulse by inserting two equally long delays, one after the initial reverse adiabatic half passage (AHP), and the other before the final AHP segment of a BIR-4 pulse. This sequence yields T(2) weighting with both B(1) and B(0) insensitivity. To simultaneously suppress fat signal (at the cost of B(0) insensitivity), the second delay is prolonged so that fat accumulates additional phase due to its chemical shift. Numerical simulations as well as phantom and in vivo image acquisitions were performed to show the efficacy of the proposed technique.