800 resultados para Distributed database


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although research on influenza lasted for more than 100 years, it is still one of the most prominent diseases causing half a million human deaths every year. With the recent observation of new highly pathogenic H5N1 and H7N7 strains, and the appearance of the influenza pandemic caused by the H1N1 swine-like lineage, a collaborative effort to share observations on the evolution of this virus in both animals and humans has been established. The OpenFlu database (OpenFluDB) is a part of this collaborative effort. It contains genomic and protein sequences, as well as epidemiological data from more than 27,000 isolates. The isolate annotations include virus type, host, geographical location and experimentally tested antiviral resistance. Putative enhanced pathogenicity as well as human adaptation propensity are computed from protein sequences. Each virus isolate can be associated with the laboratories that collected, sequenced and submitted it. Several analysis tools including multiple sequence alignment, phylogenetic analysis and sequence similarity maps enable rapid and efficient mining. The contents of OpenFluDB are supplied by direct user submission, as well as by a daily automatic procedure importing data from public repositories. Additionally, a simple mechanism facilitates the export of OpenFluDB records to GenBank. This resource has been successfully used to rapidly and widely distribute the sequences collected during the recent human swine flu outbreak and also as an exchange platform during the vaccine selection procedure. Database URL: http://openflu.vital-it.ch.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360,000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We examined sequence variation in the mitochondrial cytochrome b gene (1140 bp, n = 73) and control region (842-851 bp, n = 74) in the Eurasian harvest mouse (Micromys minutus (Pallas, 1771)), with samples drawn from across its range, from Western Europe to Japan. Phylogeographic analyses revealed region-specific haplotype groupings combined with overall low levels of inter-regional genetic divergence. Despite the enormous intervening distance, European and East Asian samples showed a net nucleotide divergence of only 0.36%. Based on an evolutionary rate for the cytochrome b gene of 2.4%(.)(site(.)lineage(.)million years)(-1), the initial divergence time of these populations is estimated at around 80 000 years before present. Our findings are consistent with available fossil evidence that has recorded repeated cycles of extinction and recolonization of Europe by M. minutus through the Quaternary. The molecular data further suggest that recolonization occurred from refugia in the Central to East Asian region. Japanese haplotypes of M. minutus, with the exception of those from Tsushima Is., show limited nucleotide diversity (0.15%) compared with those found on the adjacent Korean Peninsula. This finding suggests recent colonization of the Japanese Archipelago, probably around the last glacial period, followed by rapid population growth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper applies probability and decision theory in the graphical interface of an influence diagram to study the formal requirements of rationality which justify the individualization of a person found through a database search. The decision-theoretic part of the analysis studies the parameters that a rational decision maker would use to individualize the selected person. The modeling part (in the form of an influence diagram) clarifies the relationships between this decision and the ingredients that make up the database search problem, i.e., the results of the database search and the different pairs of propositions describing whether an individual is at the source of the crime stain. These analyses evaluate the desirability associated with the decision of 'individualizing' (and 'not individualizing'). They point out that this decision is a function of (i) the probability that the individual in question is, in fact, at the source of the crime stain (i.e., the state of nature), and (ii) the decision maker's preferences among the possible consequences of the decision (i.e., the decision maker's loss function). We discuss the relevance and argumentative implications of these insights with respect to recent comments in specialized literature, which suggest points of view that are opposed to the results of our study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Fourmidable is an infrastructure to curate and share the emerging genetic, molecular, and functional genomic data and protocols for ants. DESCRIPTION: The Fourmidable assembly pipeline groups nucleotide sequences into clusters before independently assembling each cluster. Subsequently, assembled sequences are annotated via Interproscan and BLAST against general and insect-specific databases. Gene-specific information can be retrieved using gene identifiers, searching for similar sequences or browsing through inferred Gene Ontology annotations. The database will readily scale as ultra-high throughput sequence data and sequences from additional species become available. CONCLUSION: Fourmidable currently houses EST data from two ant species and microarray gene expression data for one of these. Fourmidable is publicly available at http://fourmidable.unil.ch.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Complete Arabidopsis Transcriptome Micro Array (CATMA) database contains gene sequence tag (GST) and gene model sequences for over 70% of the predicted genes in the Arabidopsis thaliana genome as well as primer sequences for GST amplification and a wide range of supplementary information. All CATMA GST sequences are specific to the gene for which they were designed, and all gene models were predicted from a complete reannotation of the genome using uniform parameters. The database is searchable by sequence name, sequence homology or direct SQL query, and is available through the CATMA website at http://www.catma.org/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

• Examine current pile design and construction procedures used by the Iowa Department of Transportation (DOT). • Recommend changes and improvements to these procedures that are consistent with available pile load test data, soils information, and bridge design practice recommended by the Load and Resistance Factor Design (LRFD) approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of the Permanent.Plot.ch project is the conservation of historical data about permanent plots in Switzerland and the monitoring of vegetation in a context of environmental changes (mainly climate and land use). Permanent plots are currently being recognized as valuable tools to monitor long-term effects of environmental changes on vegetation. Often used in short studies (3 to 5 years), they are generally abandoned at the end of projects. However, their full potential might only be revealed after 10 or more years, once the location is lost. For instance, some of the oldest permanent plots in Switzerland (first half of the 20th century) were nearly lost, although they are now very valuable data. The Permanent.Plot.ch national database (GIVD ID EU-CH-001), by storing historical and recent data, will allow to ensuring future access to data from permanent vegetation plots. As the database contains some private data, it is not directly available on internet but an overview of the data can be downloaded from internet (http://www.unil.ch/ppch) and precise data are available on request.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper analyses and discusses arguments that emerge from a recent discussion about the proper assessment of the evidential value of correspondences observed between the characteristics of a crime stain and those of a sample from a suspect when (i) this latter individual is found as a result of a database search and (ii) remaining database members are excluded as potential sources (because of different analytical characteristics). Using a graphical probability approach (i.e., Bayesian networks), the paper here intends to clarify that there is no need to (i) introduce a correction factor equal to the size of the searched database (i.e., to reduce a likelihood ratio), nor to (ii) adopt a propositional level not directly related to the suspect matching the crime stain (i.e., a proposition of the kind 'some person in (outside) the database is the source of the crime stain' rather than 'the suspect (some other person) is the source of the crime stain'). The present research thus confirms existing literature on the topic that has repeatedly demonstrated that the latter two requirements (i) and (ii) should not be a cause of concern.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Drilled shafts have been used in the US for more than 100 years in bridges and buildings as a deep foundation alternative. For many of these applications, the drilled shafts were designed using the Working Stress Design (WSD) approach. Even though WSD has been used successfully in the past, a move toward Load Resistance Factor Design (LRFD) for foundation applications began when the Federal Highway Administration (FHWA) issued a policy memorandum on June 28, 2000.The policy memorandum requires all new bridges initiated after October 1, 2007, to be designed according to the LRFD approach. This ensures compatibility between the superstructure and substructure designs, and provides a means of consistently incorporating sources of uncertainty into each load and resistance component. Regionally-calibrated LRFD resistance factors are permitted by the American Association of State Highway and Transportation Officials (AASHTO) to improve the economy and competitiveness of drilled shafts. To achieve this goal, a database for Drilled SHAft Foundation Testing (DSHAFT) has been developed. DSHAFT is aimed at assimilating high quality drilled shaft test data from Iowa and the surrounding regions, and identifying the need for further tests in suitable soil profiles. This report introduces DSHAFT and demonstrates its features and capabilities, such as an easy-to-use storage and sharing tool for providing access to key information (e.g., soil classification details and cross-hole sonic logging reports). DSHAFT embodies a model for effective, regional LRFD calibration procedures consistent with PIle LOad Test (PILOT) database, which contains driven pile load tests accumulated from the state of Iowa. PILOT is now available for broader use at the project website: http://srg.cce.iastate.edu/lrfd/. DSHAFT, available in electronic form at http://srg.cce.iastate.edu/dshaft/, is currently comprised of 32 separate load tests provided by Illinois, Iowa, Minnesota, Missouri and Nebraska state departments of transportation and/or department of roads. In addition to serving as a manual for DSHAFT and providing a summary of the available data, this report provides a preliminary analysis of the load test data from Iowa, and will open up opportunities for others to share their data through this quality–assured process, thereby providing a platform to improve LRFD approach to drilled shafts, especially in the Midwest region.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An Actively Heated Fiber Optics (AHFO) method to estimate soil moisture is tested and the analysis technique improved on. The measurements were performed in a lysimeter uniformly packed with loam soil with variable water content profiles. In the first meter of the soil profi le, 30 m of fiber optic cable were installed in a 12 loops coil. The metal sheath armoring the fiber cable was used as an electrical resistance heater to generate a heat pulse, and the soil response was monitored with a Distributed Temperature Sensing (DTS) system. We study the cooling following three continuous heat pulses of 120 s at 36 W m(-1) by means of long-time approximation of radial heat conduction. The soil volumetric water contents were then inferred from the estimated thermal conductivities through a specifically calibrated model relating thermal conductivity and volumetric water content. To use the pre-asymptotic data we employed a time correction that allowed the volumetric water content to be estimated with a precision of 0.01-0.035 (m(3) m(-3)). A comparison of the AHFO measurements with soil-moisture measurements obtained with calibrated capacitance-based probes gave good agreement for wetter soils [discrepancy between the two methods was less than 0.04 (m(3) m(-3))]. In the shallow drier soils, the AHFO method underestimated the volumetric water content due to the longertime required for the temperature increment to become asymptotic in less thermally conductive media [discrepancy between the two methods was larger than 0.1 (m(3) m(-3))]. The present work suggests that future applications of the AHFO method should include longer heat pulses, that longer heating and cooling events are analyzed, and, temperature increments ideally be measured with higher frequency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Application of semi-distributed hydrological models to large, heterogeneous watersheds deals with several problems. On one hand, the spatial and temporal variability in catchment features should be adequately represented in the model parameterization, while maintaining the model complexity in an acceptable level to take advantage of state-of-the-art calibration techniques. On the other hand, model complexity enhances uncertainty in adjusted model parameter values, therefore increasing uncertainty in the water routing across the watershed. This is critical for water quality applications, where not only streamflow, but also a reliable estimation of the surface versus subsurface contributions to the runoff is needed. In this study, we show how a regularized inversion procedure combined with a multiobjective function calibration strategy successfully solves the parameterization of a complex application of a water quality-oriented hydrological model. The final value of several optimized parameters showed significant and consistentdifferences across geological and landscape features. Although the number of optimized parameters was significantly increased by the spatial and temporal discretization of adjustable parameters, the uncertainty in water routing results remained at reasonable values. In addition, a stepwise numerical analysis showed that the effects on calibration performance due to inclusion of different data types in the objective function could be inextricably linked. Thus caution should be taken when adding or removing data from an aggregated objective function.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing.