63 resultados para relational database


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main goal of CleanEx is to provide access to public gene expression data via unique gene names. A second objective is to represent heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-data set comparisons. A consistent and up-to-date gene nomenclature is achieved by associating each single experiment with a permanent target identifier consisting of a physical description of the targeted RNA population or the hybridization reagent used. These targets are then mapped at regular intervals to the growing and evolving catalogues of human genes and genes from model organisms. The completely automatic mapping procedure relies partly on external genome information resources such as UniGene and RefSeq. The central part of CleanEx is a weekly built gene index containing cross-references to all public expression data already incorporated into the system. In addition, the expression target database of CleanEx provides gene mapping and quality control information for various types of experimental resource, such as cDNA clones or Affymetrix probe sets. The web-based query interfaces offer access to individual entries via text string searches or quantitative expression criteria. CleanEx is accessible at: http://www.cleanex.isb-sib.ch/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the context of recent attempts to redefine the 'skin notation' concept, a position paper summarizing an international workshop on the topic stated that the skin notation should be a hazard indicator related to the degree of toxicity and the potential for transdermal exposure of a chemical. Within the framework of developing a web-based tool integrating this concept, we constructed a database of 7101 agents for which a percutaneous permeation constant can be estimated (using molecular weight and octanol-water partition constant), and for which at least one of the following toxicity indices could be retrieved: Inhalation occupational exposure limit (n=644), Oral lethal dose 50 (LD50, n=6708), cutaneous LD50 (n=1801), Oral no observed adverse effect level (NOAEL, n=1600), and cutaneous NOAEL (n=187). Data sources included the Registry of toxic effects of chemical substances (RTECS, MDL information systems, Inc.), PHYSPROP (Syracuse Research Corp.) and safety cards from the International Programme on Chemical Safety (IPCS). A hazard index, which corresponds to the product of exposure duration and skin surface exposed that would yield an internal dose equal to a toxic reference dose was calculated. This presentation provides a descriptive summary of the database, correlations between toxicity indices, and an example of how the web tool will help industrial hygienist decide on the possibility of a dermal risk using the hazard index.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Several European HIV observational data bases have, over the last decade, accumulated a substantial number of resistance test results and developed large sample repositories, There is a need to link these efforts together, We here describe the development of such a novel tool that allows to bind these data bases together in a distributed fashion for which the control and data remains with the cohorts rather than classic data mergers.METHODS: As proof-of-concept we entered two basic queries into the tool: available resistance tests and available samples. We asked for patients still alive after 1998-01-01, and between 180 and 195 cm of height, and how many samples or resistance tests there would be available for these patients, The queries were uploaded with the tool to a central web server from which each participating cohort downloaded the queries with the tool and ran them against their database, The numbers gathered were then submitted back to the server and we could accumulate the number of available samples and resistance tests.RESULTS: We obtained the following results from the cohorts on available samples/resistance test: EuResist: not availableI11,194; EuroSIDA: 20,71611,992; ICONA: 3,751/500; Rega: 302/302; SHCS: 53,78311,485, In total, 78,552 samples and 15,473 resistance tests were available amongst these five cohorts. Once these data items have been identified, it is trivial to generate lists of relevant samples that would be usefuI for ultra deep sequencing in addition to the already available resistance tests, Saon the tool will include small analysis packages that allow each cohort to pull a report on their cohort profile and also survey emerging resistance trends in their own cohort,CONCLUSIONS: We plan on providing this tool to all cohorts within the Collaborative HIV and Anti-HIV Drug Resistance Network (CHAIN) and will provide the tool free of charge to others for any non-commercial use, The potential of this tool is to ease collaborations, that is, in projects requiring data to speed up identification of novel resistance mutations by increasing the number of observations across multiple cohorts instead of awaiting single cohorts or studies to reach the critical number needed to address such issues.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Résumé: L'automatisation du séquençage et de l'annotation des génomes, ainsi que l'application à large échelle de méthodes de mesure de l'expression génique, génèrent une quantité phénoménale de données pour des organismes modèles tels que l'homme ou la souris. Dans ce déluge de données, il devient très difficile d'obtenir des informations spécifiques à un organisme ou à un gène, et une telle recherche aboutit fréquemment à des réponses fragmentées, voir incomplètes. La création d'une base de données capable de gérer et d'intégrer aussi bien les données génomiques que les données transcriptomiques peut grandement améliorer la vitesse de recherche ainsi que la qualité des résultats obtenus, en permettant une comparaison directe de mesures d'expression des gènes provenant d'expériences réalisées grâce à des techniques différentes. L'objectif principal de ce projet, appelé CleanEx, est de fournir un accès direct aux données d'expression publiques par le biais de noms de gènes officiels, et de représenter des données d'expression produites selon des protocoles différents de manière à faciliter une analyse générale et une comparaison entre plusieurs jeux de données. Une mise à jour cohérente et régulière de la nomenclature des gènes est assurée en associant chaque expérience d'expression de gène à un identificateur permanent de la séquence-cible, donnant une description physique de la population d'ARN visée par l'expérience. Ces identificateurs sont ensuite associés à intervalles réguliers aux catalogues, en constante évolution, des gènes d'organismes modèles. Cette procédure automatique de traçage se fonde en partie sur des ressources externes d'information génomique, telles que UniGene et RefSeq. La partie centrale de CleanEx consiste en un index de gènes établi de manière hebdomadaire et qui contient les liens à toutes les données publiques d'expression déjà incorporées au système. En outre, la base de données des séquences-cible fournit un lien sur le gène correspondant ainsi qu'un contrôle de qualité de ce lien pour différents types de ressources expérimentales, telles que des clones ou des sondes Affymetrix. Le système de recherche en ligne de CleanEx offre un accès aux entrées individuelles ainsi qu'à des outils d'analyse croisée de jeux de donnnées. Ces outils se sont avérés très efficaces dans le cadre de la comparaison de l'expression de gènes, ainsi que, dans une certaine mesure, dans la détection d'une variation de cette expression liée au phénomène d'épissage alternatif. Les fichiers et les outils de CleanEx sont accessibles en ligne (http://www.cleanex.isb-sib.ch/). Abstract: The automatic genome sequencing and annotation, as well as the large-scale gene expression measurements methods, generate a massive amount of data for model organisms. Searching for genespecific or organism-specific information througout all the different databases has become a very difficult task, and often results in fragmented and unrelated answers. The generation of a database which will federate and integrate genomic and transcriptomic data together will greatly improve the search speed as well as the quality of the results by allowing a direct comparison of expression results obtained by different techniques. The main goal of this project, called the CleanEx database, is thus to provide access to public gene expression data via unique gene names and to represent heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and crossdataset comparisons. A consistent and uptodate gene nomenclature is achieved by associating each single gene expression experiment with a permanent target identifier consisting of a physical description of the targeted RNA population or the hybridization reagent used. These targets are then mapped at regular intervals to the growing and evolving catalogues of genes from model organisms, such as human and mouse. The completely automatic mapping procedure relies partly on external genome information resources such as UniGene and RefSeq. The central part of CleanEx is a weekly built gene index containing crossreferences to all public expression data already incorporated into the system. In addition, the expression target database of CleanEx provides gene mapping and quality control information for various types of experimental resources, such as cDNA clones or Affymetrix probe sets. The Affymetrix mapping files are accessible as text files, for further use in external applications, and as individual entries, via the webbased interfaces . The CleanEx webbased query interfaces offer access to individual entries via text string searches or quantitative expression criteria, as well as crossdataset analysis tools, and crosschip gene comparison. These tools have proven to be very efficient in expression data comparison and even, to a certain extent, in detection of differentially expressed splice variants. The CleanEx flat files and tools are available online at: http://www.cleanex.isbsib. ch/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Analyzing the type and frequency of patient-specific mutations that give rise to Duchenne muscular dystrophy (DMD) is an invaluable tool for diagnostics, basic scientific research, trial planning, and improved clinical care. Locus-specific databases allow for the collection, organization, storage, and analysis of genetic variants of disease. Here, we describe the development and analysis of the TREAT-NMD DMD Global database (http://umd.be/TREAT_DMD/). We analyzed genetic data for 7,149 DMD mutations held within the database. A total of 5,682 large mutations were observed (80% of total mutations), of which 4,894 (86%) were deletions (1 exon or larger) and 784 (14%) were duplications (1 exon or larger). There were 1,445 small mutations (smaller than 1 exon, 20% of all mutations), of which 358 (25%) were small deletions and 132 (9%) small insertions and 199 (14%) affected the splice sites. Point mutations totalled 756 (52% of small mutations) with 726 (50%) nonsense mutations and 30 (2%) missense mutations. Finally, 22 (0.3%) mid-intronic mutations were observed. In addition, mutations were identified within the database that would potentially benefit from novel genetic therapies for DMD including stop codon read-through therapies (10% of total mutations) and exon skipping therapy (80% of deletions and 55% of total mutations).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Radial maze tasks have been used to assess optimal foraging and spatial abilities in rodents. The spatial performance was based on a capacity to rely on a configuration of local and distant cues. We adapted maze procedures assessing the relative weight of local cues and distant landmarks for arm choice in humans. NEW METHOD: The procedure allowed testing memory of places in four experimental setups: a fingertip texture-groove maze, a tactile screen maze, a virtual radial maze and a walking size maze. During training, the four reinforced positions remained fixed relative to local and distal cues. During subsequent conflict trials, these frameworks were made conflictive in the prediction of reward locations. RESULTS: Three experiments showed that the relative weight of local and distal relational cues is affected by different factors such as cues' nature, visual access to the environment, real vs. virtual environment, and gender. A fourth experiment illustrated how a walking maze can be used with people suffering intellectual disability. COMPARISON WITH EXISTING METHODS: In our procedure, long-term (reference) and short-term (working) memory can be assessed. It is the first radial task adapted to human that enables dissociating local and distal cues, to provides an indication as to their relative salience. Our mazes are moveable and easily used in limited spaces. Tasks are performed with realistic and spontaneous though controlled exploratory movements. CONCLUSION: Our tasks enabled highlighting the use of different strategies. In a clinical perspective, considering the use of compensatory strategies should orient towards adapted behavioural rehabilitation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Physical activity and sedentary behaviour in youth have been reported to vary by sex, age, weight status and country. However, supporting data are often self-reported and/or do not encompass a wide range of ages or geographical locations. This study aimed to describe objectively-measured physical activity and sedentary time patterns in youth. METHODS: The International Children's Accelerometry Database (ICAD) consists of ActiGraph accelerometer data from 20 studies in ten countries, processed using common data reduction procedures. Analyses were conducted on 27,637 participants (2.8-18.4 years) who provided at least three days of valid accelerometer data. Linear regression was used to examine associations between age, sex, weight status, country and physical activity outcomes. RESULTS: Boys were less sedentary and more active than girls at all ages. After 5 years of age there was an average cross-sectional decrease of 4.2 % in total physical activity with each additional year of age, due mainly to lower levels of light-intensity physical activity and greater time spent sedentary. Physical activity did not differ by weight status in the youngest children, but from age seven onwards, overweight/obese participants were less active than their normal weight counterparts. Physical activity varied between samples from different countries, with a 15-20 % difference between the highest and lowest countries at age 9-10 and a 26-28 % difference at age 12-13. CONCLUSIONS: Physical activity differed between samples from different countries, but the associations between demographic characteristics and physical activity were consistently observed. Further research is needed to explore environmental and sociocultural explanations for these differences.