83 resultados para database solution

em Université de Lausanne, Switzerland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The 'database search problem', that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions. The method's graphical environment, along with its computational and probabilistic architectures, represents a rich package that offers analysts and discussants with additional modes of interaction, concise representation, and coherent communication.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In traditional criminal investigation, uncertainties are often dealt with using a combination of common sense, practical considerations and experience, but rarely with tailored statistical models. For example, in some countries, in order to search for a given profile in the national DNA database, it must have allelic information for six or more of the ten SGM Plus loci for a simple trace. If the profile does not have this amount of information then it cannot be searched in the national DNA database (NDNAD). This requirement (of a result at six or more loci) is not based on a statistical approach, but rather on the feeling that six or more would be sufficient. A statistical approach, however, could be more rigorous and objective and would take into consideration factors such as the probability of adventitious matches relative to the actual database size and/or investigator's requirements in a sensible way. Therefore, this research was undertaken to establish scientific foundations pertaining to the use of partial SGM Plus loci profiles (or similar) for investigation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND AND AIMS: Inflammatory bowel disease (IBD) frequently manifests during childhood and adolescence. For providing and understanding a comprehensive picture of a patients' health status, health-related quality of life (HRQoL) instruments are an essential complement to clinical symptoms and functional limitations. Currently, the IMPACT-III questionnaire is one of the most frequently used disease-specific HRQoL instrument among patients with IBD. However, there is a lack of studies examining the validation and reliability of this instrument. METHODS: 146 paediatric IBD patients from the multicenter Swiss IBD paediatric cohort study database were included in the study. Medical and laboratory data were extracted from the hospital records. HRQoL data were assessed by means of standardized questionnaires filled out by the patients in a face-to-face interview. RESULTS: The original six IMPACT-III domain scales could not be replicated in the current sample. A principal component analysis with the extraction of four factor scores revealed the most robust solution. The four factors indicated good internal reliability (Cronbach's alpha=.64-.86), good concurrent validity measured by correlations with the generic KIDSCREEN-27 scales and excellent discriminant validity for the dimension of physical functioning measured by HRQoL differences for active and inactive severity groups (p<.001, d=1.04). CONCLUSIONS: This study with Swiss children with IBD indicates good validity and reliability for the IMPACT-III questionnaire. However, our findings suggest a slightly different factor structure than originally proposed. The IMPACT-III questionnaire can be recommended for its use in clinical practice. The factor structure should be further examined in other samples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As part of a project to use the long-lived (T(1/2)=1200a) (166m)Ho as reference source in its reference ionisation chamber, IRA standardised a commercially acquired solution of this nuclide using the 4pibeta-gamma coincidence and 4pigamma (NaI) methods. The (166m)Ho solution supplied by Isotope Product Laboratories was measured to have about 5% Europium impurities (3% (154)Eu, 0.94% (152)Eu and 0.9% (155)Eu). Holmium had therefore to be separated from europium, and this was carried out by means of ion-exchange chromatography. The holmium fractions were collected without europium contamination: 162h long HPGe gamma measurements indicated no europium impurity (detection limits of 0.01% for (152)Eu and (154)Eu, and 0.03% for (155)Eu). The primary measurement of the purified (166m)Ho solution with the 4pi (PC) beta-gamma coincidence technique was carried out at three gamma energy settings: a window around the 184.4keV peak and gamma thresholds at 121.8 and 637.3keV. The results show very good self-consistency, and the activity concentration of the solution was evaluated to be 45.640+/-0.098kBq/g (0.21% with k=1). The activity concentration of this solution was also measured by integral counting with a well-type 5''x5'' NaI(Tl) detector and efficiencies computed by Monte Carlo simulations using the GEANT code. These measurements were mutually consistent, while the resulting weighted average of the 4pi NaI(Tl) method was found to agree within 0.15% with the result of the 4pibeta-gamma coincidence technique. An ampoule of this solution and the measured value of the concentration were submitted to the BIPM as a contribution to the Système International de Référence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study is to perform a thorough comparison of quantitative susceptibility mapping (QSM) techniques and their dependence on the assumptions made. The compared methodologies were: two iterative single orientation methodologies minimizing the l2, l1TV norm of the prior knowledge of the edges of the object, one over-determined multiple orientation method (COSMOS) and anewly proposed modulated closed-form solution (MCF). The performance of these methods was compared using a numerical phantom and in-vivo high resolution (0.65mm isotropic) brain data acquired at 7T using a new coil combination method. For all QSM methods, the relevant regularization and prior-knowledge parameters were systematically changed in order to evaluate the optimal reconstruction in the presence and absence of a ground truth. Additionally, the QSM contrast was compared to conventional gradient recalled echo (GRE) magnitude and R2* maps obtained from the same dataset. The QSM reconstruction results of the single orientation methods show comparable performance. The MCF method has the highest correlation (corrMCF=0.95, r(2)MCF =0.97) with the state of the art method (COSMOS) with additional advantage of extreme fast computation time. The l-curve method gave the visually most satisfactory balance between reduction of streaking artifacts and over-regularization with the latter being overemphasized when the using the COSMOS susceptibility maps as ground-truth. R2* and susceptibility maps, when calculated from the same datasets, although based on distinct features of the data, have a comparable ability to distinguish deep gray matter structures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. WWW-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria, and to navigate to related databases exploiting different cross-references. The EPD web site also features yearly updated base frequency matrices for major eukaryotic promoter elements. EPD can be accessed at http://www.epd.isb-sib.ch

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1,000,000 hits from 462,500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

http://resfranco.cochrane.org/sites/resfranco.cochrane.org/files/uploads/Arrettabac2009.pdf

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As part of a collaborative project on the epidemiology of craniofacial anomalies, funded by the National Institutes for Dental and Craniofacial Research and channeled through the Human Genetics Programme of the World Health Organization, the International Perinatal Database of Typical Orofacial Clefts (IPDTOC) was established in 2003. IPDTOC is collecting case-by-case information on cleft lip with or without cleft palate and on cleft palate alone from birth defects registries contributing to at least one of three collaborative organizations: European Surveillance Systems of Congenital Anomalies (EUROCAT) in Europe, National Birth Defects Prevention Network (NBDPN) in the United States, and International Clearinghouse for Birth Defects Surveillance and Research (ICBDSR) worldwide. Analysis of the collected information is performed centrally at the ICBDSR Centre in Rome, Italy, to maximize the comparability of results. The present paper, the first of a series, reports data on the prevalence of cleft lip with or without cleft palate from 54 registries in 30 countries over at least 1 complete year during the period 2000 to 2005. Thus, the denominator comprises more than 7.5 million births. A total of 7704 cases of cleft lip with or without cleft palate (7141 livebirths, 237 stillbirths, 301 terminations of pregnancy, and 25 with pregnancy outcome unknown) were available. The overall prevalence of cleft lip with or without cleft palate was 9.92 per 10,000. The prevalence of cleft lip was 3.28 per 10,000, and that of cleft lip and palate was 6.64 per 10,000. There were 5918 cases (76.8%) that were isolated, 1224 (15.9%) had malformations in other systems, and 562 (7.3%) occurred as part of recognized syndromes. Cases with greater dysmorphological severity of cleft lip with or without cleft palate were more likely to include malformations of other systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Amino acids form the building blocks of all proteins. Naturally occurring amino acids are restricted to a few tens of sidechains, even when considering post-translational modifications and rare amino acids such as selenocysteine and pyrrolysine. However, the potential chemical diversity of amino acid sidechains is nearly infinite. Exploiting this diversity by using non-natural sidechains to expand the building blocks of proteins and peptides has recently found widespread applications in biochemistry, protein engineering and drug design. Despite these applications, there is currently no unified online bioinformatics resource for non-natural sidechains. With the SwissSidechain database (http://www.swisssidechain.ch), we offer a central and curated platform about non-natural sidechains for researchers in biochemistry, medicinal chemistry, protein engineering and molecular modeling. SwissSidechain provides biophysical, structural and molecular data for hundreds of commercially available non-natural amino acid sidechains, both in l- and d-configurations. The database can be easily browsed by sidechain names, families or physico-chemical properties. We also provide plugins to seamlessly insert non-natural sidechains into peptides and proteins using molecular visualization software, as well as topologies and parameters compatible with molecular mechanics software.