109 resultados para data compression
Resumo:
Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION: All such materials are available at http://questfororthologs.org. CONTACT: erik.sonnhammer@scilifelab.se or c.dessimoz@ucl.ac.uk.
Resumo:
This study presents a classification criteria for two-class Cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland, law enforcement authorities regularly ask laboratories to determine cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. In this study, the classification analysis is based on data obtained from the relative proportion of three major leaf compounds measured by gas-chromatography interfaced with mass spectrometry (GC-MS). The aim is to discriminate between drug type (illegal) and fiber type (legal) cannabis at an early stage of the growth. A Bayesian procedure is proposed: a Bayes factor is computed and classification is performed on the basis of the decision maker specifications (i.e. prior probability distributions on cannabis type and consequences of classification measured by losses). Classification rates are computed with two statistical models and results are compared. Sensitivity analysis is then performed to analyze the robustness of classification criteria.
Advanced mapping of environmental data: Geostatistics, Machine Learning and Bayesian Maximum Entropy
Resumo:
This book combines geostatistics and global mapping systems to present an up-to-the-minute study of environmental data. Featuring numerous case studies, the reference covers model dependent (geostatistics) and data driven (machine learning algorithms) analysis techniques such as risk mapping, conditional stochastic simulations, descriptions of spatial uncertainty and variability, artificial neural networks (ANN) for spatial data, Bayesian maximum entropy (BME), and more.
Resumo:
Indirect calorimetry based on respiratory exchange measurement has been successfully used from the beginning of the century to obtain an estimate of heat production (energy expenditure) in human subjects and animals. The errors inherent to this classical technique can stem from various sources: 1) model of calculation and assumptions, 2) calorimetric factors used, 3) technical factors and 4) human factors. The physiological and biochemical factors influencing the interpretation of calorimetric data include a change in the size of the bicarbonate and urea pools and the accumulation or loss (via breath, urine or sweat) of intermediary metabolites (gluconeogenesis, ketogenesis). More recently, respiratory gas exchange data have been used to estimate substrate utilization rates in various physiological and metabolic situations (fasting, post-prandial state, etc.). It should be recalled that indirect calorimetry provides an index of overall substrate disappearance rates. This is incorrectly assumed to be equivalent to substrate "oxidation" rates. Unfortunately, there is no adequate golden standard to validate whole body substrate "oxidation" rates, and this contrasts to the "validation" of heat production by indirect calorimetry, through use of direct calorimetry under strict thermal equilibrium conditions. Tracer techniques using stable (or radioactive) isotopes, represent an independent way of assessing substrate utilization rates. When carbohydrate metabolism is measured with both techniques, indirect calorimetry generally provides consistent glucose "oxidation" rates as compared to isotopic tracers, but only when certain metabolic processes (such as gluconeogenesis and lipogenesis) are minimal or / and when the respiratory quotients are not at the extreme of the physiological range. However, it is believed that the tracer techniques underestimate true glucose "oxidation" rates due to the failure to account for glycogenolysis in the tissue storing glucose, since this escapes the systemic circulation. A major advantage of isotopic techniques is that they are able to estimate (given certain assumptions) various metabolic processes (such as gluconeogenesis) in a noninvasive way. Furthermore when, in addition to the 3 macronutrients, a fourth substrate is administered (such as ethanol), isotopic quantification of substrate "oxidation" allows one to eliminate the inherent assumptions made by indirect calorimetry. In conclusion, isotopic tracers techniques and indirect calorimetry should be considered as complementary techniques, in particular since the tracer techniques require the measurement of carbon dioxide production obtained by indirect calorimetry. However, it should be kept in mind that the assessment of substrate oxidation by indirect calorimetry may involve large errors in particular over a short period of time. By indirect calorimetry, energy expenditure (heat production) is calculated with substantially less error than substrate oxidation rates.
Resumo:
The assessment of medical technologies has to answer several questions ranging from safety and effectiveness to complex economical, social, and health policy issues. The type of data needed to carry out such evaluation depends on the specific questions to be answered, as well as on the stage of development of a technology. Basically two types of data may be distinguished: (a) general demographic, administrative, or financial data which has been collected not specifically for technology assessment; (b) the data collected with respect either to a specific technology or to a disease or medical problem. On the basis of a pilot inquiry in Europe and bibliographic research, the following categories of type (b) data bases have been identified: registries, clinical data bases, banks of factual and bibliographic knowledge, and expert systems. Examples of each category are discussed briefly. The following aims for further research and practical goals are proposed: criteria for the minimal data set required, improvement to the registries and clinical data banks, and development of an international clearinghouse to enhance information diffusion on both existing data bases and available reports on medical technology assessments.
Resumo:
The present review will briefly summarize the interplay between coagulation and inflammation, highlighting possible effects of direct inhibition of factor Xa and thrombin beyond anticoagulation. Additionally, the rationale for the use of the new direct oral anticoagulants (DOACs) for indications such as cancer-associated venous thromboembolism (CAT), mechanical heart valves, thrombotic anti-phospholipid syndrome (APS), and heparin-induced thrombocytopenia (HIT) will be explored. Published data on patients with cancer or mechanical heart valves treated with DOAC will be discussed, as well as planned studies in APS and HIT. Although at the present time published evidence is insufficient for recommending DOAC in the above-mentioned indications, there are good arguments in favor of clinical trials investigating their efficacy in these contexts. Direct inhibition of factor Xa or thrombin may reveal interesting effects beyond anticoagulation as well.
Resumo:
As part of a collaborative project on the epidemiology of craniofacial anomalies, funded by the National Institutes for Dental and Craniofacial Research and channeled through the Human Genetics Programme of the World Health Organization, the International Perinatal Database of Typical Orofacial Clefts (IPDTOC) was established in 2003. IPDTOC is collecting case-by-case information on cleft lip with or without cleft palate and on cleft palate alone from birth defects registries contributing to at least one of three collaborative organizations: European Surveillance Systems of Congenital Anomalies (EUROCAT) in Europe, National Birth Defects Prevention Network (NBDPN) in the United States, and International Clearinghouse for Birth Defects Surveillance and Research (ICBDSR) worldwide. Analysis of the collected information is performed centrally at the ICBDSR Centre in Rome, Italy, to maximize the comparability of results. The present paper, the first of a series, reports data on the prevalence of cleft lip with or without cleft palate from 54 registries in 30 countries over at least 1 complete year during the period 2000 to 2005. Thus, the denominator comprises more than 7.5 million births. A total of 7704 cases of cleft lip with or without cleft palate (7141 livebirths, 237 stillbirths, 301 terminations of pregnancy, and 25 with pregnancy outcome unknown) were available. The overall prevalence of cleft lip with or without cleft palate was 9.92 per 10,000. The prevalence of cleft lip was 3.28 per 10,000, and that of cleft lip and palate was 6.64 per 10,000. There were 5918 cases (76.8%) that were isolated, 1224 (15.9%) had malformations in other systems, and 562 (7.3%) occurred as part of recognized syndromes. Cases with greater dysmorphological severity of cleft lip with or without cleft palate were more likely to include malformations of other systems.
Resumo:
Within the framework of a retrospective study of the incidence of hip fractures in the canton of Vaud (Switzerland), all cases of hip fracture occurring among the resident population in 1986 and treated in the hospitals of the canton were identified from among five different information sources. Relevant data were then extracted from the medical records. At least two sources of information were used to identify cases in each hospital, among them the statistics of the Swiss Hospital Association (VESKA). These statistics were available for 9 of the 18 hospitals in the canton that participated in the study. The number of cases identified from the VESKA statistics was compared to the total number of cases for each hospital. For the 9 hospitals the number of cases in the VESKA statistics was 407, whereas, after having excluded diagnoses that were actually "status after fracture" and double entries, the total for these hospitals was 392, that is 4% less than the VESKA statistics indicate. It is concluded that the VESKA statistics provide a good approximation of the actual number of cases treated in these hospitals, with a tendency to overestimate this number. In order to use these statistics for calculating incidence figures, however, it is imperative that a greater proportion of all hospitals (50% presently in the canton, 35% nationwide) participate in these statistics.
Resumo:
The HUPO Proteomics Standards Initiative has developed several standardized data formats to facilitate data sharing in mass spectrometry (MS)-based proteomics. These allow researchers to report their complete results in a unified way. However, at present, there is no format to describe the final qualitative and quantitative results for proteomics and metabolomics experiments in a simple tabular format. Many downstream analysis use cases are only concerned with the final results of an experiment and require an easily accessible format, compatible with tools such as Microsoft Excel or R. We developed the mzTab file format for MS-based proteomics and metabolomics results to meet this need. mzTab is intended as a lightweight supplement to the existing standard XML-based file formats (mzML, mzIdentML, mzQuantML), providing a comprehensive summary, similar in concept to the supplemental material of a scientific publication. mzTab files can contain protein, peptide, and small molecule identifications together with experimental metadata and basic quantitative information. The format is not intended to store the complete experimental evidence but provides mechanisms to report results at different levels of detail. These range from a simple summary of the final results to a representation of the results including the experimental design. This format is ideally suited to make MS-based proteomics and metabolomics results available to a wider biological community outside the field of MS. Several software tools for proteomics and metabolomics have already adapted the format as an output format. The comprehensive mzTab specification document and extensive additional documentation can be found online.