102 resultados para Data sets storage
Resumo:
One major methodological problem in analysis of sequence data is the determination of costs from which distances between sequences are derived. Although this problem is currently not optimally dealt with in the social sciences, it has some similarity with problems that have been solved in bioinformatics for three decades. In this article, the authors propose an optimization of substitution and deletion/insertion costs based on computational methods. The authors provide an empirical way of determining costs for cases, frequent in the social sciences, in which theory does not clearly promote one cost scheme over another. Using three distinct data sets, the authors tested the distances and cluster solutions produced by the new cost scheme in comparison with solutions based on cost schemes associated with other research strategies. The proposed method performs well compared with other cost-setting strategies, while it alleviates the justification problem of cost schemes.
Resumo:
The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.
Resumo:
Predictive groundwater modeling requires accurate information about aquifer characteristics. Geophysical imaging is a powerful tool for delineating aquifer properties at an appropriate scale and resolution, but it suffers from problems of ambiguity. One way to overcome such limitations is to adopt a simultaneous multitechnique inversion strategy. We have developed a methodology for aquifer characterization based on structural joint inversion of multiple geophysical data sets followed by clustering to form zones and subsequent inversion for zonal parameters. Joint inversions based on cross-gradient structural constraints require less restrictive assumptions than, say, applying predefined petro-physical relationships and generally yield superior results. This approach has, for the first time, been applied to three geophysical data types in three dimensions. A classification scheme using maximum likelihood estimation is used to determine the parameters of a Gaussian mixture model that defines zonal geometries from joint-inversion tomograms. The resulting zones are used to estimate representative geophysical parameters of each zone, which are then used for field-scale petrophysical analysis. A synthetic study demonstrated how joint inversion of seismic and radar traveltimes and electrical resistance tomography (ERT) data greatly reduces misclassification of zones (down from 21.3% to 3.7%) and improves the accuracy of retrieved zonal parameters (from 1.8% to 0.3%) compared to individual inversions. We applied our scheme to a data set collected in northeastern Switzerland to delineate lithologic subunits within a gravel aquifer. The inversion models resolve three principal subhorizontal units along with some important 3D heterogeneity. Petro-physical analysis of the zonal parameters indicated approximately 30% variation in porosity within the gravel aquifer and an increasing fraction of finer sediments with depth.
Resumo:
Genotypic frequencies at codominant marker loci in population samples convey information on mating systems. A classical way to extract this information is to measure heterozygote deficiencies (FIS) and obtain the selfing rate s from FIS = s/(2 - s), assuming inbreeding equilibrium. A major drawback is that heterozygote deficiencies are often present without selfing, owing largely to technical artefacts such as null alleles or partial dominance. We show here that, in the absence of gametic disequilibrium, the multilocus structure can be used to derive estimates of s independent of FIS and free of technical biases. Their statistical power and precision are comparable to those of FIS, although they are sensitive to certain types of gametic disequilibria, a bias shared with progeny-array methods but not FIS. We analyse four real data sets spanning a range of mating systems. In two examples, we obtain s = 0 despite positive FIS, strongly suggesting that the latter are artefactual. In the remaining examples, all estimates are consistent. All the computations have been implemented in a open-access and user-friendly software called rmes (robust multilocus estimate of selfing) available at http://ftp.cefe.cnrs.fr, and can be used on any multilocus data. Being able to extract the reliable information from imperfect data, our method opens the way to make use of the ever-growing number of published population genetic studies, in addition to the more demanding progeny-array approaches, to investigate selfing rates.
Resumo:
Geophysical data may provide crucial information about hydrological properties, states, and processes that are difficult to obtain by other means. Large data sets can be acquired over widely different scales in a minimally invasive manner and at comparatively low costs, but their effective use in hydrology makes it necessary to understand the fidelity of geophysical models, the assumptions made in their construction, and the links between geophysical and hydrological properties. Geophysics has been applied for groundwater prospecting for almost a century, but it is only in the last 20 years that it is regularly used together with classical hydrological data to build predictive hydrological models. A largely unexplored venue for future work is to use geophysical data to falsify or rank competing conceptual hydrological models. A promising cornerstone for such a model selection strategy is the Bayes factor, but it can only be calculated reliably when considering the main sources of uncertainty throughout the hydrogeophysical parameter estimation process. Most classical geophysical imaging tools tend to favor models with smoothly varying property fields that are at odds with most conceptual hydrological models of interest. It is thus necessary to account for this bias or use alternative approaches in which proposed conceptual models are honored at all steps in the model building process.
Resumo:
PURPOSE: To develop a consensus opinion regarding capturing diagnosis-timing in coded hospital data. METHODS: As part of the World Health Organization International Classification of Diseases-11th Revision initiative, the Quality and Safety Topic Advisory Group is charged with enhancing the capture of quality and patient safety information in morbidity data sets. One such feature is a diagnosis-timing flag. The Group has undertaken a narrative literature review, scanned national experiences focusing on countries currently using timing flags, and held a series of meetings to derive formal recommendations regarding diagnosis-timing reporting. RESULTS: The completeness of diagnosis-timing reporting continues to improve with experience and use; studies indicate that it enhances risk-adjustment and may have a substantial impact on hospital performance estimates, especially for conditions/procedures that involve acutely ill patients. However, studies suggest that its reliability varies, is better for surgical than medical patients (kappa in hip fracture patients of 0.7-1.0 versus kappa in pneumonia of 0.2-0.6) and is dependent on coder training and setting. It may allow simpler and more precise specification of quality indicators. CONCLUSIONS: As the evidence indicates that a diagnosis-timing flag improves the ability of routinely collected, coded hospital data to support outcomes research and the development of quality and safety indicators, the Group recommends that a classification of 'arising after admission' (yes/no), with permitted designations of 'unknown or clinically undetermined', will facilitate coding while providing flexibility when there is uncertainty. Clear coding standards and guidelines with ongoing coder education will be necessary to ensure reliability of the diagnosis-timing flag.
Resumo:
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Impact of preoperative risk factors on morbidity after esophagectomy: is there room for improvement?
Resumo:
BACKGROUND: Despite progress in multidisciplinary treatment of esophageal cancer, oncologic esophagectomy is still the cornerstone of therapeutic strategies. Several scoring systems are used to predict postoperative morbidity, but in most cases they identify nonmodifiable parameters. The aim of this study was to identify potentially modifiable risk factors associated with complications after oncologic esophagectomy. METHODS: All consecutive patients with complete data sets undergoing oncologic esophagectomy in our department during 2001-2011 were included in this study. As potentially modifiable risk factors we assessed nutritional status depicted by body mass index (BMI) and preoperative serum albumin levels, excessive alcohol consumption, and active smoking. Postoperative complications were graded according to a validated 5-grade system. Univariate and multivariate analyses were used to identify preoperative risk factors associated with the occurrence and severity of complications. RESULTS: Our series included 93 patients. Overall morbidity rate was 81 % (n = 75), with 56 % (n = 52) minor complications and 18 % (n = 17) major complications. Active smoking and excessive alcohol consumption were associated with the occurrence of severe complications, whereas BMI and low preoperative albumin levels were not. The simultaneous presence of two or more of these risk factors significantly increased the risk of postoperative complications. CONCLUSIONS: A combination of malnutrition, active smoking and alcohol consumption were found to have a negative impact on postoperative morbidity rates. Therefore, preoperative smoking and alcohol cessation counseling and monitoring and improving the nutritional status are strongly recommended.
Resumo:
1. Statistical modelling is often used to relate sparse biological survey data to remotely derived environmental predictors, thereby providing a basis for predictively mapping biodiversity across an entire region of interest. The most popular strategy for such modelling has been to model distributions of individual species one at a time. Spatial modelling of biodiversity at the community level may, however, confer significant benefits for applications involving very large numbers of species, particularly if many of these species are recorded infrequently. 2. Community-level modelling combines data from multiple species and produces information on spatial pattern in the distribution of biodiversity at a collective community level instead of, or in addition to, the level of individual species. Spatial outputs from community-level modelling include predictive mapping of community types (groups of locations with similar species composition), species groups (groups of species with similar distributions), axes or gradients of compositional variation, levels of compositional dissimilarity between pairs of locations, and various macro-ecological properties (e.g. species richness). 3. Three broad modelling strategies can be used to generate these outputs: (i) 'assemble first, predict later', in which biological survey data are first classified, ordinated or aggregated to produce community-level entities or attributes that are then modelled in relation to environmental predictors; (ii) 'predict first, assemble later', in which individual species are modelled one at a time as a function of environmental variables, to produce a stack of species distribution maps that is then subjected to classification, ordination or aggregation; and (iii) 'assemble and predict together', in which all species are modelled simultaneously, within a single integrated modelling process. These strategies each have particular strengths and weaknesses, depending on the intended purpose of modelling and the type, quality and quantity of data involved. 4. Synthesis and applications. The potential benefits of modelling large multispecies data sets using community-level, as opposed to species-level, approaches include faster processing, increased power to detect shared patterns of environmental response across rarely recorded species, and enhanced capacity to synthesize complex data into a form more readily interpretable by scientists and decision-makers. Community-level modelling therefore deserves to be considered more often, and more widely, as a potential alternative or supplement to modelling individual species.
Resumo:
We evaluated a new pulse oximeter designed to monitor beat-to-beat arterial oxygen saturation (SaO2) and compared the monitored SaO2 with arterial samples measured by co-oximetry. In 40 critically ill children (112 data sets) with a mean age of 3.9 years (range 1 day to 19 years), SaO2 ranged from 57% to 100%, and PaO2 from 27 to 128 mm Hg, heart rates from 85 to 210 beats per minute, hematocrit from 20% to 67%, and fetal hemoglobin levels from 1.3% to 60%; peripheral temperatures varied between 26.5 degrees and 36.5 degrees C. Linear correlation analysis revealed a good agreement between simultaneous pulse oximeter values and both directly measured SaO2 (r = 0.95) and that calculated from measured arterial PaO2 (r = 0.95). The device detected several otherwise unrecognized drops in SaO2 but failed to function in four patients with poor peripheral perfusion secondary to low cardiac output. Simultaneous measurements with a tcPO2 electrode showed a similarly good correlation with PaO22 (r = 0.91), but the differences between the two measurements were much wider (mean 7.1 +/- 10.3 mm Hg, range -14 to +49 mm Hg) than the differences between pulse oximeter SaO2 and measured SaO2 (1.5% +/- 3.5%, range -7.5% to -9%) and were not predictable. We conclude that pulse oximetry is a reliable and accurate noninvasive device for measuring saturation, which because of its rapid response time may be an important advance in monitoring changes in oxygenation and guiding oxygen therapy.
Resumo:
Somatic copy number aberrations (CNA) represent a mutation type encountered in the majority of cancer genomes. Here, we present the 2014 edition of arrayMap (http://www.arraymap.org), a publicly accessible collection of pre-processed oncogenomic array data sets and CNA profiles, representing a vast range of human malignancies. Since the initial release, we have enhanced this resource both in content and especially with regard to data mining support. The 2014 release of arrayMap contains more than 64,000 genomic array data sets, representing about 250 tumor diagnoses. Data sets included in arrayMap have been assembled from public repositories as well as additional resources, and integrated by applying custom processing pipelines. Online tools have been upgraded for a more flexible array data visualization, including options for processing user provided, non-public data sets. Data integration has been improved by mapping to multiple editions of the human reference genome, with the majority of the data now being available for the UCSC hg18 as well as GRCh37 versions. The large amount of tumor CNA data in arrayMap can be freely downloaded by users to promote data mining projects, and to explore special events such as chromothripsis-like genome patterns.
Resumo:
In several colour polymorphic species, morphs differ in thermoregulation either because dark and pale surfaces absorb solar radiation to a different extent and/or because morphs differ in key metabolic processes. Morph-specific thermoregulation may potentially account for the observation that differently coloured individuals are frequently not randomly distributed among habitats, and differ in many respects, including behaviour, morphology, survival and reproductive success. In a wild population of the colour polymorphic tawny owl Strix aluco, a recent cross-fostering experiment showed that offspring raised and born from red mothers were heavier than those from grey mothers. In the present study, we tested in the same individuals whether these morph-specific offspring growth patterns were associated with a difference in metabolic rate between offspring of red and grey mothers. For this purpose, we measured nestling oxygen consumption under two different temperatures (laboratory measurements: 4 and 20 degrees C), and examined the relationships between these data sets and the colour morph of foster and biological mothers. After controlling for nestling body mass, oxygen consumption at 20 degrees C was greater in foster offspring raised by grey foster mothers. No relationship was found between nestling oxygen consumption and coloration of their biological mother. Therefore, our study indicates that in our experiment offspring raised by grey foster mothers showed not only a lower body mass than offspring raised by red foster mothers, but also consumed more oxygen under warm temperature. This further indicates that rearing conditions in nests of grey mothers were more stressful than in nests of red mothers.
Resumo:
Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/or with initial data that have an intrinsic error smaller than the coarser grain size.
Resumo:
X-ray microtomography has become a new tool in earth sciences to obtain non-destructive 3D-image data from geological objects in which variations in mineralogy, chemical composition and/or porosity create sufficient x-ray density contrasts.We present here first, preliminary results of an application to the external and internal morphology of Permian to Recent Larger Foraminifera. We use a SkyScan-1072 high-resolution desk-top micro-CT system. The system has a conical x-ray source with a spot size of about 5µm that runs at 20-100kV, 0-250µA, resulting in a maximal resolution of 5µm. X-ray transmission images are captured by a scintillator coupled via fibre optics to a 1024x1024 pixel 12-bit CCD. The object is placed between the x-ray source and the scintillator on a stub that rotates 360°around its vertical axis in steps as small as 0.24 degrees. Sample size is limited to 2 cm due to the absorption of geologic material for x-rays. The transmission images are back projected using a Feldkamp algorithm into a vertical stack of up to 1000 1Kx1K images that represent horizontal cuts of the object. This calculation takes 2 to several hours on a Double-Processor 2.4GHz PC. The stack of images (.bmp) can be visualized with any 3D-imaging software, used to produce cuts of Larger Foraminifera. Among other applications, the 3D-imaging software furnished by SkyScan can produce 3D-models by defining a threshold density value to distinguish "solid" from "void. Several models with variable threshold values and colors can be imbricated, rotated and cut together. The best results were obtained with microfossils devoid of chamber-filling cements (Permian, Eocene, Recent). However, even slight differences in cement mineralogy/composition can result in surprisingly good x-ray density contrasts.X-ray microtomography may develop into a powerful tool for larger microfossils with a complex internal structure, because it is non-destructive, requires no preparation of the specimens, and produces a true 3D-image data set. We will use these data sets in the future to produce cuts in any direction to compare them with arbitrary cuts of complex microfossils in thin sections. Many groups of benthic and planktonic foraminifera may become more easily determinable in thin section by this way.
Resumo:
Profiling miRNA levels in cells with miRNA microarrays is becoming a widely used technique. Although normalization methods for mRNA gene expression arrays are well established, miRNA array normalization has so far not been investigated in detail. In this study we investigate the impact of normalization on data generated with the Agilent miRNA array platform. We have developed a method to select nonchanging miRNAs (invariants) and use them to compute linear regression normalization coefficients or variance stabilizing normalization (VSN) parameters. We compared the invariants normalization to normalization by scaling, quantile, and VSN with default parameters as well as to no normalization using samples with strong differential expression of miRNAs (heart-brain comparison) and samples where only a few miRNAs are affected (by p53 overexpression in squamous carcinoma cells versus control). All normalization methods performed better than no normalization. Normalization procedures based on the set of invariants and quantile were the most robust over all experimental conditions tested. Our method of invariant selection and normalization is not limited to Agilent miRNA arrays and can be applied to other data sets including those from one color miRNA microarray platforms, focused gene expression arrays, and gene expression analysis using quantitative PCR.