956 resultados para Data Sets
Resumo:
1. Statistical modelling is often used to relate sparse biological survey data to remotely derived environmental predictors, thereby providing a basis for predictively mapping biodiversity across an entire region of interest. The most popular strategy for such modelling has been to model distributions of individual species one at a time. Spatial modelling of biodiversity at the community level may, however, confer significant benefits for applications involving very large numbers of species, particularly if many of these species are recorded infrequently. 2. Community-level modelling combines data from multiple species and produces information on spatial pattern in the distribution of biodiversity at a collective community level instead of, or in addition to, the level of individual species. Spatial outputs from community-level modelling include predictive mapping of community types (groups of locations with similar species composition), species groups (groups of species with similar distributions), axes or gradients of compositional variation, levels of compositional dissimilarity between pairs of locations, and various macro-ecological properties (e.g. species richness). 3. Three broad modelling strategies can be used to generate these outputs: (i) 'assemble first, predict later', in which biological survey data are first classified, ordinated or aggregated to produce community-level entities or attributes that are then modelled in relation to environmental predictors; (ii) 'predict first, assemble later', in which individual species are modelled one at a time as a function of environmental variables, to produce a stack of species distribution maps that is then subjected to classification, ordination or aggregation; and (iii) 'assemble and predict together', in which all species are modelled simultaneously, within a single integrated modelling process. These strategies each have particular strengths and weaknesses, depending on the intended purpose of modelling and the type, quality and quantity of data involved. 4. Synthesis and applications. The potential benefits of modelling large multispecies data sets using community-level, as opposed to species-level, approaches include faster processing, increased power to detect shared patterns of environmental response across rarely recorded species, and enhanced capacity to synthesize complex data into a form more readily interpretable by scientists and decision-makers. Community-level modelling therefore deserves to be considered more often, and more widely, as a potential alternative or supplement to modelling individual species.
Resumo:
In this paper we propose a novel empirical extension of the standard market microstructure order flow model. The main idea is that heterogeneity of beliefs in the foreign exchange market can cause model instability and such instability has not been fully accounted for in the existing empirical literature. We investigate this issue using two di¤erent data sets and focusing on out- of-sample forecasts. Forecasting power is measured using standard statistical tests and, additionally, using an alternative approach based on measuring the economic value of forecasts after building a portfolio of assets. We nd there is a substantial economic value on conditioning on the proposed models.
Resumo:
Aquesta memoria resumeix el treball de final de carrera d’Enginyeria Superior d’Informàtica. Explicarà les principals raons que han motivat el projecte així com exemples que il·lustren l’aplicació resultant. En aquest cas el software intentarà resoldre la actual necessitat que hi ha de tenir dades de Ground Truth per als algoritmes de segmentació de text per imatges de color complexes. Tots els procesos seran explicats en els diferents capítols partint de la definició del problema, la planificació, els requeriments i el disseny fins a completar la il·lustració dels resultats del programa i les dades de Ground Truth resultants.
Resumo:
When dealing with sustainability we are concerned with the biophysical as well as the monetary aspects of economic and ecological interactions. This multidimensional approach requires that special attention is given to dimensional issues in relation to curve fitting practice in economics. Unfortunately, many empirical and theoretical studies in economics, as well as in ecological economics, apply dimensional numbers in exponential or logarithmic functions. We show that it is an analytical error to put a dimensional unit x into exponential functions ( a x ) and logarithmic functions ( x a log ). Secondly, we investigate the conditions of data sets under which a particular logarithmic specification is superior to the usual regression specification. This analysis shows that logarithmic specification superiority in terms of least square norm is heavily dependent on the available data set. The last section deals with economists’ “curve fitting fetishism”. We propose that a distinction be made between curve fitting over past observations and the development of a theoretical or empirical law capable of maintaining its fitting power for any future observations. Finally we conclude this paper with several epistemological issues in relation to dimensions and curve fitting practice in economics
Resumo:
In this paper we describe an open learning object repository on Statistics based on DSpace which contains true learning objects, that is, exercises, equations, data sets, etc. This repository is part of a large project intended to promote the use of learning object repositories as part of the learning process in virtual learning environments. This involves the creation of a new user interface that provides users with additional services such as resource rating, commenting and so. Both aspects make traditional metadata schemes such as Dublin Core to be inadequate, as there are resources with no title or author, for instance, as those fields are not used by learners to browse and search for learning resources in the repository. Therefore, exporting OAI-PMH compliant records using OAI-DC is not possible, thus limiting the visibility of the learning objects in the repository outside the institution. We propose an architecture based on ontologies and the use of extended metadata records for both storing and refactoring such descriptions.
Resumo:
We evaluated a new pulse oximeter designed to monitor beat-to-beat arterial oxygen saturation (SaO2) and compared the monitored SaO2 with arterial samples measured by co-oximetry. In 40 critically ill children (112 data sets) with a mean age of 3.9 years (range 1 day to 19 years), SaO2 ranged from 57% to 100%, and PaO2 from 27 to 128 mm Hg, heart rates from 85 to 210 beats per minute, hematocrit from 20% to 67%, and fetal hemoglobin levels from 1.3% to 60%; peripheral temperatures varied between 26.5 degrees and 36.5 degrees C. Linear correlation analysis revealed a good agreement between simultaneous pulse oximeter values and both directly measured SaO2 (r = 0.95) and that calculated from measured arterial PaO2 (r = 0.95). The device detected several otherwise unrecognized drops in SaO2 but failed to function in four patients with poor peripheral perfusion secondary to low cardiac output. Simultaneous measurements with a tcPO2 electrode showed a similarly good correlation with PaO22 (r = 0.91), but the differences between the two measurements were much wider (mean 7.1 +/- 10.3 mm Hg, range -14 to +49 mm Hg) than the differences between pulse oximeter SaO2 and measured SaO2 (1.5% +/- 3.5%, range -7.5% to -9%) and were not predictable. We conclude that pulse oximetry is a reliable and accurate noninvasive device for measuring saturation, which because of its rapid response time may be an important advance in monitoring changes in oxygenation and guiding oxygen therapy.
Resumo:
The availability of rich firm-level data sets has recently led researchers to uncover new evidence on the effects of trade liberalization. First, trade openness forces the least productive firms to exit the market. Secondly, it induces surviving firms to increase their innovation efforts and thirdly, it increases the degree of product market competition. In this paper we propose a model aimed at providing a coherent interpretation of these findings. We introducing firm heterogeneity into an innovation-driven growth model, where incumbent firms operating in oligopolistic industries perform cost-reducing innovations. In this framework, trade liberalization leads to higher product market competition, lower markups and higher quantity produced. These changes in markups and quantities, in turn, promote innovation and productivity growth through a direct competition effect, based on the increase in the size of the market, and a selection effect, produced by the reallocation of resources towards more productive firms. Calibrated to match US aggregate and firm-level statistics, the model predicts that a 10 percent reduction in variable trade costs reduces markups by 1:15 percent, firm surviving probabilities by 1 percent, and induces an increase in productivity growth of about 13 percent. More than 90 percent of the trade-induced growth increase can be attributed to the selection effect.
Resumo:
Somatic copy number aberrations (CNA) represent a mutation type encountered in the majority of cancer genomes. Here, we present the 2014 edition of arrayMap (http://www.arraymap.org), a publicly accessible collection of pre-processed oncogenomic array data sets and CNA profiles, representing a vast range of human malignancies. Since the initial release, we have enhanced this resource both in content and especially with regard to data mining support. The 2014 release of arrayMap contains more than 64,000 genomic array data sets, representing about 250 tumor diagnoses. Data sets included in arrayMap have been assembled from public repositories as well as additional resources, and integrated by applying custom processing pipelines. Online tools have been upgraded for a more flexible array data visualization, including options for processing user provided, non-public data sets. Data integration has been improved by mapping to multiple editions of the human reference genome, with the majority of the data now being available for the UCSC hg18 as well as GRCh37 versions. The large amount of tumor CNA data in arrayMap can be freely downloaded by users to promote data mining projects, and to explore special events such as chromothripsis-like genome patterns.
Resumo:
A new phylogenetic analysis of the Nyssorhynchus subgenus (Danoff-Burg and Conn, unpub. data) using six data sets {morphological (all life stages); scanning electron micrographs of eggs; nuclear ITS2 sequences; mitochondrial COII, ND2 and ND6 sequences} revealed different topologies when each data set was analyzed separately but no heterogeneity between the data sets using the arn test. Consequently, the most accurate estimate of the phylogeny was obtained when all the data were combined. This new phylogeny supports a monophyletic Nyssorhynchus subgenus but both previously recognized sections in the subgenus (Albimanus and Argyritarsis) were demonstrated to be paraphyletic relative to each other and four of the seven clades included species previously placed in both sections. One of these clades includes both Anopheles darlingi and An. albimanus, suggesting that the ability to vector malaria effectively may have originated once in this subgenus. Both a conserved (315 bp) and a variable (425 bp) region of the mitochondrial COI gene from 15 populations of An. darlingi from Belize, Bolivia, Brazil, French Guiana, Peru and Venezuela were used to examine the evolutionary history of this species and to test several analytical assumptions. Results demonstrated (1) parsimony analysis is equally informative compared to distance analysis using NJ; (2) clades or clusters are more strongly supported when these two regions are combined compared to either region separately; (3) evidence (in the form of remnants of older haplotype lineages) for two colonization events; and (4) significant genetic divergence within the population from Peixoto de Azevedo (State of Mato Grosso, Brazil). The oldest lineage includes populations from Peixoto, Boa Vista (State of Roraima) and Dourado (State of São Paulo).
Resumo:
In several colour polymorphic species, morphs differ in thermoregulation either because dark and pale surfaces absorb solar radiation to a different extent and/or because morphs differ in key metabolic processes. Morph-specific thermoregulation may potentially account for the observation that differently coloured individuals are frequently not randomly distributed among habitats, and differ in many respects, including behaviour, morphology, survival and reproductive success. In a wild population of the colour polymorphic tawny owl Strix aluco, a recent cross-fostering experiment showed that offspring raised and born from red mothers were heavier than those from grey mothers. In the present study, we tested in the same individuals whether these morph-specific offspring growth patterns were associated with a difference in metabolic rate between offspring of red and grey mothers. For this purpose, we measured nestling oxygen consumption under two different temperatures (laboratory measurements: 4 and 20 degrees C), and examined the relationships between these data sets and the colour morph of foster and biological mothers. After controlling for nestling body mass, oxygen consumption at 20 degrees C was greater in foster offspring raised by grey foster mothers. No relationship was found between nestling oxygen consumption and coloration of their biological mother. Therefore, our study indicates that in our experiment offspring raised by grey foster mothers showed not only a lower body mass than offspring raised by red foster mothers, but also consumed more oxygen under warm temperature. This further indicates that rearing conditions in nests of grey mothers were more stressful than in nests of red mothers.
Resumo:
Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/or with initial data that have an intrinsic error smaller than the coarser grain size.
Resumo:
X-ray microtomography has become a new tool in earth sciences to obtain non-destructive 3D-image data from geological objects in which variations in mineralogy, chemical composition and/or porosity create sufficient x-ray density contrasts.We present here first, preliminary results of an application to the external and internal morphology of Permian to Recent Larger Foraminifera. We use a SkyScan-1072 high-resolution desk-top micro-CT system. The system has a conical x-ray source with a spot size of about 5µm that runs at 20-100kV, 0-250µA, resulting in a maximal resolution of 5µm. X-ray transmission images are captured by a scintillator coupled via fibre optics to a 1024x1024 pixel 12-bit CCD. The object is placed between the x-ray source and the scintillator on a stub that rotates 360°around its vertical axis in steps as small as 0.24 degrees. Sample size is limited to 2 cm due to the absorption of geologic material for x-rays. The transmission images are back projected using a Feldkamp algorithm into a vertical stack of up to 1000 1Kx1K images that represent horizontal cuts of the object. This calculation takes 2 to several hours on a Double-Processor 2.4GHz PC. The stack of images (.bmp) can be visualized with any 3D-imaging software, used to produce cuts of Larger Foraminifera. Among other applications, the 3D-imaging software furnished by SkyScan can produce 3D-models by defining a threshold density value to distinguish "solid" from "void. Several models with variable threshold values and colors can be imbricated, rotated and cut together. The best results were obtained with microfossils devoid of chamber-filling cements (Permian, Eocene, Recent). However, even slight differences in cement mineralogy/composition can result in surprisingly good x-ray density contrasts.X-ray microtomography may develop into a powerful tool for larger microfossils with a complex internal structure, because it is non-destructive, requires no preparation of the specimens, and produces a true 3D-image data set. We will use these data sets in the future to produce cuts in any direction to compare them with arbitrary cuts of complex microfossils in thin sections. Many groups of benthic and planktonic foraminifera may become more easily determinable in thin section by this way.
Resumo:
Profiling miRNA levels in cells with miRNA microarrays is becoming a widely used technique. Although normalization methods for mRNA gene expression arrays are well established, miRNA array normalization has so far not been investigated in detail. In this study we investigate the impact of normalization on data generated with the Agilent miRNA array platform. We have developed a method to select nonchanging miRNAs (invariants) and use them to compute linear regression normalization coefficients or variance stabilizing normalization (VSN) parameters. We compared the invariants normalization to normalization by scaling, quantile, and VSN with default parameters as well as to no normalization using samples with strong differential expression of miRNAs (heart-brain comparison) and samples where only a few miRNAs are affected (by p53 overexpression in squamous carcinoma cells versus control). All normalization methods performed better than no normalization. Normalization procedures based on the set of invariants and quantile were the most robust over all experimental conditions tested. Our method of invariant selection and normalization is not limited to Agilent miRNA arrays and can be applied to other data sets including those from one color miRNA microarray platforms, focused gene expression arrays, and gene expression analysis using quantitative PCR.
Resumo:
This report reviews developments in health inequalities over the last 10 years across government - from the publication of the Acheson report on health inequalities in November 1998 to the announcement of the post-2010 strategic review of health inequalities in November 2008. It covers developments across government on the wider social determinants of health, and the role of the NHS. It provides an assessment of developments against the Acheson report, reviews a range of key data sets covering social, economic, health and environmental indicators, and considers lessons learned and challenges for the future.
Resumo:
Synchronization behavior of electroencephalographic (EEG) signals is important for decoding information processing in the human brain. Modern multichannel EEG allows a transition from traditional measurements of synchronization in pairs of EEG signals to whole-brain synchronization maps. The latter can be based on bivariate measures (BM) via averaging over pair-wise values or, alternatively, on multivariate measures (MM), which directly ascribe a single value to the synchronization in a group. In order to compare BM versus MM, we applied nine different estimators to simulated multivariate time series with known parameters and to real EEGs.We found widespread correlations between BM and MM, which were almost frequency-independent for all the measures except coherence. The analysis of the behavior of synchronization measures in simulated settings with variable coupling strength, connection probability, and parameter mismatch showed that some of them, including S-estimator, S-Renyi, omega, and coherence, aremore sensitive to linear interdependences,while others, like mutual information and phase locking value, are more responsive to nonlinear effects. Onemust consider these properties together with the fact thatMM are computationally less expensive and, therefore, more efficient for the large-scale data sets than BM while choosing a synchronization measure for EEG analysis.