952 resultados para Data sets storage
Resumo:
The availability of rich firm-level data sets has recently led researchers to uncover new evidence on the effects of trade liberalization. First, trade openness forces the least productive firms to exit the market. Secondly, it induces surviving firms to increase their innovation efforts and thirdly, it increases the degree of product market competition. In this paper we propose a model aimed at providing a coherent interpretation of these findings. We introducing firm heterogeneity into an innovation-driven growth model, where incumbent firms operating in oligopolistic industries perform cost-reducing innovations. In this framework, trade liberalization leads to higher product market competition, lower markups and higher quantity produced. These changes in markups and quantities, in turn, promote innovation and productivity growth through a direct competition effect, based on the increase in the size of the market, and a selection effect, produced by the reallocation of resources towards more productive firms. Calibrated to match US aggregate and firm-level statistics, the model predicts that a 10 percent reduction in variable trade costs reduces markups by 1:15 percent, firm surviving probabilities by 1 percent, and induces an increase in productivity growth of about 13 percent. More than 90 percent of the trade-induced growth increase can be attributed to the selection effect.
Resumo:
Somatic copy number aberrations (CNA) represent a mutation type encountered in the majority of cancer genomes. Here, we present the 2014 edition of arrayMap (http://www.arraymap.org), a publicly accessible collection of pre-processed oncogenomic array data sets and CNA profiles, representing a vast range of human malignancies. Since the initial release, we have enhanced this resource both in content and especially with regard to data mining support. The 2014 release of arrayMap contains more than 64,000 genomic array data sets, representing about 250 tumor diagnoses. Data sets included in arrayMap have been assembled from public repositories as well as additional resources, and integrated by applying custom processing pipelines. Online tools have been upgraded for a more flexible array data visualization, including options for processing user provided, non-public data sets. Data integration has been improved by mapping to multiple editions of the human reference genome, with the majority of the data now being available for the UCSC hg18 as well as GRCh37 versions. The large amount of tumor CNA data in arrayMap can be freely downloaded by users to promote data mining projects, and to explore special events such as chromothripsis-like genome patterns.
Resumo:
A new phylogenetic analysis of the Nyssorhynchus subgenus (Danoff-Burg and Conn, unpub. data) using six data sets {morphological (all life stages); scanning electron micrographs of eggs; nuclear ITS2 sequences; mitochondrial COII, ND2 and ND6 sequences} revealed different topologies when each data set was analyzed separately but no heterogeneity between the data sets using the arn test. Consequently, the most accurate estimate of the phylogeny was obtained when all the data were combined. This new phylogeny supports a monophyletic Nyssorhynchus subgenus but both previously recognized sections in the subgenus (Albimanus and Argyritarsis) were demonstrated to be paraphyletic relative to each other and four of the seven clades included species previously placed in both sections. One of these clades includes both Anopheles darlingi and An. albimanus, suggesting that the ability to vector malaria effectively may have originated once in this subgenus. Both a conserved (315 bp) and a variable (425 bp) region of the mitochondrial COI gene from 15 populations of An. darlingi from Belize, Bolivia, Brazil, French Guiana, Peru and Venezuela were used to examine the evolutionary history of this species and to test several analytical assumptions. Results demonstrated (1) parsimony analysis is equally informative compared to distance analysis using NJ; (2) clades or clusters are more strongly supported when these two regions are combined compared to either region separately; (3) evidence (in the form of remnants of older haplotype lineages) for two colonization events; and (4) significant genetic divergence within the population from Peixoto de Azevedo (State of Mato Grosso, Brazil). The oldest lineage includes populations from Peixoto, Boa Vista (State of Roraima) and Dourado (State of São Paulo).
Resumo:
In several colour polymorphic species, morphs differ in thermoregulation either because dark and pale surfaces absorb solar radiation to a different extent and/or because morphs differ in key metabolic processes. Morph-specific thermoregulation may potentially account for the observation that differently coloured individuals are frequently not randomly distributed among habitats, and differ in many respects, including behaviour, morphology, survival and reproductive success. In a wild population of the colour polymorphic tawny owl Strix aluco, a recent cross-fostering experiment showed that offspring raised and born from red mothers were heavier than those from grey mothers. In the present study, we tested in the same individuals whether these morph-specific offspring growth patterns were associated with a difference in metabolic rate between offspring of red and grey mothers. For this purpose, we measured nestling oxygen consumption under two different temperatures (laboratory measurements: 4 and 20 degrees C), and examined the relationships between these data sets and the colour morph of foster and biological mothers. After controlling for nestling body mass, oxygen consumption at 20 degrees C was greater in foster offspring raised by grey foster mothers. No relationship was found between nestling oxygen consumption and coloration of their biological mother. Therefore, our study indicates that in our experiment offspring raised by grey foster mothers showed not only a lower body mass than offspring raised by red foster mothers, but also consumed more oxygen under warm temperature. This further indicates that rearing conditions in nests of grey mothers were more stressful than in nests of red mothers.
Resumo:
Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/or with initial data that have an intrinsic error smaller than the coarser grain size.
Resumo:
X-ray microtomography has become a new tool in earth sciences to obtain non-destructive 3D-image data from geological objects in which variations in mineralogy, chemical composition and/or porosity create sufficient x-ray density contrasts.We present here first, preliminary results of an application to the external and internal morphology of Permian to Recent Larger Foraminifera. We use a SkyScan-1072 high-resolution desk-top micro-CT system. The system has a conical x-ray source with a spot size of about 5µm that runs at 20-100kV, 0-250µA, resulting in a maximal resolution of 5µm. X-ray transmission images are captured by a scintillator coupled via fibre optics to a 1024x1024 pixel 12-bit CCD. The object is placed between the x-ray source and the scintillator on a stub that rotates 360°around its vertical axis in steps as small as 0.24 degrees. Sample size is limited to 2 cm due to the absorption of geologic material for x-rays. The transmission images are back projected using a Feldkamp algorithm into a vertical stack of up to 1000 1Kx1K images that represent horizontal cuts of the object. This calculation takes 2 to several hours on a Double-Processor 2.4GHz PC. The stack of images (.bmp) can be visualized with any 3D-imaging software, used to produce cuts of Larger Foraminifera. Among other applications, the 3D-imaging software furnished by SkyScan can produce 3D-models by defining a threshold density value to distinguish "solid" from "void. Several models with variable threshold values and colors can be imbricated, rotated and cut together. The best results were obtained with microfossils devoid of chamber-filling cements (Permian, Eocene, Recent). However, even slight differences in cement mineralogy/composition can result in surprisingly good x-ray density contrasts.X-ray microtomography may develop into a powerful tool for larger microfossils with a complex internal structure, because it is non-destructive, requires no preparation of the specimens, and produces a true 3D-image data set. We will use these data sets in the future to produce cuts in any direction to compare them with arbitrary cuts of complex microfossils in thin sections. Many groups of benthic and planktonic foraminifera may become more easily determinable in thin section by this way.
Resumo:
Profiling miRNA levels in cells with miRNA microarrays is becoming a widely used technique. Although normalization methods for mRNA gene expression arrays are well established, miRNA array normalization has so far not been investigated in detail. In this study we investigate the impact of normalization on data generated with the Agilent miRNA array platform. We have developed a method to select nonchanging miRNAs (invariants) and use them to compute linear regression normalization coefficients or variance stabilizing normalization (VSN) parameters. We compared the invariants normalization to normalization by scaling, quantile, and VSN with default parameters as well as to no normalization using samples with strong differential expression of miRNAs (heart-brain comparison) and samples where only a few miRNAs are affected (by p53 overexpression in squamous carcinoma cells versus control). All normalization methods performed better than no normalization. Normalization procedures based on the set of invariants and quantile were the most robust over all experimental conditions tested. Our method of invariant selection and normalization is not limited to Agilent miRNA arrays and can be applied to other data sets including those from one color miRNA microarray platforms, focused gene expression arrays, and gene expression analysis using quantitative PCR.
Resumo:
This report reviews developments in health inequalities over the last 10 years across government - from the publication of the Acheson report on health inequalities in November 1998 to the announcement of the post-2010 strategic review of health inequalities in November 2008. It covers developments across government on the wider social determinants of health, and the role of the NHS. It provides an assessment of developments against the Acheson report, reviews a range of key data sets covering social, economic, health and environmental indicators, and considers lessons learned and challenges for the future.
Resumo:
Synchronization behavior of electroencephalographic (EEG) signals is important for decoding information processing in the human brain. Modern multichannel EEG allows a transition from traditional measurements of synchronization in pairs of EEG signals to whole-brain synchronization maps. The latter can be based on bivariate measures (BM) via averaging over pair-wise values or, alternatively, on multivariate measures (MM), which directly ascribe a single value to the synchronization in a group. In order to compare BM versus MM, we applied nine different estimators to simulated multivariate time series with known parameters and to real EEGs.We found widespread correlations between BM and MM, which were almost frequency-independent for all the measures except coherence. The analysis of the behavior of synchronization measures in simulated settings with variable coupling strength, connection probability, and parameter mismatch showed that some of them, including S-estimator, S-Renyi, omega, and coherence, aremore sensitive to linear interdependences,while others, like mutual information and phase locking value, are more responsive to nonlinear effects. Onemust consider these properties together with the fact thatMM are computationally less expensive and, therefore, more efficient for the large-scale data sets than BM while choosing a synchronization measure for EEG analysis.
Resumo:
Adolescent health surveys, like those for other segments of the population, tend to remain in the hands of researchers, where they can have no real impact on the way critical health issues are dealt with by policy makers or other professionals directly connected to young people in their everyday work. This paper reviews important issues concerning the dissemination of survey results among professionals from various fields. The content, length and wording of the messages should be tailored to the audience one wants to reach as well as the type of channels used for their diffusion. Survey data sets can be used to select priorities for interventions: ad hoc presentations, attractive summaries and brochures, or even films expressing young peoples' opinions have been used by European public health professionals to make data sets usable in various local, regional and national contexts. CONCLUSION: The impact of these diffusion strategies is, however, difficult to assess and needs to be refined. The adequate delivery of survey findings as well as advocacy and lobbying activities require specific skills which can be endorsed by specialized professionals. Ultimately, it is the researchers' responsibility to ensure that such tasks are effectively performed.
Resumo:
Aim, Location Although the alpine mouse Apodemus alpicola has been given species status since 1989, no distribution map has ever been constructed for this endemic alpine rodent in Switzerland. Based on redetermined museum material and using the Ecological-Niche Factor Analysis (ENFA), habitat-suitability maps were computed for A. alpicola, and also for the co-occurring A. flavicollis and A. sylvaticus. Methods In the particular case of habitat suitability models, classical approaches (GLMs, GAMs, discriminant analysis, etc.) generally require presence and absence data. The presence records provided by museums can clearly give useful information about species distribution and ecology and have already been used for knowledge-based mapping. In this paper, we apply the ENFA which requires only presence data, to build a habitat-suitability map of three species of Apodemus on the basis of museum skull collections. Results Interspecific niche comparisons showed that A. alpicola is very specialized concerning habitat selection, meaning that its habitat differs unequivocally from the average conditions in Switzerland, while both A. flavicollis and A. sylvaticus could be considered as 'generalists' in the study area. Main conclusions Although an adequate sampling design is the best way to collect ecological data for predictive modelling, this is a time and money consuming process and there are cases where time is simply not available, as for instance with endangered species conservation. On the other hand, museums, herbariums and other similar institutions are treasuring huge presence data sets. By applying the ENFA to such data it is possible to rapidly construct a habitat suitability model. The ENFA method not only provides two key measurements regarding the niche of a species (i.e. marginality and specialization), but also has ecological meaning, and allows the scientist to compare directly the niches of different species.
Resumo:
The biplot has proved to be a powerful descriptive and analytical tool in many areasof applications of statistics. For compositional data the necessary theoreticaladaptation has been provided, with illustrative applications, by Aitchison (1990) andAitchison and Greenacre (2002). These papers were restricted to the interpretation ofsimple compositional data sets. In many situations the problem has to be described insome form of conditional modelling. For example, in a clinical trial where interest isin how patients’ steroid metabolite compositions may change as a result of differenttreatment regimes, interest is in relating the compositions after treatment to thecompositions before treatment and the nature of the treatments applied. To study thisthrough a biplot technique requires the development of some form of conditionalcompositional biplot. This is the purpose of this paper. We choose as a motivatingapplication an analysis of the 1992 US President ial Election, where interest may be inhow the three-part composition, the percentage division among the three candidates -Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethniccomposition and on the urban-rural composition of the state. The methodology ofconditional compositional biplots is first developed and a detailed interpretation of the1992 US Presidential Election provided. We use a second application involving theconditional variability of tektite mineral compositions with respect to major oxidecompositions to demonstrate some hazards of simplistic interpretation of biplots.Finally we conjecture on further possible applications of conditional compositionalbiplots
Resumo:
PURPOSE: There is growing evidence that interaction between stromal and tumor cells is pivotal in breast cancer progression and response to therapy. Based on earlier research suggesting that during breast cancer progression, striking changes occur in CD10(+) stromal cells, we aimed to better characterize this cell population and its clinical relevance. EXPERIMENTAL DESIGN: We developed a CD10(+) stroma gene expression signature (using HG U133 Plus 2.0) on the basis of the comparison of CD10 cells isolated from tumoral (n = 28) and normal (n = 3) breast tissue. We further characterized the CD10(+) cells by coculture experiments of representative breast cancer cell lines with the different CD10(+) stromal cell types (fibroblasts, myoepithelial, and mesenchymal stem cells). We then evaluated its clinical relevance in terms of in situ to invasive progression, invasive breast cancer prognosis, and prediction of efficacy of chemotherapy using publicly available data sets. RESULTS: This 12-gene CD10(+) stroma signature includes, among others, genes involved in matrix remodeling (MMP11, MMP13, and COL10A1) and genes related to osteoblast differentiation (periostin). The coculture experiments showed that all 3 CD10(+) cell types contribute to the CD10(+) stroma signature, although mesenchymal stem cells have the highest CD10(+) stroma signature score. Of interest, this signature showed an important role in differentiating in situ from invasive breast cancer, in prognosis of the HER2(+) subpopulation of breast cancer only, and potentially in nonresponse to chemotherapy for those patients. CONCLUSIONS: Our results highlight the importance of CD10(+) cells in breast cancer prognosis and efficacy of chemotherapy, particularly within the HER2(+) breast cancer disease.
Resumo:
We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos
Resumo:
The aim of this study was to determine the impact of the learning curve on the diagnostic performances of CT colonography. Two blinded teams, each having a radiologist and gastroenterologist, prospectively examined 50 patients using helical CT scan followed by colonoscopy. Intermediate data evaluation was performed after 24 data sets (group 1) and compared with data from 26 subsequent patients (group 2). Parameters evaluated included sensitivity, specificity, false-positive and false-negative findings, time of data acquisition and interpretation. Using colonoscopy as the gold standard, sensitivity for CT colonography was for lesions >5 mm 63% for both teams for group 1 patients; for group 2 patients sensitivity was 45% for team 1 and 64% for team 2. Specificity per patients was for patient group 1 42% for team 1 and 58% for team 2; for patient group 2 it was 79% for both teams ( p=0.04 for team 1; p=0.2 for team 2). Comparing group 1 with group 2, the number of false-positive findings decreased significantly ( p=0.02). Furthermore, the mean time of data evaluation decreased from 45 to 17 min ( p=0.002) and the mean time of data acquisition from 19 to 17 min. With increasing experience, specificity and the time required for data interpretation improved and false positives decreased. There was no significant change of sensitivity, false-negative findings and time of data acquisition. A minimum experience of the readers is required for data interpretation of CT colonography.