946 resultados para Legacy datasets
Resumo:
Background: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/.S3T source code and datasets can also be downloaded from the aforementioned website.
Resumo:
We introduce the Coupled Aerosol and Tracer Transport model to the Brazilian developments on the Regional Atmospheric Modeling System (CATT-BRAMS). CATT-BRAMS is an on-line transport model fully consistent with the simulated atmospheric dynamics. Emission sources from biomass burning and urban-industrial-vehicular activities for trace gases and from biomass burning aerosol particles are obtained from several published datasets and remote sensing information. The tracer and aerosol mass concentration prognostics include the effects of sub-grid scale turbulence in the planetary boundary layer, convective transport by shallow and deep moist convection, wet and dry deposition, and plume rise associated with vegetation fires in addition to the grid scale transport. The radiation parameterization takes into account the interaction between the simulated biomass burning aerosol particles and short and long wave radiation. The atmospheric model BRAMS is based on the Regional Atmospheric Modeling System (RAMS), with several improvements associated with cumulus convection representation, soil moisture initialization and surface scheme tuned for the tropics, among others. In this paper the CATT-BRAMS model is used to simulate carbon monoxide and particulate material (PM(2.5)) surface fluxes and atmospheric transport during the 2002 LBA field campaigns, conducted during the transition from the dry to wet season in the southwest Amazon Basin. Model evaluation is addressed with comparisons between model results and near surface, radiosondes and airborne measurements performed during the field campaign, as well as remote sensing derived products. We show the matching of emissions strengths to observed carbon monoxide in the LBA campaign. A relatively good comparison to the MOPITT data, in spite of the fact that MOPITT a priori assumptions imply several difficulties, is also obtained.
Resumo:
This paper presents a new statistical algorithm to estimate rainfall over the Amazon Basin region using the Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI). The algorithm relies on empirical relationships derived for different raining-type systems between coincident measurements of surface rainfall rate and 85-GHz polarization-corrected brightness temperature as observed by the precipitation radar (PR) and TMI on board the TRMM satellite. The scheme includes rain/no-rain area delineation (screening) and system-type classification routines for rain retrieval. The algorithm is validated against independent measurements of the TRMM-PR and S-band dual-polarization Doppler radar (S-Pol) surface rainfall data for two different periods. Moreover, the performance of this rainfall estimation technique is evaluated against well-known methods, namely, the TRMM-2A12 [ the Goddard profiling algorithm (GPROF)], the Goddard scattering algorithm (GSCAT), and the National Environmental Satellite, Data, and Information Service (NESDIS) algorithms. The proposed algorithm shows a normalized bias of approximately 23% for both PR and S-Pol ground truth datasets and a mean error of 0.244 mm h(-1) ( PR) and -0.157 mm h(-1)(S-Pol). For rain volume estimates using PR as reference, a correlation coefficient of 0.939 and a normalized bias of 0.039 were found. With respect to rainfall distributions and rain area comparisons, the results showed that the formulation proposed is efficient and compatible with the physics and dynamics of the observed systems over the area of interest. The performance of the other algorithms showed that GSCAT presented low normalized bias for rain areas and rain volume [0.346 ( PR) and 0.361 (S-Pol)], and GPROF showed rainfall distribution similar to that of the PR and S-Pol but with a bimodal distribution. Last, the five algorithms were evaluated during the TRMM-Large-Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) 1999 field campaign to verify the precipitation characteristics observed during the easterly and westerly Amazon wind flow regimes. The proposed algorithm presented a cumulative rainfall distribution similar to the observations during the easterly regime, but it underestimated for the westerly period for rainfall rates above 5 mm h(-1). NESDIS(1) overestimated for both wind regimes but presented the best westerly representation. NESDIS(2), GSCAT, and GPROF underestimated in both regimes, but GPROF was closer to the observations during the easterly flow.
Resumo:
In geophysics and seismology, raw data need to be processed to generate useful information that can be turned into knowledge by researchers. The number of sensors that are acquiring raw data is increasing rapidly. Without good data management systems, more time can be spent in querying and preparing datasets for analyses than in acquiring raw data. Also, a lot of good quality data acquired at great effort can be lost forever if they are not correctly stored. Local and international cooperation will probably be reduced, and a lot of data will never become scientific knowledge. For this reason, the Seismological Laboratory of the Institute of Astronomy, Geophysics and Atmospheric Sciences at the University of Sao Paulo (IAG-USP) has concentrated fully on its data management system. This report describes the efforts of the IAG-USP to set up a seismology data management system to facilitate local and international cooperation.
Resumo:
Mutualistic networks are crucial to the maintenance of ecosystem services. Unfortunately, what we know about seed dispersal networks is based only on bird-fruit interactions. Therefore, we aimed at filling part of this gap by investigating bat-fruit networks. It is known from population studies that: (i) some bat species depend more on fruits than others, and (ii) that some specialized frugivorous bats prefer particular plant genera. We tested whether those preferences affected the structure and robustness of the whole network and the functional roles of species. Nine bat-fruit datasets from the literature were analyzed and all networks showed lower complementary specialization (H(2)' = 0.3760.10, mean 6 SD) and similar nestedness (NODF = 0.5660.12) than pollination networks. All networks were modular (M=0.32 +/- 0.07), and had on average four cohesive subgroups (modules) of tightly connected bats and plants. The composition of those modules followed the genus-genus associations observed at population level (Artibeus-Ficus, Carollia-Piper, and Sturnira-Solanum), although a few of those plant genera were dispersed also by other bats. Bat-fruit networks showed high robustness to simulated cumulative removals of both bats (R = 0.55 +/- 0.10) and plants (R = 0.68 +/- 0.09). Primary frugivores interacted with a larger proportion of the plants available and also occupied more central positions; furthermore, their extinction caused larger changes in network structure. We conclude that bat-fruit networks are highly cohesive and robust mutualistic systems, in which redundancy is high within modules, although modules are complementary to each other. Dietary specialization seems to be an important structuring factor that affects the topology, the guild structure and functional roles in bat-fruit networks.
Resumo:
Science education is under revision. Recent changes in society require changes in education to respond to new demands. Scientific literacy can be considered a new goal of science education and the epistemological gap between natural sciences and literacy disciplines must be overcome. The history of science is a possible bridge to link these `two cultures` and to foster an interdisciplinary approach in the classroom. This paper acknowledges Darwin`s legacy and proposes the use of cartoons and narrative expositions to put this interesting chapter of science into its historical context. A five-lesson didactic sequence was developed to tell part of the story of Darwin`s expedition through South America for students from 10 to 12 years of age. Beyond geological and biological perspectives, the inclusion of historical, social and geographical facts demonstrated the beauty and complexity of the findings that Darwin employed to propose the theory of evolution.
Resumo:
In the last decades, the air traffic system has been changing to adapt itself to new social demands, mainly the safe growth of worldwide traffic capacity. Those changes are ruled by the Communication, Navigation, Surveillance/Air Traffic Management (CNS/ATM) paradigm, based on digital communication technologies (mainly satellites) as a way of improving communication, surveillance, navigation and air traffic management services. However, CNS/ATM poses new challenges and needs, mainly related to the safety assessment process. In face of these new challenges, and considering the main characteristics of the CNS/ATM, a methodology is proposed at this work by combining ""absolute"" and ""relative"" safety assessment methods adopted by the International Civil Aviation Organization (ICAO) in ICAO Doc.9689 [14], using Fluid Stochastic Petri Nets (FSPN) as the modeling formalism, and compares the safety metrics estimated from the simulation of both the proposed (in analysis) and the legacy system models. To demonstrate its usefulness, the proposed methodology was applied to the ""Automatic Dependent Surveillance-Broadcasting"" (ADS-B) based air traffic control system. As conclusions, the proposed methodology assured to assess CNS/ATM system safety properties, in which FSPN formalism provides important modeling capabilities, and discrete event simulation allowing the estimation of the desired safety metric. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Among several process variability sources, valve friction and inadequate controller tuning are supposed to be two of the most prevalent. Friction quantification methods can be applied to the development of model-based compensators or to diagnose valves that need repair, whereas accurate process models can be used in controller retuning. This paper extends existing methods that jointly estimate the friction and process parameters, so that a nonlinear structure is adopted to represent the process model. The developed estimation algorithm is tested with three different data sources: a simulated first order plus dead time process, a hybrid setup (composed of a real valve and a simulated pH neutralization process) and from three industrial datasets corresponding to real control loops. The results demonstrate that the friction is accurately quantified, as well as ""good"" process models are estimated in several situations. Furthermore, when a nonlinear process model is considered, the proposed extension presents significant advantages: (i) greater accuracy for friction quantification and (ii) reasonable estimates of the nonlinear steady-state characteristics of the process. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
We describe in this paper a new genus and species of cricetid rodent from the Atlantic Forest of Brazil, one of the most endangered eco-regions of the world. The new form displays some but not all synapomorphies of the tribe Oryzomyini, but a suite of unique characteristics is also observed. This new forest rat possesses anatomical characteristics of arboreal taxa, such as very developed plantar pads, but was collected almost exclusively in pitfall traps. Phylogenetic analyses of morphological (integument, soft tissue, cranial, and dental characters) and molecular [nuclear - Interphotoreceptor retinoid binding protein (Irbp) - and mitochondrial - cytochrome b - genes] datasets using maximum likelihood and cladistic parsimony approaches corroborate the inclusion of the new taxon within oryzomyines. The analyses also place the new form as sister species to Eremoryzomys polius, an Andean rat endemic to the Maranon valley. This biogeographical pattern is unusual amongst small terrestrial vertebrates, as a review of the literature points to few other similar examples of Andean-Atlantic Forest pairings, in hylid frogs, Pionus parrots, and other sigmodontine rodents. (C) 2011 The Linnean Society of London, Zoological Journal of the Linnean Society, 2011, 161, 357-390. doi:10.1111/j.1096-3642.2010.00643.x
Resumo:
The DSSAT/CANEGRO model was parameterized and its predictions evaluated using data from five sugarcane (Sacchetrum spp.) experiments conducted in southern Brazil. The data used are from two of the most important Brazilian cultivars. Some parameters whose values were either directly measured or considered to be well known were not adjusted. Ten of the 20 parameters were optimized using a Generalized Likelihood Uncertainty Estimation (GLUE) algorithm using the leave-one-out cross-validation technique. Model predictions were evaluated using measured data of leaf area index (LA!), stalk and aerial dry mass, sucrose content, and soil water content, using bias, root mean squared error (RMSE), modeling efficiency (Eff), correlation coefficient, and agreement index. The Decision Support System for Agrotechnology Transfer (DSSAT)/CANEGRO model simulated the sugarcane crop in southern Brazil well, using the parameterization reported here. The soil water content predictions were better for rainfed (mean RMSE = 0.122mm) than for irrigated treatment (mean RMSE = 0.214mm). Predictions were best for aerial dry mass (Eff = 0.850), followed by stalk dry mass (Eff = 0.765) and then sucrose mass (Eff = 0.170). Number of green leaves showed the worst fit (Eff = -2.300). The cross-validation technique permits using multiple datasets that would have limited use if used independently because of the heterogeneity of measures and measurement strategies.
Resumo:
The use of remote sensing is necessary for monitoring forest carbon stocks at large scales. Optical remote sensing, although not the most suitable technique for the direct estimation of stand biomass, offers the advantage of providing large temporal and spatial datasets. In particular, information on canopy structure is encompassed in stand reflectance time series. This study focused on the example of Eucalyptus forest plantations, which have recently attracted much attention as a result of their high expansion rate in many tropical countries. Stand scale time-series of Normalized Difference Vegetation Index (NDVI) were obtained from MODIS satellite data after a procedure involving un-mixing and interpolation, on about 15,000 ha of plantations in southern Brazil. The comparison of the planting date of the current rotation (and therefore the age of the stands) estimated from these time series with real values provided by the company showed that the root mean square error was 35.5 days. Age alone explained more than 82% of stand wood volume variability and 87% of stand dominant height variability. Age variables were combined with other variables derived from the NDVI time series and simple bioclimatic data by means of linear (Stepwise) or nonlinear (Random Forest) regressions. The nonlinear regressions gave r-square values of 0.90 for volume and 0.92 for dominant height, and an accuracy of about 25 m(3)/ha for volume (15% of the volume average value) and about 1.6 m for dominant height (8% of the height average value). The improvement including NDVI and bioclimatic data comes from the fact that the cumulative NDVI since planting date integrates the interannual variability of leaf area index (LAI), light interception by the foliage and growth due for example to variations of seasonal water stress. The accuracy of biomass and height predictions was strongly improved by using the NDVI integrated over the two first years after planting, which are critical for stand establishment. These results open perspectives for cost-effective monitoring of biomass at large scales in intensively-managed plantation forests. (C) 2011 Elsevier Inc. All rights reserved.
Resumo:
Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We recently evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach in delineating breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.
Resumo:
This document records the process of migrating eprints.org data to a Fez repository. Fez is a Web-based digital repository and workflow management system based on Fedora (http://www.fedora.info/). At the time of migration, the University of Queensland Library was using EPrints 2.2.1 [pepper] for its ePrintsUQ repository. Once we began to develop Fez, we did not upgrade to later versions of eprints.org software since we knew we would be migrating data from ePrintsUQ to the Fez-based UQ eSpace. Since this document records our experiences of migration from an earlier version of eprints.org, anyone seeking to migrate eprints.org data into a Fez repository might encounter some small differences. Moving UQ publication data from an eprints.org repository into a Fez repository (hereafter called UQ eSpace (http://espace.uq.edu.au/) was part of a plan to integrate metadata (and, in some cases, full texts) about all UQ research outputs, including theses, images, multimedia and datasets, in a single repository. This tied in with the plan to identify and capture the research output of a single institution, the main task of the eScholarshipUQ testbed for the Australian Partnership for Sustainable Repositories project (http://www.apsr.edu.au/). The migration could not occur at UQ until the functionality in Fez was at least equal to that of the existing ePrintsUQ repository. Accordingly, as Fez development occurred throughout 2006, a list of eprints.org functionality not currently supported in Fez was created so that programming of such development could be planned for and implemented.
Resumo:
Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We previously evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach to delineate breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.
Resumo:
One of the challenges in scientific visualization is to generate software libraries suitable for the large-scale data emerging from tera-scale simulations and instruments. We describe the efforts currently under way at SDSC and NPACI to address these challenges. The scope of the SDSC project spans data handling, graphics, visualization, and scientific application domains. Components of the research focus on the following areas: intelligent data storage, layout and handling, using an associated “Floor-Plan” (meta data); performance optimization on parallel architectures; extension of SDSC’s scalable, parallel, direct volume renderer to allow perspective viewing; and interactive rendering of fractional images (“imagelets”), which facilitates the examination of large datasets. These concepts are coordinated within a data-visualization pipeline, which operates on component data blocks sized to fit within the available computing resources. A key feature of the scheme is that the meta data, which tag the data blocks, can be propagated and applied consistently. This is possible at the disk level, in distributing the computations across parallel processors; in “imagelet” composition; and in feature tagging. The work reflects the emerging challenges and opportunities presented by the ongoing progress in high-performance computing (HPC) and the deployment of the data, computational, and visualization Grids.