934 resultados para Data anonymization and sanitization
Resumo:
Phylogenetic analyses of chloroplast DNA sequences, morphology, and combined data have provided consistent support for many of the major branches within the angiosperm, clade Dipsacales. Here we use sequences from three mitochondrial loci to test the existing broad scale phylogeny and in an attempt to resolve several relationships that have remained uncertain. Parsimony, maximum likelihood, and Bayesian analyses of a combined mitochondrial data set recover trees broadly consistent with previous studies, although resolution and support are lower than in the largest chloroplast analyses. Combining chloroplast and mitochondrial data results in a generally well-resolved and very strongly supported topology but the previously recognized problem areas remain. To investigate why these relationships have been difficult to resolve we conducted a series of experiments using different data partitions and heterogeneous substitution models. Usually more complex modeling schemes are favored regardless of the partitions recognized but model choice had little effect on topology or support values. In contrast there are consistent but weakly supported differences in the topologies recovered from coding and non-coding matrices. These conflicts directly correspond to relationships that were poorly resolved in analyses of the full combined chloroplast-mitochondrial data set. We suggest incongruent signal has contributed to our inability to confidently resolve these problem areas. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
The aim of this study is to evaluate the variation of solar radiation data between different data sources that will be free and available at the Solar Energy Research Center (SERC). The comparison between data sources will be carried out for two locations: Stockholm, Sweden and Athens, Greece. For the desired locations, data is gathered for different tilt angles: 0°, 30°, 45°, 60° facing south. The full dataset is available in two excel files: “Stockholm annual irradiation” and “Athens annual irradiation”. The World Radiation Data Center (WRDC) is defined as a reference for the comparison with other dtaasets, because it has the highest time span recorded for Stockholm (1964–2010) and Athens (1964–1986), in form of average monthly irradiation, expressed in kWh/m2. The indicator defined for the data comparison is the estimated standard deviation. The mean biased error (MBE) and the root mean square error (RMSE) were also used as statistical indicators for the horizontal solar irradiation data. The variation in solar irradiation data is categorized in two categories: natural or inter-annual variability, due to different data sources and lastly due to different calculation models. The inter-annual variation for Stockholm is 140.4kWh/m2 or 14.4% and 124.3kWh/m2 or 8.0% for Athens. The estimated deviation for horizontal solar irradiation is 3.7% for Stockholm and 4.4% Athens. This estimated deviation is respectively equal to 4.5% and 3.6% for Stockholm and Athens at 30° tilt, 5.2% and 4.5% at 45° tilt, 5.9% and 7.0% at 60°. NASA’s SSE, SAM and RETScreen (respectively Satel-light) exhibited the highest deviation from WRDC’s data for Stockholm (respectively Athens). The essential source for variation is notably the difference in horizontal solar irradiation. The variation increases by 1-2% per degree of tilt, using different calculation models, as used in PVSYST and Meteonorm. The location and altitude of the data source did not directly influence the variation with the WRDC data. Further examination is suggested in order to improve the methodology of selecting the location; Examining the functional dependence of ground reflected radiation with ambient temperature; variation of ambient temperature and its impact on different solar energy systems; Im pact of variation in solar irradiation and ambient temperature on system output.
Resumo:
HydroShare is an online, collaborative system being developed for open sharing of hydrologic data and models. The goal of HydroShare is to enable scientists to easily discover and access hydrologic data and models, retrieve them to their desktop or perform analyses in a distributed computing environment that may include grid, cloud or high performance computing model instances as necessary. Scientists may also publish outcomes (data, results or models) into HydroShare, using the system as a collaboration platform for sharing data, models and analyses. HydroShare is expanding the data sharing capability of the CUAHSI Hydrologic Information System by broadening the classes of data accommodated, creating new capability to share models and model components, and taking advantage of emerging social media functionality to enhance information about and collaboration around hydrologic data and models. One of the fundamental concepts in HydroShare is that of a Resource. All content is represented using a Resource Data Model that separates system and science metadata and has elements common to all resources as well as elements specific to the types of resources HydroShare will support. These will include different data types used in the hydrology community and models and workflows that require metadata on execution functionality. The HydroShare web interface and social media functions are being developed using the Drupal content management system. A geospatial visualization and analysis component enables searching, visualizing, and analyzing geographic datasets. The integrated Rule-Oriented Data System (iRODS) is being used to manage federated data content and perform rule-based background actions on data and model resources, including parsing to generate metadata catalog information and the execution of models and workflows. This presentation will introduce the HydroShare functionality developed to date, describe key elements of the Resource Data Model and outline the roadmap for future development.
Resumo:
The Short-term Water Information and Forecasting Tools (SWIFT) is a suite of tools for flood and short-term streamflow forecasting, consisting of a collection of hydrologic model components and utilities. Catchments are modeled using conceptual subareas and a node-link structure for channel routing. The tools comprise modules for calibration, model state updating, output error correction, ensemble runs and data assimilation. Given the combinatorial nature of the modelling experiments and the sub-daily time steps typically used for simulations, the volume of model configurations and time series data is substantial and its management is not trivial. SWIFT is currently used mostly for research purposes but has also been used operationally, with intersecting but significantly different requirements. Early versions of SWIFT used mostly ad-hoc text files handled via Fortran code, with limited use of netCDF for time series data. The configuration and data handling modules have since been redesigned. The model configuration now follows a design where the data model is decoupled from the on-disk persistence mechanism. For research purposes the preferred on-disk format is JSON, to leverage numerous software libraries in a variety of languages, while retaining the legacy option of custom tab-separated text formats when it is a preferred access arrangement for the researcher. By decoupling data model and data persistence, it is much easier to interchangeably use for instance relational databases to provide stricter provenance and audit trail capabilities in an operational flood forecasting context. For the time series data, given the volume and required throughput, text based formats are usually inadequate. A schema derived from CF conventions has been designed to efficiently handle time series for SWIFT.
Resumo:
P>In livestock genetic resource conservation, decision making about conservation priorities is based on the simultaneous analysis of several different criteria that may contribute to long-term sustainable breeding conditions, such as genetic and demographic characteristics, environmental conditions, and role of the breed in the local or regional economy. Here we address methods to integrate different data sets and highlight problems related to interdisciplinary comparisons. Data integration is based on the use of geographic coordinates and Geographic Information Systems (GIS). In addition to technical problems related to projection systems, GIS have to face the challenging issue of the non homogeneous scale of their data sets. We give examples of the successful use of GIS for data integration and examine the risk of obtaining biased results when integrating datasets that have been captured at different scales.
Resumo:
Ties among event times are often recorded in survival studies. For example, in a two week laboratory study where event times are measured in days, ties are very likely to occur. The proportional hazards model might be used in this setting using an approximated partial likelihood function. This approximation works well when the number of ties is small. on the other hand, discrete regression models are suggested when the data are heavily tied. However, in many situations it is not clear which approach should be used in practice. In this work, empirical guidelines based on Monte Carlo simulations are provided. These recommendations are based on a measure of the amount of tied data present and the mean square error. An example illustrates the proposed criterion.
Resumo:
The integration of outcrop and subsurface information, including micropaleontological data, facies and sequence stratigraphic studies, and oxygen isotope analysis, allow us to present a new stratigraphic model for the Cretaceous continental deposits of the Bauru Group, Brazil. Thirty-eight fossil taxa were recovered from these deposits, including 29 species of ostracodes and 9 species of charophytes. Seven of these ostracode species and three subspecies are new and formally described here. The associations of Chara barbosai - Ilyocypris cf. riograndensis, found in the Adamantina Formation, and Amblyochara sp. - Neuquenocypris minor mineira nov. subsp., found in the Marília Formation. Ponte Alta Member, represent two distinct groups that are respectively Turonian-Santonian and Maastrichtian (probably Late Maastrichtian) in age. Therefore, a hiatus, encompassing more than 11 Ma, separates those two formations. From bottom to top, four depositional cycles were recognized in the Bauru Group in western São Paulo: cycles 1 and 2 belong to Caiuá Formation (fluvio-lacustrine and lacustrine deposits in the Presidente Prudente region), cycle 3 to the Santo Anastácio and lower Adamantina Formation (respectively fluvial and lacustrine deposits), and cycle 4 to the upper Adamantina Formation (fluvio-lacustrine facies). An erosional unconformity separates the Caiuá and Santo Anastácio Formations (between cycles 2 and 3). The Marília Formation is a distinct unit from the underlying succession; it does not occur in western São Paulo, but is found in restricted areas of São Paulo, Minas Gerais, Mato Grosso do Sul and Goiás States. During the deposition of the Bauru Group (Aptian? to Maastrichtian) the climate was hot and arid-semiarid. Shallow lakes underwent fluctuations in expansion (wet phases) and contraction (dry phases), as well as variations in salinity. During the deposition of the Adamantina Formation (Turonian-Santonian) there were long, dry periods that caused segmentation of large lakes (due to topographic irregularities in the basaltic substrate) and sometimes exposures of the lake floors; when flooded these lake floors were colonized by extensive meadows of single species of charophytes. Small ephemeral ponds, that were hydrochemically unstable and colonized by multiple species of charophytes, were the depositional sites for the marls and mudstones of Ponte Alta Member (Maastrichtian, Late Maastrichtian?). Our micropaleontological age control, combined with the Late Cretaceous ages of volcanic ashes found in the southeastern Brazil coastal basins, and the stratigraphic position of analcimites from the Jaboticabal-SP region, suggest a Late Coniacian-Santonian age for important magmatic events occurred in the interior of Brazil (north-central São Paulo State, Triângulo Mineiro, and southwestern Goiás State).
Resumo:
Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.
Resumo:
In soil surveys, several sampling systems can be used to define the most representative sites for sample collection and description of soil profiles. In recent years, the conditioned Latin hypercube sampling system has gained prominence for soil surveys. In Brazil, most of the soil maps are at small scales and in paper format, which hinders their refinement. The objectives of this work include: (i) to compare two sampling systems by conditioned Latin hypercube to map soil classes and soil properties; (II) to retrieve information from a detailed scale soil map of a pilot watershed for its refinement, comparing two data mining tools, and validation of the new soil map; and (III) to create and validate a soil map of a much larger and similar area from the extrapolation of information extracted from the existing soil map. Two sampling systems were created by conditioned Latin hypercube and by the cost-constrained conditioned Latin hypercube. At each prospection place, soil classification and measurement of the A horizon thickness were performed. Maps were generated and validated for each sampling system, comparing the efficiency of these methods. The conditioned Latin hypercube captured greater variability of soils and properties than the cost-constrained conditioned Latin hypercube, despite the former provided greater difficulty in field work. The conditioned Latin hypercube can capture greater soil variability and the cost-constrained conditioned Latin hypercube presents great potential for use in soil surveys, especially in areas of difficult access. From an existing detailed scale soil map of a pilot watershed, topographical information for each soil class was extracted from a Digital Elevation Model and its derivatives, by two data mining tools. Maps were generated using each tool. The more accurate of these tools was used for extrapolation of soil information for a much larger and similar area and the generated map was validated. It was possible to retrieve the existing soil map information and apply it on a larger area containing similar soil forming factors, at much low financial cost. The KnowledgeMiner tool for data mining, and ArcSIE, used to create the soil map, presented better results and enabled the use of existing soil map to extract soil information and its application in similar larger areas at reduced costs, which is especially important in development countries with limited financial resources for such activities, such as Brazil.
Resumo:
The use of markers distributed all long the genome may increase the accuracy of the predicted additive genetic value of young animals that are candidates to be selected as reproducers. In commercial herds, due to the cost of genotyping, only some animals are genotyped and procedures, divided in two or three steps, are done in order to include these genomic data in genetic evaluation. However, genomic evaluation may be calculated using one unified step that combines phenotypic data, pedigree and genomics. The aim of the study was to compare a multiple-trait model using only pedigree information with another using pedigree and genomic data. In this study, 9,318 lactations from 3061 buffaloes were used, 384 buffaloes were genotyped using a Illumina bovine chip (Illumina Infinium (R) bovineHD BeadChip). Seven traits were analyzed milk yield (MY), fat yield (FY), protein yield (PY), lactose yield (LY), fat percentage (F%), protein percentage (P%) and somatic cell score (SCSt). Two analyses were done: one using phenotypic and pedigree information (matrix A) and in the other using a matrix based in pedigree and genomic information (one step, matrix H). The (co) variance components were estimated using multiple-trait analysis by Bayesian inference method, applying an animal model, through Gibbs sampling. The model included the fixed effects of contemporary groups (herd-year-calving season), number of milking (2 levels), and age of buffalo at calving as (co) variable (quadratic and linear effect). The additive genetic, permanent environmental, and residual effects were included as random effects in the model. The heritability estimates using matrix A were 0.25, 0.22, 0.26, 0.17, 0.37, 0.42 and 0.26 and using matrix H were 0.25, 0.24, 0.26, 0.18, 0.38, 0.46 and 0.26 for MY, FY, PY, LY, % F, % P and SCCt, respectively. The estimates of the additive genetic effect for the traits were similar in both analyses, but the accuracy were bigger using matrix H (superior to 15% for traits studied). The heritability estimates were moderated indicating genetic gain under selection. The use of genomic information in the analyses increases the accuracy. It permits a better estimation of the additive genetic value of the animals.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Prior studies of phylogenetic relationships among phocoenids based on morphology and molecular sequence data conflict and yield unresolved relationships among species. This study evaluates a comprehensive set of cranial, postcranial, and soft anatomical characters to infer interrelationships among extant species and several well-known fossil phocoenids, using two different methods to analyze polymorphic data: polymorphic coding and frequency step matrix. Our phylogenetic results confirmed phocoenid monophyly. The division of Phocoenidae into two subfamilies previously proposed was rejected, as well as the alliance of the two extinct genera Salumiphocaena and Piscolithax with Phocoena dioptrica and Phocoenoides dalli. Extinct phocoenids are basal to all extant species. We also examined the origin and distribution of porpoises within the context of this phylogenetic framework. Phocoenid phylogeny together with available geologic evidence suggests that the early history of phocoenids was centered in the North Pacific during the middle Miocene, with subsequent dispersal into the southern hemisphere in the middle Pliocene. A cooling period in the Pleistocene allowed dispersal of the southern ancestor of Phocoena sinusinto the North Pacific (Gulf of California).
Resumo:
Last Glacial Maximum simulated sea surface temperature from the Paleo-Climate version of the National Center for Atmospheric Research Coupled Climate Model (NCAR-CCSM) are compared with available reconstructions and data-based products in the tropical and south Atlantic region. Model results are compared to data proxies based on the Multiproxy Approach for the Reconstruction of the Glacial Ocean surface product (MARGO). Results show that the model sea surface temperature is not consistent with the proxy-data in all of the region of interest. Discrepancies are found in the eastern, equatorial and in the high-latitude South Atlantic. The model overestimates the cooling in the southern South Atlantic (near 50 degrees S) shown by the proxy-data. Near the equator, model and proxies are in better agreement. In the eastern part of the equatorial basin the model underestimates the cooling shown by all proxies. A northward shift in the position of the subtropical convergence zone in the simulation suggests a compression or/and an equatorward shift of the subtropical gyre at the surface, consistent with what is observed in the proxy reconstruction. (C) 2008 Elsevier B.V. All rights reserved