963 resultados para Scientific data
Resumo:
Presentation from the MARAC conference in Pittsburgh, PA on April 14–16, 2016. S13 - Student Poster Session; Analysis of Federal Policy on Public Access to Scientific Research Data
Resumo:
In geophysics and seismology, raw data need to be processed to generate useful information that can be turned into knowledge by researchers. The number of sensors that are acquiring raw data is increasing rapidly. Without good data management systems, more time can be spent in querying and preparing datasets for analyses than in acquiring raw data. Also, a lot of good quality data acquired at great effort can be lost forever if they are not correctly stored. Local and international cooperation will probably be reduced, and a lot of data will never become scientific knowledge. For this reason, the Seismological Laboratory of the Institute of Astronomy, Geophysics and Atmospheric Sciences at the University of Sao Paulo (IAG-USP) has concentrated fully on its data management system. This report describes the efforts of the IAG-USP to set up a seismology data management system to facilitate local and international cooperation.
Resumo:
Brazilian science has increased fast during the last decades. An example is the increasing in the country`s share in the world`s scientific publication within the main international databases. But what is the actual weight of international publications to the whole Brazilian productivity? In order to respond this question, we have elaborated a new indicator, the International Publication Ratio (IPR). The data source was Lattes Database, a database organized by one of the main Brazilian S&T funding agency, which encompasses publication data from 1997 to 2004 of about 51,000 Brazilian researchers. Influences of distinct parameters, such as sectors, fields, career age and gender, are analyzed. We hope the data presented may help S&T managers and other S&T interests to better understand the complexity under the concept scientific productivity, especially in peripheral countries in science, such as Brazil.
Resumo:
Thermodynamic properties of bread dough (fusion enthalpy, apparent specific heat, initial freezing point and unfreezable water) were measured at temperatures from -40 degrees C to 35 degrees C using differential scanning calorimetry. The initial freezing point was also calculated based on the water activity of dough. The apparent specific heat varied as a function of temperature: specific heat in the freezing region varied from (1.7-23.1) J g(-1) degrees C(-1), and was constant at temperatures above freezing (2.7 J g(-1) degrees C(-1)). Unfreezable water content varied from (0.174-0.182) g/g of total product. Values of heat capacity as a function of temperature were correlated using thermodynamic models. A modification for low-moisture foodstuffs (such as bread dough) was successfully applied to the experimental data. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.
Resumo:
Mitochondrial DNA (mtDNA) population data for forensic purposes are still scarce for some populations, which may limit the evaluation of forensic evidence especially when the rarity of a haplotype needs to be determined in a database search. In order to improve the collection of mtDNA lineages from the Iberian and South American subcontinents, we here report the results of a collaborative study involving nine laboratories from the Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) and EMPOP. The individual laboratories contributed population data that were generated throughout the past 10 years, but in the majority of cases have not been made available to the scientific community. A total of 1019 haplotypes from Iberia (Basque Country, 2 general Spanish populations, 2 North and 1 Central Portugal populations), and Latin America (3 populations from Sao Paulo) were collected, reviewed and harmonized according to defined EMPOP criteria. The majority of data ambiguities that were found during the reviewing process (41 in total) were transcription errors confirming that the documentation process is still the most error-prone stage in reporting mtDNA population data, especially when performed manually. This GHEP-EMPOP collaboration has significantly improved the quality of the individual mtDNA datasets and adds mtDNA population data as valuable resource to the EMPOP database (www.empop.org). (C) 2010 Elsevier Ireland Ltd. All rights reserved.
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
This paper aims to cast some light on the dynamics of knowledge networks in developing countries by analyzing the scientific production of the largest university in the Northeast of Brazil and its influence on some of the remaining regional research institutions in the state of Bahia. Using a methodology test to be employed in a larger project, the Universidade Federal da Bahia (UFBA) (Federal University of Bahia), the Universidade do Estado da Bahia (Uneb) (State of Bahia University) and the Universidade Estadual de Santa Cruz (Uesc)'s (Santa Cruz State University) scientific productions are discussed in one of their most traditionally expressive sectors in academic production - namely, the field of chemistry, using social network analysis of co-authorship networks to investigate the existence of small world phenomena and the importance of these phenomena in research performance in these three universities. The results already obtained through this research bring to light data of considerable interest concerning the scientific production in unconsolidated research universities. It shows the important participation of the UFBA network in the composition of the other two public universities research networks, indicating a possible occurrence of small world phenomena in the UFBA and Uesc networks, as well as the importance of individual researchers in consolidating research networks in peripheral universities. The article also hints that the methodology employed appears to be adequate insofar as scientific production may be used as a proxy for scientific knowledge.
Resumo:
The paper discusses the difficulties in judging the quality of scientific manuscripts and describes some common pitfalls that should be avoided when preparing a paper for submission to a peer-reviewed journal. Peer review is an imperfect system, with less than optimal reliability and uncertain validity. However, as it is likely that it will remain as the principal process of screening papers for publication, authors should avoid some common mistakes when preparing a report based on empirical findings of human research. Among these are: excessively long abstracts, extensive use of abbreviations, failure to report results of parsimonious data analyses, and misinterpretation of statistical associations identified in observational studies as causal. Another common problem in many manuscripts is their excessive length, which makes them more difficult to be evaluated or read by the intended readers, if published. The evaluation of papers after their publication with a view towards their inclusion in a systematic review is also discussed. The limitations of the impact factor as a criterion to judge the quality of a paper are reviewed.
Resumo:
Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.
Resumo:
Workflows have been successfully applied to express the decomposition of complex scientific applications. However the existing tools still lack adequate support to important aspects namely, decoupling the enactment engine from tasks specification, decentralizing the control of workflow activities allowing their tasks to run in distributed infrastructures, and supporting dynamic workflow reconfigurations. We present the AWARD (Autonomic Workflow Activities Reconfigurable and Dynamic) model of computation, based on Process Networks, where the workflow activities (AWA) are autonomic processes with independent control that can run in parallel on distributed infrastructures. Each AWA executes a task developed as a Java class with a generic interface allowing end-users to code their applications without low-level details. The data-driven coordination of AWA interactions is based on a shared tuple space that also enables dynamic workflow reconfiguration. For evaluation we describe experimental results of AWARD workflow executions in several application scenarios, mapped to the Amazon (Elastic Computing EC2) Cloud.
Resumo:
Harnessing idle PCs CPU cycles, storage space and other resources of networked computers to collaborative are mainly fixated on for all major grid computing research projects. Most of the university computers labs are occupied with the high puissant desktop PC nowadays. It is plausible to notice that most of the time machines are lying idle or wasting their computing power without utilizing in felicitous ways. However, for intricate quandaries and for analyzing astronomically immense amounts of data, sizably voluminous computational resources are required. For such quandaries, one may run the analysis algorithms in very puissant and expensive computers, which reduces the number of users that can afford such data analysis tasks. Instead of utilizing single expensive machines, distributed computing systems, offers the possibility of utilizing a set of much less expensive machines to do the same task. BOINC and Condor projects have been prosperously utilized for solving authentic scientific research works around the world at a low cost. In this work the main goal is to explore both distributed computing to implement, Condor and BOINC, and utilize their potency to harness the ideal PCs resources for the academic researchers to utilize in their research work. In this thesis, Data mining tasks have been performed in implementation of several machine learning algorithms on the distributed computing environment.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
The ethical aspects of the Brazilian publications about human Chagas disease (CD) developed between 1996 and 2010 and the policy adopted by Brazilian medical journals were analyzed. Articles were selected on the SciELO Brazil data basis, and the evaluation of ethical aspects was based on the normative contents about ethics in research involving human experimentation according to the Brazilian resolution of the National Health Council no. 196/1996. The editorial policies of the section "Instructions to authors" were analyzed. In the period of 1996-2012, 58.9% of articles involving human Chagas disease did not refer to the fulfillment of the ethical aspects concerning research with human beings. In 80% of the journals, the requirements and confirmation of the information about ethical aspects in the studies of human CD were not observed. Although a failure in this type of service is still observed, awareness has been raised in federal agencies, educational institutions/research and publishing groups to standardize the procedures and ethical requirements for the Brazilian journals, reinforcing the fulfillment of the ethical parameters, according to the resolution of NHC no. 196/1996.