939 resultados para Scientific Data
Resumo:
Diagnostic test sensitivity and specificity are probabilistic estimates with far reaching implications for disease control, management and genetic studies. In the absence of 'gold standard' tests, traditional Bayesian latent class models may be used to assess diagnostic test accuracies through the comparison of two or more tests performed on the same groups of individuals. The aim of this study was to extend such models to estimate diagnostic test parameters and true cohort-specific prevalence, using disease surveillance data. The traditional Hui-Walter latent class methodology was extended to allow for features seen in such data, including (i) unrecorded data (i.e. data for a second test available only on a subset of the sampled population) and (ii) cohort-specific sensitivities and specificities. The model was applied with and without the modelling of conditional dependence between tests. The utility of the extended model was demonstrated through application to bovine tuberculosis surveillance data from Northern and the Republic of Ireland. Simulation coupled with re-sampling techniques, demonstrated that the extended model has good predictive power to estimate the diagnostic parameters and true herd-level prevalence from surveillance data. Our methodology can aid in the interpretation of disease surveillance data, and the results can potentially refine disease control strategies.
Resumo:
This paper presents a current and turbulence measurement campaign conducted at a test site in an energetic tidal channel known as Strangford Narrows, Northern Ireland. The data was collected as part of the MaRINET project funded by the EU under their FP7 framework. It was a collaborative effort between Queen’s University Belfast, SCHOTTEL and Fraunhofer IWES. The site is highly turbulent with a strong shear flow. Longer term measurements of the flow regime were made using a bottom mounted Acoustic Doppler Profiler (ADP). During a specific turbulence measurement campaign, two collocated in- struments were used to measure incoming flow characteristics: an ADP (Aquadopp, Nortek) and a turbulence profiler (MicroRider, Rockland Scientific International). The instruments recorded the same incoming flow, so that direct comparisons between the data can be made. In this study the methodology adopted to deploy the instruments is presented. The resulting turbulence measurements using the different types of instrumentation are compared and the usefulness of each instrument for the relevant range of applications is discussed. The paper shows the ranges of the frequency spectra obtained using the different instruments, with the combined measurements providing insight into the structure of the turbulence across a wide range of scales.
Resumo:
The Virtual Atomic and Molecular Data Centre (VAMDC) Consortium is a worldwide consortium which federates atomic and molecular databases through an e-science infrastructure and an organisation to support this activity. About 90% of the inter-connected databases handle data that are used for the interpretation of astronomical spectra and for modelling in many fields of astrophysics. Recently the VAMDC Consortium has connected databases from the radiation damage and the plasma communities, as well as promoting the publication of data from Indian institutes. This paper describes how the VAMDC Consortium is organised for the optimal distribution of atomic and molecular data for scientific research. It is noted that the VAMDC Consortium strongly advocates that authors of research papers using data cite the original experimental and theoretical papers as well as the relevant databases.
Resumo:
The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.
Resumo:
The analysis of seabed structure is important in a wide variety of scientific and industrial applications. In this paper, underwater acoustic data produced by bottom-penetrating sonar (Topas) are analyzed using unsupervised volumetric segmentation, based on a three dimensional Gibbs-Markov model. The result is a concise and accurate description of the seabed, in which key structures are emphasized. This description is also very well suited to further operations, such as the enhancement and automatic recognition of important structures. Experimental results demonstrating the effectiveness of this approach are shown, using Topas data gathered in the North Sea off Horten, Norway.
Automatic classification of scientific records using the German Subject Heading Authority File (SWD)
Resumo:
The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
Resumo:
Tese de mestrado em Bioinformática e Biologia Computacional (Bioinformática), apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2014
Resumo:
Tese de doutoramento, Biologia (Biologia Marinha e Aquacultura), Universidade de Lisboa, Faculdade de Ciências, 2015
Resumo:
Researchers want to analyse Health Care data which may requires large pools of compute and data resources. To have them they need access to Distributed Computing Infrastructures (DCI). To use them it requires expertise which researchers may not have. Workflows can hide infrastructures. There are many workflow systems but they are not interoperable. To learn a workflow system and create workflows in a workflow system may require significant effort. Considering these efforts it is not reasonable to expect that researchers will learn new workflow systems if they want to run workflows of other workflow systems. As a result, the lack of interoperability prevents workflow sharing and a vast amount of research efforts is wasted. The FP7 Sharing Interoperable Workflow for Large-Scale Scientific Simulation on Available DCIs (SHIWA) project developed the Coarse-Grained Interoperability (CGI) to enable workflow sharing. The project created the SHIWA Simulation Platform (SSP) to support CGI as a production-level service. The paper describes how the CGI approach can be used for analysis and simulation in Health Care.
Resumo:
El presente manual de uso del software de visualización de datos “Ocean Data View” (ODV) describe la exploración, análisis y visualización de datos oceanográficos según el formato de la colección mundial de base de datos del océano “World Ocean Database” (WOD). El manual comprende 6 ejercicios prácticos donde se describe paso a paso la creación de las metavariables, la importación de los datos y su visualización mediante mapas de latitud, longitud y gráficos de dispersión, secciones verticales y series de tiempo. Se sugiere el uso extensivo del ODV para la visualización de datos oceanográficos por el personal científico del IMARPE.
Resumo:
This mixed-methods research study sought to determine the impact of an informal science camp—the Youth Science Inquiry Development Camp (YSIDC)—on participants’ science inquiry skills, through self-assessment, as well as their views and attitudes towards science and scientific inquiry. Pre and post data were collected using quantitative surveys (SPSI, CARS), a qualitative survey (VOSI-E), interviews, and researcher’s observations. Paired sample t-tests from the quantitative surveys revealed that the YSIDC positively impacted participants’ science inquiry skills and attitudes towards science. Interviews supported these findings and provided contextual reasons for these impacts. Implications from this research would suggest that informal and formal educational institutions can increase science inquiry skills and promote positive views and attitudes towards science and scientific inquiry by using non-competitive cooperative learning strategies with a mixture of guided and open inquiry. Suggested directions for further research include measuring science inquiry skills directly and conducting longitudinal studies to determine the lasting effects of informal and formal science programs.
Resumo:
The attached file is created with Scientific Workplace Latex
Resumo:
Background/Aims: There are compelling reasons to ensure participation of ethnic minorities and populations of all ages worldwide in nutrigenetics clinical research. If findings in such research are valid for some individuals, groups, or communities, and not for others, then ethical questions of justice – and not only issues of methodology and external validity – arise. This paper aims to examine inclusion in nutrigenetics clinical research and its scientific and ethical challenges. Methods: 173 publications were identified through a systematic review of clinical studies in nutrigenetics published between 1998 and 2007 inclusively. Data such as participants' demographics as well as eligibility criteria were extracted. Results: There is no consistency in the way participants’ origins (ancestry, ethnicity or race) and ages are described in publications. A vast majority of the studies identified was conducted in North America and Europe and focused on “white” participants. Our results show that pregnant women (and fetuses), minors and the elderly (≥75 years old) remain underrepresented. Conclusion: Representativeness in nutrigenetics research is a challenging ethical and scientific issue. Yet, if nutrigenetics is to benefit whole populations and be used in public and global health agendas, fair representation, as well as clear descriptions of participants in publications are crucial.
Resumo:
Various research fields, like organic agricultural research, are dedicated to solving real-world problems and contributing to sustainable development. Therefore, systems research and the application of interdisciplinary and transdisciplinary approaches are increasingly endorsed. However, research performance depends not only on self-conception, but also on framework conditions of the scientific system, which are not always of benefit to such research fields. Recently, science and its framework conditions have been under increasing scrutiny as regards their ability to serve societal benefit. This provides opportunities for (organic) agricultural research to engage in the development of a research system that will serve its needs. This article focuses on possible strategies for facilitating a balanced research evaluation that recognises scientific quality as well as societal relevance and applicability. These strategies are (a) to strengthen the general support for evaluation beyond scientific impact, and (b) to provide accessible data for such evaluations. Synergies of interest are found between open access movements and research communities focusing on global challenges and sustainability. As both are committed to increasing the societal benefit of science, they may support evaluation criteria such as knowledge production and dissemination tailored to societal needs, and the use of open access. Additional synergies exist between all those who scrutinise current research evaluation systems for their ability to serve scientific quality, which is also a precondition for societal benefit. Here, digital communication technologies provide opportunities to increase effectiveness, transparency, fairness and plurality in the dissemination of scientific results, quality assurance and reputation. Furthermore, funders may support transdisciplinary approaches and open access and improve data availability for evaluation beyond scientific impact. If they begin to use current research information systems that include societal impact data while reducing the requirements for narrative reports, documentation burdens on researchers may be relieved, with the funders themselves acting as data providers for researchers, institutions and tailored dissemination beyond academia.
Resumo:
Compositional data naturally arises from the scientific analysis of the chemical composition of archaeological material such as ceramic and glass artefacts. Data of this type can be explored using a variety of techniques, from standard multivariate methods such as principal components analysis and cluster analysis, to methods based upon the use of log-ratios. The general aim is to identify groups of chemically similar artefacts that could potentially be used to answer questions of provenance. This paper will demonstrate work in progress on the development of a documented library of methods, implemented using the statistical package R, for the analysis of compositional data. R is an open source package that makes available very powerful statistical facilities at no cost. We aim to show how, with the aid of statistical software such as R, traditional exploratory multivariate analysis can easily be used alongside, or in combination with, specialist techniques of compositional data analysis. The library has been developed from a core of basic R functionality, together with purpose-written routines arising from our own research (for example that reported at CoDaWork'03). In addition, we have included other appropriate publicly available techniques and libraries that have been implemented in R by other authors. Available functions range from standard multivariate techniques through to various approaches to log-ratio analysis and zero replacement. We also discuss and demonstrate a small selection of relatively new techniques that have hitherto been little-used in archaeometric applications involving compositional data. The application of the library to the analysis of data arising in archaeometry will be demonstrated; results from different analyses will be compared; and the utility of the various methods discussed