846 resultados para data science
Resumo:
Mode of access: Internet.
Resumo:
Overview of the key aspects and approaches to open access, open data and open science, emphasizing on sharing scientific knowledge for sustainable progress and development.
Resumo:
Overview of the growth of policies and a critical appraisal of the issues affecting open access, open data and open science policies. Example policies and a roadmap for open access, open research data and open science are included.
Resumo:
Cloud computing offers massive scalability and elasticity required by many scien-tific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new oppor-tunities for application developers. This paper investigates how workflow sys-tems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data.
Resumo:
Responsible Research Data Management (RDM) is a pillar of quality research. In practice good RDM requires the support of a well-functioning Research Data Infrastructure (RDI). One of the challenges the research community is facing is how to fund the management of research data and the required infrastructure. Knowledge Exchange and Science Europe have both defined activities to explore how RDM/RDI are, or can be, funded. Independently they each planned to survey users and providers of data services and on becoming aware of the similar objectives and approaches, the Science Europe Working Group on Research Data and the Knowledge Exchange Research Data expert group joined forces and devised a joint activity to to inform the discussion on the funding of RDM/RDI in Europe.
Resumo:
This dissertation proposes an analysis of the governance of the European scientific research, focusing on the emergence of the Open Science paradigm: a new way of doing science, oriented towards the openness of every phase of the scientific research process, able to take full advantage of the digital ICTs. The emergence of this paradigm is relatively recent, but in the last years it has become increasingly relevant. The European institutions expressed a clear intention to embrace the Open Science paradigm (eg., think about the European Open Science Cloud, EOSC; or the establishment of the Horizon Europe programme). This dissertation provides a conceptual framework for the multiple interventions of the European institutions in the field of Open Science, addressing the major legal challenges of its implementation. The study investigates the notion of Open Science, proposing a definition that takes into account all its dimensions related to the human and fundamental rights framework in which Open Science is grounded. The inquiry addresses the legal challenges related to the openness of research data, in light of the European Open Data framework and the impact of the GDPR on the context of Open Science. The last part of the study is devoted to the infrastructural dimension of the Open Science paradigm, exploring the e-infrastructures. The focus is on a specific type of computational infrastructure: the High Performance Computing (HPC) facility. The adoption of HPC for research is analysed from the European perspective, investigating the EuroHPC project, and the local perspective, proposing the case study of the HPC facility of the University of Luxembourg, the ULHPC. This dissertation intends to underline the relevance of the legal coordination approach, between all actors and phases of the process, in order to develop and implement the Open Science paradigm, adhering to the underlying human and fundamental rights.
Resumo:
The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.
Resumo:
Diagnostic methods have been an important tool in regression analysis to detect anomalies, such as departures from error assumptions and the presence of outliers and influential observations with the fitted models. Assuming censored data, we considered a classical analysis and Bayesian analysis assuming no informative priors for the parameters of the model with a cure fraction. A Bayesian approach was considered by using Markov Chain Monte Carlo Methods with Metropolis-Hasting algorithms steps to obtain the posterior summaries of interest. Some influence methods, such as the local influence, total local influence of an individual, local influence on predictions and generalized leverage were derived, analyzed and discussed in survival data with a cure fraction and covariates. The relevance of the approach was illustrated with a real data set, where it is shown that, by removing the most influential observations, the decision about which model best fits the data is changed.
Resumo:
Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
Background: High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results: The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions: Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.
Resumo:
Melanoma is a highly aggressive and therapy resistant tumor for which the identification of specific markers and therapeutic targets is highly desirable. We describe here the development and use of a bioinformatic pipeline tool, made publicly available under the name of EST2TSE, for the in silico detection of candidate genes with tissue-specific expression. Using this tool we mined the human EST (Expressed Sequence Tag) database for sequences derived exclusively from melanoma. We found 29 UniGene clusters of multiple ESTs with the potential to predict novel genes with melanoma-specific expression. Using a diverse panel of human tissues and cell lines, we validated the expression of a subset of three previously uncharacterized genes (clusters Hs.295012, Hs.518391, and Hs.559350) to be highly restricted to melanoma/melanocytes and named them RMEL1, 2 and 3, respectively. Expression analysis in nevi, primary melanomas, and metastatic melanomas revealed RMEL1 as a novel melanocytic lineage-specific gene up-regulated during melanoma development. RMEL2 expression was restricted to melanoma tissues and glioblastoma. RMEL3 showed strong up-regulation in nevi and was lost in metastatic tumors. Interestingly, we found correlations of RMEL2 and RMEL3 expression with improved patient outcome, suggesting tumor and/or metastasis suppressor functions for these genes. The three genes are composed of multiple exons and map to 2q12.2, 1q25.3, and 5q11.2, respectively. They are well conserved throughout primates, but not other genomes, and were predicted as having no coding potential, although primate-conserved and human-specific short ORFs could be found. Hairpin RNA secondary structures were also predicted. Concluding, this work offers new melanoma-specific genes for future validation as prognostic markers or as targets for the development of therapeutic strategies to treat melanoma.
Resumo:
The A1763 superstructure at z = 0.23 contains the first galaxy filament to be directly detected using mid-infrared observations. Our previous work has shown that the frequency of starbursting galaxies, as characterized by 24 mu m emission is much higher within the filament than at either the center of the rich galaxy cluster, or the field surrounding the system. New Very Large Array and XMM-Newton data are presented here. We use the radio and X-ray data to examine the fraction and location of active galaxies, both active galactic nuclei (AGNs) and starbursts (SBs). The radio far-infrared correlation, X-ray point source location, IRAC colors, and quasar positions are all used to gain an understanding of the presence of dominant AGNs. We find very few MIPS-selected galaxies that are clearly dominated by AGN activity. Most radio-selected members within the filament are SBs. Within the supercluster, three of eight spectroscopic members detected both in the radio and in the mid-infrared are radio-bright AGNs. They are found at or near the core of A1763. The five SBs are located further along the filament. We calculate the physical properties of the known wide angle tail (WAT) source which is the brightest cluster galaxy of A1763. A second double lobe source is found along the filament well outside of the virial radius of either cluster. The velocity offset of the WAT from the X-ray centroid and the bend of the WAT in the intracluster medium are both consistent with ram pressure stripping, indicative of streaming motions along the direction of the filament. We consider this as further evidence of the cluster-feeding nature of the galaxy filament.
Resumo:
The HR Del nova remnant was observed with the IFU-GMOS at Gemini North. The spatially resolved spectral data cube was used in the kinematic, morphological, and abundance analysis of the ejecta. The line maps show a very clumpy shell with two main symmetric structures. The first one is the outer part of the shell seen in H alpha, which forms two rings projected in the sky plane. These ring structures correspond to a closed hourglass shape, first proposed by Harman & O'Brien. The equatorial emission enhancement is caused by the superimposed hourglass structures in the line of sight. The second structure seen only in the [O III] and [N II] maps is located along the polar directions inside the hourglass structure. Abundance gradients between the polar caps and equatorial region were not found. However, the outer part of the shell seems to be less abundant in oxygen and nitrogen than the inner regions. Detailed 2.5-dimensional photoionization modeling of the three-dimensional shell was performed using the mass distribution inferred from the observations and the presence of mass clumps. The resulting model grids are used to constrain the physical properties of the shell as well as the central ionizing source. A sequence of three-dimensional clumpy models including a disk-shaped ionization source is able to reproduce the ionization gradients between polar and equatorial regions of the shell. Differences between shell axial ratios in different lines can also be explained by aspherical illumination. A total shell mass of 9 x 10(-4) M(circle dot) is derived from these models. We estimate that 50%-70% of the shell mass is contained in neutral clumps with density contrast up to a factor of 30.
Resumo:
Using the published KTeV samples of K(L) -> pi(+/-)e(-/+)nu and K(L) -> pi(+/-)mu(-/+)nu decays, we perform a reanalysis of the scalar and vector form factors based on the dispersive parametrization. We obtain phase-space integrals I(K)(e) = 0.15446 +/- 0.00025 and I(K)(mu) = 0.10219 +/- 0.00025. For the scalar form factor parametrization, the only free parameter is the normalized form factor value at the Callan-Treiman point (C); our best-fit results in InC = 0.1915 +/- 0.0122. We also study the sensitivity of C to different parametrizations of the vector form factor. The results for the phase-space integrals and C are then used to make tests of the standard model. Finally, we compare our results with lattice QCD calculations of F(K)/F(pi) and f(+)(0).