925 resultados para research data repository
Resumo:
Scientific research is increasingly data-intensive, relying more and more upon advanced computational resources to be able to answer the questions most pressing to our society at large. This report presents findings from a brief descriptive survey sent to a sample of 342 leading researchers at the University of Washington (UW), Seattle, Washington in 2010 and 2011 as the first stage of the larger National Science Foundation project “Interacting with Cyberinfrastructure in the Face of Changing Science.” This survey assesses these researcher’s use of advanced computational resources, data, and software in their research. We present high-level findings that describe UW researchers’: demographics, interdisciplinarity, research groups, data use, software and computational use—including software development and use, data storage and transfer activities, and collaboration tools, and computing resources. These findings offer insights into the state of computational resources in use during this time period as well as offering a look at the data intensiveness of UW researchers.
Resumo:
This article presents a systematic review of the literature on Digital Scholarship, aimed at better understanding the collocation of this research area at the crossroad of several disciplines and strands of research. The authors analysed 45 articlesin order to draw a picture of research in this area. In the first phase, the articles were classified, and relevant quantitative and qualitative data were analysed. Results showed that three clear strands of research do exist: Digital Libraries, Networked Scholarship and Digital Humanities. Moreover, researchers involved in this research area tackle the problems related to technological uptake in the scholar's profession from different points of view, and define the field in different – often complementary – ways, thus generating the perception of a research area still in need of a unifying vision. In the second phase, authors searched for evidence of the disciplinary contributions and interdisciplinary cohesion of research carried out in this area through the use of bibliometric maps. Results suggest that the area of Digital Scholarship, still in its infancy, is advancing in a rather fragmented way, shaping itself around the above-mentioned strands, each with its own research agenda. However, results from the cross-citation analysis suggest that the Networked Scholarship strand is more cohesive than the others in terms of cross-citations.
Resumo:
Information and Communication Technology (ICT) has become a necessary element of academic practice in higher education today. Under normal circumstances, PhD students from all disciplines have to use ICT in some form throughout the process of their research, including the preparation, fieldwork, analysis and writing phases of their studies. Nevertheless, there has been little research to date that explores PhD students’ first-hand experiences of using various ICT to support their research practices. This paper brings together the findings and the key points from a review of significant parts of the existing literature associated with the role played by ICT in the processes PhD students use in doctoral research. The review is based on 27 papers appearing in international peer-reviewed journals published from 2005 to 2014. The study seeks to address the under-researched area in the current literature of how ICT plays a role in the processes of doctoral research. While there are many contributions taking the ‘institutional’ or ‘teaching’ perspectives, papers focusing on ‘student’ perspective, or the viewpoint of engaging ICT in daily study routine, are relatively fewer. As far as research methodology is concerned, this review found that many of the papers that were examined were mostly based on perception data such as surveys or interviews, while actual practice data were rarely present. With their ready access to technologies, PhD students are well positioned to take advantage of a range of technologies in order to carry out their research efficiently (in terms of means to an end) and effectively (in terms of reaching goals within a task). This review reveals that in the literature, this important area is under-represented.
Resumo:
The project answers to the following central research question: ‘How would a moral duty of patients to transfer (health) data for the benefit of health care improvement, research, and public health in the eHealth sector sit within the existing confidentiality, privacy, and data protection legislations?’. The improvement of healthcare services, research, and public health relies on patient data, which is why one might raise the question concerning a potential moral responsibility of patients to transfer data concerning health. Such a responsibility logically would have subsequent consequences for care providers concerning the further transferring of health data with other healthcare providers or researchers and other organisations (who also possibly transfer the data further with others and other organisations). Otherwise, the purpose of the patients’ moral duty, i.e. to improve the care system and research, would be undermined. Albeit the arguments that may exist in favour of a moral responsibility of patients to share health-related data, there are also some moral hurdles that come with such a moral responsibility. Furthermore, the existing European and national confidentiality, privacy and data protection legislations appear to hamper such a possible moral duty, and they may need to be reconsidered to unlock the full use of data for healthcare and research.
Resumo:
Big data and AI are paving the way to promising scenarios in clinical practice and research. However, the use of such technologies might clash with GDPR requirements. Today, two forces are driving the EU policies in this domain. The first is the necessity to protect individuals’ safety and fundamental rights. The second is to incentivize the deployment of innovative technologies. The first objective is pursued by legislative acts such as the GDPR or the AIA, the second is supported by the new data strategy recently launched by the European Commission. Against this background, the thesis analyses the issue of GDPR compliance when big data and AI systems are implemented in the health domain. The thesis focuses on the use of co-regulatory tools for compliance with the GDPR. This work argues that there are two level of co-regulation in the EU legal system. The first, more general, is the approach pursued by the EU legislator when shaping legislative measures that deal with fast-evolving technologies. The GDPR can be deemed a co-regulatory solution since it mainly introduces general requirements, which implementation shall then be interpretated by the addressee of the law following a risk-based approach. This approach, although useful is costly and sometimes burdensome for organisations. The second co-regulatory level is represented by specific co-regulatory tools, such as code of conduct and certification mechanisms. These tools are meant to guide and support the interpretation effort of the addressee of the law. The thesis argues that the lack of co-regulatory tools which are supposed to implement data protection law in specific situations could be an obstacle to the deployment of innovative solutions in complex scenario such as the health ecosystem. The thesis advances hypothesis on theoretical level about the reasons of such a lack of co-regulatory solutions.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.
Resumo:
The aim of this study was to analyze the reasons for missed appointments in dental Family Health Units (FHU) and implement strategies to reduce same through action research. This is a study conducted in 12 FHUs in Piracicaba in the State of São Paulo from January, 1 to December, 31 2010. The sample was composed of 385 users of these health units who were interviewed over the phone and asked about the reasons for missing dental appointments, as well as 12 dentists and 12 nurses. Two workshops were staged with professionals: the first to assess the data collected in interviews and develop strategy, and the second for evaluation after 4 months. The primary cause for missed appointments was the opening hours of the units coinciding with the work schedule of the users. Among the strategies suggested were lectures on oral health, ongoing education in team meetings, training of Community Health Agents, participation in therapeutic groups and partnerships between Oral Health Teams and the social infrastructure of the community. The adoption of the single medical record was the strategy proposed by professionals. The strategies implemented led to a 66.6% reduction in missed appointments by the units and the motivating nature of the workshops elicited critical reflection to redirect health practices.
Resumo:
The syndrome of resistance to thyroid hormone (RTH β) is an inherited disorder characterized by variable tissue hyposensitivity to 3,5,30-l-triiodothyronine (T3), with persistent elevation of free-circulating T3 (FT3) and free thyroxine (FT4) levels in association with nonsuppressed serum thyrotropin (TSH). Clinical presentation is variable and the molecular analysis of THRB gene provides a short cut diagnosis. Here, we describe 2 cases in which RTH β was suspected on the basis of laboratory findings. The diagnosis was confirmed by direct THRB sequencing that revealed 2 novel mutations: the heterozygous p.Ala317Ser in subject 1 and the heterozygous p.Arg438Pro in subject 2. Both mutations were shown to be deleterious by SIFT, PolyPhen, and Align GV-GD predictive methods.
Resumo:
Advances in diagnostic research are moving towards methods whereby the periodontal risk can be identified and quantified by objective measures using biomarkers. Patients with periodontitis may have elevated circulating levels of specific inflammatory markers that can be correlated to the severity of the disease. The purpose of this study was to evaluate whether differences in the serum levels of inflammatory biomarkers are differentially expressed in healthy and periodontitis patients. Twenty-five patients (8 healthy patients and 17 chronic periodontitis patients) were enrolled in the study. A 15 mL blood sample was used for identification of the inflammatory markers, with a human inflammatory flow cytometry multiplex assay. Among 24 assessed cytokines, only 3 (RANTES, MIG and Eotaxin) were statistically different between groups (p<0.05). In conclusion, some of the selected markers of inflammation are differentially expressed in healthy and periodontitis patients. Cytokine profile analysis may be further explored to distinguish the periodontitis patients from the ones free of disease and also to be used as a measure of risk. The present data, however, are limited and larger sample size studies are required to validate the findings of the specific biomarkers.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
Information on fruits and vegetables consumption in Brazil in the three levels of dietary data was analyzed and compared. Data about national supply came from Food Balance Sheets compiled by the FAO; household availability information was obtained from the Brazilian National Household Budget Survey (HBS); and actual intake information came from a large individual dietary intake survey that was representative of the adult population of São Paulo city. All sources of information were collected between 2002 and 2003. A subset of the HBS, representative of São Paulo city, was used in our analysis in order to improve the quality of the comparison with actual intake data. The ratio of national supply to household availability of fruits and vegetables was 2.6 while the ratio of national supply to actual intake was 4.0. The discrepancy ratio in the comparison between household availability and actual intake was smaller, 1.6. While the use of supply and availability data has advantages, as lower cost, must be taken into account that these sources tend to overestimate actual intake of fruits and vegetables.
Resumo:
study-specific results, their findings should be interpreted with caution
Resumo:
Diagnostic methods have been an important tool in regression analysis to detect anomalies, such as departures from error assumptions and the presence of outliers and influential observations with the fitted models. Assuming censored data, we considered a classical analysis and Bayesian analysis assuming no informative priors for the parameters of the model with a cure fraction. A Bayesian approach was considered by using Markov Chain Monte Carlo Methods with Metropolis-Hasting algorithms steps to obtain the posterior summaries of interest. Some influence methods, such as the local influence, total local influence of an individual, local influence on predictions and generalized leverage were derived, analyzed and discussed in survival data with a cure fraction and covariates. The relevance of the approach was illustrated with a real data set, where it is shown that, by removing the most influential observations, the decision about which model best fits the data is changed.
Resumo:
Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.