955 resultados para Data Repository
Resumo:
Social Computing Data Repository hosts data from a collection of many different social media sites, most of which have blogging capacity. Some of the prominent social media sites included in this repository are BlogCatalog, Twitter, MyBlogLog, Digg, StumbleUpon, del.icio.us, MySpace, LiveJournal, The Unofficial Apple Weblog (TUAW), Reddit, etc. The repository contains various facets of blog data including blog site metadata like, user defined tags, predefined categories, blog site description; blog post level metadata like, user defined tags, date and time of posting; blog posts; blog post mood (which is defined as the blogger's emotions when (s)he wrote the blog post); blogger name; blog post comments; and blogger social network.
Resumo:
Trabajo realizado por Antonio Machado Carrillo, Juan Antonio Bermejo e Ignacio Lorenzo
Resumo:
This dissertation established a software-hardware integrated design for a multisite data repository in pediatric epilepsy. A total of 16 institutions formed a consortium for this web-based application. This innovative fully operational web application allows users to upload and retrieve information through a unique human-computer graphical interface that is remotely accessible to all users of the consortium. A solution based on a Linux platform with My-SQL and Personal Home Page scripts (PHP) has been selected. Research was conducted to evaluate mechanisms to electronically transfer diverse datasets from different hospitals and collect the clinical data in concert with their related functional magnetic resonance imaging (fMRI). What was unique in the approach considered is that all pertinent clinical information about patients is synthesized with input from clinical experts into 4 different forms, which were: Clinical, fMRI scoring, Image information, and Neuropsychological data entry forms. A first contribution of this dissertation was in proposing an integrated processing platform that was site and scanner independent in order to uniformly process the varied fMRI datasets and to generate comparative brain activation patterns. The data collection from the consortium complied with the IRB requirements and provides all the safeguards for security and confidentiality requirements. An 1-MR1-based software library was used to perform data processing and statistical analysis to obtain the brain activation maps. Lateralization Index (LI) of healthy control (HC) subjects in contrast to localization-related epilepsy (LRE) subjects were evaluated. Over 110 activation maps were generated, and their respective LIs were computed yielding the following groups: (a) strong right lateralization: (HC=0%, LRE=18%), (b) right lateralization: (HC=2%, LRE=10%), (c) bilateral: (HC=20%, LRE=15%), (d) left lateralization: (HC=42%, LRE=26%), e) strong left lateralization: (HC=36%, LRE=31%). Moreover, nonlinear-multidimensional decision functions were used to seek an optimal separation between typical and atypical brain activations on the basis of the demographics as well as the extent and intensity of these brain activations. The intent was not to seek the highest output measures given the inherent overlap of the data, but rather to assess which of the many dimensions were critical in the overall assessment of typical and atypical language activations with the freedom to select any number of dimensions and impose any degree of complexity in the nonlinearity of the decision space.
Resumo:
Queensland University of Technology (QUT) completed an Australian National Data Service (ANDS) funded “Seeding the Commons Project” to contribute metadata to Research Data Australia. The project employed two Research Data Librarians from October 2009 through to July 2010. Technical support for the project was provided by QUT’s High Performance Computing and Research Support Specialists. ---------- The project identified and described QUT’s category 1 (ARC / NHMRC) research datasets. Metadata for the research datasets was stored in QUT’s Research Data Repository (Architecta Mediaflux). Metadata which was suitable for inclusion in Research Data Australia was made available to the Australian Research Data Commons (ARDC) in RIF-CS format. ---------- Several workflows and processes were developed during the project. 195 data interviews took place in connection with 424 separate research activities which resulted in the identification of 492 datasets. ---------- The project had a high level of technical support from QUT High Performance Computing and Research Support Specialists who developed the Research Data Librarian interface to the data repository that enabled manual entry of interview data and dataset metadata, creation of relationships between repository objects. The Research Data Librarians mapped the QUT metadata repository fields to RIF-CS and an application was created by the HPC and Research Support Specialists to generate RIF-CS files for harvest by the Australian Research Data Commons (ARDC). ---------- This poster will focus on the workflows and processes established for the project including: ---------- • Interview processes and instruments • Data Ingest from existing systems (including mapping to RIF-CS) • Data entry and the Data Librarian interface to Mediaflux • Verification processes • Mapping and creation of RIF-CS for the ARDC
Resumo:
Queensland University of Technology’s Institutional Repository, QUT ePrints (http://eprints.qut.edu.au/), was established in 2003. With the help of an institutional mandate (endorsed in 2004) the repository now holds over 11,000 open access publications. The repository’s success is celebrated within the University and acknowledged nationally and internationally. QUT ePrints was built on GNU EPrints open source repository software (currently running v.3.1.3) and was originally configured to accommodate open access versions of the traditional range of research publications (journal articles, conference papers, books, book chapters and working papers). However, in 2009, the repository’s scope, content and systems were broadened and the ‘QUT Digital repository’ is now a service encompassing a range of digital collections, services and systems. For a work to be accepted in to the institutional repository, at least one of the authors/creators must have a current affiliation with QUT. However, the success of QUT ePrints in terms of its capacity to increase the visibility and accessibility of our researchers' scholarly works resulted in requests to accept digital collections of works which were out of scope. To address this need, a number of parallel digital collections have been developed. These collections include, OZcase, a collection of legal research materials and ‘The Sugar Industry Collection’; a digitsed collection of books and articles on sugar cane production and processing. Additionally, the Library has responded to requests from academics for a service to support the publication of new, and existing, peer reviewed open access journals. A project is currently underway to help a group of senior QUT academics publish a new international peer reviewed journal. The QUT Digital Repository website will be a portal for access to a range of resources to support copyright management. It is likely that it will provide an access point for the institution’s data repository. The data repository, provisionally named the ‘QUT Data Commons’, is currently a work-in-progress. The metadata for some QUT datasets will also be harvested by and discoverable via ‘Research Data Australia’, the dataset discovery service managed by the Australian National Data Service (ANDS). QUT Digital repository will integrate a range of technologies and services related to scholarly communication. This paper will discuss the development of the QUT Digital Repository, its strategic functions, the stakeholders involved and lessons learned.
Resumo:
At QUT research data refers to information that is generated or collected to be used as primary sources in the production of original research results, and which would be required to validate or replicate research findings (Callan, De Vine, & Baker, 2010). Making publicly funded research data discoverable by the broader research community and the public is a key aim of the Australian National Data Service (ANDS). Queensland University of Technology (QUT) has been innovating in this space by undertaking mutually dependant technical and content (metadata) focused projects funded by ANDS. Research Data Librarians identified and described datasets generated from Category 1 funded research at QUT, by interviewing researchers, collecting metadata and fashioning metadata records for upload to the Australian Research Data commons (ARDC) and exposure through the Research Data Australia interface. In parallel to this project, a Research Data Management Service and Metadata hub project were being undertaken by QUT High Performance Computing & Research Support specialists. These projects will collectively store and aggregate QUT’s metadata and research data from multiple repositories and administration systems and contribute metadata directly by OAI-PMH compliant feed to RDA. The pioneering nature of the work has resulted in a collaborative project dynamic where good data management practices and the discoverability and sharing of research data were the shared drivers for all activity. Each project’s development and progress was dependent on feedback from the other. The metadata structure evolved in tandem with the development of the repository and the development of the repository interface responded to meet the needs of the data interview process. The project environment was one of bottom-up collaborative approaches to process and system development which matched top-down strategic alliances crossing organisational boundaries in order to provide the deliverables required by ANDS. This paper showcases the work undertaken at QUT, focusing on the Seeding the Commons project as a case study, and illustrates how the data management projects are interconnected. It describes the processes and systems being established to make QUT research data more visible and the nature of the collaborations between organisational areas required to achieve this. The paper concludes with the Seeding the Commons project outcomes and the contribution this project made to getting more research data ‘out there’.
Resumo:
QUT Library and the High Performance Computing and Research Support (HPC) Team have been collaborating on developing and delivering a range of research support services, including those designed to assist researchers to manage their data. QUT’s Management of Research Data policy has been available since 2010 and is complemented by the Data Management Guidelines and Checklist. QUT has partnered with the Australian Research Data Service (ANDS) on a number of projects including Seeding the Commons, Metadata Hub (with Griffith University) and the Data Capture program. The HPC Team has also been developing the QUT Research Data Repository based on the Architecta Mediaflux system and have run several pilots with faculties. Library and HPC staff have been trained in the principles of research data management and are providing a range of research data management seminars and workshops for researchers and HDR students.
Resumo:
This poster presents key features of how QUT’s integrated research data storage and management services work with researchers through their own individual or team research life cycle. By understanding the characteristics of research data, and the long-term need to store this data, QUT has provided resources and tools that support QUT’s goal of being a research intensive institute. Key to successful delivery and operation has been the focus upon researchers’ individual needs and the collaboration between providers, in particular, Information Technology Services, High Performance Computing and Research Support, and QUT Library. QUT’s Research Data Storage service provides all QUT researchers (staff and Higher Degree Research students (HDRs)) with a secure data repository throughout the research data lifecycle. Three distinct storage areas provide for raw research data to be acquired, project data to be worked on, and published data to be archived. Since the service was launched in late 2014, it has provided research project teams from all QUT faculties with acquisition, working or archival data space. Feedback indicates that the storage suits the unique needs of researchers and their data. As part of the workflow to establish storage space for researchers, Research Support Specialists and Research Data Librarians consult with researchers and HDRs to identify data storage requirements for projects and individual researchers, and to select and implement the most suitable data storage services and facilities. While research can be a journey into the unknown[1], a plan can help navigate through the uncertainty. Intertwined in the storage provision is QUT’s Research Data Management Planning tool. Launched in March 2015, it has already attracted 273 QUT staff and 352 HDR student registrations, and over 620 plans have been created (2/10/2015). Developed in collaboration with Office of Research Ethics and Integrity (OREI), uptake of the plan has exceeded expectations.
Resumo:
Scientific research revolves around the production, analysis, storage, management, and re-use of data. Data sharing offers important benefits for scientific progress and advancement of knowledge. However, several limitations and barriers in the general adoption of data sharing are still in place. Probably the most important challenge is that data sharing is not yet very common among scholars and is not yet seen as a regular activity among scientists, although important efforts are being invested in promoting data sharing. In addition, there is a relatively low commitment of scholars to cite data. The most important problems and challenges regarding data metrics are closely tied to the more general problems related to data sharing. The development of data metrics is dependent on the growth of data sharing practices, after all it is nothing more than the registration of researchers’ behaviour. At the same time, the availability of proper metrics can help researchers to make their data work more visible. This may subsequently act as an incentive for more data sharing and in this way a virtuous circle may be set in motion. This report seeks to further explore the possibilities of metrics for datasets (i.e. the creation of reliable data metrics) and an effective reward system that aligns the main interests of the main stakeholders involved in the process. The report reviews the current literature on data sharing and data metrics. It presents interviews with the main stakeholders on data sharing and data metrics. It also analyses the existing repositories and tools in the field of data sharing that have special relevance for the promotion and development of data metrics. On the basis of these three pillars, the report presents a number of solutions and necessary developments, as well as a set of recommendations regarding data metrics. The most important recommendations include the general adoption of data sharing and data publication among scholars; the development of a reward system for scientists that includes data metrics; reducing the costs of data publication; reducing existing negative cultural perceptions of researchers regarding data publication; developing standards for preservation, publication, identification and citation of datasets; more coordination of data repository initiatives; and further development of interoperability protocols across different actors.
Resumo:
Background: Popular approaches in human tissue-based biomarker discovery include tissue microarrays (TMAs) and DNA Microarrays (DMAs) for protein and gene expression profiling respectively. The data generated by these analytic platforms, together with associated image, clinical and pathological data currently reside on widely different information platforms, making searching and cross-platform analysis difficult. Consequently, there is a strong need to develop a single coherent database capable of correlating all available data types.
Method: This study presents TMAX, a database system to facilitate biomarker discovery tasks. TMAX organises a variety of biomarker discovery-related data into the database. Both TMA and DMA experimental data are integrated in TMAX and connected through common DNA/protein biomarkers. Patient clinical data (including tissue pathological data), computer assisted tissue image and associated analytic data are also included in TMAX to enable the truly high throughput processing of ultra-large digital slides for both TMAs and whole slide tissue digital slides. A comprehensive web front-end was built with embedded XML parser software and predefined SQL queries to enable rapid data exchange in the form of standard XML files.
Results & Conclusion: TMAX represents one of the first attempts to integrate TMA data with public gene expression experiment data. Experiments suggest that TMAX is robust in managing large quantities of data from different sources (clinical, TMA, DMA and image analysis). Its web front-end is user friendly, easy to use, and most importantly allows the rapid and easy data exchange of biomarker discovery related data. In conclusion, TMAX is a robust biomarker discovery data repository and research tool, which opens up the opportunities for biomarker discovery and further integromics research.
Resumo:
OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.
Resumo:
Problem: Dental radiographs generally display one or more findings/diagnoses, and are linked to a unique set of patient demographics, medical history and other findings not represented by the image. However, this information is not associated with radiographs in any type of meta format, and images are not searchable based on any clinical criteria (1,2). The purpose of this pilot study is to create an online, searchable data repository of dental radiographs to be used for patient care, teaching and research. [See PDF for complete abstract]
Resumo:
The oceans play a critical role in the Earth's climate, but unfortunately, the extent of this role is only partially understood. One major obstacle is the difficulty associated with making high-quality, globally distributed observations, a feat that is nearly impossible using only ships and other ocean-based platforms. The data collected by satellite-borne ocean color instruments, however, provide environmental scientists a synoptic look at the productivity and variability of the Earth's oceans and atmosphere, respectively, on high-resolution temporal and spatial scales. Three such instruments, the Sea-viewing Wide Field-of-view Sensor (SeaWiFS) onboard ORBIMAGE's OrbView-2 satellite, and two Moderate Resolution Imaging Spectroradiometers (MODIS) onboard the National Aeronautic and Space Administration's (NASA) Terra and Aqua satellites, have been in continuous operation since September 1997, February 2000, and June 2002, respectively. To facilitate the assembly of a suitably accurate data set for climate research, members of the NASA Sensor Intercomparison and Merger for Biological and Interdisciplinary Oceanic Studies (SIMBIOS) Project and SeaWiFS Project Offices devote significant attention to the calibration and validation of these and other ocean color instruments. This article briefly presents results from the SIMBIOS and SeaWiFS Project Office's (SSPO) satellite ocean color validation activities and describes the SeaWiFS Bio-optical Archive and Storage System (SeaBASS), a state-of-the-art system for archiving, cataloging, and distributing the in situ data used in these activities.
Resumo:
Fluctuations in oxygen (d18O) and carbon (d13C) isotope values of benthic foraminiferal calcite from the tropical Pacific and Southern Oceans indicate rapid reversals in the dominant mode and direction of the thermohaline circulation during a 1 m.y. interval (71-70 Ma) in the Maastrichtian. At the onset of this change, benthic foraminiferal d18O values increased and were highest in low-latitude Pacific Ocean waters, whereas benthic and planktic foraminiferal d13C values decreased and benthic values were lowest in the Southern Ocean. Subsequently, benthic foraminiferal d18O values in the Indo-Pacific decreased, and benthic and planktic d13C values increased globally. These isotopic patterns suggest that cool intermediate-depth waters, derived from high-latitude regions, penetrated temporarily to the tropics. The low benthic d13C values at the Southern Ocean sites, however, suggest that these cool waters may have been derived from high northern rather than high southern latitudes. Correlation with eustatic sea-level curves suggests that sea-level change was the most likely mechanism to change the circulation and/or source(s) of intermediate-depth waters. We thus propose that oceanic circulation during the latest Cretaceous was vigorous and that competing sources of intermediate- and deep-water formation, linked to changes in climate and sea level, may have alternated in importance.
Resumo:
Differences in regional responses to climate fluctuations are well documented on short time scales (e.g., El Niño-Southern Oscillation), but with the exception of latitudinal temperature gradients, regional patterns are seldom considered in discussions of ancient greenhouse climates. Contrary to the expectation of global warming or global cooling implicit in most treatments of climate evolution over millions of years, this paper shows that the North Atlantic warmed by as much as 6°C (1.5% decrease in d18O values of planktic foraminifera) during the Maastrichtian global cooling interval. We suggest that warming was the result of the importation of heat from the South Atlantic. Decreasing North Atlantic d18O values are also associated with increasing gradients in planktic d13C values, suggesting increasing surface-water stratification and a correlated strengthening of the North Atlantic Polar Front. If correct, this conclusion predicts arctic cooling during the late Maastrichtian. Beyond implications for the Maastrichtian, these data demonstrate that climate does not behave as if there is a simple global thermostat, even on geologic time scales.