993 resultados para data summarization
Resumo:
Developing and maintaining a successful institutional repository for research publications requires a considerable investment by the institution. Most of the money is spent on developing the skill-sets of existing staff or hiring new staff with the necessary skills. The return on this investment can be magnified by using this valuable infrastructure to curate collections of other materials such as learning objects, student work, conference proceedings and institutional or local community heritage materials. When Queensland University of Technology (QUT) implemented its repository for research publications (QUT ePrints) over 11 years ago, it was one of the first institutional repositories to be established in Australia. Currently, the repository holds over 29,000 open access research publications and the cumulative total number of full-text downloads for these document now exceeds 16 million. The full-text deposit rate for recently-published peer reviewed papers (currently over 74%) shows how well the repository has been embraced by QUT researchers. The success of QUT ePrints has resulted in requests to accommodate a plethora of materials which are ‘out of scope’ for this repository. QUT Library saw this as an opportunity to use its repository infrastructure (software, technical know-how and policies) to develop and implement a metadata repository for its research datasets (QUT Research Data Finder), a repository for research-related software (QUT Software Finder) and to curate a number of digital collections of institutional and local community heritage materials (QUT Digital Collections). This poster describes the repositories and digital collections curated by QUT Library and outlines the value delivered to the institution, and the wider community, by these initiatives.
Resumo:
This cross disciplinary study was conducted as two research and development projects. The outcome is a multimodal and dynamic chronicle, which incorporates the tracking of spatial, temporal and visual elements of performative practice-led and design-led research journeys. The distilled model provides a strong new approach to demonstrate rigour in non-traditional research outputs including provenance and an 'augmented web of facticity'.
Resumo:
On 19 June 2015, representatives from over 40 Australian research institutions gathered in Canberra to launch their Open Data Collections. The one day event, hosted by the Australian National Data Service (ANDS), showcased to government and a range of national stakeholders the rich variety of data collections that have been generated through the Major Open Data Collections (MODC) project. Colin Eustace attended the showcase for QUT Library and presented a poster that reflected the work that he and Jodie Vaughan generated through the project. QUT’s Blueprint 4, the University’s five-year institutional strategic plan, outlines the key priorities of developing a commitment to working in partnership with industry, as well as combining disciplinary strengths with interdisciplinary application. The Division of Technology, Information and Learning Support (TILS) has undertaken a number of Australian National Data Service (ANDS) funded projects since 2009 with the aim of developing improved research data management services within the University to support these strategic aims. By leveraging existing tools and systems developed during these projects, the Major Open Data Collection (MODC) project delivered support to multi-disciplinary collaborative research activities through partnership building between QUT researchers and Queensland government agencies, in order to add to and promote the discovery and reuse of a collection of spatially referenced datasets. The MODC project built upon existing Research Data Finder infrastructure (which uses VIVO open source software, developed by Cornell University) to develop a separate collection, Spatial Data Finder (https://researchdatafinder.qut.edu.au/spatial) as the interface to display the spatial data collection. During the course of the project, 62 dataset descriptions were added to Spatial Data Finder, 7 added to Research Data Finder and two added to Software Finder, another separate collection. The project team met with 116 individual researchers and attended 13 school and faculty meetings to promote the MODC project and raise awareness of the Library’s services and resources for research data management.
Resumo:
This paper analyzes the limitations upon the amount of in- domain (NIST SREs) data required for training a probabilistic linear discriminant analysis (PLDA) speaker verification system based on out-domain (Switchboard) total variability subspaces. By limiting the number of speakers, the number of sessions per speaker and the length of active speech per session available in the target domain for PLDA training, we investigated the relative effect of these three parameters on PLDA speaker verification performance in the NIST 2008 and NIST 2010 speaker recognition evaluation datasets. Experimental results indicate that while these parameters depend highly on each other, to beat out-domain PLDA training, more than 10 seconds of active speech should be available for at least 4 sessions/speaker for a minimum of 800 speakers. If further data is available, considerable improvement can be made over solely out-domain PLDA training.
Resumo:
It is important to develop reliable finite element models for real structures not only in the design phase but also for the structural health monitoring and structural maintenance purposes. This paper describes the experience of the authors in using ambient vibration model identification techniques together with model updating tools to develop reliable finite element models of real civil engineering structures. Case studies of two real structures are presented in this paper. One is a 10 storey concrete building which is considered as a non-slender structure with complex boundary conditions. The other is a single span concrete foot bridge which is also a relatively inflexible planar structure with complex boundary conditions. Both structures are located at the Queensland University of Technology (QUT) and equipped with continuous structural health monitoring systems.
Resumo:
This paper proposes the addition of a weighted median Fisher discriminator (WMFD) projection prior to length-normalised Gaussian probabilistic linear discriminant analysis (GPLDA) modelling in order to compensate the additional session variation. In limited microphone data conditions, a linear-weighted approach is introduced to increase the influence of microphone speech dataset. The linear-weighted WMFD-projected GPLDA system shows improvements in EER and DCF values over the pooled LDA- and WMFD-projected GPLDA systems in inter-view-interview condition as WMFD projection extracts more speaker discriminant information with limited number of sessions/ speaker data, and linear-weighted GPLDA approach estimates reliable model parameters with limited microphone data.
Resumo:
Objectives Demonstrate the application of decision trees – classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs) – to understand structure in missing data. Setting Data taken from employees at three different industry sites in Australia. Participants 7915 observations were included. Materials and Methods The approach was evaluated using an occupational health dataset comprising results of questionnaires, medical tests, and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the Type of data (medical or environmental), the site in which it was collected, the number of visits and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusion Researchers are encouraged to use CART and BRT models to explore and understand missing data.
Resumo:
This article examines a series of controversies within the life sciences over data sharing. Part 1 focuses upon the agricultural biotechnology firm Syngenta publishing data on the rice genome in the journal Science, and considers proposals to reform scientific publishing and funding to encourage data sharing. Part 2 examines the relationship between intellectual property rights and scientific publishing, in particular copyright protection of databases, and evaluates the declaration of the Human Genome Organisation that genomic databases should be global public goods. Part 3 looks at varying opinions on the information function of patent law, and then considers the proposals of Patrinos and Drell to provide incentives for private corporations to release data into the public domain.
Resumo:
Combining datasets across independent studies can boost statistical power by increasing the numbers of observations and can achieve more accurate estimates of effect sizes. This is especially important for genetic studies where a large number of observations are required to obtain sufficient power to detect and replicate genetic effects. There is a need to develop and evaluate methods for joint-analytical analyses of rich datasets collected in imaging genetics studies. The ENIGMA-DTI consortium is developing and evaluating approaches for obtaining pooled estimates of heritability through meta-and mega-genetic analytical approaches, to estimate the general additive genetic contributions to the intersubject variance in fractional anisotropy (FA) measured from diffusion tensor imaging (DTI). We used the ENIGMA-DTI data harmonization protocol for uniform processing of DTI data from multiple sites. We evaluated this protocol in five family-based cohorts providing data from a total of 2248 children and adults (ages: 9-85) collected with various imaging protocols. We used the imaging genetics analysis tool, SOLAR-Eclipse, to combine twin and family data from Dutch, Australian and Mexican-American cohorts into one large "mega-family". We showed that heritability estimates may vary from one cohort to another. We used two meta-analytical (the sample-size and standard-error weighted) approaches and a mega-genetic analysis to calculate heritability estimates across-population. We performed leave-one-out analysis of the joint estimates of heritability, removing a different cohort each time to understand the estimate variability. Overall, meta- and mega-genetic analyses of heritability produced robust estimates of heritability.
Resumo:
Heritability of brain anatomical connectivity has been studied with diffusion-weighted imaging (DWI) mainly by modeling each voxel's diffusion pattern as a tensor (e.g., to compute fractional anisotropy), but this method cannot accurately represent the many crossing connections present in the brain. We hypothesized that different brain networks (i.e., their component fibers) might have different heritability and we investigated brain connectivity using High Angular Resolution Diffusion Imaging (HARDI) in a cohort of twins comprising 328 subjects that included 70 pairs of monozygotic and 91 pairs of dizygotic twins. Water diffusion was modeled in each voxel with a Fiber Orientation Distribution (FOD) function to study heritability for multiple fiber orientations in each voxel. Precision was estimated in a test-retest experiment on a sub-cohort of 39 subjects. This was taken into account when computing heritability of FOD peaks using an ACE model on the monozygotic and dizygotic twins. Our results confirmed the overall heritability of the major white matter tracts but also identified differences in heritability between connectivity networks. Inter-hemispheric connections tended to be more heritable than intra-hemispheric and cortico-spinal connections. The highly heritable tracts were found to connect particular cortical regions, such as medial frontal cortices, postcentral, paracentral gyri, and the right hippocampus.
Resumo:
The Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) Consortium is a collaborative network of researchers working together on a range of large-scale studies that integrate data from 70 institutions worldwide. Organized into Working Groups that tackle questions in neuroscience, genetics, and medicine, ENIGMA studies have analyzed neuroimaging data from over 12,826 subjects. In addition, data from 12,171 individuals were provided by the CHARGE consortium for replication of findings, in a total of 24,997 subjects. By meta-analyzing results from many sites, ENIGMA has detected factors that affect the brain that no individual site could detect on its own, and that require larger numbers of subjects than any individual neuroimaging study has currently collected. ENIGMA's first project was a genome-wide association study identifying common variants in the genome associated with hippocampal volume or intracranial volume. Continuing work is exploring genetic associations with subcortical volumes (ENIGMA2) and white matter microstructure (ENIGMA-DTI). Working groups also focus on understanding how schizophrenia, bipolar illness, major depression and attention deficit/hyperactivity disorder (ADHD) affect the brain. We review the current progress of the ENIGMA Consortium, along with challenges and unexpected discoveries made on the way.
Resumo:
The reliance on police data for the counting of road crash injuries can be problematic, as it is well known that not all road crash injuries are reported to police which under-estimates the overall burden of road crash injuries. The aim of this study was to use multiple linked data sources to estimate the extent of under-reporting of road crash injuries to police in the Australian state of Queensland. Data from the Queensland Road Crash Database (QRCD), the Queensland Hospital Admitted Patients Data Collection (QHAPDC), Emergency Department Information System (EDIS), and the Queensland Injury Surveillance Unit (QISU) for the year 2009 were linked. The completeness of road crash cases reported to police was examined via discordance rates between the police data (QRCD) and the hospital data collections. In addition, the potential bias of this discordance (under-reporting) was assessed based on gender, age, road user group, and regional location. Results showed that the level of under-reporting varied depending on the data set with which the police data was compared. When all hospital data collections are examined together the estimated population of road crash injuries was approximately 28,000, with around two-thirds not linking to any record in the police data. The results also showed that the under-reporting was more likely for motorcyclists, cyclists, males, young people, and injuries occurring in Remote and Inner Regional areas. These results have important implications for road safety research and policy in terms of: prioritising funding and resources; targeting road safety interventions into areas of higher risk; and estimating the burden of road crash injuries.
Resumo:
In recent years, rapid advances in information technology have led to various data collection systems which are enriching the sources of empirical data for use in transport systems. Currently, traffic data are collected through various sensors including loop detectors, probe vehicles, cell-phones, Bluetooth, video cameras, remote sensing and public transport smart cards. It has been argued that combining the complementary information from multiple sources will generally result in better accuracy, increased robustness and reduced ambiguity. Despite the fact that there have been substantial advances in data assimilation techniques to reconstruct and predict the traffic state from multiple data sources, such methods are generally data-driven and do not fully utilize the power of traffic models. Furthermore, the existing methods are still limited to freeway networks and are not yet applicable in the urban context due to the enhanced complexity of the flow behavior. The main traffic phenomena on urban links are generally caused by the boundary conditions at intersections, un-signalized or signalized, at which the switching of the traffic lights and the turning maneuvers of the road users lead to shock-wave phenomena that propagate upstream of the intersections. This paper develops a new model-based methodology to build up a real-time traffic prediction model for arterial corridors using data from multiple sources, particularly from loop detectors and partial observations from Bluetooth and GPS devices.
Resumo:
In public transport, seamless coordinated transfer strengthens the quality of service and attracts ridership. The problem of transfer coordination is sophisticated due to (1) the stochasticity of travel time variability, (2) unavailability of passenger transfer plan. However, the proliferation of Big Data technologies provides a tremendous opportunity to solve these problems. This dissertation enhances passenger transfer quality by offline and online transfer coordination. While offline transfer coordination exploits the knowledge of travel time variability to coordinate transfers, online transfer coordination provides simultaneous vehicle arrivals at stops to facilitate transfers by employing the knowledge of passenger behaviours.
Resumo:
The 3D Water Chemistry Atlas is an intuitive, open source, Web-based system that enables the three-dimensional (3D) sub-surface visualization of ground water monitoring data, overlaid on the local geological model (formation and aquifer strata). This paper firstly describes the results of evaluating existing virtual globe technologies, which led to the decision to use the Cesium open source WebGL Virtual Globe and Map Engine as the underlying platform. Next it describes the backend database and search, filtering, browse and analysis tools that were developed to enable users to interactively explore the groundwater monitoring data and interpret it spatially and temporally relative to the local geological formations and aquifers via the Cesium interface. The result is an integrated 3D visualization system that enables environmental managers and regulators to assess groundwater conditions, identify inconsistencies in the data, manage impacts and risks and make more informed decisions about coal seam gas extraction, waste water extraction, and water reuse.