986 resultados para data disclosure
Resumo:
Background Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner. Methods In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns: (a) a gene set, and (b) the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty. Conclusions This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.
Resumo:
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star
Resumo:
The IEEE Subcommittee on the Application of Probability Methods (APM) published the IEEE Reliability Test System (RTS) [1] in 1979. This system provides a consistent and generally acceptable set of data that can be used both in generation capacity and in composite system reliability evaluation [2,3]. The test system provides a basis for the comparison of results obtained by different people using different methods. Prior to its publication, there was no general agreement on either the system or the data that should be used to demonstrate or test various techniques developed to conduct reliability studies. Development of reliability assessment techniques and programs are very dependent on the intent behind the development as the experience of one power utility with their system may be quite different from that of another utility. The development and the utilization of a reliability program are, therefore, greatly influenced by the experience of a utlity and the intent of the system manager, planner and designer conducting the reliability studies. The IEEE-RTS has proved to be extremely valuable in highlighting and comparing the capabilities (or incapabilities) of programs used in reliability studies, the differences in the perception of various power utilities and the differences in the solution techniques. The IEEE-RTS contains a reasonably large power network which can be difficult to use for initial studies in an educational environment.
Resumo:
The IEEE Reliability Test System (RTS) developed by the Application of Probability Method Subcommittee has been used to compare and test a wide range of generating capacity and composite system evaluation techniques and subsequent digital computer programs. A basic reliability test system is presented which has evolved from the reliability education and research programs conducted by the Power System Research Group at the University of Saskatchewan. The basic system data necessary for adequacy evaluation at the generation and composite generation and transmission system levels are presented together with the fundamental data required to conduct reliability-cost/reliability-worth evaluation
Resumo:
Rationale: The Australasian Nutrition Care Day Survey (ANCDS) evaluated if malnutrition and decreased food intake are independent risk factors for negative outcomes in hospitalised patients. Methods: A multicentre (56 hospitals) cross-sectional survey was conducted in two phases. Phase 1 evaluated nutritional status (defined by Subjective Global Assessment) and 24-hour food intake recorded as 0, 25, 50, 75, and 100% intake. Phase 2 data, which included length of stay (LOS), readmissions and mortality, were collected 90 days post-Phase 1. Logistic regression was used to control for confounders: age, gender, disease type and severity (using Patient Clinical Complexity Level scores). Results: Of 3122 participants (53% males, mean age: 65±18 years) 32% were malnourished and 23% consumed�25% of the offered food. Median LOS for malnourished (MN) patients was higher than well-nourished (WN) patients (15 vs. 10 days, p<0.0001). Median LOS for patients consuming �25% of the food was higher than those consuming �50% (13 vs. 11 days, p<0.0001). MN patients had higher readmission rates (36% vs. 30%, p = 0.001). The odds ratios of 90-day in-hospital mortality were 1.8 times greater for MN patients (CI: 1.03 3.22, p = 0.04) and 2.7 times greater for those consuming �25% of the offered food (CI: 1.54 4.68, p = 0.001). Conclusion: The ANCDS demonstrates that malnutrition and/or decreased food intake are associated with longer LOS and readmissions. The survey also establishes that malnutrition and decreased food intake are independent risk factors for in-hospital mortality in acute care patients; and highlights the need for appropriate nutritional screening and support during hospitalisation. Disclosure of Interest: None Declared.
Resumo:
Buildings are key mediators between human activity and the environment around them, but details of energy usage and activity in buildings is often poorly communicated and understood. ECOS is an Eco-Visualization project that aims to contextualize the energy generation and consumption of a green building in a variety of different climates. The ECOS project is being developed for a large public interactive space installed in the new Science and Engineering Centre of the Queensland University of Technology that is dedicated to delivering interactive science education content to the public. This paper focuses on how design can develop ICT solutions from large data sets to create meaningful engagement with environmental data.
Resumo:
QUT’s new metadata repository (data registry), Research Data Finder, has been designed to promote the visibility and discoverability of QUT research datasets. Funded by the Australian National Data Service (ANDS), it will provide a qualitative snapshot of research data outputs created or collected by members of the QUT research community that are available via open or mediated access. As a fully integrated metadata repository Research Data Finder aligns with institutional sources of truth, such as QUT’s research administrative system, ResearchMaster, as well as QUT’s Academic Profiles system to provide high quality data descriptions that increase awareness of, and access to, shareable research data. In addition, the repository and its workflows are designed to foster smoother data management practices, enhance opportunities for collaboration and research, promote cross-disciplinary research and maximize existing research datasets. The metadata schema used in Research Data Finder is the Registry Interchange Format - Collections and Services (RIF-CS), developed by ANDS in 2009. This comprehensive schema is potentially complex for researchers; unlike metadata for publications, which are often made publicly available with the official publication, metadata for datasets are not typically available and need to be created. Research Data Finder uses a hybrid self-deposit and mediated deposit system. In addition to automated ingests from ResearchMaster (research project information) and Academic Profiles system (researcher information), shareable data is identified at a number of key “trigger points” in the research cycle. These include: research grant proposals; ethics applications; Data Management Plans; Liaison Librarian data interviews; and thesis submissions. These ingested records can be supplemented with related metadata including links to related publications, such as those in QUT ePrints. Records deposited in Research Data Finder are harvested by ANDS and made available to a national and international audience via Research Data Australia, ANDS’ discovery service for Australian research data. Researcher and research group metadata records are also harvested by the National Library of Australia (NLA) and these records are then published in Trove (the NLA’s digital information portal). By contributing records to the national infrastructure, QUT data will become more visible. Within Australia and internationally, many funding bodies have already mandated the open access of publications produced from publicly funded research projects, such as those supported by the Australian Research Council (ARC), or the National Health and Medical Research Council (NHMRC). QUT will be well placed to respond to the rapidly evolving climate of research data management. This project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund (EIF) Super Science Initiative.
Resumo:
This paper describes the use of property graphs for mapping data between AEC software tools, which are not linked by common data formats and/or other interoperability measures. The intention of introducing this in practice, education and research is to facilitate the use of diverse, non-integrated design and analysis applications by a variety of users who need to create customised digital workflows, including those who are not expert programmers. Data model types are examined by way of supporting the choice of directed, attributed, multi-relational graphs for such data transformation tasks. A brief exemplar design scenario is also presented to illustrate the concepts and methods proposed, and conclusions are drawn regarding the feasibility of this approach and directions for further research.
Resumo:
Climate change and land use pressures are making environmental monitoring increasingly important. As environmental health is degrading at an alarming rate, ecologists have tried to tackle the problem by monitoring the composition and condition of environment. However, traditional monitoring methods using experts are manual and expensive; to address this issue government organisations designed a simpler and faster surrogate-based assessment technique for consultants, landholders and ordinary citizens. However, it remains complex, subjective and error prone. This makes collected data difficult to interpret and compare. In this paper we describe a work-in-progress mobile application designed to address these shortcomings through the use of augmented reality and multimedia smartphone technology.
Resumo:
Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Data reliability issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. Participants may submit low quality, misleading, inaccurate, or even malicious data. Therefore, finding a way to improve the data reliability has become an urgent demand. This study aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we propose to design a reputation framework to enhance data reliability and also investigate some critical elements that should be aware of during developing and designing new reputation systems.
Resumo:
Open the sports or business section of your daily newspaper, and you are immediately bombarded with an array of graphs, tables, diagrams, and statistical reports that require interpretation. Across all walks of life, the need to understand statistics is fundamental. Given that our youngsters’ future world will be increasingly data laden, scaffolding their statistical understanding and reasoning is imperative, from the early grades on. The National Council of Teachers of Mathematics (NCTM) continues to emphasize the importance of early statistical learning; data analysis and probability was the Council’s professional development “Focus of the Year” for 2007–2008. We need such a focus, especially given the results of the statistics items from the 2003 NAEP. As Shaughnessy (2007) noted, students’ performance was weak on more complex items involving interpretation or application of items of information in graphs and tables. Furthermore, little or no gains were made between the 2000 NAEP and the 2003 NAEP studies. One approach I have taken to promote young children’s statistical reasoning is through data modeling. Having implemented in grades 3 –9 a number of model-eliciting activities involving working with data (e.g., English 2010), I observed how competently children could create their own mathematical ideas and representations—before being instructed how to do so. I thus wished to introduce data-modeling activities to younger children, confi dent that they would likewise generate their own mathematics. I recently implemented data-modeling activities in a cohort of three first-grade classrooms of six year- olds. I report on some of the children’s responses and discuss the components of data modeling the children engaged in.
Resumo:
From human biomonitoring data that are increasingly collected in the United States, Australia, and in other countries from large-scale field studies, we obtain snap-shots of concentration levels of various persistent organic pollutants (POPs) within a cross section of the population at different times. Not only can we observe the trends within this population with time, but we can also gain information going beyond the obvious time trends. By combining the biomonitoring data with pharmacokinetic modeling, we can re-construct the time-variant exposure to individual POPs, determine their intrinsic elimination half-lives in the human body, and predict future levels of POPs in the population. Different approaches have been employed to extract information from human biomonitoring data. Pharmacokinetic (PK) models were combined with longitudinal data1, with single2 or multiple3 average concentrations of a cross-sectional data (CSD), or finally with multiple CSD with or without empirical exposure data4. In the latter study, for the first time, the authors based their modeling outputs on two sets of CSD and empirical exposure data, which made it possible that their model outputs were further constrained due to the extensive body of empirical measurements. Here we use a PK model to analyze recent levels of PBDE concentrations measured in the Australian population. In this study, we are able to base our model results on four sets5-7 of CSD; we focus on two PBDE congeners that have been shown3,5,8-9 to differ in intake rates and half-lives with BDE-47 being associated with high intake rates and a short half-life and BDE-153 with lower intake rates and a longer half-life. By fitting the model to PBDE levels measured in different age groups in different years, we determine the level of intake of BDE-47 and BDE-153, as well as the half-lives of these two chemicals in the Australian population.
Resumo:
Driving on an approach to a signalized intersection while distracted is particularly dangerous, as potential vehicular conflicts and resulting angle collisions tend to be severe. Given the prevalence and importance of this particular scenario, the decisions and actions of distracted drivers during the onset of yellow lights are the focus of this study. Driving simulator data were obtained from a sample of 58 drivers under baseline and handheld mobile phone conditions at the University of Iowa - National Advanced Driving Simulator. Explanatory variables included age, gender, cell phone use, distance to stop-line, and speed. Although there is extensive research on drivers’ responses to yellow traffic signals, the examination has been conducted from a traditional regression-based approach, which does not necessary provide the underlying relations and patterns among the sampled data. In this paper, we exploit the benefits of both classical statistical inference and data mining techniques to identify the a priori relationships among main effects, non-linearities, and interaction effects. Results suggest that novice (16-17 years) and young drivers’ (18-25 years) have heightened yellow light running risk while distracted by a cell phone conversation. Driver experience captured by age has a multiplicative effect with distraction, making the combined effect of being inexperienced and distracted particularly risky. Overall, distracted drivers across most tested groups tend to reduce the propensity of yellow light running as the distance to stop line increases, exhibiting risk compensation on a critical driving situation.
Resumo:
A routine activity for a sports dietitian is to estimate energy and nutrient intake from an athlete's self-reported food intake. Decisions made by the dietitian when coding a food record are a source of variability in the data. The aim of the present study was to determine the variability in estimation of the daily energy and key nutrient intakes of elite athletes, when experienced coders analyzed the same food record using the same database and software package. Seven-day food records from a dietary survey of athletes in the 1996 Australian Olympic team were randomly selected to provide 13 sets of records, each set representing the self-reported food intake of an endurance, team, weight restricted, and sprint/power athlete. Each set was coded by 3-5 members of Sports Dietitians Australia, making a total of 52 athletes, 53 dietitians, and 1456 athlete-days of data. We estimated within- and between- athlete and dietitian variances for each dietary nutrient using mixed modeling, and we combined the variances to express variability as a coefficient of variation (typical variation as a percent of the mean). Variability in the mean of 7-day estimates of a nutrient was 2- to 3-fold less than that of a single day. The variability contributed by the coder was less than the true athlete variability for a 1-day record but was of similar magnitude for a 7-day record. The most variable nutrients (e.g., vitamin C, vitamin A, cholesterol) had approximately 3-fold more variability than least variable nutrients (e.g., energy, carbohydrate, magnesium). These athlete and coder variabilities need to be taken into account in dietary assessment of athletes for counseling and research.
Resumo:
This paper describes an innovative platform that facilitates the collection of objective safety data around occurrences at railway level crossings using data sources including forward-facing video, telemetry from trains and geo-referenced asset and survey data. This platform is being developed with support by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper provides a description of the underlying accident causation model, the development methodology and refinement process as well as a description of the data collection platform. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.