905 resultados para Challenge posed by omics data to compositional analysis-paucity of independent samples (n)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies the missing covariate problem which is often encountered in survival analysis. Three covariate imputation methods are employed in the study, and the effectiveness of each method is evaluated within the hazard prediction framework. Data from a typical engineering asset is used in the case study. Covariate values in some time steps are deliberately discarded to generate an incomplete covariate set. It is found that although the mean imputation method is simpler than others for solving missing covariate problems, the results calculated by it can differ largely from the real values of the missing covariates. This study also shows that in general, results obtained from the regression method are more accurate than those of the mean imputation method but at the cost of a higher computational expensive. Gaussian Mixture Model (GMM) method is found to be the most effective method within these three in terms of both computation efficiency and predication accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Baseline monitoring of groundwater quality aims to characterize the ambient condition of the resource and identify spatial or temporal trends. Sites comprising any baseline monitoring network must be selected to provide a representative perspective of groundwater quality across the aquifer(s) of interest. Hierarchical cluster analysis (HCA) has been used as a means of assessing the representativeness of a groundwater quality monitoring network, using example datasets from New Zealand. HCA allows New Zealand's national and regional monitoring networks to be compared in terms of the number of water-quality categories identified in each network, the hydrochemistry at the centroids of these water-quality categories, the proportions of monitoring sites assigned to each water-quality category, and the range of concentrations for each analyte within each water-quality category. Through the HCA approach, the National Groundwater Monitoring Programme (117 sites) is shown to provide a highly representative perspective of groundwater quality across New Zealand, relative to the amalgamated regional monitoring networks operated by 15 different regional authorities (680 sites have sufficient data for inclusion in HCA). This methodology can be applied to evaluate the representativeness of any subset of monitoring sites taken from a larger network.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining techniques extract repeated and useful patterns from a large data set that in turn are utilized to predict the outcome of future events. The main purpose of the research presented in this paper is to investigate data mining strategies and develop an efficient framework for multi-attribute project information analysis to predict the performance of construction projects. The research team first reviewed existing data mining algorithms, applied them to systematically analyze a large project data set collected by the survey, and finally proposed a data-mining-based decision support framework for project performance prediction. To evaluate the potential of the framework, a case study was conducted using data collected from 139 capital projects and analyzed the relationship between use of information technology and project cost performance. The study results showed that the proposed framework has potential to promote fast, easy to use, interpretable, and accurate project data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Maintenance Test Section Survey (MTSS) was conducted as part of a Peer State Review of the Texas Maintenance Program conducted October 5–7, 2010. The purpose of the MTSS was to conduct a field review of 34 highway test sections and obtain participants’ opinions about pavement, roadside, and maintenance conditions. The goal was to cross reference or benchmark TxDOT’s maintenance practices based on practices used by selected peer states. Representatives from six peer states (California, Georgia, Kansas, Missouri, North Carolina, and Washington) were invited to Austin to attend a 3-day Peer State Review of TxDOT Maintenance Practices Workshop and to participate in a field survey of a number of pre-selected one-mile roadway sections. It should be emphasized that the objective of the survey was not to evaluate and grade or score TxDOT’s road network but rather to determine whether the selected roadway sections met acceptable standards of service as perceived by Directors of Maintenance or senior maintenance managers from the peer states...

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trivium is a bit-based stream cipher in the final portfolio of the eSTREAM project. In this paper, we apply the algebraic attack approach of Berbain et al. to Trivium-like ciphers and perform new analyses on them. We demonstrate a new algebraic attack on Bivium-A. This attack requires less time and memory than previous techniques to recover Bivium-A's initial state. Though our attacks on Bivium-B, Trivium and Trivium-N are worse than exhaustive keysearch, the systems of equations which are constructed are smaller and less complex compared to previous algebraic analyses. We also answer an open question posed by Berbain et al. on the feasibility of applying their technique on Trivium-like ciphers. Factors which can affect the complexity of our attack on Trivium-like ciphers are discussed in detail. Analysis of Bivium-B and Trivium-N are omitted from this manuscript. The full paper is available on the IACR ePrint Archive.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The implementation of the Australian Consumer Law in 2011 highlighted the need for better use of injury data to improve the effectiveness and responsiveness of product safety (PS) initiatives. In the PS system, resources are allocated to different priority issues using risk assessment tools. The rapid exchange of information (RAPEX) tool to prioritise hazards, developed by the European Commission, is currently being adopted in Australia. Injury data is required as a basic input to the RAPEX tool in the risk assessment process. One of the challenges in utilising injury data in the PS system is the complexity of translating detailed clinical coded data into broad categories such as those used in the RAPEX tool. Aims This study aims to translate hospital burns data into a simplified format by mapping the International Statistical Classification of Disease and Related Health Problems (Tenth Revision) Australian Modification (ICD-10-AM) burn codes into RAPEX severity rankings, using these rankings to identify priority areas in childhood product-related burns data. Methods ICD-10-AM burn codes were mapped into four levels of severity using the RAPEX guide table by assigning rankings from 1-4, in order of increasing severity. RAPEX rankings were determined by the thickness and surface area of the burn (BSA) with information extracted from the fourth character of T20-T30 codes for burn thickness, and the fourth and fifth characters of T31 codes for the BSA. Following the mapping process, secondary data analysis of 2008-2010 Queensland Hospital Admitted Patient Data Collection (QHAPDC) paediatric data was conducted to identify priority areas in product-related burns. Results The application of RAPEX rankings in QHAPDC burn data showed approximately 70% of paediatric burns in Queensland hospitals were categorised under RAPEX levels 1 and 2, 25% under RAPEX 3 and 4, with the remaining 5% unclassifiable. In the PS system, prioritisations are made to issues categorised under RAPEX levels 3 and 4. Analysis of external cause codes within these levels showed that flammable materials (for children aged 10-15yo) and hot substances (for children aged <2yo) were the most frequently identified products. Discussion and conclusions The mapping of ICD-10-AM burn codes into RAPEX rankings showed a favourable degree of compatibility between both classification systems, suggesting that ICD-10-AM coded burn data can be simplified to more effectively support PS initiatives. Additionally, the secondary data analysis showed that only 25% of all admitted burn cases in Queensland were severe enough to trigger a PS response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background The implementation of the Australian Consumer Law in 2011 highlighted the need for better use of injury data to improve the effectiveness and responsiveness of product safety (PS) initiatives. In the PS system, resources are allocated to different priority issues using risk assessment tools. The rapid exchange of information (RAPEX) tool to prioritise hazards, developed by the European Commission, is currently being adopted in Australia. Injury data is required as a basic input to the RAPEX tool in the risk assessment process. One of the challenges in utilising injury data in the PS system is the complexity of translating detailed clinical coded data into broad categories such as those used in the RAPEX tool. Aims This study aims to translate hospital burns data into a simplified format by mapping the International Statistical Classification of Disease and Related Health Problems (Tenth Revision) Australian Modification (ICD-10-AM) burn codes into RAPEX severity rankings, using these rankings to identify priority areas in childhood product-related burns data. Methods ICD-10-AM burn codes were mapped into four levels of severity using the RAPEX guide table by assigning rankings from 1-4, in order of increasing severity. RAPEX rankings were determined by the thickness and surface area of the burn (BSA) with information extracted from the fourth character of T20-T30 codes for burn thickness, and the fourth and fifth characters of T31 codes for the BSA. Following the mapping process, secondary data analysis of 2008-2010 Queensland Hospital Admitted Patient Data Collection (QHAPDC) paediatric data was conducted to identify priority areas in product-related burns. Results The application of RAPEX rankings in QHAPDC burn data showed approximately 70% of paediatric burns in Queensland hospitals were categorised under RAPEX levels 1 and 2, 25% under RAPEX 3 and 4, with the remaining 5% unclassifiable. In the PS system, prioritisations are made to issues categorised under RAPEX levels 3 and 4. Analysis of external cause codes within these levels showed that flammable materials (for children aged 10-15yo) and hot substances (for children aged <2yo) were the most frequently identified products. Discussion and conclusions The mapping of ICD-10-AM burn codes into RAPEX rankings showed a favourable degree of compatibility between both classification systems, suggesting that ICD-10-AM coded burn data can be simplified to more effectively support PS initiatives. Additionally, the secondary data analysis showed that only 25% of all admitted burn cases in Queensland were severe enough to trigger a PS response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The method of generalized estimating equations (GEE) is a popular tool for analysing longitudinal (panel) data. Often, the covariates collected are time-dependent in nature, for example, age, relapse status, monthly income. When using GEE to analyse longitudinal data with time-dependent covariates, crucial assumptions about the covariates are necessary for valid inferences to be drawn. When those assumptions do not hold or cannot be verified, Pepe and Anderson (1994, Communications in Statistics, Simulations and Computation 23, 939–951) advocated using an independence working correlation assumption in the GEE model as a robust approach. However, using GEE with the independence correlation assumption may lead to significant efficiency loss (Fitzmaurice, 1995, Biometrics 51, 309–317). In this article, we propose a method that extracts additional information from the estimating equations that are excluded by the independence assumption. The method always includes the estimating equations under the independence assumption and the contribution from the remaining estimating equations is weighted according to the likelihood of each equation being a consistent estimating equation and the information it carries. We apply the method to a longitudinal study of the health of a group of Filipino children.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this research is to analyse the problems for occupational health and safety (OHS)regulators posed by agency work/leased labour (also known as labour hire in Australasia), using Australian evidence. The analysis is based on an examination of prosecutions involving labour hire firms along with other documentary records (union, industry and government reports and guidance material). The study also draws on interviews with approximately 200 regulatory officials, employers and union representatives since 2001 and workplace visits with 40 OHS inspectors in 2004‐2005.The triangular relationship entailed in labour leasing, in combination with the temporary nature of most placements, poses serious problems for government agencies in terms of enforcing OHS standards notwithstanding a growing number of successful prosecutions for breaches of legislative duties by host and labour leasing firms. Research to investigate these issues in other countries and compare findings with those for Australia is required, along with assessing the effectiveness of new enforcement initiatives. The paper assesses existing regulatory responses and highlights the need for new regulatory strategies to combat the problems posed by labour. The OHS problems posed by agency work have received comparatively little attention. The paper provides insights into the specific problems posed for OHS regulators and how inspectorates are trying to address them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research project provides a scientifically robust approach for assessing the resilience of water supply systems, which are critical infrastructure, to impacts of climate change and population growth. An approach for the identification of trigger points that allows timely and appropriate management actions to be taken to avoid catastrophic system failure is an important outcome of this project. In the current absence of a formal method to evaluate the resilience of a water supply system, the approach developed in this study was based on the characterisation of resilience of a water supply system to a range of surrogate measures. Accordingly, a set of indicators are proposed to evaluate system behaviour and logistic regression analysis was used to assess system behaviour under predicted rainfall, storage and demand conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Road traffic crashes are an alarming public health issue in Oman, despite ongoing improvements in traffic law enforcement practices and technology. One of the main target groups for road safety in Oman are young drivers aged 17-25 years. This report provides an overview of the characteristics of crashes in Oman involving young drivers (17-25 years) between 1st January 2009 and 31st December 2011. Although, young drivers aged 17-25 years comprise around 17% of all licence holders in Oman, they represented more than one third of all drivers involved in road traffic crashes in the country. A total of 11,101 young drivers (17-25 years) were involved in registered crashes during the study period. From this, 7,727 young drivers (69.6%) were found to be the cause of the crashes...

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. 'Population-based linkage analysis' (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10-6). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority. © 2013 Lin et al.