906 resultados para statistical techniques
Resumo:
This report presents the final deliverable from the project titled Conceptual and statistical framework for a water quality component of an integrated report card’ funded by the Marine and Tropical Sciences Research Facility (MTSRF; Project 3.7.7). The key management driver of this, and a number of other MTSRF projects concerned with indicator development, is the requirement for state and federal government authorities and other stakeholders to provide robust assessments of the present ‘state’ or ‘health’ of regional ecosystems in the Great Barrier Reef (GBR) catchments and adjacent marine waters. An integrated report card format, that encompasses both biophysical and socioeconomic factors, is an appropriate framework through which to deliver these assessments and meet a variety of reporting requirements. It is now well recognised that a ‘report card’ format for environmental reporting is very effective for community and stakeholder communication and engagement, and can be a key driver in galvanising community and political commitment and action. Although a report card it needs to be understandable by all levels of the community, it also needs to be underpinned by sound, quality-assured science. In this regard this project was to develop approaches to address the statistical issues that arise from amalgamation or integration of sets of discrete indicators into a final score or assessment of the state of the system. In brief, the two main issues are (1) selecting, measuring and interpreting specific indicators that vary both in space and time, and (2) integrating a range of indicators in such a way as to provide a succinct but robust overview of the state of the system. Although there is considerable research and knowledge of the use of indicators to inform the management of ecological, social and economic systems, methods on how to best to integrate multiple disparate indicators remain poorly developed. Therefore the objective of this project was to (i) focus on statistical approaches aimed at ensuring that estimates of individual indicators are as robust as possible, and (ii) present methods that can be used to report on the overall state of the system by integrating estimates of individual indicators. It was agreed at the outset, that this project was to focus on developing methods for a water quality report card. This was driven largely by the requirements of Reef Water Quality Protection Plan (RWQPP) and led to strong partner engagement with the Reef Water Quality Partnership.
Resumo:
This research is a step forward in improving the accuracy of detecting anomaly in a data graph representing connectivity between people in an online social network. The proposed hybrid methods are based on fuzzy machine learning techniques utilising different types of structural input features. The methods are presented within a multi-layered framework which provides the full requirements needed for finding anomalies in data graphs generated from online social networks, including data modelling and analysis, labelling, and evaluation.
Resumo:
Impaired driver alertness increases the likelihood of drivers’ making mistakes and reacting too late to unexpected events while driving. This is particularly a concern on monotonous roads, where a driver’s attention can decrease rapidly. While effective countermeasures do not currently exist, the development of in-vehicle sensors opens avenues for monitoring driving behavior in real-time. The aim of this study is to predict drivers’ level of alertness through surrogate measures collected from in-vehicle sensors. Electroencephalographic activity is used as a reference to evaluate alertness. Based on a sample of 25 drivers, data was collected in a driving simulator instrumented with an eye tracking system, a heart rate monitor and an electrodermal activity device. Various classification models were tested from linear regressions to Bayesians and data mining techniques. Results indicated that Neural Networks were the most efficient model in detecting lapses in alertness. Findings also show that reduced alertness can be predicted up to 5 minutes in advance with 90% accuracy, using surrogate measures such as time to line crossing, blink frequency and skin conductance level. Such a method could be used to warn drivers of their alertness level through the development of an in-vehicle device monitoring, in real-time, drivers' behavior on highways.
Resumo:
Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.
Resumo:
Objectives Recent research has shown that machine learning techniques can accurately predict activity classes from accelerometer data in adolescents and adults. The purpose of this study is to develop and test machine learning models for predicting activity type in preschool-aged children. Design Participants completed 12 standardised activity trials (TV, reading, tablet game, quiet play, art, treasure hunt, cleaning up, active game, obstacle course, bicycle riding) over two laboratory visits. Methods Eleven children aged 3–6 years (mean age = 4.8 ± 0.87; 55% girls) completed the activity trials while wearing an ActiGraph GT3X+ accelerometer on the right hip. Activities were categorised into five activity classes: sedentary activities, light activities, moderate to vigorous activities, walking, and running. A standard feed-forward Artificial Neural Network and a Deep Learning Ensemble Network were trained on features in the accelerometer data used in previous investigations (10th, 25th, 50th, 75th and 90th percentiles and the lag-one autocorrelation). Results Overall recognition accuracy for the standard feed forward Artificial Neural Network was 69.7%. Recognition accuracy for sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running was 82%, 79%, 64%, 36% and 46%, respectively. In comparison, overall recognition accuracy for the Deep Learning Ensemble Network was 82.6%. For sedentary activities, light activities and games, moderate-to-vigorous activities, walking, and running recognition accuracy was 84%, 91%, 79%, 73% and 73%, respectively. Conclusions Ensemble machine learning approaches such as Deep Learning Ensemble Network can accurately predict activity type from accelerometer data in preschool children.
Deterrence of drug driving : the impact of the ACT drug driving legislation and detection techniques
Resumo:
Overarching Research Questions Are ACT motorists aware of roadside saliva based drug testing operations? What is the perceived deterrent impact of the operations? What factors are predictive of future intentions to drug drive? What are the differences between key subgroups
Resumo:
Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.
Resumo:
Determining the key variables of transportation disadvantage remains a great challenge as the variables are commonly selected using ad-hoc techniques. In order to identify the variables, this research develops a transportation disadvantage framework by manipulating the capability approach. Developed framework is statistically analysed using partial least square-based software to determine the framework fitness. The statistical analysis identifies mobility and socioeconomic variables that significantly influence transportation disadvantage. The research reveals the key socioeconomic variables for transportation disadvantage in the case of Brisbane, Australia as household structure, presence of dependent family member, vehicle ownership, and driving licence possession.
Resumo:
This thesis proposes three novel models which extend the statistical methodology for motor unit number estimation, a clinical neurology technique. Motor unit number estimation is important in the treatment of degenerative muscular diseases and, potentially, spinal injury. Additionally, a recent and untested statistic to enable statistical model choice is found to be a practical alternative for larger datasets. The existing methods for dose finding in dual-agent clinical trials are found to be suitable only for designs of modest dimensions. The model choice case-study is the first of its kind containing interesting results using so-called unit information prior distributions.
Resumo:
There is a wide range of potential study designs for intervention studies to decrease nosocomial infections in hospitals. The analysis is complex due to competing events, clustering, multiple timescales and time-dependent period and intervention variables. This review considers the popular pre-post quasi-experimental design and compares it with randomized designs. Randomization can be done in several ways: randomization of the cluster [intensive care unit (ICU) or hospital] in a parallel design; randomization of the sequence in a cross-over design; and randomization of the time of intervention in a stepped-wedge design. We introduce each design in the context of nosocomial infections and discuss the designs with respect to the following key points: bias, control for nonintervention factors, and generalizability. Statistical issues are discussed. A pre-post-intervention design is often the only choice that will be informative for a retrospective analysis of an outbreak setting. It can be seen as a pilot study with further, more rigorous designs needed to establish causality. To yield internally valid results, randomization is needed. Generally, the first choice in terms of the internal validity should be a parallel cluster randomized trial. However, generalizability might be stronger in a stepped-wedge design because a wider range of ICU clinicians may be convinced to participate, especially if there are pilot studies with promising results. For analysis, the use of extended competing risk models is recommended.
Resumo:
Bounds on the expectation and variance of errors at the output of a multilayer feedforward neural network with perturbed weights and inputs are derived. It is assumed that errors in weights and inputs to the network are statistically independent and small. The bounds obtained are applicable to both digital and analogue network implementations and are shown to be of practical value.
Resumo:
This paper addresses research from a three-year longitudinal study that engaged children in data modeling experiences from the beginning school year through to third year (6-8 years). A data modeling approach to statistical development differs in several ways from what is typically done in early classroom experiences with data. In particular, data modeling immerses children in problems that evolve from their own questions and reasoning, with core statistical foundations established early. These foundations include a focus on posing and refining statistical questions within and across contexts, structuring and representing data, making informal inferences, and developing conceptual, representational, and metarepresentational competence. Examples are presented of how young learners developed and sustained informal inferential reasoning and metarepresentational competence across the study to become “sophisticated statisticians”.