Biblioteca Digital

154 resultados para Statistical Robustness

New variational Bayesian approaches for statistical data mining : with applications to profiling and differentiating habitual consumption behaviour of customers in the wireless telecommunication industry

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Development of the rural statistical sustainability framework tool

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is important to promote a sustainable development approach to ensure that economic, environmental and social developments are maintained in balance. Sustainable development and its implications are not just a global concern, it also affects Australia. In particular, rural Australian communities are facing various economic, environmental and social challenges. Thus, the need for sustainable development in rural regions is becoming increasingly important. To promote sustainable development, proper frameworks along with the associated tools optimised for the specific regions, need to be developed. This will ensure that the decisions made for sustainable development are evidence based, instead of subjective opinions. To address these issues, Queensland University of Technology (QUT), through an Australian Research Council (ARC) linkage grant, has initiated research into the development of a Rural Statistical Sustainability Framework (RSSF) to aid sustainable decision making in rural Queensland. This particular branch of the research developed a decision support tool that will become the integrating component of the RSSF. This tool is developed on the web-based platform to allow easy dissemination, quick maintenance and to minimise compatibility issues. The tool is developed based on MapGuide Open Source and it follows the three-tier architecture: Client tier, Web tier and the Server tier. The developed tool is interactive and behaves similar to a familiar desktop-based application. It has the capability to handle and display vector-based spatial data and can give further visual outputs using charts and tables. The data used in this tool is obtained from the QUT research team. Overall the tool implements four tasks to help in the decision-making process. These are the Locality Classification, Trend Display, Impact Assessment and Data Entry and Update. The developed tool utilises open source and freely available software and accounts for easy extensibility and long-term sustainability.

Instantaneous phase of the autocorrelation and robustness to signal-dependent noise

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The phase of an analytic signal constructed from the autocorrelation function of a signal contains significant information about the shape of the signal. Using Bedrosian's (1963) theorem for the Hilbert transform it is proved that this phase is robust to multiplicative noise if the signal is baseband and the spectra of the signal and the noise do not overlap. Higher-order spectral features are interpreted in this context and shown to extract nonlinear phase information while retaining robustness. The significance of the result is that prior knowledge of the spectra is not required.

Prediction of structural integrity, robustness and service life using advanced finite element methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Corrosion is a common phenomenon and critical aspects of steel structural application. It affects the daily design, inspection and maintenance in structural engineering, especially for the heavy and complex industrial applications, where the steel structures are subjected to hash corrosive environments in combination of high working stress condition and often in open field and/or under high temperature production environments. In the paper, it presents the actual engineering application of advanced finite element methods in the predication of the structural integrity and robustness at a designed service life for the furnaces of alumina production, which was operated in the high temperature, corrosive environments and rotating with high working stress condition.

Using Bayesian methods for the estimation of uncertainty in complex statistical models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The research objectives of this thesis were to contribute to Bayesian statistical methodology by contributing to risk assessment statistical methodology, and to spatial and spatio-temporal methodology, by modelling error structures using complex hierarchical models. Specifically, I hoped to consider two applied areas, and use these applications as a springboard for developing new statistical methods as well as undertaking analyses which might give answers to particular applied questions. Thus, this thesis considers a series of models, firstly in the context of risk assessments for recycled water, and secondly in the context of water usage by crops. The research objective was to model error structures using hierarchical models in two problems, namely risk assessment analyses for wastewater, and secondly, in a four dimensional dataset, assessing differences between cropping systems over time and over three spatial dimensions. The aim was to use the simplicity and insight afforded by Bayesian networks to develop appropriate models for risk scenarios, and again to use Bayesian hierarchical models to explore the necessarily complex modelling of four dimensional agricultural data. The specific objectives of the research were to develop a method for the calculation of credible intervals for the point estimates of Bayesian networks; to develop a model structure to incorporate all the experimental uncertainty associated with various constants thereby allowing the calculation of more credible credible intervals for a risk assessment; to model a single day’s data from the agricultural dataset which satisfactorily captured the complexities of the data; to build a model for several days’ data, in order to consider how the full data might be modelled; and finally to build a model for the full four dimensional dataset and to consider the timevarying nature of the contrast of interest, having satisfactorily accounted for possible spatial and temporal autocorrelations. This work forms five papers, two of which have been published, with two submitted, and the final paper still in draft. The first two objectives were met by recasting the risk assessments as directed, acyclic graphs (DAGs). In the first case, we elicited uncertainty for the conditional probabilities needed by the Bayesian net, incorporated these into a corresponding DAG, and used Markov chain Monte Carlo (MCMC) to find credible intervals, for all the scenarios and outcomes of interest. In the second case, we incorporated the experimental data underlying the risk assessment constants into the DAG, and also treated some of that data as needing to be modelled as an ‘errors-invariables’ problem [Fuller, 1987]. This illustrated a simple method for the incorporation of experimental error into risk assessments. In considering one day of the three-dimensional agricultural data, it became clear that geostatistical models or conditional autoregressive (CAR) models over the three dimensions were not the best way to approach the data. Instead CAR models are used with neighbours only in the same depth layer. This gave flexibility to the model, allowing both the spatially structured and non-structured variances to differ at all depths. We call this model the CAR layered model. Given the experimental design, the fixed part of the model could have been modelled as a set of means by treatment and by depth, but doing so allows little insight into how the treatment effects vary with depth. Hence, a number of essentially non-parametric approaches were taken to see the effects of depth on treatment, with the model of choice incorporating an errors-in-variables approach for depth in addition to a non-parametric smooth. The statistical contribution here was the introduction of the CAR layered model, the applied contribution the analysis of moisture over depth and estimation of the contrast of interest together with its credible intervals. These models were fitted using WinBUGS [Lunn et al., 2000]. The work in the fifth paper deals with the fact that with large datasets, the use of WinBUGS becomes more problematic because of its highly correlated term by term updating. In this work, we introduce a Gibbs sampler with block updating for the CAR layered model. The Gibbs sampler was implemented by Chris Strickland using pyMCMC [Strickland, 2010]. This framework is then used to consider five days data, and we show that moisture in the soil for all the various treatments reaches levels particular to each treatment at a depth of 200 cm and thereafter stays constant, albeit with increasing variances with depth. In an analysis across three spatial dimensions and across time, there are many interactions of time and the spatial dimensions to be considered. Hence, we chose to use a daily model and to repeat the analysis at all time points, effectively creating an interaction model of time by the daily model. Such an approach allows great flexibility. However, this approach does not allow insight into the way in which the parameter of interest varies over time. Hence, a two-stage approach was also used, with estimates from the first-stage being analysed as a set of time series. We see this spatio-temporal interaction model as being a useful approach to data measured across three spatial dimensions and time, since it does not assume additivity of the random spatial or temporal effects.

Non-invasive, quantitative analysis of drug mixtures in containers using spatially offset Raman spectroscopy (SORS) and multivariate statistical analysis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, spatially offset Raman spectroscopy (SORS) is demonstrated for non-invasively investigating the composition of drug mixtures inside an opaque plastic container. The mixtures consisted of three components including a target drug (acetaminophen or phenylephrine hydrochloride) and two diluents (glucose and caffeine). The target drug concentrations ranged from 5% to 100%. After conducting SORS analysis to ascertain the Raman spectra of the concealed mixtures, principal component analysis (PCA) was performed on the SORS spectra to reveal trends within the data. Partial least squares (PLS) regression was used to construct models that predicted the concentration of each target drug, in the presence of the other two diluents. The PLS models were able to predict the concentration of acetaminophen in the validation samples with a root-mean-square error of prediction (RMSEP) of 3.8% and the concentration of phenylephrine hydrochloride with an RMSEP of 4.6%. This work demonstrates the potential of SORS, used in conjunction with multivariate statistical techniques, to perform non-invasive, quantitative analysis on mixtures inside opaque containers. This has applications for pharmaceutical analysis, such as monitoring the degradation of pharmaceutical products on the shelf, in forensic investigations of counterfeit drugs, and for the analysis of illicit drug mixtures which may contain multiple components.

On the robustness and security of digital image watermarking

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In most of the digital image watermarking schemes, it becomes a common practice to address security in terms of robustness, which is basically a norm in cryptography. Such consideration in developing and evaluation of a watermarking scheme may severely affect the performance and render the scheme ultimately unusable. This paper provides an explicit theoretical analysis towards watermarking security and robustness in figuring out the exact problem status from the literature. With the necessary hypotheses and analyses from technical perspective, we demonstrate the fundamental realization of the problem. Finally, some necessary recommendations are made for complete assessment of watermarking security and robustness.

Statistical eye model for normal eyes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose. To create a binocular statistical eye model based on previously measured ocular biometric data. Methods. Thirty-nine parameters were determined for a group of 127 healthy subjects (37 male, 90 female; 96.8% Caucasian) with an average age of 39.9 ± 12.2 years and spherical equivalent refraction of −0.98 ± 1.77 D. These parameters described the biometry of both eyes and the subjects' age. Missing parameters were complemented by data from a previously published study. After confirmation of the Gaussian shape of their distributions, these parameters were used to calculate their mean and covariance matrices. These matrices were then used to calculate a multivariate Gaussian distribution. From this, an amount of random biometric data could be generated, which were then randomly selected to create a realistic population of random eyes. Results. All parameters had Gaussian distributions, with the exception of the parameters that describe total refraction (i.e., three parameters per eye). After these non-Gaussian parameters were omitted from the model, the generated data were found to be statistically indistinguishable from the original data for the remaining 33 parameters (TOST [two one-sided t tests]; P < 0.01). Parameters derived from the generated data were also significantly indistinguishable from those calculated with the original data (P > 0.05). The only exception to this was the lens refractive index, for which the generated data had a significantly larger SD. Conclusions. A statistical eye model can describe the biometric variations found in a population and is a useful addition to the classic eye models.

Three-dimensional geological modelling and multivariate statistical analysis of water chemistry data to analyse and visualise aquifer structure and groundwater composition in the Wairau Plain, Marlborough District, New Zealand

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Concerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques(e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.

Statistical modelling of wind effects on signal propagation for wireless sensor networks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A wireless sensor network system must have the ability to tolerate harsh environmental conditions and reduce communication failures. In a typical outdoor situation, the presence of wind can introduce movement in the foliage. This motion of vegetation structures causes large and rapid signal fading in the communication link and must be accounted for when deploying a wireless sensor network system in such conditions. This thesis examines the fading characteristics experienced by wireless sensor nodes due to the effect of varying wind speed in a foliage obstructed transmission path. It presents extensive measurement campaigns at two locations with the approach of a typical wireless sensor networks configuration. The significance of this research lies in the varied approaches of its different experiments, involving a variety of vegetation types, scenarios and the use of different polarisations (vertical and horizontal). Non–line of sight (NLoS) scenario conditions investigate the wind effect based on different vegetation densities including that of the Acacia tree, Dogbane tree and tall grass. Whereas the line of sight (LoS) scenario investigates the effect of wind when the grass is swaying and affecting the ground-reflected component of the signal. Vegetation type and scenarios are envisaged to simulate real life working conditions of wireless sensor network systems in outdoor foliated environments. The results from the measurements are presented in statistical models involving first and second order statistics. We found that in most of the cases, the fading amplitude could be approximated by both Lognormal and Nakagami distribution, whose m parameter was found to depend on received power fluctuations. Lognormal distribution is known as the result of slow fading characteristics due to shadowing. This study concludes that fading caused by variations in received power due to wind in wireless sensor networks systems are found to be insignificant. There is no notable difference in Nakagami m values for low, calm, and windy wind speed categories. It is also shown in the second order analysis, the duration of the deep fades are very short, 0.1 second for 10 dB attenuation below RMS level for vertical polarization and 0.01 second for 10 dB attenuation below RMS level for horizontal polarization. Another key finding is that the received signal strength for horizontal polarisation demonstrates more than 3 dB better performances than the vertical polarisation for LoS and near LoS (thin vegetation) conditions and up to 10 dB better for denser vegetation conditions.

Statistical phylogeographic tests of competing 'Lake Carpentaria hypotheses' in the mouth-brooding freshwater fish, Glossamia aprion (Apogonidae)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Glacial cycles during the Pleistocene reduced sea levels and created new land connections in northern Australia, where many currently isolated rivers also became connected via an extensive paleo-lake system, 'Lake Carpentaria'. However, the most recent period during which populations of freshwater species were connected by gene flow across Lake Carpentaria is debated: various 'Lake Carpentaria hypotheses' have been proposed. Here, we used a statistical phylogeographic approach to assess the timing of past population connectivity across the Carpentaria region in the obligate freshwater fish, Glossamia aprion. Results for this species indicate that the most recent period of genetic exchange across the Carpentaria region coincided with the mid- to late Pleistocene, a result shown previously for other freshwater and diadromous species. Based on these findings and published studies for various freshwater, diadromous and marine species, we propose a set of 'Lake Carpentaria' hypotheses to explain past population connectivity in aquatic species: (1) strictly freshwater species had widespread gene flow in the mid- to late Pleistocene before the last glacial maximum; (2) marine species were subdivided into eastern and western populations by land during Pleistocene glacial phases; and (3) past connectivity in diadromous species reflects the relative strength of their marine affinity.

Statistical analysis of conflict involvements in port water navigation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With increasing rate of shipping traffic, the risk of collisions in busy and congested port waters is likely to rise. However, due to low collision frequencies in port waters, it is difficult to analyze such risk in a sound statistical manner. A convenient approach of investigating navigational collision risk is the application of the traffic conflict techniques, which have potential to overcome the difficulty of obtaining statistical soundness. This study aims at examining port water conflicts in order to understand the characteristics of collision risk with regard to vessels involved, conflict locations, traffic and kinematic conditions. A hierarchical binomial logit model, which considers the potential correlations between observation-units, i.e., vessels, involved in the same conflicts, is employed to evaluate the association of explanatory variables with conflict severity levels. Results show higher likelihood of serious conflicts for vessels of small gross tonnage or small overall length. The probability of serious conflict also increases at locations where vessels have more varied headings, such as traffic intersections and anchorages; becoming more critical at night time. Findings from this research should assist both navigators operating in port waters as well as port authorities overseeing navigational management.

Performance monitoring in interventional cardiology: Application of statistical process control to a single-site database

Relevância:

20.00% 20.00%

Publicador:

A comparison of methods for classifying clinical samples based on proteomics data : a case study for statistical and machine learning approaches

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.

A review of current statistical methodologies for in storage sampling and surveillance in the grains industry

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Effective, statistically robust sampling and surveillance strategies form an integral component of large agricultural industries such as the grains industry. Intensive in-storage sampling is essential for pest detection, Integrated Pest Management (IPM), to determine grain quality and to satisfy importing nation’s biosecurity concerns, while surveillance over broad geographic regions ensures that biosecurity risks can be excluded, monitored, eradicated or contained within an area. In the grains industry, a number of qualitative and quantitative methodologies for surveillance and in-storage sampling have been considered. Primarily, research has focussed on developing statistical methodologies for in storage sampling strategies concentrating on detection of pest insects within a grain bulk, however, the need for effective and statistically defensible surveillance strategies has also been recognised. Interestingly, although surveillance and in storage sampling have typically been considered independently, many techniques and concepts are common between the two fields of research. This review aims to consider the development of statistically based in storage sampling and surveillance strategies and to identify methods that may be useful for both surveillance and in storage sampling. We discuss the utility of new quantitative and qualitative approaches, such as Bayesian statistics, fault trees and more traditional probabilistic methods and show how these methods may be used in both surveillance and in storage sampling systems.

«
1
2
3
4
5
6
7
8
9
10
11
»