Biblioteca Digital

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.

Veja mais

New methodology and comparisons for the analysis of binary data using Bayesian and tree based methods

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Hash based disk imaging using AFF4

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Forensic imaging has been facing scalability challenges for some time. As disk capacity growth continues to outpace storage IO bandwidth, the demands placed on storage and time are ever increasing. Data reduction and de-duplication technologies are now commonplace in the Enterprise space, and are potentially applicable to forensic acquisition. Using the new AFF4 forensic file format we employ a hash based compression scheme to leverage an existing corpus of images, reducing both acquisition time and storage requirements. This paper additionally describes some of the recent evolution in the AFF4 file format making the efficient implementation of hash based imaging a reality.

Veja mais

On the data consumption benefits of accepting increased uncertainty

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the context of learning paradigms of identification in the limit, we address the question: why is uncertainty sometimes desirable? We use mind change bounds on the output hypotheses as a measure of uncertainty, and interpret ‘desirable’ as reduction in data memorization, also defined in terms of mind change bounds. The resulting model is closely related to iterative learning with bounded mind change complexity, but the dual use of mind change bounds — for hypotheses and for data — is a key distinctive feature of our approach. We show that situations exists where the more mind changes the learner is willing to accept, the lesser the amount of data it needs to remember in order to converge to the correct hypothesis. We also investigate relationships between our model and learning from good examples, set-driven, monotonic and strong-monotonic learners, as well as class-comprising versus class-preserving learnability.

Veja mais

Acoustic keyword spotting in speech with applications to data mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.

Veja mais

Development and application of a salmonid EST database and cDNA microarray: data mining and interspecific hybridization characteristics

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Differential expression and network inferences through functional data modeling

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Cluster-based network model for time-course gene expression data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.

Veja mais

A secure framework and related protocols for ubiquitous access to electronic health records using Java sim cards

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ubiquitous access to patient medical records is an important aspect of caring for patient safety. Unavailability of sufficient medical information at the point-ofcare could possibly lead to a fatality. The U.S. Institute of Medicine has reported that between 44,000 and 98,000 people die each year due to medical errors, such as incorrect medication dosages, due to poor legibility in manual records, or delays in consolidating needed information to discern the proper intervention. In this research we propose employing emergent technologies such as Java SIM Cards (JSC), Smart Phones (SP), Next Generation Networks (NGN), Near Field Communications (NFC), Public Key Infrastructure (PKI), and Biometric Identification to develop a secure framework and related protocols for ubiquitous access to Electronic Health Records (EHR). A partial EHR contained within a JSC can be used at the point-of-care in order to help quick diagnosis of a patient’s problems. The full EHR can be accessed from an Electronic Health Records Centre (EHRC) when time and network availability permit. Moreover, this framework and related protocols enable patients to give their explicit consent to a doctor to access their personal medical data, by using their Smart Phone, when the doctor needs to see or update the patient’s medical information during an examination. Also our proposed solution would give the power to patients to modify the Access Control List (ACL) related to their EHRs and view their EHRs through their Smart Phone. Currently, very limited research has been done on using JSCs and similar technologies as a portable repository of EHRs or on the specific security issues that are likely to arise when JSCs are used with ubiquitous access to EHRs. Previous research is concerned with using Medicare cards, a kind of Smart Card, as a repository of medical information at the patient point-of-care. However, this imposes some limitations on the patient’s emergency medical care, including the inability to detect the patient’s location, to call and send information to an emergency room automatically, and to interact with the patient in order to get consent. The aim of our framework and related protocols is to overcome these limitations by taking advantage of the SIM card and the technologies mentioned above. Briefly, our framework and related protocols will offer the full benefits of accessing an up-to-date, precise, and comprehensive medical history of a patient, whilst its mobility will provide ubiquitous access to medical and patient information everywhere it is needed. The objective of our framework and related protocols is to automate interactions between patients, healthcare providers and insurance organisations, increase patient safety, improve quality of care, and reduce the costs.

Veja mais

The Conceptual Basis of Personal Information in Australian Privacy Law

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Australian privacy law regulates how government agencies and private sector organisations collect, store and use personal information. A coherent conceptual basis of personal information is an integral requirement of information privacy law as it determines what information is regulated. A 2004 report conducted on behalf of the UK’s Information Commissioner (the 'Booth Report') concluded that there was no coherent definition of personal information currently in operation because different data protection authorities throughout the world conceived the concept of personal information in different ways. The authors adopt the models developed by the Booth Report to examine the conceptual basis of statutory definitions of personal information in Australian privacy laws. Research findings indicate that the definition of personal information is not construed uniformly in Australian privacy laws and that different definitions rely upon different classifications of personal information. A similar situation is evident in a review of relevant case law. Despite this, the authors conclude the article by asserting that a greater jurisprudential discourse is required based on a coherent conceptual framework to ensure the consistent development of Australian privacy law.

Veja mais

The mobile phone as a multi OTP device using trusted computing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid growth in the number of online services leads to an increasing number of different digital identities each user needs to manage. As a result, many people feel overloaded with credentials, which in turn negatively impact their ability to manage them securely. Passwords are perhaps the most common type of credential used today. To avoid the tedious task of remembering difficult passwords, users often behave less securely by using low entropy and weak passwords. Weak passwords and bad password habits represent security threats to online services. Some solutions have been developed to eliminate the need for users to create and manage passwords. A typical solution is based on giving the user a hardware token that generates one-time-passwords, i.e. passwords for single session or transaction usage. Unfortunately, most of these solutions do not satisfy scalability and/or usability requirements, or they are simply insecure. In this paper, we propose a scalable OTP solution using mobile phones and based on trusted computing technology that combines enhanced usability with strong security.

Veja mais

548 resultados para Data security

Filtro por publicador