869 resultados para Data Deduplication Compression


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Mount Isa Basin is a new concept used to describe the area of Palaeo- to Mesoproterozoic rocks south of the Murphy Inlier and inappropriately described presently as the Mount Isa Inlier. The new basin concept presented in this thesis allows for the characterisation of basin-wide structural deformation, correlation of mineralisation with particular lithostratigraphic and seismic stratigraphic packages, and the recognition of areas with petroleum exploration potential. The northern depositional margin of the Mount Isa Basin is the metamorphic, intrusive and volcanic complex here referred to as the Murphy Inlier (not the "Murphy Tectonic Ridge"). The eastern, southern and western boundaries of the basin are obscured by younger basins (Carpentaria, Eromanga and Georgina Basins). The Murphy Inlier rocks comprise the seismic basement to the Mount Isa Basin sequence. Evidence for the continuity of the Mount Isa Basin with the McArthur Basin to the northwest and the Willyama Block (Basin) at Broken Hill to the south is presented. These areas combined with several other areas of similar age are believed to have comprised the Carpentarian Superbasin (new term). The application of seismic exploration within Authority to Prospect (ATP) 423P at the northern margin of the basin was critical to the recognition and definition of the Mount Isa Basin. The Mount Isa Basin is structurally analogous to the Palaeozoic Arkoma Basin of Illinois and Arkansas in southern USA but, as with all basins it contains unique characteristics, a function of its individual development history. The Mount Isa Basin evolved in a manner similar to many well described, Phanerozoic plate tectonic driven basins. A full Wilson Cycle is recognised and a plate tectonic model proposed. The northern Mount Isa Basin is defined as the Proterozoic basin area northwest of the Mount Gordon Fault. Deposition in the northern Mount Isa Basin began with a rift sequence of volcaniclastic sediments followed by a passive margin drift phase comprising mostly carbonate rocks. Following the rift and drift phases, major north-south compression produced east-west thrusting in the south of the basin inverting the older sequences. This compression produced an asymmetric epi- or intra-cratonic clastic dominated peripheral foreland basin provenanced in the south and thinning markedly to a stable platform area (the Murphy Inlier) in the north. The fmal major deformation comprised east-west compression producing north-south aligned faults that are particularly prominent at Mount Isa. Potential field studies of the northern Mount Isa Basin, principally using magnetic data (and to a lesser extent gravity data, satellite images and aerial photographs) exhibit remarkable correlation with the reflection seismic data. The potential field data contributed significantly to the unravelling of the northern Mount Isa Basin architecture and deformation. Structurally, the Mount Isa Basin consists of three distinct regions. From the north to the south they are the Bowthorn Block, the Riversleigh Fold Zone and the Cloncurry Orogen (new names). The Bowthom Block, which is located between the Elizabeth Creek Thrust Zone and the Murphy Inlier, consists of an asymmetric wedge of volcanic, carbonate and clastic rocks. It ranges from over 10 000 m stratigraphic thickness in the south to less than 2000 min the north. The Bowthorn Block is relatively undeformed: however, it contains a series of reverse faults trending east-west that are interpreted from seismic data to be down-to-the-north normal faults that have been reactivated as thrusts. The Riversleigh Fold Zone is a folded and faulted region south of the Bowthorn Block, comprising much of the area formerly referred to as the Lawn Hill Platform. The Cloncurry Orogen consists of the area and sequences equivalent to the former Mount Isa Orogen. The name Cloncurry Orogen clearly distinguishes this area from the wider concept of the Mount Isa Basin. The South Nicholson Group and its probable correlatives, the Pilpah Sandstone and Quamby Conglomerate, comprise a later phase of now largely eroded deposits within the Mount Isa Basin. The name South Nicholson Basin is now outmoded as this terminology only applied to the South Nicholson Group unlike the original broader definition in Brown et al. (1968). Cored slimhole stratigraphic and mineral wells drilled by Amoco, Esso, Elf Aquitaine and Carpentaria Exploration prior to 1986, penetrated much of the stratigraphy and intersected both minor oil and gas shows plus excellent potential source rocks. The raw data were reinterpreted and augmented with seismic stratigraphy and source rock data from resampled mineral and petroleum stratigraphic exploration wells for this study. Since 1986, Comalco Aluminium Limited, as operator of a joint venture with Monument Resources Australia Limited and Bridge Oil Limited, recorded approximately 1000 km of reflection seismic data within the basin and drilled one conventional stratigraphic petroleum well, Beamesbrook-1. This work was the first reflection seismic and first conventional petroleum test of the northern Mount Isa Basin. When incorporated into the newly developed foreland basin and maturity models, a grass roots petroleum exploration play was recognised and this led to the present thesis. The Mount Isa Basin was seen to contain excellent source rocks coupled with potential reservoirs and all of the other essential aspects of a conventional petroleum exploration play. This play, although high risk, was commensurate with the enormous and totally untested petroleum potential of the basin. The basin was assessed for hydrocarbons in 1992 with three conventional exploration wells, Desert Creek-1, Argyle Creek-1 and Egilabria-1. These wells also tested and confrrmed the proposed basin model. No commercially viable oil or gas was encountered although evidence of its former existence was found. In addition to the petroleum exploration, indeed as a consequence of it, the association of the extensive base metal and other mineralisation in the Mount Isa Basin with hydrocarbons could not be overlooked. A comprehensive analysis of the available data suggests a link between the migration and possible generation or destruction of hydrocarbons and metal bearing fluids. Consequently, base metal exploration based on hydrocarbon exploration concepts is probably. the most effective technique in such basins. The metal-hydrocarbon-sedimentary basin-plate tectonic association (analogous to Phanerozoic models) is a compelling outcome of this work on the Palaeo- to Mesoproterozoic Mount lsa Basin. Petroleum within the Bowthom Block was apparently destroyed by hot brines that produced many ore deposits elsewhere in the basin.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Forensic imaging has been facing scalability challenges for some time. As disk capacity growth continues to outpace storage IO bandwidth, the demands placed on storage and time are ever increasing. Data reduction and de-duplication technologies are now commonplace in the Enterprise space, and are potentially applicable to forensic acquisition. Using the new AFF4 forensic file format we employ a hash based compression scheme to leverage an existing corpus of images, reducing both acquisition time and storage requirements. This paper additionally describes some of the recent evolution in the AFF4 file format making the efficient implementation of hash based imaging a reality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the context of learning paradigms of identification in the limit, we address the question: why is uncertainty sometimes desirable? We use mind change bounds on the output hypotheses as a measure of uncertainty, and interpret ‘desirable’ as reduction in data memorization, also defined in terms of mind change bounds. The resulting model is closely related to iterative learning with bounded mind change complexity, but the dual use of mind change bounds — for hypotheses and for data — is a key distinctive feature of our approach. We show that situations exists where the more mind changes the learner is willing to accept, the lesser the amount of data it needs to remember in order to converge to the correct hypothesis. We also investigate relationships between our model and learning from good examples, set-driven, monotonic and strong-monotonic learners, as well as class-comprising versus class-preserving learnability.