17 resultados para Capture-recapture Data
em Aston University Research Archive
Resumo:
This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.
Resumo:
We present an algorithm and the associated single-view capture methodology to acquire the detailed 3D shape, bends, and wrinkles of deforming surfaces. Moving 3D data has been difficult to obtain by methods that rely on known surface features, structured light, or silhouettes. Multispectral photometric stereo is an attractive alternative because it can recover a dense normal field from an untextured surface. We show how to capture such data, which in turn allows us to demonstrate the strengths and limitations of our simple frame-to-frame registration over time. Experiments were performed on monocular video sequences of untextured cloth and faces with and without white makeup. Subjects were filmed under spatially separated red, green, and blue lights. Our first finding is that the color photometric stereo setup is able to produce smoothly varying per-frame reconstructions with high detail. Second, when these 3D reconstructions are augmented with 2D tracking results, one can register both the surfaces and relax the homogenous-color restriction of the single-hue subject. Quantitative and qualitative experiments explore both the practicality and limitations of this simple multispectral capture system.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY WITH PRIOR ARRANGEMENT
Resumo:
This research is investigating the claim that Change Data Capture (CDC) technologies capture data changes in real-time. Based on theory, our hypothesis states that real-time CDC is not achievable with traditional approaches (log scanning, triggers and timestamps). Traditional approaches to CDC require a resource to be polled, which prevents true real-time CDC. We propose an approach to CDC that encapsulates the data source with a set of web services. These web services will propagate the changes to the targets and eliminate the need for polling. Additionally we propose a framework for CDC technologies that allow changes to flow from source to target. This paper discusses current CDC technologies and presents the theory about why they are unable to deliver changes in real-time. Following, we discuss our web service approach to CDC and accompanying framework, explaining how they can produce real-time CDC. The paper concludes with a discussion on the research required to investigate the real-time capabilities of CDC technologies. © 2010 IEEE.
Resumo:
Hierarchical visualization systems are desirable because a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex high-dimensional data sets. We extend an existing locally linear hierarchical visualization system PhiVis [1] in several directions: bf(1) we allow for em non-linear projection manifolds (the basic building block is the Generative Topographic Mapping -- GTM), bf(2) we introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree, bf(3) we describe folding patterns of low-dimensional projection manifold in high-dimensional data space by computing and visualizing the manifold's local directional curvatures. Quantities such as magnification factors [3] and directional curvatures are helpful for understanding the layout of the nonlinear projection manifold in the data space and for further refinement of the hierarchical visualization plot. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. We demonstrate the visualization system principle of the approach on a complex 12-dimensional data set and mention possible applications in the pharmaceutical industry.
Resumo:
Exploratory analysis of data in all sciences seeks to find common patterns to gain insights into the structure and distribution of the data. Typically visualisation methods like principal components analysis are used but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this technical report we discuss a complementary approach based on a non-linear probabilistic model. The generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate far more structure than a two dimensional principal components plot could, and deal at the same time with missing data. We show that using the generative topographic mapping provides us with an optimal method to explore the data while being able to replace missing values in a dataset, particularly where a large proportion of the data is missing.
Resumo:
In this paper we propose a data envelopment analysis (DEA) based method for assessing the comparative efficiencies of units operating production processes where input-output levels are inter-temporally dependent. One cause of inter-temporal dependence between input and output levels is capital stock which influences output levels over many production periods. Such units cannot be assessed by traditional or 'static' DEA which assumes input-output correspondences are contemporaneous in the sense that the output levels observed in a time period are the product solely of the input levels observed during that same period. The method developed in the paper overcomes the problem of inter-temporal input-output dependence by using input-output 'paths' mapped out by operating units over time as the basis of assessing them. As an application we compare the results of the dynamic and static model for a set of UK universities. The paper is suggested that dynamic model capture the efficiency better than static model. © 2003 Elsevier Inc. All rights reserved.
Resumo:
Few works address methodological issues of how to conduct strategy-as-practice research and even fewer focus on how to analyse the subsequent data in ways that illuminate strategy as an everyday, social practice. We address this gap by proposing a quantitative method for analysing observational data, which can complement more traditional qualitative methodologies. We propose that rigorous but context-sensitive coding of transcripts can render everyday practice analysable statistically. Such statistical analysis provides a means for analytically representing patterns and shifts within the mundane, repetitive elements through which practice is accomplished. We call this approach the Event Database (EDB) and it consists of five basic coding categories that help us capture the stream of practice. Indexing codes help to index or categorise the data, in order to give context and offer some basic information about the event under discussion. Indexing codes are descriptive codes, which allow us to catalogue and classify events according to their assigned characteristics. Content codes are to do with the qualitative nature of the event; this is the essence of the event. It is a description that helps to inform judgements about the phenomenon. Nature codes help us distinguish between discursive and tangible events. We include this code to acknowledge that some events differ qualitatively from other events. Type events are codes abstracted from the data in order to help us classify events based on their description or nature. This involves significantly more judgement than the index codes but consequently is also more meaningful. Dynamics codes help us capture some of the movement or fluidity of events. This category has been included to let us capture the flow of activity over time.
Resumo:
Foley [J. Opt. Soc. Am. A 11 (1994) 1710] has proposed an influential psychophysical model of masking in which mask components in a contrast gain pool are raised to an exponent before summation and divisive inhibition. We tested this summation rule in experiments in which contrast detection thresholds were measured for a vertical 1 c/deg (or 2 c/deg) sine-wave component in the presence of a 3 c/deg (or 6 c/deg) mask that had either a single component oriented at -45° or a pair of components oriented at ±45°. Contrary to the predictions of Foley's model 3, we found that for masks of moderate contrast and above, threshold elevation was predicted by linear summation of the mask components in the inhibitory stage of the contrast gain pool. We built this feature into two new models, referred to as the early adaptation model and the hybrid model. In the early adaptation model, contrast adaptation controls a threshold-like nonlinearity on the output of otherwise linear pathways that provide the excitatory and inhibitory inputs to a gain control stage. The hybrid model involves nonlinear and nonadaptable routes to excitatory and inhibitory stages as well as an adaptable linear route. With only six free parameters, both models provide excellent fits to the masking and adaptation data of Foley and Chen [Vision Res. 37 (1997) 2779] but unlike Foley and Chen's model, are able to do so with only one adaptation parameter. However, only the hybrid model is able to capture the features of Foley's (1994) pedestal plus orthogonal fixed mask data. We conclude that (1) linear summation of inhibitory components is a feature of contrast masking, and (2) that the main aftereffect of spatial adaptation on contrast increment thresholds can be assigned to a single site. © 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Resumo:
In order to generate sales promotion response predictions, marketing analysts estimate demand models using either disaggregated (consumer-level) or aggregated (store-level) scanner data. Comparison of predictions from these demand models is complicated by the fact that models may accommodate different forms of consumer heterogeneity depending on the level of data aggregation. This study shows via simulation that demand models with various heterogeneity specifications do not produce more accurate sales response predictions than a homogeneous demand model applied to store-level data, with one major exception: a random coefficients model designed to capture within-store heterogeneity using store-level data produced significantly more accurate sales response predictions (as well as better fit) compared to other model specifications. An empirical application to the paper towel product category adds additional insights. This article has supplementary material online.
Resumo:
This paper addresses the problem of obtaining 3d detailed reconstructions of human faces in real-time and with inexpensive hardware. We present an algorithm based on a monocular multi-spectral photometric-stereo setup. This system is known to capture high-detailed deforming 3d surfaces at high frame rates and without having to use any expensive hardware or synchronized light stage. However, the main challenge of such a setup is the calibration stage, which depends on the lights setup and how they interact with the specific material being captured, in this case, human faces. For this purpose we develop a self-calibration technique where the person being captured is asked to perform a rigid motion in front of the camera, maintaining a neutral expression. Rigidity constrains are then used to compute the head's motion with a structure-from-motion algorithm. Once the motion is obtained, a multi-view stereo algorithm reconstructs a coarse 3d model of the face. This coarse model is then used to estimate the lighting parameters with a stratified approach: In the first step we use a RANSAC search to identify purely diffuse points on the face and to simultaneously estimate this diffuse reflectance model. In the second step we apply non-linear optimization to fit a non-Lambertian reflectance model to the outliers of the previous step. The calibration procedure is validated with synthetic and real data.
Resumo:
Few works address methodological issues of how to conduct strategy-as-practice research and even fewer focus on how to analyse the subsequent data in ways that illuminate strategy as an everyday, social practice. We address this gap by proposing a quantitative method for analysing observational data, which can complement more traditional qualitative methodologies. We propose that rigorous but context-sensitive coding of transcripts can render everyday practice analysable statistically. Such statistical analysis provides a means for analytically representing patterns and shifts within the mundane, repetitive elements through which practice is accomplished. We call this approach the Event Database (EDB) and it consists of five basic coding categories that help us capture the stream of practice. Indexing codes help to index or categorise the data, in order to give context and offer some basic information about the event under discussion. Indexing codes are descriptive codes, which allow us to catalogue and classify events according to their assigned characteristics. Content codes are to do with the qualitative nature of the event; this is the essence of the event. It is a description that helps to inform judgements about the phenomenon. Nature codes help us distinguish between discursive and tangible events. We include this code to acknowledge that some events differ qualitatively from other events. Type events are codes abstracted from the data in order to help us classify events based on their description or nature. This involves significantly more judgement than the index codes but consequently is also more meaningful. Dynamics codes help us capture some of the movement or fluidity of events. This category has been included to let us capture the flow of activity over time.
Resumo:
This paper presents the digital imaging results of a collaborative research project working toward the generation of an on-line interactive digital image database of signs from ancient cuneiform tablets. An important aim of this project is the application of forensic analysis to the cuneiform symbols to identify scribal hands. Cuneiform tablets are amongst the earliest records of written communication, and could be considered as one of the original information technologies; an accessible, portable and robust medium for communication across distance and time. The earliest examples are up to 5,000 years old, and the writing technique remained in use for some 3,000 years. Unfortunately, only a small fraction of these tablets can be made available for display in museums and much important academic work has yet to be performed on the very large numbers of tablets to which there is necessarily restricted access. Our paper will describe the challenges encountered in the 2D image capture of a sample set of tablets held in the British Museum, explaining the motivation for attempting 3D imaging and the results of initial experiments scanning the smaller, more densely inscribed cuneiform tablets. We will also discuss the tractability of 3D digital capture, representation and manipulation, and investigate the requirements for scaleable data compression and transmission methods. Additional information can be found on the project website: www.cuneiform.net