913 resultados para 004 - Informatik (Data processing Computer science)
Resumo:
We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.
A robust Bayesian approach to null intercept measurement error model with application to dental data
Resumo:
Measurement error models often arise in epidemiological and clinical research. Usually, in this set up it is assumed that the latent variable has a normal distribution. However, the normality assumption may not be always correct. Skew-normal/independent distribution is a class of asymmetric thick-tailed distributions which includes the Skew-normal distribution as a special case. In this paper, we explore the use of skew-normal/independent distribution as a robust alternative to null intercept measurement error model under a Bayesian paradigm. We assume that the random errors and the unobserved value of the covariate (latent variable) follows jointly a skew-normal/independent distribution, providing an appealing robust alternative to the routine use of symmetric normal distribution in this type of model. Specific distributions examined include univariate and multivariate versions of the skew-normal distribution, the skew-t distributions, the skew-slash distributions and the skew contaminated normal distributions. The methods developed is illustrated using a real data set from a dental clinical trial. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
Basic information theory is used to analyse the amount of confidential information which may be leaked by programs written in a very simple imperative language. In particular, a detailed analysis is given of the possible leakage due to equality tests and if statements. The analysis is presented as a set of syntax-directed inference rules and can readily be automated.
Resumo:
In e-Science experiments, it is vital to record the experimental process for later use such as in interpreting results, verifying that the correct process took place or tracing where data came from. The process that led to some data is called the provenance of that data, and a provenance architecture is the software architecture for a system that will provide the necessary functionality to record, store and use process documentation. However, there has been little principled analysis of what is actually required of a provenance architecture, so it is impossible to determine the functionality they would ideally support. In this paper, we present use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, technology and application-independent architecture. We propose an architecture that meets these requirements and evaluate a preliminary implementation by attempting to realise two of the use cases.
Resumo:
This article describes a methodological approach to conditional reasoning in online asynchronous learning environments such as Virtual-U VGroups, developed by SFU, BC, Canada, consistent with the notion of meaning implication: If part of a meaning C is embedded in B and a part of a meaning B is embedded in A, then A implies C in terms of meaning [Piaget 91]. A new transcript analysis technique was developed to assess the flows of conditional meaning implications and to identify the occurrence of hypotheses and connections among them in two human science graduate mixed-mode online courses offered in the summer/spring session of 1997 by SFU. Flows of conditional meaning implications were confronted with Virtual-U VGroups threads and results of the two courses were compared. Findings suggest that Virtual-U VGroups is a knowledge-building environment although the tree-like Virtual-U VGroups threads should be transformed into neuronal-like threads. Findings also suggest that formulating hypotheses together triggers a collaboratively problem-solving process that scaffolds knowledge-building in asynchronous learning environments: A pedagogical technique and an built-in tool for formulating hypotheses together are proposed. © Springer Pub. Co.
Resumo:
This paper adresses the problem on processing biological data such as cardiac beats, audio and ultrasonic range, calculating wavelet coefficients in real time, with processor clock running at frequency of present ASIC's and FPGA. The Paralell Filter Architecture for DWT has been improved, calculating wavelet coefficients in real time with hardware reduced to 60%. The new architecture, which also processes IDWT, is implemented with the Radix-2 or the Booth-Wallace Constant multipliers. Including series memory register banks, one integrated circuit Signal Analyzer, ultrasonic range, is presented.
Resumo:
Until mid 2006, SCIAMACHY data processors for the operational retrieval of nitrogen dioxide (NO2) column data were based on the historical version 2 of the GOME Data Processor (GDP). On top of known problems inherent to GDP 2, ground-based validations of SCIAMACHY NO2 data revealed issues specific to SCIAMACHY, like a large cloud-dependent offset occurring at Northern latitudes. In 2006, the GDOAS prototype algorithm of the improved GDP version 4 was transferred to the off-line SCIAMACHY Ground Processor (SGP) version 3.0. In parallel, the calibration of SCIAMACHY radiometric data was upgraded. Before operational switch-on of SGP 3.0 and public release of upgraded SCIAMACHY NO2 data, we have investigated the accuracy of the algorithm transfer: (a) by checking the consistency of SGP 3.0 with prototype algorithms; and (b) by comparing SGP 3.0 NO2 data with ground-based observations reported by the WMO/GAW NDACC network of UV-visible DOAS/SAOZ spectrometers. This delta-validation study concludes that SGP 3.0 is a significant improvement with respect to the previous processor IPF 5.04. For three particular SCIAMACHY states, the study reveals unexplained features in the slant columns and air mass factors, although the quantitative impact on SGP 3.0 vertical columns is not significant.
Resumo:
In this work an image pre-processing module has been developed to extract quantitative information from plantation images with various degrees of infestation. Four filters comprise this module: the first one acts on smoothness of the image, the second one removes image background enhancing plants leaves, the third filter removes isolated dots not removed by the previous filter, and the fourth one is used to highlight leaves' edges. At first the filters were tested with MATLAB, for a quick visual feedback of the filters' behavior. Then the filters were implemented in the C programming language. At last, the module as been coded in VHDL for the implementation on a Stratix II family FPGA. Tests were run and the results are shown in this paper. © 2008 Springer-Verlag Berlin Heidelberg.
Resumo:
The Optimum-Path Forest (OPF) classifier is a recent and promising method for pattern recognition, with a fast training algorithm and good accuracy results. Therefore, the investigation of a combining method for this kind of classifier can be important for many applications. In this paper we report a fast method to combine OPF-based classifiers trained with disjoint training subsets. Given a fixed number of subsets, the algorithm chooses random samples, without replacement, from the original training set. Each subset accuracy is improved by a learning procedure. The final decision is given by majority vote. Experiments with simulated and real data sets showed that the proposed combining method is more efficient and effective than naive approach provided some conditions. It was also showed that OPF training step runs faster for a series of small subsets than for the whole training set. The combining scheme was also designed to support parallel or distributed processing, speeding up the procedure even more. © 2011 Springer-Verlag.
Resumo:
We are investigating the combination of wavelets and decision trees to detect ships and other maritime surveillance targets from medium resolution SAR images. Wavelets have inherent advantages to extract image descriptors while decision trees are able to handle different data sources. In addition, our work aims to consider oceanic features such as ship wakes and ocean spills. In this incipient work, Haar and Cohen-Daubechies-Feauveau 9/7 wavelets obtain detailed descriptors from targets and ocean features and are inserted with other statistical parameters and wavelets into an oblique decision tree. © 2011 Springer-Verlag.
Resumo:
Software Transactional Memory (STM) systems have poor performance under high contention scenarios. Since many transactions compete for the same data, most of them are aborted, wasting processor runtime. Contention management policies are typically used to avoid that, but they are passive approaches as they wait for an abort to happen so they can take action. More proactive approaches have emerged, trying to predict when a transaction is likely to abort so its execution can be delayed. Such techniques are limited, as they do not replace the doomed transaction by another or, when they do, they rely on the operating system for that, having little or no control on which transaction should run. In this paper we propose LUTS, a Lightweight User-Level Transaction Scheduler, which is based on an execution context record mechanism. Unlike other techniques, LUTS provides the means for selecting another transaction to run in parallel, thus improving system throughput. Moreover, it avoids most of the issues caused by pseudo parallelism, as it only launches as many system-level threads as the number of available processor cores. We discuss LUTS design and present three conflict-avoidance heuristics built around LUTS scheduling capabilities. Experimental results, conducted with STMBench7 and STAMP benchmark suites, show LUTS efficiency when running high contention applications and how conflict-avoidance heuristics can improve STM performance even more. In fact, our transaction scheduling techniques are capable of improving program performance even in overloaded scenarios. © 2011 Springer-Verlag.
Resumo:
This paper presents three methods for automatic detection of dust devils tracks in images of Mars. The methods are mainly based on Mathematical Morphology and results of their performance are analyzed and compared. A dataset of 21 images from the surface of Mars representative of the diversity of those track features were considered for developing, testing and evaluating our methods, confronting their outputs with ground truth images made manually. Methods 1 and 3, based on closing top-hat and path closing top-hat, respectively, showed similar mean accuracies around 90% but the time of processing was much greater for method 1 than for method 3. Method 2, based on radial closing, was the fastest but showed worse mean accuracy. Thus, this was the tiebreak factor. © 2011 Springer-Verlag.