842 resultados para data movement problem
Resumo:
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
Resumo:
This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.
Resumo:
The well-studied link between psychotic traits and creativity is a subject of much debate. The present study investigated the extent to which schizotypic personality traits - as measured by O-LIFE (Oxford-Liverpool Inventory of Feelings and Experiences) - equip healthy individuals to engage as groups in everyday tasks. From a sample of 69 students, eight groups of four participants - comprised of high, medium, or low-schizotypy individuals - were assembled to work as a team to complete a creative problem-solving task. Predictably, high scorers on the O-LIFE formulated a greater number of strategies to solve the task, indicative of creative divergent thinking. However, for task success (as measured by time taken to complete the problem) an inverted U shaped pattern emerged, whereby high and low-schizotypy groups were consistently faster than medium schizotypy groups. Intriguing data emerged concerning leadership within the groups, and other tangential findings relating to anxiety, competition and motivation were explored. These findings challenge the traditional cliche that psychotic personality traits are linearly related to creative performance, and suggest that the nature of the problem determines which thinking styles are optimally equipped to solve it. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Two experiments implement and evaluate a training scheme for learning to apply frequency formats to probability judgements couched in terms of percentages. Results indicate that both conditional and cumulative probability judgements can be improved in this manner, however the scheme is insufficient to promote any deeper understanding of the problem structure. In both experiments, training on one problem type only (either conditional or cumulative risk judgements) resulted in an inappropriate transfer of a learned method at test. The obstacles facing a frequency-based training programme for teaching appropriate use of probability data are discussed. Copyright (c) 2006 John Wiley & Sons, Ltd.
Resumo:
This paper proposes a new iterative algorithm for OFDM joint data detection and phase noise (PHN) cancellation based on minimum mean square prediction error. We particularly highlight the problem of "overfitting" such that the iterative approach may converge to a trivial solution. Although it is essential for this joint approach, the overfitting problem was relatively less studied in existing algorithms. In this paper, specifically, we apply a hard decision procedure at every iterative step to overcome the overfitting. Moreover, compared with existing algorithms, a more accurate Pade approximation is used to represent the phase noise, and finally a more robust and compact fast process based on Givens rotation is proposed to reduce the complexity to a practical level. Numerical simulations are also given to verify the proposed algorithm.
OFDM joint data detection and phase noise cancellation based on minimum mean square prediction error
Resumo:
This paper proposes a new iterative algorithm for orthogonal frequency division multiplexing (OFDM) joint data detection and phase noise (PHN) cancellation based on minimum mean square prediction error. We particularly highlight the relatively less studied problem of "overfitting" such that the iterative approach may converge to a trivial solution. Specifically, we apply a hard-decision procedure at every iterative step to overcome the overfitting. Moreover, compared with existing algorithms, a more accurate Pade approximation is used to represent the PHN, and finally a more robust and compact fast process based on Givens rotation is proposed to reduce the complexity to a practical level. Numerical Simulations are also given to verify the proposed algorithm. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
This correspondence proposes a new algorithm for the OFDM joint data detection and phase noise (PHN) cancellation for constant modulus modulations. We highlight that it is important to address the overfitting problem since this is a major detrimental factor impairing the joint detection process. In order to attack the overfitting problem we propose an iterative approach based on minimum mean square prediction error (MMSPE) subject to the constraint that the estimated data symbols have constant power. The proposed constrained MMSPE algorithm (C-MMSPE) significantly improves the performance of existing approaches with little extra complexity being imposed. Simulation results are also given to verify the proposed algorithm.
Resumo:
This paper presents a study investigating how the performance of motion-impaired computer users in point and click tasks varies with target distance (A), target width (W), and force-feedback gravity well width (GWW). Six motion-impaired users performed point and click tasks across a range of values for A, W, and GWW. Times were observed to increase with A, and to decrease with W. Times also improved with GWW, and, with the addition of a gravity well, a greater improvement was observed for smaller targets than for bigger ones. It was found that Fitts Law gave a good description of behaviour for each value of GWW, and that gravity wells reduced the effect of task difficulty on performance. A model based on Fitts Law is proposed, which incorporates the effect of GWW on movement time. The model accounts for 88.8% of the variance in the observed data.
Resumo:
The genetic analysis workshop 15 (GAW15) problem 1 contained baseline expression levels of 8793 genes in immortalised B cells from 194 individuals in 14 Centre d’Etude du Polymorphisme Humane (CEPH) Utah pedigrees. Previous analysis of the data showed linkage and association and evidence of substantial individual variations. In particular, correlation was examined on expression levels of 31 genes and 25 target genes corresponding to two master regulatory regions. In this analysis, we apply Bayesian network analysis to gain further insight into these findings. We identify strong dependences and therefore provide additional insight into the underlying relationships between the genes involved. More generally, the approach is expected to be applicable for integrated analysis of genes on biological pathways.
Resumo:
We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leave-one-out (LOO) criteria is developed to construct parsimonious radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines the center vector and diagonal covariance matrix of one RBF node by minimizing the LOO statistics. For regression applications, the LOO criterion is chosen to be the LOO mean square error, while the LOO misclassification rate is adopted in two-class classification applications. By adopting the Parzen window estimate as the desired response, the unsupervised density estimation problem is transformed into a constrained regression problem. This PSO aided OFR algorithm for tunable-node RBF networks is capable of constructing very parsimonious RBF models that generalize well, and our analysis and experimental results demonstrate that the algorithm is computationally even simpler than the efficient regularization assisted orthogonal least square algorithm based on LOO criteria for selecting fixed-node RBF models. Another significant advantage of the proposed learning procedure is that it does not have learning hyperparameters that have to be tuned using costly cross validation. The effectiveness of the proposed PSO aided OFR construction procedure is illustrated using several examples taken from regression and classification, as well as density estimation applications.
Resumo:
A common problem in many data based modelling algorithms such as associative memory networks is the problem of the curse of dimensionality. In this paper, a new two-stage neurofuzzy system design and construction algorithm (NeuDeC) for nonlinear dynamical processes is introduced to effectively tackle this problem. A new simple preprocessing method is initially derived and applied to reduce the rule base, followed by a fine model detection process based on the reduced rule set by using forward orthogonal least squares model structure detection. In both stages, new A-optimality experimental design-based criteria we used. In the preprocessing stage, a lower bound of the A-optimality design criterion is derived and applied as a subset selection metric, but in the later stage, the A-optimality design criterion is incorporated into a new composite cost function that minimises model prediction error as well as penalises the model parameter variance. The utilisation of NeuDeC leads to unbiased model parameters with low parameter variance and the additional benefit of a parsimonious model structure. Numerical examples are included to demonstrate the effectiveness of this new modelling approach for high dimensional inputs.
Resumo:
The modelling of a nonlinear stochastic dynamical processes from data involves solving the problems of data gathering, preprocessing, model architecture selection, learning or adaptation, parametric evaluation and model validation. For a given model architecture such as associative memory networks, a common problem in non-linear modelling is the problem of "the curse of dimensionality". A series of complementary data based constructive identification schemes, mainly based on but not limited to an operating point dependent fuzzy models, are introduced in this paper with the aim to overcome the curse of dimensionality. These include (i) a mixture of experts algorithm based on a forward constrained regression algorithm; (ii) an inherent parsimonious delaunay input space partition based piecewise local lineal modelling concept; (iii) a neurofuzzy model constructive approach based on forward orthogonal least squares and optimal experimental design and finally (iv) the neurofuzzy model construction algorithm based on basis functions that are Bézier Bernstein polynomial functions and the additive decomposition. Illustrative examples demonstrate their applicability, showing that the final major hurdle in data based modelling has almost been removed.
Resumo:
We are developing computational tools supporting the detailed analysis of the dependence of neural electrophysiological response on dendritic morphology. We approach this problem by combining simulations of faithful models of neurons (experimental real life morphological data with known models of channel kinetics) with algorithmic extraction of morphological and physiological parameters and statistical analysis. In this paper, we present the novel method for an automatic recognition of spike trains in voltage traces, which eliminates the need for human intervention. This enables classification of waveforms with consistent criteria across all the analyzed traces and so it amounts to reduction of the noise in the data. This method allows for an automatic extraction of relevant physiological parameters necessary for further statistical analysis. In order to illustrate the usefulness of this procedure to analyze voltage traces, we characterized the influence of the somatic current injection level on several electrophysiological parameters in a set of modeled neurons. This application suggests that such an algorithmic processing of physiological data extracts parameters in a suitable form for further investigation of structure-activity relationship in single neurons.
Resumo:
Measured process data normally contain inaccuracies because the measurements are obtained using imperfect instruments. As well as random errors one can expect systematic bias caused by miscalibrated instruments or outliers caused by process peaks such as sudden power fluctuations. Data reconciliation is the adjustment of a set of process data based on a model of the process so that the derived estimates conform to natural laws. In this paper, techniques for the detection and identification of both systematic bias and outliers in dynamic process data are presented. A novel technique for the detection and identification of systematic bias is formulated and presented. The problem of detection, identification and elimination of outliers is also treated using a modified version of a previously available clustering technique. These techniques are also combined to provide a global dynamic data reconciliation (DDR) strategy. The algorithms presented are tested in isolation and in combination using dynamic simulations of two continuous stirred tank reactors (CSTR).
Conditioning of incremental variational data assimilation, with application to the Met Office system
Resumo:
Implementations of incremental variational data assimilation require the iterative minimization of a series of linear least-squares cost functions. The accuracy and speed with which these linear minimization problems can be solved is determined by the condition number of the Hessian of the problem. In this study, we examine how different components of the assimilation system influence this condition number. Theoretical bounds on the condition number for a single parameter system are presented and used to predict how the condition number is affected by the observation distribution and accuracy and by the specified lengthscales in the background error covariance matrix. The theoretical results are verified in the Met Office variational data assimilation system, using both pseudo-observations and real data.