80 resultados para Hoey, Michael: Textual interaction: An introduction to written discourse analysis
Resumo:
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
For dynamic simulations to be credible, verification of the computer code must be an integral part of the modelling process. This two-part paper describes a novel approach to verification through program testing and debugging. In Part 1, a methodology is presented for detecting and isolating coding errors using back-to-back testing. Residuals are generated by comparing the output of two independent implementations, in response to identical inputs. The key feature of the methodology is that a specially modified observer is created using one of the implementations, so as to impose an error-dependent structure on these residuals. Each error can be associated with a fixed and known subspace, permitting errors to be isolated to specific equations in the code. It is shown that the geometric properties extend to multiple errors in either one of the two implementations. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
In Part 1 of this paper a methodology for back-to-back testing of simulation software was described. Residuals with error-dependent geometric properties were generated. A set of potential coding errors was enumerated, along with a corresponding set of feature matrices, which describe the geometric properties imposed on the residuals by each of the errors. In this part of the paper, an algorithm is developed to isolate the coding errors present by analysing the residuals. A set of errors is isolated when the subspace spanned by their combined feature matrices corresponds to that of the residuals. Individual feature matrices are compared to the residuals and classified as 'definite', 'possible' or 'impossible'. The status of 'possible' errors is resolved using a dynamic subset testing algorithm. To demonstrate and validate the testing methodology presented in Part 1 and the isolation algorithm presented in Part 2, a case study is presented using a model for biological wastewater treatment. Both single and simultaneous errors that are deliberately introduced into the simulation code are correctly detected and isolated. Copyright (C) 2003 John Wiley Sons, Ltd.