174 resultados para Classification model stakeholders
em University of Queensland eSpace - Australia
Resumo:
Racing algorithms have recently been proposed as a general-purpose method for performing model selection in machine teaming algorithms. In this paper, we present an empirical study of the Hoeffding racing algorithm for selecting the k parameter in a simple k-nearest neighbor classifier. Fifteen widely-used classification datasets from UCI are used and experiments conducted across different confidence levels for racing. The results reveal a significant amount of sensitivity of the k-nn classifier to its model parameter value. The Hoeffding racing algorithm also varies widely in its performance, in terms of the computational savings gained over an exhaustive evaluation. While in some cases the savings gained are quite small, the racing algorithm proved to be highly robust to the possibility of erroneously eliminating the optimal models. All results were strongly dependent on the datasets used.
Resumo:
The Montreal Process indicators are intended to provide a common framework for assessing and reviewing progress toward sustainable forest management. The potential of a combined geometrical-optical/spectral mixture analysis model was assessed for mapping the Montreal Process age class and successional age indicators at a regional scale using Landsat Thematic data. The project location is an area of eucalyptus forest in Emu Creek State Forest, Southeast Queensland, Australia. A quantitative model relating the spectral reflectance of a forest to the illumination geometry, slope, and aspect of the terrain surface and the size, shape, and density, and canopy size. Inversion of this model necessitated the use of spectral mixture analysis to recover subpixel information on the fractional extent of ground scene elements (such as sunlit canopy, shaded canopy, sunlit background, and shaded background). Results obtained fron a sensitivity analysis allowed improved allocation of resources to maximize the predictive accuracy of the model. It was found that modeled estimates of crown cover projection, canopy size, and tree densities had significant agreement with field and air photo-interpreted estimates. However, the accuracy of the successional stage classification was limited. The results obtained highlight the potential for future integration of high and moderate spatial resolution-imaging sensors for monitoring forest structure and condition. (C) Elsevier Science Inc., 2000.
Resumo:
A new isotherm is proposed here for adsorption of condensable vapors and gases on nonporous materials having type II isotherms according to the Brunauer-Deming-Deming-Teller (BDDH) classification. The isotherm combines the recent molecular-continuum model in the multilayer region, with other widely used models for sub-monolayer coverage, some of which satisfy the requirement of a Henry's law asymptote. The model is successfully tested using isotherm data for nitrogen adsorption on nonporous silica, carbon and alumina, as well as benzene and hexane adsorption on nonporous carbon. Based on the data fits, out of several different alternative choices of model for the monolayer region, the Freundlich and the Unilan models are found to be the most successful when combined with the multilayer model to predict the whole isotherm. The hybrid model is consequently applicable over a wide pressure range. (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
The majority of the world's population now resides in urban environments and information on the internal composition and dynamics of these environments is essential to enable preservation of certain standards of living. Remotely sensed data, especially the global coverage of moderate spatial resolution satellites such as Landsat, Indian Resource Satellite and Systeme Pour I'Observation de la Terre (SPOT), offer a highly useful data source for mapping the composition of these cities and examining their changes over time. The utility and range of applications for remotely sensed data in urban environments could be improved with a more appropriate conceptual model relating urban environments to the sampling resolutions of imaging sensors and processing routines. Hence, the aim of this work was to take the Vegetation-Impervious surface-Soil (VIS) model of urban composition and match it with the most appropriate image processing methodology to deliver information on VIS composition for urban environments. Several approaches were evaluated for mapping the urban composition of Brisbane city (south-cast Queensland, Australia) using Landsat 5 Thematic Mapper data and 1:5000 aerial photographs. The methods evaluated were: image classification; interpretation of aerial photographs; and constrained linear mixture analysis. Over 900 reference sample points on four transects were extracted from the aerial photographs and used as a basis to check output of the classification and mixture analysis. Distinctive zonations of VIS related to urban composition were found in the per-pixel classification and aggregated air-photo interpretation; however, significant spectral confusion also resulted between classes. In contrast, the VIS fraction images produced from the mixture analysis enabled distinctive densities of commercial, industrial and residential zones within the city to be clearly defined, based on their relative amount of vegetation cover. The soil fraction image served as an index for areas being (re)developed. The logical match of a low (L)-resolution, spectral mixture analysis approach with the moderate spatial resolution image data, ensured the processing model matched the spectrally heterogeneous nature of the urban environments at the scale of Landsat Thematic Mapper data.
Resumo:
Complete small subunit ribosomal RNA gene (ssrDNA) and partial (D1-D3) large subunit ribosomal RNA gene (lsrDNA) sequences were used to estimate the phylogeny of the Digenea via maximum parsimony and Bayesian inference. Here we contribute 80 new ssrDNA and 124 new lsrDNA sequences. Fully complementary data sets of the two genes were assembled from newly generated and previously published sequences and comprised 163 digenean taxa representing 77 nominal families and seven aspidogastrean outgroup taxa representing three families. Analyses were conducted on the genes independently as well as combined and separate analyses including only the higher plagiorchiidan taxa were performed using a reduced-taxon alignment including additional characters that could not be otherwise unambiguously aligned. The combined data analyses yielded the most strongly supported results and differences between the two methods of analysis were primarily in their degree of resolution. The Bayesian analysis including all taxa and characters, and incorporating a model of nucleotide substitution (general-time-reversible with among-site rate heterogeneity), was considered the best estimate of the phylogeny and was used to evaluate their classification and evolution. In broad terms, the Digenea forms a dichotomy that is split between a lineage leading to the Brachylaimoidea, Diplostomoidea and Schistosomatoidea (collectively the Diplostomida nomen novum (nom. nov.)) and the remainder of the Digenea (the Plagiorchiida), in which the Bivesiculata nom. nov. and Transversotremata nom. nov. form the two most basal lineages, followed by the Hemiurata. The remainder of the Plagiorchiida forms a large number of independent lineages leading to the crown clade Xiphidiata nom. nov. that comprises the Allocreadioidea, Gorgoderoidea, Microphalloidea and Plagiorchioidea, which are united by the presence of a penetrating stylet in their cercariae. Although a majority of families and to a lesser degree, superfamilies are supported as currently defined, the traditional divisions of the Echinostomida, Plagiorchiida and Strigeida were found to comprise non-natural assemblages. Therefore, the membership of established higher taxa are emended, new taxa erected and a revised, phylogenetically based classification proposed and discussed in light of ontogeny, morphology and taxonomic history. (C) 2003 Australian Society for Parasitology Inc. Published by Elsevier Science Ltd. All rights reserved.
Resumo:
An energy-based swing hammer mill model has been developed for coke oven feed preparation. it comprises a mechanistic power model to determine the dynamic internal recirculation and a perfect mixing mill model with a dual-classification function to mimic the operations of crusher and screen. The model parameters were calibrated using a pilot-scale swing hammer mill at various operating conditions. The effects of the underscreen configurations and the feed sizes on hammer mill operations were demonstrated through the fitted model parameters. Relationships between the model parameters and the machine configurations were established. The model was validated using the independent experimental data of single lithotype coal tests with the same BJD pilot-scale hammer mill and full operation audit data of an industrial hammer mill. The outcome of the energy-based swing hammer mill model is the capability to simulate the impact of changing blends of coal or mill configurations and operating conditions on product size distribution. Alternatively, the model can be used to select the machine settings required to achieve a desired product. (C) 2003 Elsevier Science B.V. All rights reserved.
Resumo:
Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
Resumo:
Chemical engineers are turning to multiscale modelling to extend traditional modelling approaches into new application areas and to achieve higher levels of detail and accuracy. There is, however, little advice available on the best strategy to use in constructing a multiscale model. This paper presents a starting point for the systematic analysis of multiscale models by defining several integrating frameworks for linking models at different scales. It briefly explores how the nature of the information flow between the models at the different scales is influenced by the choice of framework, and presents some restrictions on model-framework compatibility. The concepts are illustrated with reference to the modelling of a catalytic packed bed reactor. (C) 2004 Elsevier Ltd. All rights reserved.
Resumo:
Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.
Resumo:
We study a model for a two-mode atomic-molecular Bose-Einstein condensate. Starting with a classical analysis we determine the phase space fixed points of the system. It is found that bifurcations of the fixed points naturally separate the coupling parameter space into four regions. The different regions give rise to qualitatively different dynamics. We then show that this classification holds true for the quantum dynamics.