915 resultados para Model Classification
Resumo:
his paper discusses a process to graphically view and analyze information obtained from a network of urban streets, using an algorithm that establishes a ranking of importance of the nodes of the network itself. The basis of this process is to quantify the network information obtained by assigning numerical values to each node, representing numerically the information. These values are used to construct a data matrix that allows us to apply a classification algorithm of nodes in a network in order of importance. From this numerical ranking of the nodes, the process finish with the graphical visualization of the network. An example is shown to illustrate the whole process.
Resumo:
The Maser thesis is devoted to developing a model to technical state of gas turbine engine estimation. The approaches to preparation data, especially to handle unbalanced data were presented in the thesis. In order to efficient estimation of model performance, the special metric was chosen. Goal of the master thesis is analyzing of monitoring parameters data and developing a model of technical state of GTE estimation based on the data.
Resumo:
Questions of handling unbalanced data considered in this article. As models for classification, PNN and MLP are used. Problem of estimation of model performance in case of unbalanced training set is solved. Several methods (clustering approach and boosting approach) considered as useful to deal with the problem of input data.
Resumo:
Petrographical and mineral chemistry data are described for the mist representative basement lithologies occurring as clasts (pebble grain-size class) from the CRP-1 drillhole. Most pebbles consits of either undeformed or foliated biotite with or without hornblende monzogranites. Other rock types include biotite with or without garnet syenogranitr, biotite-hornblende granodiorite, tonalite, monzogranitic porphyries, haplogranite, quartz-monzonite (restricted to the Quaternary section), Ca-silicate rocks and biotite amphibolite (restricted to the Miocene strata). The common and ubiquitous occurence of biotite with or without hornblende monzogranite pebbles, in both the Quaternary and Miocene sections, apparently mirrors the dominance of these rock types in the granitoid assemblages which are presently exposed in the upper Precambrian-lower Paleozoic basement of the south Victoria Land. The other CRP-1 pebble lithologies show petrographical features which consitently support a dominant supply from areas of the Transantarctic Mountains located to the west and south-west of the CRP-1 site, and they thus furthercorroborate a model of local provenance for the supply of basement clasts to the CRP-1 sedimentary strata.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Most of the modem developments with classification trees are aimed at improving their predictive capacity. This article considers a curiously neglected aspect of classification trees, namely the reliability of predictions that come from a given classification tree. In the sense that a node of a tree represents a point in the predictor space in the limit, the aim of this article is the development of localized assessment of the reliability of prediction rules. A classification tree may be used either to provide a probability forecast, where for each node the membership probabilities for each class constitutes the prediction, or a true classification where each new observation is predictively assigned to a unique class. Correspondingly, two types of reliability measure will be derived-namely, prediction reliability and classification reliability. We use bootstrapping methods as the main tool to construct these measures. We also provide a suite of graphical displays by which they may be easily appreciated. In addition to providing some estimate of the reliability of specific forecasts of each type, these measures can also be used to guide future data collection to improve the effectiveness of the tree model. The motivating example we give has a binary response, namely the presence or absence of a species of Eucalypt, Eucalyptus cloeziana, at a given sampling location in response to a suite of environmental covariates, (although the methods are not restricted to binary response data).
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
Resumo:
Chemical engineers are turning to multiscale modelling to extend traditional modelling approaches into new application areas and to achieve higher levels of detail and accuracy. There is, however, little advice available on the best strategy to use in constructing a multiscale model. This paper presents a starting point for the systematic analysis of multiscale models by defining several integrating frameworks for linking models at different scales. It briefly explores how the nature of the information flow between the models at the different scales is influenced by the choice of framework, and presents some restrictions on model-framework compatibility. The concepts are illustrated with reference to the modelling of a catalytic packed bed reactor. (C) 2004 Elsevier Ltd. All rights reserved.
Resumo:
Risk assessment systems for introduced species are being developed and applied globally, but methods for rigorously evaluating them are still in their infancy. We explore classification and regression tree models as an alternative to the current Australian Weed Risk Assessment system, and demonstrate how the performance of screening tests for unwanted alien species may be quantitatively compared using receiver operating characteristic (ROC) curve analysis. The optimal classification tree model for predicting weediness included just four out of a possible 44 attributes of introduced plants examined, namely: (i) intentional human dispersal of propagules; (ii) evidence of naturalization beyond native range; (iii) evidence of being a weed elsewhere; and (iv) a high level of domestication. Intentional human dispersal of propagules in combination with evidence of naturalization beyond a plants native range led to the strongest prediction of weediness. A high level of domestication in combination with no evidence of naturalization mitigated the likelihood of an introduced plant becoming a weed resulting from intentional human dispersal of propagules. Unlikely intentional human dispersal of propagules combined with no evidence of being a weed elsewhere led to the lowest predicted probability of weediness. The failure to include intrinsic plant attributes in the model suggests that either these attributes are not useful general predictors of weediness, or data and analysis were inadequate to elucidate the underlying relationship(s). This concurs with the historical pessimism that we will ever be able to accurately predict invasive plants. Given the apparent importance of propagule pressure (the number of individuals of an species released), future attempts at evaluating screening model performance for identifying unwanted plants need to account for propagule pressure when collating and/or analysing datasets. The classification tree had a cross-validated sensitivity of 93.6% and specificity of 36.7%. Based on the area under the ROC curve, the performance of the classification tree in correctly classifying plants as weeds or non-weeds was slightly inferior (Area under ROC curve = 0.83 +/- 0.021 (+/- SE)) to that of the current risk assessment system in use (Area under ROC curve = 0.89 +/- 0.018 (+/- SE)), although requires many fewer questions to be answered.
Resumo:
We study a model for a two-mode atomic-molecular Bose-Einstein condensate. Starting with a classical analysis we determine the phase space fixed points of the system. It is found that bifurcations of the fixed points naturally separate the coupling parameter space into four regions. The different regions give rise to qualitatively different dynamics. We then show that this classification holds true for the quantum dynamics.
Resumo:
Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.
Resumo:
Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.
Resumo:
Abstract—This paper describes an electrical model of the ventricles incorporating real geometry and motion. Cardiac geometry and motion is obtained from segmentations of multipleslice MRI time sequences. A static heart model developed previously is deformed to match the observed geometry using a novel shape registration algorithm. The resulting electrocardiograms and body surface potential maps are compared to a static simulation in the resting heart. These results demonstrate that introducing motion into the cardiac model modifies the ECG during the T wave at peak contraction of the ventricles.