19 resultados para CLASSIFIER
em University of Queensland eSpace - Australia
Resumo:
The indefinite determiner yi 'one'+ classifier' is the most approximate to an indefinite article, like the English a, in Chinese. It serves all the functions characteristic of representative stages of grammaticalization from a numeral to a generalized indefinite determiner as elaborated in the literature. It is established in this paper that the Chinese indefinite determiner has developed a special use with definite expressions, serving as a backgrounding device marking entities as of low thematic importance and unlikely to receive subsequent mentions in ensuing discourse. 'yi+ classifier' in the special use with definite expressions displays striking similarities in terms of semantic bleaching and phonological reduction with the same determiner at the advanced stage of grammaticalization characterized by uses with generics, nonspecifics and nonreferentials. An explanation is offered in terms of an implicational relation between nonreferentiality and low thematic importance which characterize the two uses of the indefinite determiner. While providing another piece of evidence in support of the claim that semantically nonreferentials and entities of low thematic importance tend to be encoded in terms of same linguistic devices in language, findings in this paper have shown how an indefinite determiner can undergo a higher degree of grammaticalization than has been reported in the literature-it expands its scope to mark not only indefinite but also definite expressions as semantically nonreferential and/or thematically unimportant. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Merkel cell carcinoma (MCC) is a rare aggressive skin tumor which shares histopathological and genetic features with small-cell lung carcinoma (SCLC), both are of neuroendocrine origin. Comparable to SCLC, MCC cell lines are classified into two different biochemical subgroups designated as 'Classic' and 'Variant'. With the aim to identify typical gene-expression signatures associated with these phenotypically different MCC cell lines subgroups and to search for differentially expressed genes between MCC and SCLC, we used cDNA arrays to pro. le 10 MCC cell lines and four SCLC cell lines. Using significance analysis of microarrays, we defined a set of 76 differentially expressed genes that allowed unequivocal identification of Classic and Variant MCC subgroups. We assume that the differential expression levels of some of these genes reflect, analogous to SCLC, the different biological and clinical properties of Classic and Variant MCC phenotypes. Therefore, they may serve as useful prognostic markers and potential targets for the development of new therapeutic interventions specific for each subgroup. Moreover, our analysis identified 17 powerful classifier genes capable of discriminating MCC from SCLC. Real-time quantitative RT-PCR analysis of these genes on 26 additional MCC and SCLC samples confirmed their diagnostic classification potential, opening opportunities for new investigations into these aggressive cancers.
Resumo:
The Tree Augmented Naïve Bayes (TAN) classifier relaxes the sweeping independence assumptions of the Naïve Bayes approach by taking account of conditional probabilities. It does this in a limited sense, by incorporating the conditional probability of each attribute given the class and (at most) one other attribute. The method of boosting has previously proven very effective in improving the performance of Naïve Bayes classifiers and in this paper, we investigate its effectiveness on application to the TAN classifier.
Resumo:
Support vector machines (SVMs) have recently emerged as a powerful technique for solving problems in pattern classification and regression. Best performance is obtained from the SVM its parameters have their values optimally set. In practice, good parameter settings are usually obtained by a lengthy process of trial and error. This paper describes the use of genetic algorithm to evolve these parameter settings for an application in mobile robotics.
Resumo:
This paper examines the article system in interlanguage grammar focusing on Japanese learners of English, whose native language lacks articles. It will be demonstrated that for the acquisition of the English article system, count/mass distinctions and definiteness are the crucial factors. Although Japanese does not employ the article system to encode these aspects, it will be argued that they are nevertheless syntactically encoded through its classifier system. Hence, the problem for these learners must be to map these features onto the appropriate surface forms as the Missing Surface Inflection Hypothesis predicts (Prévost & White 2000). This suggestion will further be supported empirically by a fill-in-the article task. It will be concluded that these Japanese learners understand the English article system fairly well, possibly due to their native language, yet have problems with realizing the relevant features (i.e. count/mass distinctions and definiteness) in the target language.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.
Resumo:
Modelling and simulation studies were carried out at 26 cement clinker grinding circuits including tube mills, air separators and high pressure grinding rolls in 8 plants. The results reported earlier have shown that tube mills can be modelled as several mills in series, and the internal partition in tube mills can be modelled as a screen which must retain coarse particles in the first compartment but not impede the flow of drying air. In this work the modelling has been extended to show that the Tromp curve which describes separator (classifier) performance can be modelled in terms of d(50)(corr), by-pass, the fish hook, and the sharpness of the curve. Also the high pressure grinding rolls model developed at the Julius Kruttschnitt Mineral Research Centre gives satisfactory predictions using a breakage function derived from impact and compressed bed tests. Simulation studies of a full plant incorporating a tube mill, HPGR and separators showed that the models could successfully predict the performance of the another mill working under different conditions. The simulation capability can therefore be used for process optimization and design. (C) 2001 Elsevier Science Ltd. All rights reserved.
Resumo:
In this study we present a novel automated strategy for predicting infarct evolution, based on MR diffusion and perfusion images acquired in the acute stage of stroke. The validity of this methodology was tested on novel patient data including data acquired from an independent stroke clinic. Regions-of-interest (ROIs) defining the initial diffusion lesion and tissue with abnormal hemodynamic function as defined by the mean transit time (MTT) abnormality were automatically extracted from DWI/PI maps. Quantitative measures of cerebral blood flow (CBF) and volume (CBV) along with ratio measures defined relative to the contralateral hemisphere (r(a)CBF and r(a)CBV) were calculated for the MTT ROIs. A parametric normal classifier algorithm incorporating these measures was used to predict infarct growth. The mean r(a)CBF and r(a)CBV values for eventually infarcted MTT tissue were 0.70 +/-0.19 and 1.20 +/-0.36. For recovered tissue the mean values were 0.99 +/-0.25 and 1.87 +/-0.71, respectively. There was a significant difference between these two regions for both measures (P
Resumo:
We have used microarray gene expression pro. ling and machine learning to predict the presence of BRAF mutations in a panel of 61 melanoma cell lines. The BRAF gene was found to be mutated in 42 samples (69%) and intragenic mutations of the NRAS gene were detected in seven samples (11%). No cell line carried mutations of both genes. Using support vector machines, we have built a classifier that differentiates between melanoma cell lines based on BRAF mutation status. As few as 83 genes are able to discriminate between BRAF mutant and BRAF wild-type samples with clear separation observed using hierarchical clustering. Multidimensional scaling was used to visualize the relationship between a BRAF mutation signature and that of a generalized mitogen-activated protein kinase ( MAPK) activation ( either BRAF or NRAS mutation) in the context of the discriminating gene list. We observed that samples carrying NRAS mutations lie somewhere between those with or without BRAF mutations. These observations suggest that there are gene-specific mutation signals in addition to a common MAPK activation that result from the pleiotropic effects of either BRAF or NRAS on other signaling pathways, leading to measurably different transcriptional changes.
Resumo:
Selection of machine learning techniques requires a certain sensitivity to the requirements of the problem. In particular, the problem can be made more tractable by deliberately using algorithms that are biased toward solutions of the requisite kind. In this paper, we argue that recurrent neural networks have a natural bias toward a problem domain of which biological sequence analysis tasks are a subset. We use experiments with synthetic data to illustrate this bias. We then demonstrate that this bias can be exploitable using a data set of protein sequences containing several classes of subcellular localization targeting peptides. The results show that, compared with feed forward, recurrent neural networks will generally perform better on sequence analysis tasks. Furthermore, as the patterns within the sequence become more ambiguous, the choice of specific recurrent architecture becomes more critical.
Resumo:
Simplicity in design and minimal floor space requirements render the hydrocyclone the preferred classifier in mineral processing plants. Empirical models have been developed for design and process optimisation but due to the complexity of the flow behaviour in the hydrocyclone these do not provide information on the internal separation mechanisms. To study the interaction of design variables, the flow behaviour needs to be considered, especially when modelling the new three-product cyclone. Computational fluid dynamics (CFD) was used to model the three-product cyclone, in particular the influence of the dual vortex finder arrangement on flow behaviour. From experimental work performed on the UG2 platinum ore, significant differences in the classification performance of the three-product cyclone were noticed with variations in the inner vortex finder length. Because of this simulations were performed for a range of inner vortex finder lengths. Simulations were also conducted on a conventional hydrocyclone of the same size to enable a direct comparison of the flow behaviour between the two cyclone designs. Significantly, high velocities were observed for the three-product cyclone with an inner vortex finder extended deep into the conical section of the cyclone. CFD studies revealed that in the three-product cyclone, a cylindrical shaped air-core is observed similar to conventional hydrocyclones. A constant diameter air-core was observed throughout the inner vortex finder length, while no air-core was present in the annulus. (c) 2006 Elsevier Ltd. All rights reserved.
Resumo:
In this article, we propose a framework, namely, Prediction-Learning-Distillation (PLD) for interactive document classification and distilling misclassified documents. Whenever a user points out misclassified documents, the PLD learns from the mistakes and identifies the same mistakes from all other classified documents. The PLD then enforces this learning for future classifications. If the classifier fails to accept relevant documents or reject irrelevant documents on certain categories, then PLD will assign those documents as new positive/negative training instances. The classifier can then strengthen its weakness by learning from these new training instances. Our experiments’ results have demonstrated that the proposed algorithm can learn from user-identified misclassified documents, and then distil the rest successfully.
Resumo:
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a "smart gate" sorting the outputs of several different targeting peptide detection networks. Our final model (PProwler v1.2) gives MCC values of 0.873 for non-plant and 0.849 for plant proteins. The model improves upon the accuracy of our previous subcellular localization predictor (PProwler v1.1) by 2% for plant data (which represents 7.5% improvement upon TargetP).