18 resultados para Supervised classifier
em University of Queensland eSpace - Australia
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
The indefinite determiner yi 'one'+ classifier' is the most approximate to an indefinite article, like the English a, in Chinese. It serves all the functions characteristic of representative stages of grammaticalization from a numeral to a generalized indefinite determiner as elaborated in the literature. It is established in this paper that the Chinese indefinite determiner has developed a special use with definite expressions, serving as a backgrounding device marking entities as of low thematic importance and unlikely to receive subsequent mentions in ensuing discourse. 'yi+ classifier' in the special use with definite expressions displays striking similarities in terms of semantic bleaching and phonological reduction with the same determiner at the advanced stage of grammaticalization characterized by uses with generics, nonspecifics and nonreferentials. An explanation is offered in terms of an implicational relation between nonreferentiality and low thematic importance which characterize the two uses of the indefinite determiner. While providing another piece of evidence in support of the claim that semantically nonreferentials and entities of low thematic importance tend to be encoded in terms of same linguistic devices in language, findings in this paper have shown how an indefinite determiner can undergo a higher degree of grammaticalization than has been reported in the literature-it expands its scope to mark not only indefinite but also definite expressions as semantically nonreferential and/or thematically unimportant. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Merkel cell carcinoma (MCC) is a rare aggressive skin tumor which shares histopathological and genetic features with small-cell lung carcinoma (SCLC), both are of neuroendocrine origin. Comparable to SCLC, MCC cell lines are classified into two different biochemical subgroups designated as 'Classic' and 'Variant'. With the aim to identify typical gene-expression signatures associated with these phenotypically different MCC cell lines subgroups and to search for differentially expressed genes between MCC and SCLC, we used cDNA arrays to pro. le 10 MCC cell lines and four SCLC cell lines. Using significance analysis of microarrays, we defined a set of 76 differentially expressed genes that allowed unequivocal identification of Classic and Variant MCC subgroups. We assume that the differential expression levels of some of these genes reflect, analogous to SCLC, the different biological and clinical properties of Classic and Variant MCC phenotypes. Therefore, they may serve as useful prognostic markers and potential targets for the development of new therapeutic interventions specific for each subgroup. Moreover, our analysis identified 17 powerful classifier genes capable of discriminating MCC from SCLC. Real-time quantitative RT-PCR analysis of these genes on 26 additional MCC and SCLC samples confirmed their diagnostic classification potential, opening opportunities for new investigations into these aggressive cancers.
Resumo:
Introduction: Walking programmes are recommended as part of the initial treatment for intermittent claudication (IC). However, for many patients factors such as frailty, the severe leg discomfort associated with walking and safety concerns about exercising in public areas reduce compliance to such prescription. Thus, there is a need to identify a mode of exercise that provides the same benefits as regular walking while also offering convenience and comfort for these patients. The present study aims to provide evidence for the first time of the efficacy of a supervised cycle training programme compared with a conventional walking programme for the treatment of IC. Methods: Thus far 33 patients have been randomized to: a treadmill-training group (n = 12); a cycle-training group (n = 11); or a control group (n = 10). Training groups participated in three sessions of supervised training per week for a period of 6 weeks. Control patients received no experimental intervention. Maximal incremental treadmill testing was performed at baseline and after the 6 weeks of training. Measures included pain-free (PFWT) and maximal walking time (MWT), continuous heart rate and gas-analysis recording, and ankle-brachial index assessment. Results: In the treadmill trained group MWT increased significantly from 1016.7 523.7 to 1255.2 432.2 s (P < 0.05). MWT tended to increase with cycle training (848.72 333.18 to 939.54 350.35 s, P = 0.14), and remained unchanged in the control group (1555.1 683.23 to 1534.7 689.87 s). For PFWT, there was a non-significant increase in the treadmill-training group from 414.4 262.3 to 592.9 381.9 s, while both the cycle training and control groups displayed no significant change in this time (226.7 147.1 s to 192.3 56.8 and 499.4 503.7 s to 466.0 526.1 s, respectively). Conclusions: These preliminary results might suggest that, unlike treadmill walking, cycling has no clear effect on walking performance in patients with IC. Thus the current recommendations promoting walking based programmes appear appropriate. The present study was funded by the National Heart Foundation of Australia.
Resumo:
The Tree Augmented Naïve Bayes (TAN) classifier relaxes the sweeping independence assumptions of the Naïve Bayes approach by taking account of conditional probabilities. It does this in a limited sense, by incorporating the conditional probability of each attribute given the class and (at most) one other attribute. The method of boosting has previously proven very effective in improving the performance of Naïve Bayes classifiers and in this paper, we investigate its effectiveness on application to the TAN classifier.
Resumo:
Support vector machines (SVMs) have recently emerged as a powerful technique for solving problems in pattern classification and regression. Best performance is obtained from the SVM its parameters have their values optimally set. In practice, good parameter settings are usually obtained by a lengthy process of trial and error. This paper describes the use of genetic algorithm to evolve these parameter settings for an application in mobile robotics.
Resumo:
This paper examines the article system in interlanguage grammar focusing on Japanese learners of English, whose native language lacks articles. It will be demonstrated that for the acquisition of the English article system, count/mass distinctions and definiteness are the crucial factors. Although Japanese does not employ the article system to encode these aspects, it will be argued that they are nevertheless syntactically encoded through its classifier system. Hence, the problem for these learners must be to map these features onto the appropriate surface forms as the Missing Surface Inflection Hypothesis predicts (Prévost & White 2000). This suggestion will further be supported empirically by a fill-in-the article task. It will be concluded that these Japanese learners understand the English article system fairly well, possibly due to their native language, yet have problems with realizing the relevant features (i.e. count/mass distinctions and definiteness) in the target language.
Resumo:
There are many techniques for electricity market price forecasting. However, most of them are designed for expected price analysis rather than price spike forecasting. An effective method of predicting the occurrence of spikes has not yet been observed in the literature so far. In this paper, a data mining based approach is presented to give a reliable forecast of the occurrence of price spikes. Combined with the spike value prediction techniques developed by the same authors, the proposed approach aims at providing a comprehensive tool for price spike forecasting. In this paper, feature selection techniques are firstly described to identify the attributes relevant to the occurrence of spikes. A simple introduction to the classification techniques is given for completeness. Two algorithms: support vector machine and probability classifier are chosen to be the spike occurrence predictors and are discussed in details. Realistic market data are used to test the proposed model with promising results.
Resumo:
Introduction: This paper reviews studies of physical activity interventions in health care settings to determine effects on physical activity and/or fitness and characteristics of successful interventions. Methods: Studies testing interventions to promote physical activity in health care settings for primary prevention (patients without disease) and secondary prevention (patients with cardiovascular disease [CVD]) were identified by computerized search methods and reference lists of reviews and articles. Inclusion criteria included assignment to intervention and control groups, physical activity or cardiorespiratory fitness outcome measures, and, for the secondary prevention studies, measurement 12 or more months after randomization. The number of studies with statistically significant effects was determined overall as well as for studies testing interventions with various characteristics. Results: Twelve studies of primary prevention were identified, seven of which were randomized. Three of four randomized studies with short-term measurement (4 weeks to 3 months after randomization), and two of five randomized studies with long-term measurement (6 months after randomization) achieved significant effects on physical activity. Twenty-four randomized studies of CVD secondary prevention were identified; 13 achieved significant effects on activity and/or fitness at twelve or more months. Studies with measurement at two time points showed decaying effects over time, particularly if the intervention were discontinued. Successful interventions contained multiple contacts, behavioral approaches, supervised exercise, provision of equipment, and/or continuing intervention. Many studies had methodologic problems such as low follow-up rates. Conclusion: Interventions in health care settings can increase physical activity for both primary and secondary prevention. Long-term effects are more likely with continuing intervention and multiple intervention components such as supervised exercise, provision of equipment, and behavioral approaches. Recommendations for additional research are given.
Resumo:
SETTING: Hlabisa Tuberculosis Programme, Hlabisa, South Africa. OBJECTIVE: To determine trends in and risk factors for interruption of tuberculosis treatment. METHODS: Data were extracted from the control programme database starting in 1991. Temporal trends in treatment interruption are described; independent risk factors for treatment interruption were determined with a multiple logistic regression model, and Kaplan-Meier survival curves for treatment interruption were constructed for patients treated in 1994-1995. RESULTS: Overall 629 of 3610 surviving patients (17%) failed to complete treatment; this proportion increased from 11% (n = 79) in 1991/1992 to 22% (n = 201) in 1996. Independent risk factors for treatment interruption were diagnosis between 1994-1996 compared with 1991-1393 (odds ratio [OR] 1.9, 95% confidence interval [CT] 1.6-2.4); human immunodeficiency virus (HIV) positivity compared with HIV negativity (OR 1.8, 95% CI 1.4-2.4); supervised by village clinic compared with community health worker (OR 1.9, 95% CI 1.4-2.6); and male versus female sex (OR 1.3, 95% CI 1.1-1.6). Few patients interrupted treatment during the first 2 weeks, and the treatment interruption rate thereafter was constant at 1% per 14 days. CONCLUSIONS: Frequency of treatment interruption from this programme has increased recently. The strongest risk factor was year of diagnosis, perhaps reflecting the impact of an increased caseload on programme performance. Ensuring adherence to therapy in communities with a high level of migration remains a challenge even within community-based directly observed therapy programmes.