19 resultados para Machine learning experiments
Resumo:
Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.
Resumo:
This paper presents a composite multi-layer classifier system for predicting the subcellular localization of proteins based on their amino acid sequence. The work is an extension of our previous predictor PProwler v1.1 which is itself built upon the series of predictors SignalP and TargetP. In this study we outline experiments conducted to improve the classifier design. The major improvement came from using Support Vector machines as a "smart gate" sorting the outputs of several different targeting peptide detection networks. Our final model (PProwler v1.2) gives MCC values of 0.873 for non-plant and 0.849 for plant proteins. The model improves upon the accuracy of our previous subcellular localization predictor (PProwler v1.1) by 2% for plant data (which represents 7.5% improvement upon TargetP).
Resumo:
This paper presents a new approach to improving the effectiveness of autonomous systems that deal with dynamic environments. The basis of the approach is to find repeating patterns of behavior in the dynamic elements of the system, and then to use predictions of the repeating elements to better plan goal directed behavior. It is a layered approach involving classifying, modeling, predicting and exploiting. Classifying involves using observations to place the moving elements into previously defined classes. Modeling involves recording features of the behavior on a coarse grained grid. Exploitation is achieved by integrating predictions from the model into the behavior selection module to improve the utility of the robot's actions. This is in contrast to typical approaches that use the model to select between different strategies or plays. Three methods of adaptation to the dynamic features of the environment are explored. The effectiveness of each method is determined using statistical tests over a number of repeated experiments. The work is presented in the context of predicting opponent behavior in the highly dynamic and multi-agent robot soccer domain (RoboCup)