720 resultados para Machine Learning. Semissupervised learning. Multi-label classification. Reliability Parameter


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, an improved stochastic discrimination (SD) is introduced to reduce the error rate of the standard SD in the context of multi-class classification problem. The learning procedure of the improved SD consists of two stages. In the first stage, a standard SD, but with shorter learning period is carried out to identify an important space where all the misclassified samples are located. In the second stage, the standard SD is modified by (i) restricting sampling in the important space; and (ii) introducing a new discriminant function for samples in the important space. It is shown by mathematical derivation that the new discriminant function has the same mean, but smaller variance than that of standard SD for samples in the important space. It is also analyzed that the smaller the variance of the discriminant function, the lower the error rate of the classifier. Consequently, the proposed improved SD improves standard SD by its capability of achieving higher classification accuracy. Illustrative examples axe provided to demonstrate the effectiveness of the proposed improved SD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Species` potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species` potential distribution. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a global economy, manufacturers mainly compete with cost efficiency of production, as the price of raw materials are similar worldwide. Heavy industry has two big issues to deal with. On the one hand there is lots of data which needs to be analyzed in an effective manner, and on the other hand making big improvements via investments in cooperate structure or new machinery is neither economically nor physically viable. Machine learning offers a promising way for manufacturers to address both these problems as they are in an excellent position to employ learning techniques with their massive resource of historical production data. However, choosing modelling a strategy in this setting is far from trivial and this is the objective of this article. The article investigates characteristics of the most popular classifiers used in industry today. Support Vector Machines, Multilayer Perceptron, Decision Trees, Random Forests, and the meta-algorithms Bagging and Boosting are mainly investigated in this work. Lessons from real-world implementations of these learners are also provided together with future directions when different learners are expected to perform well. The importance of feature selection and relevant selection methods in an industrial setting are further investigated. Performance metrics have also been discussed for the sake of completion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developing successful navigation and mapping strategies is an essential part of autonomous robot research. However, hardware limitations often make for inaccurate systems. This project serves to investigate efficient alternatives to mapping an environment, by first creating a mobile robot, and then applying machine learning to the robot and controlling systems to increase the robustness of the robot system. My mapping system consists of a semi-autonomous robot drone in communication with a stationary Linux computer system. There are learning systems running on both the robot and the more powerful Linux system. The first stage of this project was devoted to designing and building an inexpensive robot. Utilizing my prior experience from independent studies in robotics, I designed a small mobile robot that was well suited for simple navigation and mapping research. When the major components of the robot base were designed, I began to implement my design. This involved physically constructing the base of the robot, as well as researching and acquiring components such as sensors. Implementing the more complex sensors became a time-consuming task, involving much research and assistance from a variety of sources. A concurrent stage of the project involved researching and experimenting with different types of machine learning systems. I finally settled on using neural networks as the machine learning system to incorporate into my project. Neural nets can be thought of as a structure of interconnected nodes, through which information filters. The type of neural net that I chose to use is a type that requires a known set of data that serves to train the net to produce the desired output. Neural nets are particularly well suited for use with robotic systems as they can handle cases that lie at the extreme edges of the training set, such as may be produced by "noisy" sensor data. Through experimenting with available neural net code, I became familiar with the code and its function, and modified it to be more generic and reusable for multiple applications of neural nets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study presents an approach to combine uncertainties of the hydrological model outputs predicted from a number of machine learning models. The machine learning based uncertainty prediction approach is very useful for estimation of hydrological models' uncertainty in particular hydro-metrological situation in real-time application [1]. In this approach the hydrological model realizations from Monte Carlo simulations are used to build different machine learning uncertainty models to predict uncertainty (quantiles of pdf) of the a deterministic output from hydrological model . Uncertainty models are trained using antecedent precipitation and streamflows as inputs. The trained models are then employed to predict the model output uncertainty which is specific for the new input data. We used three machine learning models namely artificial neural networks, model tree, locally weighted regression to predict output uncertainties. These three models produce similar verification results, which can be improved by merging their outputs dynamically. We propose an approach to form a committee of the three models to combine their outputs. The approach is applied to estimate uncertainty of streamflows simulation from a conceptual hydrological model in the Brue catchment in UK and the Bagmati catchment in Nepal. The verification results show that merged output is better than an individual model output. [1] D. L. Shrestha, N. Kayastha, and D. P. Solomatine, and R. Price. Encapsulation of parameteric uncertainty statistics by various predictive machine learning models: MLUE method, Journal of Hydroinformatic, in press, 2013.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although some individual techniques of supervised Machine Learning (ML), also known as classifiers, or algorithms of classification, to supply solutions that, most of the time, are considered efficient, have experimental results gotten with the use of large sets of pattern and/or that they have a expressive amount of irrelevant data or incomplete characteristic, that show a decrease in the efficiency of the precision of these techniques. In other words, such techniques can t do an recognition of patterns of an efficient form in complex problems. With the intention to get better performance and efficiency of these ML techniques, were thought about the idea to using some types of LM algorithms work jointly, thus origin to the term Multi-Classifier System (MCS). The MCS s presents, as component, different of LM algorithms, called of base classifiers, and realized a combination of results gotten for these algorithms to reach the final result. So that the MCS has a better performance that the base classifiers, the results gotten for each base classifier must present an certain diversity, in other words, a difference between the results gotten for each classifier that compose the system. It can be said that it does not make signification to have MCS s whose base classifiers have identical answers to the sames patterns. Although the MCS s present better results that the individually systems, has always the search to improve the results gotten for this type of system. Aim at this improvement and a better consistency in the results, as well as a larger diversity of the classifiers of a MCS, comes being recently searched methodologies that present as characteristic the use of weights, or confidence values. These weights can describe the importance that certain classifier supplied when associating with each pattern to a determined class. These weights still are used, in associate with the exits of the classifiers, during the process of recognition (use) of the MCS s. Exist different ways of calculating these weights and can be divided in two categories: the static weights and the dynamic weights. The first category of weights is characterizes for not having the modification of its values during the classification process, different it occurs with the second category, where the values suffers modifications during the classification process. In this work an analysis will be made to verify if the use of the weights, statics as much as dynamics, they can increase the perfomance of the MCS s in comparison with the individually systems. Moreover, will be made an analysis in the diversity gotten for the MCS s, for this mode verify if it has some relation between the use of the weights in the MCS s with different levels of diversity

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The presence of precipitates in metallic materials affects its durability, resistance and mechanical properties. Hence, its automatic identification by image processing and machine learning techniques may lead to reliable and efficient assessments on the materials. In this paper, we introduce four widely used supervised pattern recognition techniques to accomplish metallic precipitates segmentation in scanning electron microscope images from dissimilar welding on a Hastelloy C-276 alloy: Support Vector Machines, Optimum-Path Forest, Self Organizing Maps and a Bayesian classifier. Experimental results demonstrated that all classifiers achieved similar recognition rates with good results validated by an expert in metallographic image analysis. © 2011 Springer-Verlag Berlin Heidelberg.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wireless Sensor Networks (WSNs) can be used to monitor hazardous and inaccessible areas. In these situations, the power supply (e.g. battery) of each node cannot be easily replaced. One solution to deal with the limited capacity of current power supplies is to deploy a large number of sensor nodes, since the lifetime and dependability of the network will increase through cooperation among nodes. Applications on WSN may also have other concerns, such as meeting temporal deadlines on message transmissions and maximizing the quality of information. Data fusion is a well-known technique that can be useful for the enhancement of data quality and for the maximization of WSN lifetime. In this paper, we propose an approach that allows the implementation of parallel data fusion techniques in IEEE 802.15.4 networks. One of the main advantages of the proposed approach is that it enables a trade-off between different user-defined metrics through the use of a genetic machine learning algorithm. Simulations and field experiments performed in different communication scenarios highlight significant improvements when compared with, for instance, the Gur Game approach or the implementation of conventional periodic communication techniques over IEEE 802.15.4 networks. © 2013 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise.