778 resultados para Machine Learning. Semissupervised learning. Multi-label classification. Reliability Parameter


Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo principal de esta tesis doctoral es profundizar en el análisis y diseño de un sistema inteligente para la predicción y control del acabado superficial en un proceso de fresado a alta velocidad, basado fundamentalmente en clasificadores Bayesianos, con el prop´osito de desarrollar una metodolog´ıa que facilite el diseño de este tipo de sistemas. El sistema, cuyo propósito es posibilitar la predicción y control de la rugosidad superficial, se compone de un modelo aprendido a partir de datos experimentales con redes Bayesianas, que ayudar´a a comprender los procesos dinámicos involucrados en el mecanizado y las interacciones entre las variables relevantes. Dado que las redes neuronales artificiales son modelos ampliamente utilizados en procesos de corte de materiales, también se incluye un modelo para fresado usándolas, donde se introdujo la geometría y la dureza del material como variables novedosas hasta ahora no estudiadas en este contexto. Por lo tanto, una importante contribución en esta tesis son estos dos modelos para la predicción de la rugosidad superficial, que se comparan con respecto a diferentes aspectos: la influencia de las nuevas variables, los indicadores de evaluación del desempeño, interpretabilidad. Uno de los principales problemas en la modelización con clasificadores Bayesianos es la comprensión de las enormes tablas de probabilidad a posteriori producidas. Introducimos un m´etodo de explicación que genera un conjunto de reglas obtenidas de árboles de decisión. Estos árboles son inducidos a partir de un conjunto de datos simulados generados de las probabilidades a posteriori de la variable clase, calculadas con la red Bayesiana aprendida a partir de un conjunto de datos de entrenamiento. Por último, contribuimos en el campo multiobjetivo en el caso de que algunos de los objetivos no se puedan cuantificar en números reales, sino como funciones en intervalo de valores. Esto ocurre a menudo en aplicaciones de aprendizaje automático, especialmente las basadas en clasificación supervisada. En concreto, se extienden las ideas de dominancia y frontera de Pareto a esta situación. Su aplicación a los estudios de predicción de la rugosidad superficial en el caso de maximizar al mismo tiempo la sensibilidad y la especificidad del clasificador inducido de la red Bayesiana, y no solo maximizar la tasa de clasificación correcta. Los intervalos de estos dos objetivos provienen de un m´etodo de estimación honesta de ambos objetivos, como e.g. validación cruzada en k rodajas o bootstrap.---ABSTRACT---The main objective of this PhD Thesis is to go more deeply into the analysis and design of an intelligent system for surface roughness prediction and control in the end-milling machining process, based fundamentally on Bayesian network classifiers, with the aim of developing a methodology that makes easier the design of this type of systems. The system, whose purpose is to make possible the surface roughness prediction and control, consists of a model learnt from experimental data with the aid of Bayesian networks, that will help to understand the dynamic processes involved in the machining and the interactions among the relevant variables. Since artificial neural networks are models widely used in material cutting proceses, we include also an end-milling model using them, where the geometry and hardness of the piecework are introduced as novel variables not studied so far within this context. Thus, an important contribution in this thesis is these two models for surface roughness prediction, that are then compared with respecto to different aspects: influence of the new variables, performance evaluation metrics, interpretability. One of the main problems with Bayesian classifier-based modelling is the understanding of the enormous posterior probabilitiy tables produced. We introduce an explanation method that generates a set of rules obtained from decision trees. Such trees are induced from a simulated data set generated from the posterior probabilities of the class variable, calculated with the Bayesian network learned from a training data set. Finally, we contribute in the multi-objective field in the case that some of the objectives cannot be quantified as real numbers but as interval-valued functions. This often occurs in machine learning applications, especially those based on supervised classification. Specifically, the dominance and Pareto front ideas are extended to this setting. Its application to the surface roughness prediction studies the case of maximizing simultaneously the sensitivity and specificity of the induced Bayesian network classifier, rather than only maximizing the correct classification rate. Intervals in these two objectives come from a honest estimation method of both objectives, like e.g. k-fold cross-validation or bootstrap.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An automatic machine learning strategy for computing the 3D structure of monocular images from a single image query using Local Binary Patterns is presented. The 3D structure is inferred through a training set composed by a repository of color and depth images, assuming that images with similar structure present similar depth maps. Local Binary Patterns are used to characterize the structure of the color images. The depth maps of those color images with a similar structure to the query image are adaptively combined and filtered to estimate the final depth map. Using public databases, promising results have been obtained outperforming other state-of-the-art algorithms and with a computational cost similar to the most efficient 2D-to-3D algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La teoría de reconocimiento y clasificación de patrones y el aprendizaje automático son actualmente áreas de conocimiento en constante desarrollo y con aplicaciones prácticas en múltiples ámbitos de la industria. El propósito de este Proyecto de Fin de Grado es el estudio de las mismas así como la implementación de un sistema software que dé solución a un problema de clasificación de ruido impulsivo, concretamente mediante el desarrollo de un sistema de seguridad basado en la clasificación de eventos sonoros en tiempo real. La solución será integral, comprendiendo todas las fases del proceso, desde la captación de sonido hasta el etiquetado de los eventos registrados, pasando por el procesado digital de señal y la extracción de características. Para su desarrollo se han diferenciado dos partes fundamentales; una primera que comprende la interfaz de usuario y el procesado de la señal de audio donde se desarrollan las labores de monitorización y detección de ruido impulsivo y otra segunda centrada únicamente en la clasificación de los eventos sonoros detectados, definiendo una arquitectura de doble clasificador donde se determina si los eventos detectados son falsas alarmas o amenazas, etiquetándolos como de un tipo concreto en este segundo caso. Los resultados han sido satisfactorios, mostrando una fiabilidad global en el proceso de entorno al 90% a pesar de algunas limitaciones a la hora de construir la base de datos de archivos de audio, lo que prueba que un dispositivo de seguridad basado en el análisis de ruido ambiente podría incluirse en un sistema integral de alarma doméstico aumentando la protección del hogar. ABSTRACT. Pattern classification and machine learning are currently expertise areas under continuous development and also with extensive applications in many business sectors. The aim of this Final Degree Project is to study them as well as the implementation of software to carry on impulsive noise classification tasks, particularly through the development of a security system based on sound events classification. The solution will go over all process stages, from capturing sound to the labelling of the events recorded, without forgetting digital signal processing and feature extraction, everything in real time. In the development of the Project a distinction has been made between two main parts. The first one comprises the user’s interface and the audio signal processing module, where monitoring and impulsive noise detection tasks take place. The second one is focussed in sound events classification tasks, defining a double classifier architecture where it is determined whether detected events are false alarms or threats, labelling them from a concrete category in the latter case. The obtained results have been satisfactory, with an overall reliability of 90% despite some limitations when building the audio files database. This proves that a safety device based on the analysis of environmental noise could be included in a full alarm system increasing home protection standards.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The emergence of new horizons in the field of travel assistant management leads to the development of cutting-edge systems focused on improving the existing ones. Moreover, new opportunities are being also presented since systems trend to be more reliable and autonomous. In this paper, a self-learning embedded system for object identification based on adaptive-cooperative dynamic approaches is presented for intelligent sensor’s infrastructures. The proposed system is able to detect and identify moving objects using a dynamic decision tree. Consequently, it combines machine learning algorithms and cooperative strategies in order to make the system more adaptive to changing environments. Therefore, the proposed system may be very useful for many applications like shadow tolls since several types of vehicles may be distinguished, parking optimization systems, improved traffic conditions systems, etc.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Foreign exchange trading has emerged recently as a significant activity in many countries. As with most forms of trading, the activity is influenced by many random parameters so that the creation of a system that effectively emulates the trading process will be very helpful. A major issue for traders in the deregulated Foreign Exchange Market is when to sell and when to buy a particular currency in order to maximize profit. This paper presents novel trading strategies based on the machine learning methods of genetic algorithms and reinforcement learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: Inpatient length of stay (LOS) is an important measure of hospital activity, health care resource consumption, and patient acuity. This research work aims at developing an incremental expectation maximization (EM) based learning approach on mixture of experts (ME) system for on-line prediction of LOS. The use of a batchmode learning process in most existing artificial neural networks to predict LOS is unrealistic, as the data become available over time and their pattern change dynamically. In contrast, an on-line process is capable of providing an output whenever a new datum becomes available. This on-the-spot information is therefore more useful and practical for making decisions, especially when one deals with a tremendous amount of data. Methods and material: The proposed approach is illustrated using a real example of gastroenteritis LOS data. The data set was extracted from a retrospective cohort study on all infants born in 1995-1997 and their subsequent admissions for gastroenteritis. The total number of admissions in this data set was n = 692. Linked hospitalization records of the cohort were retrieved retrospectively to derive the outcome measure, patient demographics, and associated co-morbidities information. A comparative study of the incremental learning and the batch-mode learning algorithms is considered. The performances of the learning algorithms are compared based on the mean absolute difference (MAD) between the predictions and the actual LOS, and the proportion of predictions with MAD < 1 day (Prop(MAD < 1)). The significance of the comparison is assessed through a regression analysis. Results: The incremental learning algorithm provides better on-line prediction of LOS when the system has gained sufficient training from more examples (MAD = 1.77 days and Prop(MAD < 1) = 54.3%), compared to that using the batch-mode learning. The regression analysis indicates a significant decrease of MAD (p-value = 0.063) and a significant (p-value = 0.044) increase of Prop(MAD

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning from mistakes has proven to be an effective way of learning in the interactive document classifications. In this paper we propose an approach to effectively learning from mistakes in the email filtering process. Our system has employed both SVM and Winnow machine learning algorithms to learn from misclassified email documents and refine the email filtering process accordingly. Our experiments have shown that the training of an email filter becomes much effective and faster

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, machine learning algorithms have been evaluated in applications where assumptions can be reliably made about class priors and/or misclassification costs. In this paper, we consider the case of imprecise environments, where little may be known about these factors and they may well vary significantly when the system is applied. Specifically, the use of precision-recall analysis is investigated and compared to the more well known performance measures such as error-rate and the receiver operating characteristic (ROC). We argue that while ROC analysis is invariant to variations in class priors, this invariance in fact hides an important factor of the evaluation in imprecise environments. Therefore, we develop a generalised precision-recall analysis methodology in which variation due to prior class probabilities is incorporated into a multi-way analysis of variance (ANOVA). The increased sensitivity and reliability of this approach is demonstrated in a remote sensing application.