839 resultados para Real-world problem
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
This paper surveys the context of feature extraction by neural network approaches, and compares and contrasts their behaviour as prospective data visualisation tools in a real world problem. We also introduce and discuss a hybrid approach which allows us to control the degree of discriminatory and topographic information in the extracted feature space.
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2016
Resumo:
Design methods and tools are generally best learned and developed experientially [1]. Finding appropriate vehicles for delivering these to students is becoming increasingly challenging, especially when considering only those that will enthuse, intrigue and inspire. This paper traces the development of different eco-car design and build projects which competed in the Shell Eco-Marathon. The cars provided opportunities for experiential learning through a formal learning cycle of CDIO (Conceive, Design, Implement, Operate) or the more traditional understand, explore, create, validate, with both teams developing a functional finished prototype. Lessons learned were applied through the design of a third and fourth eco-car using experimental techniques with bio-composites, combining the knowledge of fibre reinforced composite materials and adhesives with the plywood construction techniques of the two teams. The paper discusses the importance of applying materials and techniques to a real world problem. It will also explore how eco-car and comparing traditional materials and construction techniques with high tech composite materials is an ideal teaching, learning and assessment vehicle for technical design techniques.
Resumo:
A dolgozat célja egy vállalati gyakorlatból származó eset elemzése. Egy könyvkiadót tekintünk. A kiadó kapcsolatban van kis- és nagykereskedőkkel, valamint a fogyasztók egy csoportjával is vannak kapcsolatai. A könyvkiadók projekt rendszerben működnek. A kiadó azzal a problémával szembesül, hogy hogyan ossza el egy frissen kiadott és nyomtatott könyv példányszámait a kis- és nagykereskedők között, valamint mekkora példányszámot tároljon maga a fogyasztók közvetlen kielégítésére. A kiadóról feltételezzük, hogy visszavásárlási szerződése van a kereskedőkkel. A könyv iránti kereslet nem ismert, de becsülhető. A kis- és nagykereskedők maximalizálják a nyereségüket. = The aim of the paper is to analyze a practical real world problem. A publishing house is given. The publishing firm has contacts to a number of wholesaler / retailer enterprises and direct contact to customers to satisfy the market demand. The book publishers work in a project industry. The publisher faces with the problem how to allocate the stocks of a given, newly published book to the wholesaler and retailer, and to hold some copies to satisfy the customers direct from the publisher. The publisher has a buyback option. The distribution of the demand is unknown, but it can be estimated. The wholesaler / retailer maximize the profits. The problem can be modeled as a one-warehouse and N-retailer supply chain with not identical demand distribution. The model can be transformed in a game theory problem. It is assumed that the demand distribution follows a Poisson distribution.
Resumo:
Egy könyvkiadó vállalatot vizsgálunk. A kiadó kiadványait a szokásos értékesítési láncon (kis- és nagykereskedelem) keresztül értékesíti. A kérdés az, hogy egy új könyv példányait hogyan allokálja az értékesítési láncban. Feltételezzük, hogy a kereslet ismert, Poisson-eloszlású. A készletezés költségeit szintén ismertnek tételezzük fel. Cél a költségek minimalizálása. = The aim of the paper is to analyze a practical real world problem. A publishing house is given. The publishing firm has contacts to a number of wholesaler / retailer enterprises and direct contact to customers to satisfy the market demand. The book publishers work in a project industry. The publisher faces with the problem to allocate the stocks of a given, newly published book to the wholesaler and retailer, and to hold some copies to satisfy the customers direct from the publisher. The distribution of the demand is unknown, but it can be estimated. The costs consist of inventory holding and shortage, backorder costs. The decision maker wants to minimize these relevant costs. The problem can be modeled as a one-warehouse and N-retailer supply chain with not identical demand distribution. The problem structure is similar that of a newsvendor model. It is assumed that the demand distribution follows a Poisson distribution.
Resumo:
Proceedings of the 8th International Symposium on Project Approaches in Engineering Education (PAEE), Guimarães, 2016
Resumo:
Ecological science contributes to solving a broad range of environmental problems. However, lack of ecological literacy in practice often limits application of this knowledge. In this paper, we highlight a critical but often overlooked demand on ecological literacy: to enable professionals of various careers to apply scientific knowledge when faced with environmental problems. Current university courses on ecology often fail to persuade students that ecological science provides important tools for environmental problem solving. We propose problem-based learning to improve the understanding of ecological science and its usefulness for real-world environmental issues that professionals in careers as diverse as engineering, public health, architecture, social sciences, or management will address. Courses should set clear learning objectives for cognitive skills they expect students to acquire. Thus, professionals in different fields will be enabled to improve environmental decision-making processes and to participate effectively in multidisciplinary work groups charged with tackling environmental issues.
Resumo:
One of the most difficult problems that face researchers experimenting with complex systems in real world applications is the Facility Layout Design Problem. It relies with the design and location of production lines, machinery and equipment, inventory storage and shipping facilities. In this work it is intended to address this problem through the use of Constraint Logic Programming (CLP) technology. The use of Genetic Algorithms (GA) as optimisation technique in CLP environment is also an issue addressed. The approach aims the implementation of genetic algorithm operators following the CLP paradigm.
Resumo:
The container loading problem (CLP) is a combinatorial optimization problem for the spatial arrangement of cargo inside containers so as to maximize the usage of space. The algorithms for this problem are of limited practical applicability if real-world constraints are not considered, one of the most important of which is deemed to be stability. This paper addresses static stability, as opposed to dynamic stability, looking at the stability of the cargo during container loading. This paper proposes two algorithms. The first is a static stability algorithm based on static mechanical equilibrium conditions that can be used as a stability evaluation function embedded in CLP algorithms (e.g. constructive heuristics, metaheuristics). The second proposed algorithm is a physical packing sequence algorithm that, given a container loading arrangement, generates the actual sequence by which each box is placed inside the container, considering static stability and loading operation efficiency constraints.
Resumo:
The Container Loading Problem (CLP) literature has traditionally evaluated the dynamic stability of cargo by applying two metrics to box arrangements: the mean number of boxes supporting the items excluding those placed directly on the floor (M1) and the percentage of boxes with insufficient lateral support (M2). However, these metrics, that aim to be proxies for cargo stability during transportation, fail to translate real-world cargo conditions of dynamic stability. In this paper two new performance indicators are proposed to evaluate the dynamic stability of cargo arrangements: the number of fallen boxes (NFB) and the number of boxes within the Damage Boundary Curve fragility test (NB_DBC). Using 1500 solutions for well-known problem instances found in the literature, these new performance indicators are evaluated using a physics simulation tool (StableCargo), replacing the real-world transportation by a truck with a simulation of the dynamic behaviour of container loading arrangements. Two new dynamic stability metrics that can be integrated within any container loading algorithm are also proposed. The metrics are analytical models of the proposed stability performance indicators, computed by multiple linear regression. Pearson’s r correlation coefficient was used as an evaluation parameter for the performance of the models. The extensive computational results show that the proposed metrics are better proxies for dynamic stability in the CLP than the previous widely used metrics.
Resumo:
Autor proof
Resumo:
Random problem distributions have played a key role in the study and design of algorithms for constraint satisfaction and Boolean satisfiability, as well as in ourunderstanding of problem hardness, beyond standard worst-case complexity. We consider random problem distributions from a highly structured problem domain that generalizes the Quasigroup Completion problem (QCP) and Quasigroup with Holes (QWH), a widely used domain that captures the structure underlying a range of real-world applications. Our problem domain is also a generalization of the well-known Sudoku puz- zle: we consider Sudoku instances of arbitrary order, with the additional generalization that the block regions can have rectangular shape, in addition to the standard square shape. We evaluate the computational hardness of Generalized Sudoku instances, for different parameter settings. Our experimental hardness results show that we can generate instances that are considerably harder than QCP/QWH instances of the same size. More interestingly, we show the impact of different balancing strategies on problem hardness. We also provide insights into backbone variables in Generalized Sudoku instances and how they correlate to problem hardness.
Resumo:
Hub location problem is an NP-hard problem that frequently arises in the design of transportation and distribution systems, postal delivery networks, and airline passenger flow. This work focuses on the Single Allocation Hub Location Problem (SAHLP). Genetic Algorithms (GAs) for the capacitated and uncapacitated variants of the SAHLP based on new chromosome representations and crossover operators are explored. The GAs is tested on two well-known sets of real-world problems with up to 200 nodes. The obtained results are very promising. For most of the test problems the GA obtains improved or best-known solutions and the computational time remains low. The proposed GAs can easily be extended to other variants of location problems arising in network design planning in transportation systems.
Resumo:
Complex networks have recently attracted a significant amount of research attention due to their ability to model real world phenomena. One important problem often encountered is to limit diffusive processes spread over the network, for example mitigating pandemic disease or computer virus spread. A number of problem formulations have been proposed that aim to solve such problems based on desired network characteristics, such as maintaining the largest network component after node removal. The recently formulated critical node detection problem aims to remove a small subset of vertices from the network such that the residual network has minimum pairwise connectivity. Unfortunately, the problem is NP-hard and also the number of constraints is cubic in number of vertices, making very large scale problems impossible to solve with traditional mathematical programming techniques. Even many approximation algorithm strategies such as dynamic programming, evolutionary algorithms, etc. all are unusable for networks that contain thousands to millions of vertices. A computationally efficient and simple approach is required in such circumstances, but none currently exist. In this thesis, such an algorithm is proposed. The methodology is based on a depth-first search traversal of the network, and a specially designed ranking function that considers information local to each vertex. Due to the variety of network structures, a number of characteristics must be taken into consideration and combined into a single rank that measures the utility of removing each vertex. Since removing a vertex in sequential fashion impacts the network structure, an efficient post-processing algorithm is also proposed to quickly re-rank vertices. Experiments on a range of common complex network models with varying number of vertices are considered, in addition to real world networks. The proposed algorithm, DFSH, is shown to be highly competitive and often outperforms existing strategies such as Google PageRank for minimizing pairwise connectivity.