779 resultados para Supervised machine learning
Resumo:
Recently there has been an increasing interest in the development of new methods using Pareto optimality to deal with multi-objective criteria (for example, accuracy and architectural complexity). Once one has learned a model based on their devised method, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Unfortunately, the standard tests used for this purpose are not able to jointly consider performance measures. The aim of this paper is to resolve this issue by developing statistical procedures that are able to account for multiple competing measures at the same time. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameter of such models, as usually the number of studied cases is very reduced in such comparisons. Real data from a comparison among general purpose classifiers is used to show a practical application of our tests.
Resumo:
In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.
Resumo:
Esta tese propõe uma forma diferente de navegação de robôs em ambientes dinâmicos, onde o robô tira partido do movimento de pedestres, com o objetivo de melhorar as suas capacidades de navegação. A ideia principal é que, ao invés de tratar as pessoas como obstáculos dinâmicos que devem ser evitados, elas devem ser tratadas como agentes especiais com conhecimento avançado em navegação em ambientes dinâmicos. Para se beneficiar do movimento de pedestres, este trabalho propõe que um robô os selecione e siga, de modo que possa mover-se por caminhos ótimos, desviar-se de obstáculos não detetados, melhorar a navegação em ambientes densamente populados e aumentar a sua aceitação por outros humanos. Para atingir estes objetivos, novos métodos são desenvolvidos na área da seleção de líderes, onde duas técnicas são exploradas. A primeira usa métodos de previsão de movimento, enquanto a segunda usa técnicas de aprendizagem por máquina, para avaliar a qualidade de candidatos a líder, onde o treino é feito com exemplos reais. Os métodos de seleção de líder são integrados com algoritmos de planeamento de movimento e experiências são realizadas para validar as técnicas propostas.
Resumo:
The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.
Resumo:
Nowadays, communication environments are already characterized by a myriad of competing and complementary technologies that aim to provide an ubiquitous connectivity service. Next Generation Networks need to hide this heterogeneity by providing a new abstraction level, while simultaneously be aware of the underlying technologies to deliver richer service experiences to the end-user. Moreover, the increasing interest for group-based multimedia services followed by their ever growing resource demands and network dynamics, has been boosting the research towards more scalable and exible network control approaches. The work developed in this Thesis enables such abstraction and exploits the prevailing heterogeneity in favor of a context-aware network management and adaptation. In this scope, we introduce a novel hierarchical control framework with self-management capabilities that enables the concept of Abstract Multiparty Trees (AMTs) to ease the control of multiparty content distribution throughout heterogeneous networks. A thorough evaluation of the proposed multiparty transport control framework was performed in the scope of this Thesis, assessing its bene ts in terms of network selection, delivery tree recon guration and resource savings. Moreover, we developed an analytical study to highlight the scalability of the AMT concept as well as its exibility in large scale networks and group sizes. To prove the feasibility and easy deployment characteristic of the proposed control framework, we implemented a proof-of-concept demonstrator that comprehends the main control procedures conceptually introduced. Its outcomes highlight a good performance of the multiparty content distribution tree control, including its local and global recon guration. In order to endow the AMT concept with the ability to guarantee the best service experience by the end-user, we integrate in the control framework two additional QoE enhancement approaches. The rst employs the concept of Network Coding to improve the robustness of the multiparty content delivery, aiming at mitigating the impact of possible packet losses in the end-user service perception. The second approach relies on a machine learning scheme to autonomously determine at each node the expected QoE towards a certain destination. This knowledge is then used by di erent QoE-aware network management schemes that, jointly, maximize the overall users' QoE. The performance and scalability of the control procedures developed, aided by the context and QoE-aware mechanisms, show the advantages of the AMT concept and the proposed hierarchical control strategy for the multiparty content distribution with enhanced service experience. Moreover we also prove the feasibility of the solution in a practical environment, and provide future research directions that bene t the evolved control framework and make it commercially feasible.
Automatic classification of scientific records using the German Subject Heading Authority File (SWD)
Resumo:
The following paper deals with an automatic text classification method which does not require training documents. For this method the German Subject Heading Authority File (SWD), provided by the linked data service of the German National Library is used. Recently the SWD was enriched with notations of the Dewey Decimal Classification (DDC). In consequence it became possible to utilize the subject headings as textual representations for the notations of the DDC. Basically, we we derive the classification of a text from the classification of the words in the text given by the thesaurus. The method was tested by classifying 3826 OAI-Records from 7 different repositories. Mean reciprocal rank and recall were chosen as evaluation measure. Direct comparison to a machine learning method has shown that this method is definitely competitive. Thus we can conclude that the enriched version of the SWD provides high quality information with a broad coverage for classification of German scientific articles.
Resumo:
Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2014
Resumo:
Thesis (Master's)--University of Washington, 2012
Resumo:
Thesis (Ph.D.)--University of Washington, 2013
Resumo:
Thesis (Master's)--University of Washington, 2013
Resumo:
Thesis (Master's)--University of Washington, 2014
Resumo:
Thesis (Master's)--University of Washington, 2016-03
Resumo:
Coping with an ageing population is a major concern for healthcare organisations around the world. The average cost of hospital care is higher than social care for older and terminally ill patients. Moreover, the average cost of social care increases with the age of the patient. Therefore, it is important to make efficient and fair capacity planning which also incorporates patient centred outcomes. Predictive models can provide predictions which their accuracy can be understood and quantified. Predictive modelling can help patients and carers to get the appropriate support services, and allow clinical decision-makers to improve care quality and reduce the cost of inappropriate hospital and Accident and Emergency admissions. The aim of this study is to provide a review of modelling techniques and frameworks for predictive risk modelling of patients in hospital, based on routinely collected data such as the Hospital Episode Statistics database. A number of sub-problems can be considered such as Length-of-Stay and End-of-Life predictive modelling. The methodologies in the literature are mainly focused on addressing the problems using regression methods and Markov models, and the majority lack generalisability. In some cases, the robustness, accuracy and re-usability of predictive risk models have been shown to be improved using Machine Learning methods. Dynamic Bayesian Network techniques can represent complex correlations models and include small probabilities into the solution. The main focus of this study is to provide a review of major time-varying Dynamic Bayesian Network techniques with applications in healthcare predictive risk modelling.
Resumo:
The study of electricity markets operation has been gaining an increasing importance in last years, as result of the new challenges that the electricity markets restructuring produced. This restructuring increased the competitiveness of the market, but with it its complexity. The growing complexity and unpredictability of the market’s evolution consequently increases the decision making difficulty. Therefore, the intervenient entities are forced to rethink their behaviour and market strategies. Currently, lots of information concerning electricity markets is available. These data, concerning innumerous regards of electricity markets operation, is accessible free of charge, and it is essential for understanding and suitably modelling electricity markets. This paper proposes a tool which is able to handle, store and dynamically update data. The development of the proposed tool is expected to be of great importance to improve the comprehension of electricity markets and the interactions among the involved entities.
Resumo:
This paper presents MASCEM - a multi-agent based electricity market simulator. MASCEM uses game theory, machine learning techniques, scenario analysis and optimisation techniques to model market agents and to provide them with decision-support. This paper mainly focus on the MASCEM ability to provide the means to model and simulate Virtual Power Producers (VPP). VPPs are represented as a coalition of agents, with specific characteristics and goals. The paper detail some of the most important aspects considered in VPP formation and in the aggregation of new producers and includes a case study.