720 resultados para Machine Learning. Semissupervised learning. Multi-label classification. Reliability Parameter
Resumo:
Improved clinical care for Bipolar Disorder (BD) relies on the identification of diagnostic markers that can reliably detect disease-related signals in clinically heterogeneous populations. At the very least, diagnostic markers should be able to differentiate patients with BD from healthy individuals and from individuals at familial risk for BD who either remain well or develop other psychopathology, most commonly Major Depressive Disorder (MDD). These issues are particularly pertinent to the development of translational applications of neuroimaging as they represent challenges for which clinical observation alone is insufficient. We therefore applied pattern classification to task-based functional magnetic resonance imaging (fMRI) data of the n-back working memory task, to test their predictive value in differentiating patients with BD (n=30) from healthy individuals (n=30) and from patients' relatives who were either diagnosed with MDD (n=30) or were free of any personal lifetime history of psychopathology (n=30). Diagnostic stability in these groups was confirmed with 4-year prospective follow-up. Task-based activation patterns from the fMRI data were analyzed with Gaussian Process Classifiers (GPC), a machine learning approach to detecting multivariate patterns in neuroimaging datasets. Consistent significant classification results were only obtained using data from the 3-back versus 0-back contrast. Using contrast, patients with BD were correctly classified compared to unrelated healthy individuals with an accuracy of 83.5%, sensitivity of 84.6% and specificity of 92.3%. Classification accuracy, sensitivity and specificity when comparing patients with BD to their relatives with MDD, were respectively 73.1%, 53.9% and 94.5%. Classification accuracy, sensitivity and specificity when comparing patients with BD to their healthy relatives were respectively 81.8%, 72.7% and 90.9%. We show that significant individual classification can be achieved using whole brain pattern analysis of task-based working memory fMRI data. The high accuracy and specificity achieved by all three classifiers suggest that multivariate pattern recognition analyses can aid clinicians in the clinical care of BD in situations of true clinical uncertainty regarding the diagnosis and prognosis.
Resumo:
Security defects are common in large software systems because of their size and complexity. Although efficient development processes, testing, and maintenance policies are applied to software systems, there are still a large number of vulnerabilities that can remain, despite these measures. Some vulnerabilities stay in a system from one release to the next one because they cannot be easily reproduced through testing. These vulnerabilities endanger the security of the systems. We propose vulnerability classification and prediction frameworks based on vulnerability reproducibility. The frameworks are effective to identify the types and locations of vulnerabilities in the earlier stage, and improve the security of software in the next versions (referred to as releases). We expand an existing concept of software bug classification to vulnerability classification (easily reproducible and hard to reproduce) to develop a classification framework for differentiating between these vulnerabilities based on code fixes and textual reports. We then investigate the potential correlations between the vulnerability categories and the classical software metrics and some other runtime environmental factors of reproducibility to develop a vulnerability prediction framework. The classification and prediction frameworks help developers adopt corresponding mitigation or elimination actions and develop appropriate test cases. Also, the vulnerability prediction framework is of great help for security experts focus their effort on the top-ranked vulnerability-prone files. As a result, the frameworks decrease the number of attacks that exploit security vulnerabilities in the next versions of the software. To build the classification and prediction frameworks, different machine learning techniques (C4.5 Decision Tree, Random Forest, Logistic Regression, and Naive Bayes) are employed. The effectiveness of the proposed frameworks is assessed based on collected software security defects of Mozilla Firefox.
Resumo:
Data sources are often dispersed geographically in real life applications. Finding a knowledge model may require to join all the data sources and to run a machine learning algorithm on the joint set. We present an alternative based on a Multi Agent System (MAS): an agent mines one data source in order to extract a local theory (knowledge model) and then merges it with the previous MAS theory using a knowledge fusion technique. This way, we obtain a global theory that summarizes the distributed knowledge without spending resources and time in joining data sources. New experiments have been executed including statistical significance analysis. The results show that, as a result of knowledge fusion, the accuracy of initial theories is significantly improved as well as the accuracy of the monolithic solution.
Resumo:
Las organizaciones y sus entornos son sistemas complejos. Tales sistemas son difíciles de comprender y predecir. Pese a ello, la predicción es una tarea fundamental para la gestión empresarial y para la toma de decisiones que implica siempre un riesgo. Los métodos clásicos de predicción (entre los cuales están: la regresión lineal, la Autoregresive Moving Average y el exponential smoothing) establecen supuestos como la linealidad, la estabilidad para ser matemática y computacionalmente tratables. Por diferentes medios, sin embargo, se han demostrado las limitaciones de tales métodos. Pues bien, en las últimas décadas nuevos métodos de predicción han surgido con el fin de abarcar la complejidad de los sistemas organizacionales y sus entornos, antes que evitarla. Entre ellos, los más promisorios son los métodos de predicción bio-inspirados (ej. redes neuronales, algoritmos genéticos /evolutivos y sistemas inmunes artificiales). Este artículo pretende establecer un estado situacional de las aplicaciones actuales y potenciales de los métodos bio-inspirados de predicción en la administración.
Resumo:
A utilização generalizada do computador para a automatização das mais diversas tarefas, tem conduzido ao desenvolvimento de aplicações que possibilitam a realização de actividades que até então poderiam não só ser demoradas, como estar sujeitas a erros inerentes à actividade humana. A investigação desenvolvida no âmbito desta tese, tem como objectivo o desenvolvimento de um software e algoritmos que permitam a avaliação e classificação de queijos produzidos na região de Évora, através do processamento de imagens digitais. No decurso desta investigação, foram desenvolvidos algoritmos e metodologias que permitem a identificação dos olhos e dimensões do queijo, a presença de textura na parte exterior do queijo, assim como características relativas à cor do mesmo, permitindo que com base nestes parâmetros possa ser efectuada uma classificação e avaliação do queijo. A aplicação de software, resultou num produto de simples utilização. As fotografias devem respeitar algumas regras simples, sobre as quais se efectuará o processamento e classificação do queijo. ABSTRACT: The widespread use of computers for the automation of repetitive tasks, has resulted in developing applications that allow a range of activities, that until now could not only be time consuming and also subject to errors inherent to human activity, to be performed without or with little human intervention. The research carried out within this thesis, aims to develop a software application and algorithms that enable the assessment and classification of cheeses produced in the region of Évora, by digital images processing. Throughout this research, algorithms and methodologies have been developed that allow the identification of the cheese eyes, the dimensions of the cheese, the presence of texture on the outside of cheese, as well as an analysis of the color, so that, based on these parameters, a classification and evaluation of the cheese can be conducted. The developed software application, is product simple to use, requiring no special computer knowledge. Requires only the acquisition of the photographs following a simple set of rules, based on which it will do the processing and classification of cheese.
Resumo:
This paper presents our work at 2016 FIRE CHIS. Given a CHIS query and a document associated with that query, the task is to classify the sentences in the document as relevant to the query or not; and further classify the relevant sentences to be supporting, neutral or opposing to the claim made in the query. In this paper, we present two different approaches to do the classification. With the first approach, we implement two models to satisfy the task. We first implement an information retrieval model to retrieve the sentences that are relevant to the query; and then we use supervised learning method to train a classification model to classify the relevant sentences into support, oppose or neutral. With the second approach, we only use machine learning techniques to learn a model and classify the sentences into four classes (relevant & support, relevant & neutral, relevant & oppose, irrelevant & neutral). Our submission for CHIS uses the first approach.
Resumo:
The main purpose of this study is to evaluate the best set of features that automatically enables the identification of argumentative sentences from unstructured text. As corpus, we use case laws from the European Court of Human Rights (ECHR). Three kinds of experiments are conducted: Basic Experiments, Multi Feature Experiments and Tree Kernel Experiments. These experiments are basically categorized according to the type of features available in the corpus. The features are extracted from the corpus and Support Vector Machine (SVM) and Random Forest are the used as Machine learning algorithms. We achieved F1 score of 0.705 for identifying the argumentative sentences which is quite promising result and can be used as the basis for a general argument-mining framework.
Resumo:
As descrições de produtos turísticos na área da hotelaria, aviação, rent-a-car e pacotes de férias baseiam-se sobretudo em descrições textuais em língua natural muito heterogénea com estilos, apresentações e conteúdos muito diferentes entre si. Uma vez que o sector do turismo é bastante dinâmico e que os seus produtos e ofertas estão constantemente em alteração, o tratamento manual de normalização de toda essa informação não é possível. Neste trabalho construiu-se um protótipo que permite a classificação e extracção automática de informação a partir de descrições de produtos de turismo. Inicialmente a informação é classificada quanto ao tipo. Seguidamente são extraídos os elementos relevantes de cada tipo e gerados objectos facilmente computáveis. Sobre os objectos extraídos, o protótipo com recurso a modelos de textos e imagens gera automaticamente descrições normalizadas e orientadas a um determinado mercado. Esta versatilidade permite um novo conjunto de serviços na promoção e venda dos produtos que seria impossível implementar com a informação original. Este protótipo, embora possa ser aplicado a outros domínios, foi avaliado na normalização da descrição de hotéis. As frases descritivas do hotel são classificadas consoante o seu tipo (Local, Serviços e/ou Equipamento) através de um algoritmo de aprendizagem automática que obtém valores médios de cobertura de 96% e precisão de 72%. A cobertura foi considerada a medida mais importante uma vez que a sua maximização permite que não se percam frases para processamentos posteriores. Este trabalho permitiu também a construção e população de uma base de dados de hotéis que possibilita a pesquisa de hotéis pelas suas características. Esta funcionalidade não seria possível utilizando os conteúdos originais. ABSTRACT: The description of tourism products, like hotel, aviation, rent-a-car and holiday packages, is strongly supported on natural language expressions. Due to the extent of tourism offers and considering the high dynamics in the tourism sector, manual data management is not a reliable or scalable solution. Offer descriptions - in the order of thousands - are structured in different ways, possibly comprising different languages, complementing and/or overlap one another. This work aims at creating a prototype for the automatic classification and extraction of relevant knowledge from tourism-related text expressions. Captured knowledge is represented in a normalized/standard format to enable new services based on this information in order to promote and sale tourism products that would be impossible to implement with the raw information. Although it could be applied to other areas, this prototype was evaluated in the normalization of hotel descriptions. Hotels descriptive sentences are classified according their type (Location, Services and/or Equipment) using a machine learning algorithm. The built setting obtained an average recall of 96% and precision of 72%. Recall considered the most important measure of performance since its maximization allows that sentences were not lost in further processes. As a side product a database of hotels was built and populated with search facilities on its characteristics. This ability would not be possible using the original contents.
Resumo:
A Histologia, o estudo de tecidos, é uma das áreas fundamentais da Biologia que permitiu enormes avanços científicos. Sendo uma tarefa exigente, meticulosa e demorada, será importante aproveitar a existência de ferramentas e algoritmos computacionais no seu auxílio, tornando o processo mais rápido e possibilitando a descoberta de informação que poderá não estar visível à partida. Esta dissertação tem como principal objectivo averiguar se um animal foi ou não sujeito à ingestão de um xenobiótico. Com esse objectivo em vista, utilizaram-se técnicas de processamento e segmentação de imagem aplicadas a imagens de tecido renal de ratos saudáveis e ratos que ingeriram o xenobiótico. Destas imagens extraíram-se inúmeras características do corpúsculo renal que após serem analisadas através de vários algoritmos de classificação mostraram ser possível saber se o animal ingeriu ou não o xenobiótico, com um reduzido grau de incerteza. ABSTRACT: Histology, the study of tissues, is one of the key areas of Biology that has allowed huge advances in Science. Being a demanding, meticulous and time consuming task, it is important to use the existence of computational tools and algorithms in its aid, making the process faster and enabling the discovery of information that may not be initially visible. The main goal of this thesis is to ascertain if an animal was subjected or not to the ingestion of a xenobiotic. With this in mind, were used image processing and segmentation techniques applied on images of kidney tissue from healthy rats and rats that ingested the xenobiotic. From these images were extracted several features of renal glomeruli that after being analyzed by various classification algorithms had shown to be possible to know, with an acceptable degree of certainty, if the animal ingested or not the xenobiotic.
Resumo:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
Resumo:
In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.
Resumo:
With the recent regulatory reforms in a number of countries, railways resources are no longer managed by a single party but are distributed among different stakeholders. To facilitate the operation of train services, a train service provider (SP) has to negotiate with the infrastructure provider (IP) for a train schedule and the associated track access charge. This paper models the SP and IP as software agents and the negotiation as a prioritized fuzzy constraint satisfaction (PFCS) problem. Computer simulations have been conducted to demonstrate the effects on the train schedule when the SP has different optimization criteria. The results show that by assigning different priorities on the fuzzy constraints, agents can represent SPs with different operational objectives.
Resumo:
Software used by architectural and industrial designers – has moved from becoming a tool for drafting, towards use in verification, simulation, project management and project sharing remotely. In more advanced models, parameters for the designed object can be adjusted so a family of variations can be produced rapidly. With advances in computer aided design technology, numerous design options can now be generated and analyzed in real time. However the use of digital tools to support design as an activity is still at an early stage and has largely been limited in functionality with regard to the design process. To date, major CAD vendors have not developed an integrated tool that is able to both leverage specialized design knowledge from various discipline domains (known as expert knowledge systems) and support the creation of design alternatives that satisfy different forms of constraints. We propose that evolutionary computing and machine learning be linked with parametric design techniques to record and respond to a designer’s own way of working and design history. It is expected that this will lead to results that impact on future work on design support systems-(ergonomics and interface) as well as implicit constraint and problem definition for problems that are difficult to quantify.