776 resultados para Machine learning methods
Resumo:
With the advent of high-performance computing devices, deep neural networks have gained a lot of popularity in solving many Natural Language Processing tasks. However, they are also vulnerable to adversarial attacks, which are able to modify the input text in order to mislead the target model. Adversarial attacks are a serious threat to the security of deep neural networks, and they can be used to craft adversarial examples that steer the model towards a wrong decision. In this dissertation, we propose SynBA, a novel contextualized synonym-based adversarial attack for text classification. SynBA is based on the idea of replacing words in the input text with their synonyms, which are selected according to the context of the sentence. We show that SynBA successfully generates adversarial examples that are able to fool the target model with a high success rate. We demonstrate three advantages of this proposed approach: (1) effective - it outperforms state-of-the-art attacks by semantic similarity and perturbation rate, (2) utility-preserving - it preserves semantic content, grammaticality, and correct types classified by humans, and (3) efficient - it performs attacks faster than other methods.
Resumo:
In this thesis we address a multi-label hierarchical text classification problem in a low-resource setting and explore different approaches to identify the best one for our case. The goal is to train a model that classifies English school exercises according to a hierarchical taxonomy with few labeled data. The experiments made in this work employ different machine learning models and text representation techniques: CatBoost with tf-idf features, classifiers based on pre-trained models (mBERT, LASER), and SetFit, a framework for few-shot text classification. SetFit proved to be the most promising approach, achieving better performance when during training only a few labeled examples per class are available. However, this thesis does not consider all the hierarchical taxonomy, but only the first two levels: to address classification with the classes at the third level further experiments should be carried out, exploring methods for zero-shot text classification, data augmentation, and strategies to exploit the hierarchical structure of the taxonomy during training.
Resumo:
Privacy issues and data scarcity in PET field call for efficient methods to expand datasets via synthetic generation of new data that cannot be traced back to real patients and that are also realistic. In this thesis, machine learning techniques were applied to 1001 amyloid-beta PET images, which had undergone a diagnosis of Alzheimer’s disease: the evaluations were 540 positive, 457 negative and 4 unknown. Isomap algorithm was used as a manifold learning method to reduce the dimensions of the PET dataset; a numerical scale-free interpolation method was applied to invert the dimensionality reduction map. The interpolant was tested on the PET images via LOOCV, where the removed images were compared with the reconstructed ones with the mean SSIM index (MSSIM = 0.76 ± 0.06). The effectiveness of this measure is questioned, since it indicated slightly higher performance for a method of comparison using PCA (MSSIM = 0.79 ± 0.06), which gave clearly poor quality reconstructed images with respect to those recovered by the numerical inverse mapping. Ten synthetic PET images were generated and, after having been mixed with ten originals, were sent to a team of clinicians for the visual assessment of their realism; no significant agreements were found either between clinicians and the true image labels or among the clinicians, meaning that original and synthetic images were indistinguishable. The future perspective of this thesis points to the improvement of the amyloid-beta PET research field by increasing available data, overcoming the constraints of data acquisition and privacy issues. Potential improvements can be achieved via refinements of the manifold learning and the inverse mapping stages during the PET image analysis, by exploring different combinations in the choice of algorithm parameters and by applying other non-linear dimensionality reduction algorithms. A final prospect of this work is the search for new methods to assess image reconstruction quality.
Resumo:
Worldwide, biodiversity is decreasing due to climate change, habitat fragmentation and agricultural intensification. Bees are essential crops pollinator, but their abundance and diversity are decreasing as well. For their conservation, it is necessary to assess the status of bee population. Field data collection methods are expensive and time consuming thus, recently, new methods based on remote sensing are used. In this study we tested the possibility of using flower cover diversity estimated by UAV images (FCD-UAV) to assess bee diversity and abundance in 10 agricultural meadows in the Netherlands. In order to do so, field data of flower and bee diversity and abundance were collected during a campaign in May 2021. Furthermore, RGB images of the areas have been collected using Unmanned Aerial Vehicle (UAV) and post-processed into orthomosaics. Lastly, Random Forest machine learning algorithm was applied to estimate FCD of the species detected in each field. Resulting FCD was expressed with Shannon and Simpson diversity indices, which were successively correlated to bee Shannon and Simpson diversity indices, abundance and species richness. The results showed a positive relationship between FCD-UAV and in-situ collected data about bee diversity, evaluated with Shannon index, abundance and species richness. The strongest relationship was found between FCD (Shannon Index) and bee abundance with R2=0.52. Following, good correlations were found with bee species richness (R2=0.39) and bee diversity (R2=0.37). R2 values of the relationship between FCD (Simpson Index) and bee abundance, species richness and diversity were slightly inferior (0.45, 0.37 and 0.35, respectively). Our results suggest that the proposed method based on the coupling of UAV imagery and machine learning for the assessment of flower species diversity could be developed into valuable tools for large-scale, standardized and cost-effective monitoring of flower cover and of the habitat quality for bees.
Resumo:
Robotic Grasping is an important research topic in robotics since for robots to attain more general-purpose utility, grasping is a necessary skill, but very challenging to master. In general the robots may use their perception abilities like an image from a camera to identify grasps for a given object usually unknown. A grasp describes how a robotic end-effector need to be positioned to securely grab an object and successfully lift it without lost it, at the moment state of the arts solutions are still far behind humans. In the last 5–10 years, deep learning methods take the scene to overcome classical problem like the arduous and time-consuming approach to form a task-specific algorithm analytically. In this thesis are present the progress and the approaches in the robotic grasping field and the potential of the deep learning methods in robotic grasping. Based on that, an implementation of a Convolutional Neural Network (CNN) as a starting point for generation of a grasp pose from camera view has been implemented inside a ROS environment. The developed technologies have been integrated into a pick-and-place application for a Panda robot from Franka Emika. The application includes various features related to object detection and selection. Additionally, the features have been kept as generic as possible to allow for easy replacement or removal if needed, without losing time for improvement or new testing.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
This work presents a method for predicting resource availability in opportunistic grids by means of use pattern analysis (UPA), a technique based on non-supervised learning methods. This prediction method is based on the assumption of the existence of several classes of computational resource use patterns, which can be used to predict the resource availability. Trace-driven simulations validate this basic assumptions, which also provide the parameter settings for the accurate learning of resource use patterns. Experiments made with an implementation of the UPA method show the feasibility of its use in the scheduling of grid tasks with very little overhead. The experiments also demonstrate the method`s superiority over other predictive and non-predictive methods. An adaptative prediction method is suggested to deal with the lack of training data at initialization. Further adaptative behaviour is motivated by experiments which show that, in some special environments, reliable resource use patterns may not always be detected. Copyright (C) 2009 John Wiley & Sons, Ltd.
Resumo:
Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.
Resumo:
Desde a d??cada de 1990, o Governo Federal brasileiro vem implementando uma agenda ambiciosa de reformas do Estado, centradas na redu????o da pobreza e na melhoria da efici??ncia dos servi??os p??blicos. As principais prioridades, conforme previstas no Plano Plurianual (PPA) para o per??odo 2003-2007, s??o as seguintes: inclus??o social e redu????o da desigualdade; crescimento econ??mico com gera????o de emprego; distribui????o de renda e respeito ao meio ambiente; promo????o e amplia????o dos direitos de cidadania; e fortalecimento da democracia. No in??cio de 2006, o Governo criou a Pol??tica Nacional de Desenvolvimento de Pessoal (Decreto 5.707), com o objetivo de melhorar e aumentar a efici??ncia e a efic??cia na presta????o de servi??os p??blicos. No marco dessa pol??tica recente, as escolas de administra????o p??blica desempenham um papel fundamental na identifica????o das compet??ncias que precisam ser desenvolvidas nas institui????es do governo, bem como na implementa????o de pol??ticas de capacita????o para os servidores p??blicos, diretamente e/ou em parceria com escolas de governo nos n??veis federal, estadual ou local. O Canad?? tamb??m est?? criando uma estrutura para levantar as compet??ncias necess??rias para os servidores p??blicos e desenvolv??-las como um componente da Renova????o do Servi??o P??blico em todo o governo. Como institui????es l??deres no desenvolvimento de compet??ncias de servidores p??blicos, a Canada School of Public Service (CSPS) e a Escola Nacional de Administra????o P??blica (ENAP) firmaram uma parceria para implementar o Projeto de Desenvolvimento de Capacidade de Governan??a no Brasil. A finalidade do Projeto ?? melhorar a capacidade de servidores p??blicos federais, estaduais e municipais do Brasil para desenvolver e implementar programas de capacita????o e gerenciar pol??ticas p??blicas descentralizadas. Espera-se que essa parceria e o resultante compartilhamento de experi??ncias em capacita????o para governan??a efetiva contribuam para a redu????o da pobreza e das desigualdades no Brasil, por meio do desenvolvimento de compet??ncias de servidores na presta????o de servi??os p??blicos eficazes e eficientes, voltados para o cidad??o. O Projeto re??ne, al??m das duas principais Escolas de Governo no Canad?? e no Brasil, seis Escolas Brasileiras de Administra????o P??blica regionais e duas renomadas Institui????es Acad??micas Canadenses ??? a Queen???s University e a Western Ontario University. O Minist??rio do Desenvolvimento Social e Combate ?? Fome (MDS) e tr??s Secretarias Especiais do Governo Federal ??? Ra??a (SEPPIR), Direitos Humanos (SEDH) e Pol??ticas para as Mulheres (SPM) ??? tamb??m se envolver??o nas atividades de compartilhamento de conhecimentos com o Human Resources and Skills Development Canada (HRSDC) e a Canada Public Service Agency (CPSA). A CIDA fornecer?? CND$1.700.000 por meio do Programa Brasil-Canad?? de Interc??mbio de Conhecimentos para a Promo????o da Equidade (PIPE). A contribui????o da ENAP ser?? de CND$1.069.707 em esp??cie. A CSPS contribuir?? com cursos, al??m de conhecimentos e suporte t??cnicos, avaliados em CND$1.000.000. Aproveitando a parceria entre a CSPS e a ENAP, que resultou na transfer??ncia e na adapta????o bem sucedidas de cursos e metodologias canadenses, o novo projeto extrapola o n??cleo do servi??o p??blico em Bras??lia, alcan??ando escolas de governo em regi??es brasileiras em situa????o de desvantagem. ?? semelhan??a do papel da CSPS no primeiro projeto, a ENAP fortalecer?? a capacidade das escolas parceiras regionais para capacitar servidores p??blicos envolvidos na presta????o de servi??os aos brasileiros. O interc??mbio estruturado entre Minist??rios dos Governos canadense e brasileiro tamb??m aplicar?? a aprendizagem mais diretamente a quest??es de pol??ticas e programas sociais do Brasil. O desafio assumido neste Projeto ?? a adapta????o de conhecimentos e aprendizagem, com vistas a melhorar a implementa????o de pol??ticas e programas sociais. Para tanto, a CSPS e a ENAP introduzir??o novos cursos nos curr??culos das escolas parceiras e incorporar??o novos m??todos e tecnologias de aprendizagem como, por exemplo, comunidades de pr??tica virtuais e um componente de tutoria (mentoring) envolvendo o Human Resources and Skills Development Canada e o Minist??rio do Desenvolvimento Social e Combate ?? Fome do Brasil. Seis institui????es da Rede Nacional de Escolas de Governo do Brasil e do Programa de Parceria da ENAP foram selecionadas e convidadas a se unir ?? CSPS e ?? ENAP nesse novo Projeto: a Universidade Federal do Par?? (UFPA), de Bel??m (estado do Par?? ??? regi??o Norte); a Funda????o Joaquim Nabuco (FUNDAJ), de Recife (Pernambuco ??? Nordeste); a Universidade Corporativa do Servi??o P??blico / Secretaria de Administra????o do Estado da Bahia (UCS/SAEB), Salvador (Bahia ??? Nordeste); a Escola de Governo do Mato Grosso do Sul (ESCOLAGOV), Campo Grande (estado do Mato Grosso do Sul ??? Centro-Oeste); a Escola Nacional de Ci??ncias Estat??sticas / Instituto Brasileiro de Geografia e Estat??stica (ENCE/IBGE), Rio de Janeiro (estado do Rio de Janeiro ??? Sudeste); e o Instituto Municipal de Administra????o P??blica (IMAP) de Curitiba (Paran?? ??? Sul). Essas escolas de refer??ncia foram escolhidas segundo sua capacidade de trabalhar como p??los de pr??ticas inovadoras em pol??ticas p??blicas e disseminar os benef??cios do Projeto para outras escolas em suas regi??es, por meio da Rede Nacional coordenada pela ENAP. O objetivo dessa parceria ?? fortalecer as escolas de governo locais, para que estas desenvolvam, por meio de eventos de aprendizagem, compet??ncias em servidores p??blicos, a fim de aumentar a capacidade do governo na implementa????o e gest??o de pol??ticas p??blicas. O Plano de Implementa????o do Projeto (PIP) descreve o trabalho a ser realizado por essas institui????es nos pr??ximos 30 meses, ao tempo em que serve de guia para os Parceiros do Projeto no que se refere ??s a????es e aos recursos necess??rios para a obten????o dos resultados acordados. Na medida em que o Projeto estiver em andamento e os parceiros iniciarem um interc??mbio produtivo de conhecimentos, o Plano de Trabalho Anual ser?? atualizado e revisto por meio de reuni??es anuais de avalia????o e encontros do Comit?? Diretor do Projeto, com vistas a assegurar que os resultados descritos no PIP sejam alcan??ados com sucesso
Resumo:
Dental implant recognition in patients without available records is a time-consuming and not straightforward task. The traditional method is a complete user-dependent process, where the expert compares a 2D X-ray image of the dental implant with a generic database. Due to the high number of implants available and the similarity between them, automatic/semi-automatic frameworks to aide implant model detection are essential. In this study, a novel computer-aided framework for dental implant recognition is suggested. The proposed method relies on image processing concepts, namely: (i) a segmentation strategy for semi-automatic implant delineation; and (ii) a machine learning approach for implant model recognition. Although the segmentation technique is the main focus of the current study, preliminary details of the machine learning approach are also reported. Two different scenarios are used to validate the framework: (1) comparison of the semi-automatic contours against implant’s manual contours of 125 X-ray images; and (2) classification of 11 known implants using a large reference database of 601 implants. Regarding experiment 1, 0.97±0.01, 2.24±0.85 pixels and 11.12±6 pixels of dice metric, mean absolute distance and Hausdorff distance were obtained, respectively. In experiment 2, 91% of the implants were successfully recognized while reducing the reference database to 5% of its original size. Overall, the segmentation technique achieved accurate implant contours. Although the preliminary classification results prove the concept of the current work, more features and an extended database should be used in a future work.