934 resultados para unsupervised feature learning
Resumo:
Machine Learning makes computers capable of performing tasks typically requiring human intelligence. A domain where it is having a considerable impact is the life sciences, allowing to devise new biological analysis protocols, develop patients’ treatments efficiently and faster, and reduce healthcare costs. This Thesis work presents new Machine Learning methods and pipelines for the life sciences focusing on the unsupervised field. At a methodological level, two methods are presented. The first is an “Ab Initio Local Principal Path” and it is a revised and improved version of a pre-existing algorithm in the manifold learning realm. The second contribution is an improvement over the Import Vector Domain Description (one-class learning) through the Kullback-Leibler divergence. It hybridizes kernel methods to Deep Learning obtaining a scalable solution, an improved probabilistic model, and state-of-the-art performances. Both methods are tested through several experiments, with a central focus on their relevance in life sciences. Results show that they improve the performances achieved by their previous versions. At the applicative level, two pipelines are presented. The first one is for the analysis of RNA-Seq datasets, both transcriptomic and single-cell data, and is aimed at identifying genes that may be involved in biological processes (e.g., the transition of tissues from normal to cancer). In this project, an R package is released on CRAN to make the pipeline accessible to the bioinformatic Community through high-level APIs. The second pipeline is in the drug discovery domain and is useful for identifying druggable pockets, namely regions of a protein with a high probability of accepting a small molecule (a drug). Both these pipelines achieve remarkable results. Lastly, a detour application is developed to identify the strengths/limitations of the “Principal Path” algorithm by analyzing Convolutional Neural Networks induced vector spaces. This application is conducted in the music and visual arts domains.
Resumo:
One of the e-learning environment goal is to attend the individual needs of students during the learning process. The adaptation of contents, activities and tools into different visualization or in a variety of content types is an important feature of this environment, bringing to the user the sensation that there are suitable workplaces to his profile in the same system. Nevertheless, it is important the investigation of student behaviour aspects, considering the context where the interaction happens, to achieve an efficient personalization process. The paper goal is to present an approach to identify the student learning profile analyzing the context of interaction. Besides this, the learning profile could be analyzed in different dimensions allows the system to deal with the different focus of the learning.
Resumo:
In this paper, a framework for detection of human skin in digital images is proposed. This framework is composed of a training phase and a detection phase. A skin class model is learned during the training phase by processing several training images in a hybrid and incremental fuzzy learning scheme. This scheme combines unsupervised-and supervised-learning: unsupervised, by fuzzy clustering, to obtain clusters of color groups from training images; and supervised to select groups that represent skin color. At the end of the training phase, aggregation operators are used to provide combinations of selected groups into a skin model. In the detection phase, the learned skin model is used to detect human skin in an efficient way. Experimental results show robust and accurate human skin detection performed by the proposed framework.
Resumo:
A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.
Resumo:
In music genre classification, most approaches rely on statistical characteristics of low-level features computed on short audio frames. In these methods, it is implicitly considered that frames carry equally relevant information loads and that either individual frames, or distributions thereof, somehow capture the specificities of each genre. In this paper we study the representation space defined by short-term audio features with respect to class boundaries, and compare different processing techniques to partition this space. These partitions are evaluated in terms of accuracy on two genre classification tasks, with several types of classifiers. Experiments show that a randomized and unsupervised partition of the space, used in conjunction with a Markov Model classifier lead to accuracies comparable to the state of the art. We also show that unsupervised partitions of the space tend to create less hubs.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
A repository of learning objects is a system that stores electronic resources in a technology-mediated learning process. The need for this kind of repository is growing as more educators become eager to use digital educa- tional contents and more of it becomes available. The sharing and use of these resources relies on the use of content and communication standards as a means to describe and exchange educational resources, commonly known as learning objects. This paper presents the design and implementation of a service-oriented reposi- tory of learning objects called crimsonHex. This repository supports new definitions of learning objects for specialized domains and we illustrate this feature with the definition of programming exercises as learning objects and its validation by the repository. The repository is also fully compliant with existing commu- nication standards and we propose extensions by adding new functions, formalizing message interchange and providing a REST interface. To validate the interoperability features of the repository, we developed a repository plug-in for Moodle that is expected to be included in the next release of this popular learning management system.
Resumo:
Locomotor tasks characterization plays an important role in trying to improve the quality of life of a growing elderly population. This paper focuses on this matter by trying to characterize the locomotion of two population groups with different functional fitness levels (high or low) while executing three different tasks-gait, stair ascent and stair descent. Features were extracted from gait data, and feature selection methods were used in order to get the set of features that allow differentiation between functional fitness level. Unsupervised learning was used to validate the sets obtained and, ultimately, indicated that it is possible to distinguish the two population groups. The sets of best discriminate features for each task are identified and thoroughly analysed. Copyright © 2014 SCITEPRESS - Science and Technology Publications. All rights reserved.
Resumo:
The corner stone of the interoperability of eLearning systems is the standard definition of learning objects. Nevertheless, for some domains this standard is insufficient to fully describe all the assets, especially when they are used as input for other eLearning services. On the other hand, a standard definition of learning objects in not enough to ensure interoperability among eLearning systems; they must also use a standard API to exchange learning objects. This paper presents the design and implementation of a service oriented repository of learning objects called crimsonHex. This repository is fully compliant with the existing interoperability standards and supports new definitions of learning objects for specialized domains. We illustrate this feature with the definition of programming problems as learning objects and its validation by the repository. This repository is also prepared to store usage data on learning objects to tailor the presentation order and adapt it to learner profiles.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.
Resumo:
High-content analysis has revolutionized cancer drug discovery by identifying substances that alter the phenotype of a cell, which prevents tumor growth and metastasis. The high-resolution biofluorescence images from assays allow precise quantitative measures enabling the distinction of small molecules of a host cell from a tumor. In this work, we are particularly interested in the application of deep neural networks (DNNs), a cutting-edge machine learning method, to the classification of compounds in chemical mechanisms of action (MOAs). Compound classification has been performed using image-based profiling methods sometimes combined with feature reduction methods such as principal component analysis or factor analysis. In this article, we map the input features of each cell to a particular MOA class without using any treatment-level profiles or feature reduction methods. To the best of our knowledge, this is the first application of DNN in this domain, leveraging single-cell information. Furthermore, we use deep transfer learning (DTL) to alleviate the intensive and computational demanding effort of searching the huge parameter's space of a DNN. Results show that using this approach, we obtain a 30% speedup and a 2% accuracy improvement.
Resumo:
Learning novel actions and skills is a prevalent ability across multiple species and a critical feature for survival and competence in a constantly changing world. Novel actions are generated and learned through a process of trial and error, where an animal explores the environment around itself, generates multiple patterns of behavior and selects the ones that increase the likelihood of positive outcomes. Proper adaptation and execution of the selected behavior requires the coordination of several biomechanical features by the animal. Cortico-basal ganglia circuits and loops are critically involved in the acquisition, learning and consolidation of motor skills.(...)
Resumo:
O objetivo deste trabalho é apresentar os resultados da análise das concepções de dois protagonistas de uma reforma curricular que está sendo implementada numa escola de engenharia. A principal característica do novo currículo é o uso de projetos e oficinas como atividades complementares a serem realizadas pelos estudantes. As atividades complementares acontecerão em paralelo ao trabalho realizado nas disciplinas sem que haja uma relação de interdisciplinaridade. O novo currículo está sendo implantado desde fevereiro de 2015. Segundo Pacheco (2005) há dois momentos, dentre outros, no processo de mudança curricular, o currículo “ideal”, determinado por dimensões epistemológica, política, econômica, ideológica, técnica, estética, e histórica e, que recebe influência direta daquele que idealiza e cria o novo currículo e, o currículo “formal” que se traduz na prática implementada na escola. São essas duas etapas estudadas nesta pesquisa. Para isso serão considerados como fontes de dados dois protagonistas, um mais ligado à concepção do currículo e outro da sua implementação, a partir dos quais se busca compreender as motivações, crenças e percepções que, por sua vez, determinam a reforma curricular. Entrevistas semiestruturadas foram utilizadas como técnica de pesquisa, com o propósito de se entender a gênese da proposta e as mudanças entre essas duas etapas. Os dados revelam que mudanças aconteceram desde a idealização até a formalização do currículo, motivadas por demandas do processo de implementação, revela ainda diferenças na visão de currículo e a motivação para romper com padrões na formação de engenheiros no Brasil.
Resumo:
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching 90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.