820 resultados para Data classification
Resumo:
Em cenas naturais, ocorrem com certa freqüência classes espectralmente muito similares, isto é, os vetores média são muito próximos. Em situações como esta, dados de baixa dimensionalidade (LandSat-TM, Spot) não permitem uma classificação acurada da cena. Por outro lado, sabe-se que dados em alta dimensionalidade [FUK 90] tornam possível a separação destas classes, desde que as matrizes covariância sejam suficientemente distintas. Neste caso, o problema de natureza prática que surge é o da estimação dos parâmetros que caracterizam a distribuição de cada classe. Na medida em que a dimensionalidade dos dados cresce, aumenta o número de parâmetros a serem estimados, especialmente na matriz covariância. Contudo, é sabido que, no mundo real, a quantidade de amostras de treinamento disponíveis, é freqüentemente muito limitada, ocasionando problemas na estimação dos parâmetros necessários ao classificador, degradando portanto a acurácia do processo de classificação, na medida em que a dimensionalidade dos dados aumenta. O Efeito de Hughes, como é chamado este fenômeno, já é bem conhecido no meio científico, e estudos vêm sendo realizados com o objetivo de mitigar este efeito. Entre as alternativas propostas com a finalidade de mitigar o Efeito de Hughes, encontram-se as técnicas de regularização da matriz covariância. Deste modo, técnicas de regularização para a estimação da matriz covariância das classes, tornam-se um tópico interessante de estudo, bem como o comportamento destas técnicas em ambientes de dados de imagens digitais de alta dimensionalidade em sensoriamento remoto, como por exemplo, os dados fornecidos pelo sensor AVIRIS. Neste estudo, é feita uma contextualização em sensoriamento remoto, descrito o sistema sensor AVIRIS, os princípios da análise discriminante linear (LDA), quadrática (QDA) e regularizada (RDA) são apresentados, bem como os experimentos práticos dos métodos, usando dados reais do sensor. Os resultados mostram que, com um número limitado de amostras de treinamento, as técnicas de regularização da matriz covariância foram eficientes em reduzir o Efeito de Hughes. Quanto à acurácia, em alguns casos o modelo quadrático continua sendo o melhor, apesar do Efeito de Hughes, e em outros casos o método de regularização é superior, além de suavizar este efeito. Esta dissertação está organizada da seguinte maneira: No primeiro capítulo é feita uma introdução aos temas: sensoriamento remoto (radiação eletromagnética, espectro eletromagnético, bandas espectrais, assinatura espectral), são também descritos os conceitos, funcionamento do sensor hiperespectral AVIRIS, e os conceitos básicos de reconhecimento de padrões e da abordagem estatística. No segundo capítulo, é feita uma revisão bibliográfica sobre os problemas associados à dimensionalidade dos dados, à descrição das técnicas paramétricas citadas anteriormente, aos métodos de QDA, LDA e RDA, e testes realizados com outros tipos de dados e seus resultados.O terceiro capítulo versa sobre a metodologia que será utilizada nos dados hiperespectrais disponíveis. O quarto capítulo apresenta os testes e experimentos da Análise Discriminante Regularizada (RDA) em imagens hiperespectrais obtidos pelo sensor AVIRIS. No quinto capítulo são apresentados as conclusões e análise final. A contribuição científica deste estudo, relaciona-se à utilização de métodos de regularização da matriz covariância, originalmente propostos por Friedman [FRI 89] para classificação de dados em alta dimensionalidade (dados sintéticos, dados de enologia), para o caso especifico de dados de sensoriamento remoto em alta dimensionalidade (imagens hiperespectrais). A conclusão principal desta dissertação é que o método RDA é útil no processo de classificação de imagens com dados em alta dimensionalidade e classes com características espectrais muito próximas.
Resumo:
Data classification is a task with high applicability in a lot of areas. Most methods for treating classification problems found in the literature dealing with single-label or traditional problems. In recent years has been identified a series of classification tasks in which the samples can be labeled at more than one class simultaneously (multi-label classification). Additionally, these classes can be hierarchically organized (hierarchical classification and hierarchical multi-label classification). On the other hand, we have also studied a new category of learning, called semi-supervised learning, combining labeled data (supervised learning) and non-labeled data (unsupervised learning) during the training phase, thus reducing the need for a large amount of labeled data when only a small set of labeled samples is available. Thus, since both the techniques of multi-label and hierarchical multi-label classification as semi-supervised learning has shown favorable results with its use, this work is proposed and used to apply semi-supervised learning in hierarchical multi-label classication tasks, so eciently take advantage of the main advantages of the two areas. An experimental analysis of the proposed methods found that the use of semi-supervised learning in hierarchical multi-label methods presented satisfactory results, since the two approaches were statistically similar results
Resumo:
Pós-graduação em Agronomia (Energia na Agricultura) - FCA
Resumo:
In general, pattern recognition techniques require a high computational burden for learning the discriminating functions that are responsible to separate samples from distinct classes. As such, there are several studies that make effort to employ machine learning algorithms in the context of big data classification problems. The research on this area ranges from Graphics Processing Units-based implementations to mathematical optimizations, being the main drawback of the former approaches to be dependent on the graphic video card. Here, we propose an architecture-independent optimization approach for the optimum-path forest (OPF) classifier, that is designed using a theoretical formulation that relates the minimum spanning tree with the minimum spanning forest generated by the OPF over the training dataset. The experiments have shown that the approach proposed can be faster than the traditional one in five public datasets, being also as accurate as the original OPF. (C) 2014 Elsevier B. V. All rights reserved.
Resumo:
Since the beginning, some pattern recognition techniques have faced the problem of high computational burden for dataset learning. Among the most widely used techniques, we may highlight Support Vector Machines (SVM), which have obtained very promising results for data classification. However, this classifier requires an expensive training phase, which is dominated by a parameter optimization that aims to make SVM less prone to errors over the training set. In this paper, we model the problem of finding such parameters as a metaheuristic-based optimization task, which is performed through Harmony Search (HS) and some of its variants. The experimental results have showen the robustness of HS-based approaches for such task in comparison against with an exhaustive (grid) search, and also a Particle Swarm Optimization-based implementation.
Resumo:
Complex networks have been employed to model many real systems and as a modeling tool in a myriad of applications. In this paper, we use the framework of complex networks to the problem of supervised classification in the word disambiguation task, which consists in deriving a function from the supervised (or labeled) training data of ambiguous words. Traditional supervised data classification takes into account only topological or physical features of the input data. On the other hand, the human (animal) brain performs both low- and high-level orders of learning and it has facility to identify patterns according to the semantic meaning of the input data. In this paper, we apply a hybrid technique which encompasses both types of learning in the field of word sense disambiguation and show that the high-level order of learning can really improve the accuracy rate of the model. This evidence serves to demonstrate that the internal structures formed by the words do present patterns that, generally, cannot be correctly unveiled by only traditional techniques. Finally, we exhibit the behavior of the model for different weights of the low- and high-level classifiers by plotting decision boundaries. This study helps one to better understand the effectiveness of the model. Copyright (C) EPLA, 2012
Resumo:
Semisupervised learning is a machine learning approach that is able to employ both labeled and unlabeled samples in the training process. In this paper, we propose a semisupervised data classification model based on a combined random-preferential walk of particles in a network (graph) constructed from the input dataset. The particles of the same class cooperate among themselves, while the particles of different classes compete with each other to propagate class labels to the whole network. A rigorous model definition is provided via a nonlinear stochastic dynamical system and a mathematical analysis of its behavior is carried out. A numerical validation presented in this paper confirms the theoretical predictions. An interesting feature brought by the competitive-cooperative mechanism is that the proposed model can achieve good classification rates while exhibiting low computational complexity order in comparison to other network-based semisupervised algorithms. Computer simulations conducted on synthetic and real-world datasets reveal the effectiveness of the model.
Resumo:
Este artigo pretende oferecer uma visão global dos direitos da personalidade, desde a possibilidade de sua aplicação às pessoas jurídicas, passando pela superposição do estudo de seu objeto por outros ramos do Direito, assim como por dados históricos, classificação, análise jurisprudencial, doutrinária e legislativa dos pontos centrais que envolvem tais direitos.
Resumo:
El objeto de la investigación es analizar la situación de las plataformas logísticas y caracterizar las mismas, sus áreas funcionales y los parámetros de diseño que se utilizan para su planificación y desarrollo. La investigación se ha realizado sobre las plataformas logísticas existentes en España, si bien se ha contrastado la situación nacional con otras experiencias internacionales. Para ello se ha procedido al estudio del estado del arte a nivel internacional, determinando la terminología, caracterización y clasificación que aporta la comunidad científica al respecto sobre las plataformas logísticas. La investigación se ha centrado en el estudio de un elevado número de plataformas logísticas ubicadas en distintas Comunidades Autónomas, que representan en torno al cuarenta por ciento de las que se encuentran en estos momentos en activo, analizando sus datos básicos, su clasificación en lo que respecta a su grado de centralidad, su intermodalidad, su accesibilidad, y la información urbanística relativa a superficies de áreas funcionales, edificabilidades, usos principales, complementarios y no admitidos, red viaria, zonas verdes, y parámetros máximos y mínimos de ordenación. El análisis, valoración e interpretación de los resultados obtenidos, ha permitido concluir en la heterogeneidad existente sobre el término, proponiéndose por parte del autor un nuevo concepto que englobe y defina de forma genérica pero clara la tipología de plataformas logísticas presentes en el panorama nacional. Del mismo modo, y como consecuencia de la investigación realizada, ha sido posible la caracterización de una “plataforma logística tipo”, estableciendo unos parámetros de diseño estándar para desarrollos posteriores. Finalmente se proponen varias líneas de investigación. En primer lugar, analizar el conjunto del transporte de mercancías, para determinar y valorar su desequilibrio, para de este modo potenciar la intermodalidad e internacionalizar la misma; y, en segundo lugar, establecer una regulación global de las competencias y legislación en materia de planificación de centros logísticos, para racionalizar el desarrollo de estos elementos. The aim of this research is to analyze the situation of logistics platforms and to characterize them, their functional areas and the design parameters that are used in their planning and development. This study was conducted on logistics platforms in Spain, although the situation in Spain has been compared to other international experiences. For this purpose, the international state-of-the-art has been examined, determining the terminology, characterization and classification provided by the scientific community in regard to logistics platforms. The research focuses on the study of a large number of logistics platforms located in different Autonomous Regions, representing approximately 40% of those currently in operation. An analysis has been made of the fundamental data, classification in regard to the degree of centrality, intermodality, accessibility and the urban planning information relating to functional area spaces, as well as buildable potential, main, complementary and prohibited uses, the road network, green zones and maximum and minimum development planning parameters. The analysis, evaluation and interpretation of the results obtained has led to the conclusion of the heterogeneous nature of the term, and the author proposes a new concept that would cover and define, in a generic yet clear way, the typology of logistics platforms present in the national scenario. In a similar manner, and as a result of the research conducted, it has been possible to define a “logistics platform type”, establishing standard design parameters for future developments. Lastly, various lines of research are proposed. In the first place, to analyze overall goods transport to determine and evaluate its imbalances and, in this way, to promote its intermodality and internationalization. In the second place, to establish a global regulation of competencies and legislation with regard to the planning of logistics centers, to rationalize the development of these elements.
Resumo:
In this paper, we propose a novel filter for feature selection. Such filter relies on the estimation of the mutual information between features and classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon one. The complexity of such bypassing process does not depend on the number of dimensions but on the number of patterns/samples, and thus the curse of dimensionality is circumvented. We show that it is then possible to outperform a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification.
Resumo:
The construction industry is characterised by fragmentation and suffers from lack of collaboration, often adopting adversarial working practices to achieve deliverables. For the UK Government and construction industry, BIM is a game changer aiming to rectify this fragmentation and promote collaboration. However it has become clear that there is an essential need to have better controls and definitions of both data deliverables and data classification. Traditional methods and techniques for collating and inputting data have shown to be time consuming and provide little to improve or add value to the overall task of improving deliverables. Hence arose the need in the industry to develop a Digital Plan of Work (DPoW) toolkit that would aid the decision making process, providing the required control over the project workflows and data deliverables, and enabling better collaboration through transparency of need and delivery. The specification for the existing Digital Plan of Work (DPoW) was to be, an industry standard method of describing geometric, requirements and data deliveries at key stages of the project cycle, with the addition of a structured and standardised information classification system. However surveys and interviews conducted within this research indicate that the current DPoW resembles a digitised version of the pre-existing plans of work and does not push towards the data enriched decision-making abilities that advancements in technology now offer. A Digital Framework is not simply the digitisation of current or historic standard methods and procedures, it is a new intelligent driven digital system that uses new tools, processes, procedures and work flows to eradicate waste and increase efficiency. In addition to reporting on conducted surveys above, this research paper will present a theoretical investigation into usage of Intelligent Decision Support Systems within a digital plan of work framework. Furthermore this paper will present findings on the suitability to utilise advancements in intelligent decision-making system frameworks and Artificial Intelligence for a UK BIM Framework. This should form the foundations of decision-making for projects implemented at BIM level 2. The gap identified in this paper is that the current digital toolkit does not incorporate the intelligent characteristics available in other industries through advancements in technology and collation of vast amounts of data that a digital plan of work framework could have access to and begin to develop, learn and adapt for decision-making through the live interaction of project stakeholders.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
A number of studies in the areas of Biomedical Engineering and Health Sciences have employed machine learning tools to develop methods capable of identifying patterns in different sets of data. Despite its extinction in many countries of the developed world, Hansen’s disease is still a disease that affects a huge part of the population in countries such as India and Brazil. In this context, this research proposes to develop a method that makes it possible to understand in the future how Hansen’s disease affects facial muscles. By using surface electromyography, a system was adapted so as to capture the signals from the largest possible number of facial muscles. We have first looked upon the literature to learn about the way researchers around the globe have been working with diseases that affect the peripheral neural system and how electromyography has acted to contribute to the understanding of these diseases. From these data, a protocol was proposed to collect facial surface electromyographic (sEMG) signals so that these signals presented a high signal to noise ratio. After collecting the signals, we looked for a method that would enable the visualization of this information in a way to make it possible to guarantee that the method used presented satisfactory results. After identifying the method's efficiency, we tried to understand which information could be extracted from the electromyographic signal representing the collected data. Once studies demonstrating which information could contribute to a better understanding of this pathology were not to be found in literature, parameters of amplitude, frequency and entropy were extracted from the signal and a feature selection was made in order to look for the features that better distinguish a healthy individual from a pathological one. After, we tried to identify the classifier that best discriminates distinct individuals from different groups, and also the set of parameters of this classifier that would bring the best outcome. It was identified that the protocol proposed in this study and the adaptation with disposable electrodes available in market proved their effectiveness and capability of being used in different studies whose intention is to collect data from facial electromyography. The feature selection algorithm also showed that not all of the features extracted from the signal are significant for data classification, with some more relevant than others. The classifier Support Vector Machine (SVM) proved itself efficient when the adequate Kernel function was used with the muscle from which information was to be extracted. Each investigated muscle presented different results when the classifier used linear, radial and polynomial kernel functions. Even though we have focused on Hansen’s disease, the method applied here can be used to study facial electromyography in other pathologies.
Resumo:
A number of studies in the areas of Biomedical Engineering and Health Sciences have employed machine learning tools to develop methods capable of identifying patterns in different sets of data. Despite its extinction in many countries of the developed world, Hansen’s disease is still a disease that affects a huge part of the population in countries such as India and Brazil. In this context, this research proposes to develop a method that makes it possible to understand in the future how Hansen’s disease affects facial muscles. By using surface electromyography, a system was adapted so as to capture the signals from the largest possible number of facial muscles. We have first looked upon the literature to learn about the way researchers around the globe have been working with diseases that affect the peripheral neural system and how electromyography has acted to contribute to the understanding of these diseases. From these data, a protocol was proposed to collect facial surface electromyographic (sEMG) signals so that these signals presented a high signal to noise ratio. After collecting the signals, we looked for a method that would enable the visualization of this information in a way to make it possible to guarantee that the method used presented satisfactory results. After identifying the method's efficiency, we tried to understand which information could be extracted from the electromyographic signal representing the collected data. Once studies demonstrating which information could contribute to a better understanding of this pathology were not to be found in literature, parameters of amplitude, frequency and entropy were extracted from the signal and a feature selection was made in order to look for the features that better distinguish a healthy individual from a pathological one. After, we tried to identify the classifier that best discriminates distinct individuals from different groups, and also the set of parameters of this classifier that would bring the best outcome. It was identified that the protocol proposed in this study and the adaptation with disposable electrodes available in market proved their effectiveness and capability of being used in different studies whose intention is to collect data from facial electromyography. The feature selection algorithm also showed that not all of the features extracted from the signal are significant for data classification, with some more relevant than others. The classifier Support Vector Machine (SVM) proved itself efficient when the adequate Kernel function was used with the muscle from which information was to be extracted. Each investigated muscle presented different results when the classifier used linear, radial and polynomial kernel functions. Even though we have focused on Hansen’s disease, the method applied here can be used to study facial electromyography in other pathologies.
Resumo:
Physiologists and animal scientists try to understand the relationship between ruminants and their environment. The knowledge about feeding behavior of these animals is the key to maximize the production of meat and milk and their derivatives and ensure animal welfare. Within the area called precision farming, one of the goals is to find a model that describes animal nutrition. Existing methods for determining the consumption and ingestive patterns are often time-consuming and imprecise. Therefore, an accurate and less laborious method may be relevant for feeding behaviour recognition. Surface electromyography (sEMG) is able to provide information of muscle activity. Through sEMG of the muscles of mastication, coupled with instrumentation techniques, signal processing and data classification, it is possible to extract the variables of interest that describe chewing activity. This work presents a new method for chewing pattern evaluation, feed intake prediction and for the determination of rumination, food and daily rest time through ruminant animals masseter muscle sEMG signals. Short-term evaluation results are shown and discussed, evidencing employed methods viability.