809 resultados para Machine learning classification
Resumo:
The comfort level of the seat has a major effect on the usage of a vehicle; thus, car manufacturers have been working on elevating car seat comfort as much as possible. However, still, the testing and evaluation of comfort are done using exhaustive trial and error testing and evaluation of data. In this thesis, we resort to machine learning and Artificial Neural Networks (ANN) to develop a fully automated approach. Even though this approach has its advantages in minimizing time and using a large set of data, it takes away the degree of freedom of the engineer on making decisions. The focus of this study is on filling the gap in a two-step comfort level evaluation which used pressure mapping with body regions to evaluate the average pressure supported by specific body parts and the Self-Assessment Exam (SAE) questions on evaluation of the person’s interest. This study has created a machine learning algorithm that works on giving a degree of freedom to the engineer in making a decision when mapping pressure values with body regions using ANN. The mapping is done with 92% accuracy and with the help of a Graphical User Interface (GUI) that facilitates the process during the testing time of comfort level evaluation of the car seat, which decreases the duration of the test analysis from days to hours.
Resumo:
Nella sede dell’azienda ospitante Alexide, si è ravvisata la mancanza di un sistema di controllo automatico da remoto dell’intero impianto di climatizzazione HVAC (Heating, Ventilation and Air Conditioning) utilizzato, e la soluzione migliore è risultata quella di attuare un processo di trasformazione della struttura in uno smart building. Ho quindi eseguito questa procedura di trasformazione digitale progettando e sviluppando un sistema distribuito in grado di gestire una serie di dati provenienti in tempo reale da sensori ambientali. L’architettura del sistema progettato è stata sviluppata in C# su ambiente dotNET, dove sono stati collezionati i dati necessari per il funzionamento del modello di predizione. Nella fattispecie sono stati utilizzati i dati provenienti dall’HVAC, da un sensore di temperatura interna dell'edificio e dal fotovoltaico installato nella struttura. La comunicazione tra il sistema distribuito e l’entità dell’HVAC avviene mediante il canale di comunicazione ModBus, mentre per quanto riguarda i dati della temperatura interna e del fotovoltaico questi vengono collezionati da sensori che inviano le informazioni sfruttando un canale di comunicazione che utilizza il protocollo MQTT, e lo stesso viene utilizzato come principale metodo di comunicazione all’interno del sistema, appoggiandosi ad un broker di messaggistica con modello publish/subscribe. L'automatizzazione del sistema è dovuta anche all'utilizzo di un modello di predizione con lo scopo di predire in maniera quanto più accurata possibile la temperatura interna all'edificio delle ore future. Per quanto riguarda il modello di predizione da me implementato e integrato nel sistema la scelta è stata quella di ispirarmi ad un modello ideato da Google nel 2014 ovvero il Sequence to Sequence. Il modello sviluppato si struttura come un encoder-decoder che utilizza le RNN, in particolare le reti LSTM.
Resumo:
Computational anatomy with magnetic resonance imaging (MRI) is well established as a noninvasive biomarker of Alzheimer's disease (AD); however, there is less certainty about its dependency on the staging of AD. We use classical group analyses and automated machine learning classification of standard structural MRI scans to investigate AD diagnostic accuracy from the preclinical phase to clinical dementia. Longitudinal data from the Alzheimer's Disease Neuroimaging Initiative were stratified into 4 groups according to the clinical status-(1) AD patients; (2) mild cognitive impairment (MCI) converters; (3) MCI nonconverters; and (4) healthy controls-and submitted to a support vector machine. The obtained classifier was significantly above the chance level (62%) for detecting AD already 4 years before conversion from MCI. Voxel-based univariate tests confirmed the plausibility of our findings detecting a distributed network of hippocampal-temporoparietal atrophy in AD patients. We also identified a subgroup of control subjects with brain structure and cognitive changes highly similar to those observed in AD. Our results indicate that computational anatomy can detect AD substantially earlier than suggested by current models. The demonstrated differential spatial pattern of atrophy between correctly and incorrectly classified AD patients challenges the assumption of a uniform pathophysiological process underlying clinically identified AD.
Resumo:
The automatic characterization of particles in metallographic images has been paramount, mainly because of the importance of quantifying such microstructures in order to assess the mechanical properties of materials common used in industry. This automated characterization may avoid problems related with fatigue and possible measurement errors. In this paper, computer techniques are used and assessed towards the accomplishment of this crucial industrial goal in an efficient and robust manner. Hence, the use of the most actively pursued machine learning classification techniques. In particularity, Support Vector Machine, Bayesian and Optimum-Path Forest based classifiers, and also the Otsu's method, which is commonly used in computer imaging to binarize automatically simply images and used here to demonstrated the need for more complex methods, are evaluated in the characterization of graphite particles in metallographic images. The statistical based analysis performed confirmed that these computer techniques are efficient solutions to accomplish the aimed characterization. Additionally, the Optimum-Path Forest based classifier demonstrated an overall superior performance, both in terms of accuracy and speed. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.
Resumo:
Behavioral biometrics is one of the areas with growing interest within the biosignal research community. A recent trend in the field is ECG-based biometrics, where electrocardiographic (ECG) signals are used as input to the biometric system. Previous work has shown this to be a promising trait, with the potential to serve as a good complement to other existing, and already more established modalities, due to its intrinsic characteristics. In this paper, we propose a system for ECG biometrics centered on signals acquired at the subject's hand. Our work is based on a previously developed custom, non-intrusive sensing apparatus for data acquisition at the hands, and involved the pre-processing of the ECG signals, and evaluation of two classification approaches targeted at real-time or near real-time applications. Preliminary results show that this system leads to competitive results both for authentication and identification, and further validate the potential of ECG signals as a complementary modality in the toolbox of the biometric system designer.
Resumo:
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.
Resumo:
High-content analysis has revolutionized cancer drug discovery by identifying substances that alter the phenotype of a cell, which prevents tumor growth and metastasis. The high-resolution biofluorescence images from assays allow precise quantitative measures enabling the distinction of small molecules of a host cell from a tumor. In this work, we are particularly interested in the application of deep neural networks (DNNs), a cutting-edge machine learning method, to the classification of compounds in chemical mechanisms of action (MOAs). Compound classification has been performed using image-based profiling methods sometimes combined with feature reduction methods such as principal component analysis or factor analysis. In this article, we map the input features of each cell to a particular MOA class without using any treatment-level profiles or feature reduction methods. To the best of our knowledge, this is the first application of DNN in this domain, leveraging single-cell information. Furthermore, we use deep transfer learning (DTL) to alleviate the intensive and computational demanding effort of searching the huge parameter's space of a DNN. Results show that using this approach, we obtain a 30% speedup and a 2% accuracy improvement.
Resumo:
This dissertation presents a solution for environment sensing using sensor fusion techniques and a context/environment classification of the surroundings in a service robot, so it could change his behavior according to the different rea-soning outputs. As an example, if a robot knows he is outdoors, in a field environment, there can be a sandy ground, in which it should slow down. Contrariwise in indoor environments, that situation is statistically unlikely to happen (sandy ground). This simple assumption denotes the importance of context-aware in automated guided vehicles.
Resumo:
In the recent years, kernel methods have revealed very powerful tools in many application domains in general and in remote sensing image classification in particular. The special characteristics of remote sensing images (high dimension, few labeled samples and different noise sources) are efficiently dealt with kernel machines. In this paper, we propose the use of structured output learning to improve remote sensing image classification based on kernels. Structured output learning is concerned with the design of machine learning algorithms that not only implement input-output mapping, but also take into account the relations between output labels, thus generalizing unstructured kernel methods. We analyze the framework and introduce it to the remote sensing community. Output similarity is here encoded into SVM classifiers by modifying the model loss function and the kernel function either independently or jointly. Experiments on a very high resolution (VHR) image classification problem shows promising results and opens a wide field of research with structured output kernel methods.
Resumo:
We present a novel filtering method for multispectral satellite image classification. The proposed method learns a set of spatial filters that maximize class separability of binary support vector machine (SVM) through a gradient descent approach. Regularization issues are discussed in detail and a Frobenius-norm regularization is proposed to efficiently exclude uninformative filters coefficients. Experiments carried out on multiclass one-against-all classification and target detection show the capabilities of the learned spatial filters.