970 resultados para random forest


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Despite the importance of laughter in social interactions it remains little studied in affective computing. Respiratory, auditory, and facial laughter signals have been investigated but laughter-related body movements have received almost no attention. The aim of this study is twofold: first an investigation into observers' perception of laughter states (hilarious, social, awkward, fake, and non-laughter) based on body movements alone, through their categorization of avatars animated with natural and acted motion capture data. Significant differences in torso and limb movements were found between animations perceived as containing laughter and those perceived as nonlaughter. Hilarious laughter also differed from social laughter in the amount of bending of the spine, the amount of shoulder rotation and the amount of hand movement. The body movement features indicative of laughter differed between sitting and standing avatar postures. Based on the positive findings in this perceptual study, the second aim is to investigate the possibility of automatically predicting the distributions of observer's ratings for the laughter states. The findings show that the automated laughter recognition rates approach human rating levels, with the Random Forest method yielding the best performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this study, 39 sets of hard turning (HT) experimental trials were performed on a Mori-Seiki SL-25Y (4-axis) computer numerical controlled (CNC) lathe to study the effect of cutting parameters in influencing the machined surface roughness. In all the trials, AISI 4340 steel workpiece (hardened up to 69 HRC) was machined with a commercially available CBN insert (Warren Tooling Limited, UK) under dry conditions. The surface topography of the machined samples was examined by using a white light interferometer and a reconfirmation of measurement was done using a Form Talysurf. The machining outcome was used as an input to develop various regression models to predict the average machined surface roughness on this material. Three regression models - Multiple regression, Random Forest, and Quantile regression were applied to the experimental outcomes. To the best of the authors’ knowledge, this paper is the first to apply Random Forest or Quantile regression techniques to the machining domain. The performance of these models was compared to each other to ascertain how feed, depth of cut, and spindle speed affect surface roughness and finally to obtain a mathematical equation correlating these variables. It was concluded that the random forest regression model is a superior choice over multiple regression models for prediction of surface roughness during machining of AISI 4340 steel (69 HRC).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Despite its importance in social interactions, laughter remains little studied in affective computing. Intelligent virtual agents are often blind to users’ laughter and unable to produce convincing laughter themselves. Respiratory, auditory, and facial laughter signals have been investigated but laughter-related body movements have received less attention. The aim of this study is threefold. First, to probe human laughter perception by analyzing patterns of categorisations of natural laughter animated on a minimal avatar. Results reveal that a low dimensional space can describe perception of laughter “types”. Second, to investigate observers’ perception of laughter (hilarious, social, awkward, fake, and non-laughter) based on animated avatars generated from natural and acted motion-capture data. Significant differences in torso and limb movements are found between animations perceived as laughter and those perceived as non-laughter. Hilarious laughter also differs from social laughter. Different body movement features were indicative of laughter in sitting and standing avatar postures. Third, to investigate automatic recognition of laughter to the same level of certainty as observers’ perceptions. Results show recognition rates of the Random Forest model approach human rating levels. Classification comparisons and feature importance analyses indicate an improvement in recognition of social laughter when localized features and nonlinear models are used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With over 50 billion downloads and more than 1.3 million apps in Google’s official market, Android has continued to gain popularity amongst smartphone users worldwide. At the same time there has been a rise in malware targeting the platform, with more recent strains employing highly sophisticated detection avoidance techniques. As traditional signature based methods become less potent in detecting unknown malware, alternatives are needed for timely zero-day discovery. Thus this paper proposes an approach that utilizes ensemble learning for Android malware detection. It combines advantages of static analysis with the efficiency and performance of ensemble machine learning to improve Android malware detection accuracy. The machine learning models are built using a large repository of malware samples and benign apps from a leading antivirus vendor. Experimental results and analysis presented shows that the proposed method which uses a large feature space to leverage the power of ensemble learning is capable of 97.3 % to 99% detection accuracy with very low false positive rates.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the most popular techniques of generating classifier ensembles is known as stacking which is based on a meta-learning approach. In this paper, we introduce an alternative method to stacking which is based on cluster analysis. Similar to stacking, instances from a validation set are initially classified by all base classifiers. The output of each classifier is subsequently considered as a new attribute of the instance. Following this, a validation set is divided into clusters according to the new attributes and a small subset of the original attributes of the instances. For each cluster, we find its centroid and calculate its class label. The collection of centroids is considered as a meta-classifier. Experimental results show that the new method outperformed all benchmark methods, namely Majority Voting, Stacking J48, Stacking LR, AdaBoost J48, and Random Forest, in 12 out of 22 data sets. The proposed method has two advantageous properties: it is very robust to relatively small training sets and it can be applied in semi-supervised learning problems. We provide a theoretical investigation regarding the proposed method. This demonstrates that for the method to be successful, the base classifiers applied in the ensemble should have greater than 50% accuracy levels.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertação para obtenção do grau de Mestre em Engenharia Informática

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Therapy of chronic hepatitis C (CHC) with pegIFNa/ribavirin achieves sustained virologic response (SVR) in ~55%. Pre-activation of the endogenous interferon system in the liver is associated non-response (NR). Recently, genome-wide association studies described associations of allelic variants near the IL28B (IFNλ3) gene with treatment response and with spontaneous clearance of the virus. We investigated if the IL28B genotype determines the constitutive expression of IFN stimulated genes (ISGs) in the liver of patients with CHC. Methods: We genotyped 93 patients with CHC for 3 IL28B single nucleotide polymorphisms (SNPs, rs12979860, rs8099917, rs12980275), extracted RNA from their liver biopsies and quantified the expression of IL28B and of 8 previously identified classifier genes which discriminate between SVR and NR (IFI44L, RSAD2, ISG15, IFI22, LAMP3, OAS3, LGALS3BP and HTATIP2). Decision tree ensembles in the form of a random forest classifier were used to calculate the relative predictive power of these different variables in a multivariate analysis. Results: The minor IL28B allele (bad risk for treatment response) was significantly associated with increased expression of ISGs, and, unexpectedly, with decreased expression of IL28B. Stratification of the patients into SVR and NR revealed that ISG expression was conditionally independent from the IL28B genotype, i.e. there was an increased expression of ISGs in NR compared to SVR irrespective of the IL28B genotype. The random forest feature score (RFFS) identified IFI27 (RFFS = 2.93), RSAD2 (1.88) and HTATIP2 (1.50) expression and the HCV genotype (1.62) as the strongest predictors of treatment response. ROC curves of the IL28B SNPs showed an AUC of 0.66 with an error rate (ERR) of 0.38. A classifier with the 3 best classifying genes showed an excellent test performance with an AUC of 0.94 and ERR of 0.15. The addition of IL28B genotype information did not improve the predictive power of the 3-gene classifier. Conclusions: IL28B genotype and hepatic ISG expression are conditionally independent predictors of treatment response in CHC. There is no direct link between altered IFNλ3 expression and pre-activation of the endogenous system in the liver. Hepatic ISG expression is by far the better predictor for treatment response than IL28B genotype.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L'increment de bases de dades que cada vegada contenen imatges més difícils i amb un nombre més elevat de categories, està forçant el desenvolupament de tècniques de representació d'imatges que siguin discriminatives quan es vol treballar amb múltiples classes i d'algorismes que siguin eficients en l'aprenentatge i classificació. Aquesta tesi explora el problema de classificar les imatges segons l'objecte que contenen quan es disposa d'un gran nombre de categories. Primerament s'investiga com un sistema híbrid format per un model generatiu i un model discriminatiu pot beneficiar la tasca de classificació d'imatges on el nivell d'anotació humà sigui mínim. Per aquesta tasca introduïm un nou vocabulari utilitzant una representació densa de descriptors color-SIFT, i desprès s'investiga com els diferents paràmetres afecten la classificació final. Tot seguit es proposa un mètode par tal d'incorporar informació espacial amb el sistema híbrid, mostrant que la informació de context es de gran ajuda per la classificació d'imatges. Desprès introduïm un nou descriptor de forma que representa la imatge segons la seva forma local i la seva forma espacial, tot junt amb un kernel que incorpora aquesta informació espacial en forma piramidal. La forma es representada per un vector compacte obtenint un descriptor molt adequat per ésser utilitzat amb algorismes d'aprenentatge amb kernels. Els experiments realitzats postren que aquesta informació de forma te uns resultats semblants (i a vegades millors) als descriptors basats en aparença. També s'investiga com diferents característiques es poden combinar per ésser utilitzades en la classificació d'imatges i es mostra com el descriptor de forma proposat juntament amb un descriptor d'aparença millora substancialment la classificació. Finalment es descriu un algoritme que detecta les regions d'interès automàticament durant l'entrenament i la classificació. Això proporciona un mètode per inhibir el fons de la imatge i afegeix invariança a la posició dels objectes dins les imatges. S'ensenya que la forma i l'aparença sobre aquesta regió d'interès i utilitzant els classificadors random forests millora la classificació i el temps computacional. Es comparen els postres resultats amb resultats de la literatura utilitzant les mateixes bases de dades que els autors Aixa com els mateixos protocols d'aprenentatge i classificació. Es veu com totes les innovacions introduïdes incrementen la classificació final de les imatges.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Wydział Nauk Geograficznych i Geologicznych: Instytut Geoekologii i Geoinformacji

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate speed prediction is a crucial step in the development of a dynamic vehcile activated sign (VAS). A previous study showed that the optimal trigger speed of such signs will need to be pre-determined according to the nature of the site and to the traffic conditions. The objective of this paper is to find an accurate predictive model based on historical traffic speed data to derive the optimal trigger speed for such signs. Adaptive neuro fuzzy (ANFIS), classification and regression tree (CART) and random forest (RF) were developed to predict one step ahead speed during all times of the day. The developed models were evaluated and compared to the results obtained from artificial neural network (ANN), multiple linear regression (MLR) and naïve prediction using traffic speed data collected at four sites located in Sweden. The data were aggregated into two periods, a short term period (5-min) and a long term period (1-hour). The results of this study showed that using RF is a promising method for predicting mean speed in the two proposed periods.. It is concluded that in terms of performance and computational complexity, a simplistic input features to the predicitive model gave a marked increase in the response time of the model whilse still delivering a low prediction error.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Vehicle activated signs (VAS) display a warning message when drivers exceed a particular threshold. VAS are often installed on local roads to display a warning message depending on the speed of the approaching vehicles. VAS are usually powered by electricity; however, battery and solar powered VAS are also commonplace. This thesis investigated devel-opment of an automatic trigger speed of vehicle activated signs in order to influence driver behaviour, the effect of which has been measured in terms of reduced mean speed and low standard deviation. A comprehen-sive understanding of the effectiveness of the trigger speed of the VAS on driver behaviour was established by systematically collecting data. Specif-ically, data on time of day, speed, length and direction of the vehicle have been collected for the purpose, using Doppler radar installed at the road. A data driven calibration method for the radar used in the experiment has also been developed and evaluated. Results indicate that trigger speed of the VAS had variable effect on driv-ers’ speed at different sites and at different times of the day. It is evident that the optimal trigger speed should be set near the 85th percentile speed, to be able to lower the standard deviation. In the case of battery and solar powered VAS, trigger speeds between the 50th and 85th per-centile offered the best compromise between safety and power consump-tion. Results also indicate that different classes of vehicles report differ-ences in mean speed and standard deviation; on a highway, the mean speed of cars differs slightly from the mean speed of trucks, whereas a significant difference was observed between the classes of vehicles on lo-cal roads. A differential trigger speed was therefore investigated for the sake of completion. A data driven approach using Random forest was found to be appropriate in predicting trigger speeds respective to types of vehicles and traffic conditions. The fact that the predicted trigger speed was found to be consistently around the 85th percentile speed justifies the choice of the automatic model.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the last decade, the efforts of spoken language processing have achieved significant advances, however, the work with emotional recognition has not progressed so far, and can only achieve 50% to 60% in accuracy. This is because a majority of researchers in this field have focused on the synthesis of emotional speech rather than focusing on automating human emotion recognition. Many research groups have focused on how to improve the performance of the classifier they used for emotion recognition, and few work has been done on data pre-processing, such as the extraction and selection of a set of specifying acoustic features instead of using all the possible ones they had in hand. To work with well-selected acoustic features does not mean to delay the whole job, but this will save much time and resources by removing the irrelative information and reducing the high-dimension data calculation. In this paper, we developed an automatic feature selector based on a RF2TREE algorithm and the traditional C4.5 algorithm. RF2TREE applied here helped us to solve the problems that did not have enough data examples. The ensemble learning technique was applied to enlarge the original data set by building a bagged random forest to generate many virtual examples, and then the new data set was used to train a single decision tree, which selects the most efficient features to represent the speech signals for the emotion recognition. Finally, the output of the selector was a set of specifying acoustic features, produced by RF2TREE and a single decision tree.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A system that can automatically detect nodules within lung images may assist expert radiologists in interpreting the abnormal patterns as nodules in 2D CT lung images. A system is presented that can automatically identify nodules of various sizes within lung images. The pattern classification method is employed to develop the proposed system. A random forest ensemble classifier is formed consisting of many weak learners that can grow decision trees. The forest selects the decision that has the most votes. The developed system consists of two random forest classifiers connected in a series fashion. A subset of CT lung images from the LIDC database is employed. It consists of 5721 images to train and test the system. There are 411 images that contained expert- radiologists identified nodules. Training sets consisting of nodule, non-nodule, and false-detection patterns are constructed. A collection of test images are also built. The first classifier is developed to detect all nodules. The second classifier is developed to eliminate the false detections produced by the first classifier. According to the experimental results, a true positive rate of 100%, and false positive rate of 1.4 per lung image are achieved.