348 resultados para SVM-RFE
Resumo:
With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).
Resumo:
Virtual screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, but it has been demonstrated that diverse ligands interact with unrelated parts of the target and many VS methods do not take into account this relevant fact. This problem is circumvented by a novel VS methodology named BINDSURF that scans the whole protein surface in order to find new hotspots, where ligands might potentially interact with, and which is implemented in last generation massively parallel GPU hardware, allowing fast processing of large ligand databases. BINDSURF can thus be used in drug discovery, drug design, drug repurposing and therefore helps considerably in clinical research. However, the accuracy of most VS methods and concretely BINDSURF is constrained by limitations in the scoring function that describes biomolecular interactions, and even nowadays these uncertainties are not completely understood. In order to improve accuracy of the scoring functions used in BINDSURF we propose a hybrid novel approach where neural networks (NNET) and support vector machines (SVM) methods are trained with databases of known active (drugs) and inactive compounds, being this information exploited afterwards to improve BINDSURF VS predictions.
Resumo:
Virtual Screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. However, the accuracy of most VS methods is constrained by limitations in the scoring function that describes biomolecular interactions, and even nowadays these uncertainties are not completely understood. In order to improve accuracy of scoring functions used in most VS methods we propose a hybrid novel approach where neural networks (NNET) and support vector machines (SVM) methods are trained with databases of known active (drugs) and inactive compounds, this information being exploited afterwards to improve VS predictions.
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Faculdade Gama, Programa de Pós-Graduação em Engenharia Biomédica, 2015.
Resumo:
The Mara River Basin (MRB) is endowed with pristine biodiversity, socio-cultural heritage and natural resources. The purpose of my study is to develop and apply an integrated water resource allocation framework for the MRB based on the hydrological processes, water demand and economic factors. The basin was partitioned into twelve sub-basins and the rainfall runoff processes was modeled using the Soil and Water Assessment Tool (SWAT) after satisfactory Nash-Sutcliff efficiency of 0.68 for calibration and 0.43 for validation at Mara Mines station. The impact and uncertainty of climate change on the hydrology of the MRB was assessed using SWAT and three scenarios of statistically downscaled outputs from twenty Global Circulation Models. Results predicted the wet season getting more wet and the dry season getting drier, with a general increasing trend of annual rainfall through 2050. Three blocks of water demand (environmental, normal and flood) were estimated from consumptive water use by human, wildlife, livestock, tourism, irrigation and industry. Water demand projections suggest human consumption is expected to surpass irrigation as the highest water demand sector by 2030. Monthly volume of water was estimated in three blocks of current minimum reliability, reserve (>95%), normal (80–95%) and flood (40%) for more than 5 months in a year. The assessment of water price and marginal productivity showed that current water use hardly responds to a change in price or productivity of water. Finally, a water allocation model was developed and applied to investigate the optimum monthly allocation among sectors and sub-basins by maximizing the use value and hydrological reliability of water. Model results demonstrated that the status on reserve and normal volumes can be improved to ‘low’ or ‘moderate’ by updating the existing reliability to meet prevailing demand. Flow volumes and rates for four scenarios of reliability were presented. Results showed that the water allocation framework can be used as comprehensive tool in the management of MRB, and possibly be extended similar watersheds.
Resumo:
Support Vector Machines (SVMs) are widely used classifiers for detecting physiological patterns in Human-Computer Interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the application of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables, and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported.
Resumo:
The accuracy of a map is dependent on the reference dataset used in its construction. Classification analyses used in thematic mapping can, for example, be sensitive to a range of sampling and data quality concerns. With particular focus on the latter, the effects of reference data quality on land cover classifications from airborne thematic mapper data are explored. Variations in sampling intensity and effort are highlighted in a dataset that is widely used in mapping and modelling studies; these may need accounting for in analyses. The quality of the labelling in the reference dataset was also a key variable influencing mapping accuracy. Accuracy varied with the amount and nature of mislabelled training cases with the nature of the effects varying between classifiers. The largest impacts on accuracy occurred when mislabelling involved confusion between similar classes. Accuracy was also typically negatively related to the magnitude of mislabelled cases and the support vector machine (SVM), which has been claimed to be relatively insensitive to training data error, was the most sensitive of the set of classifiers investigated, with overall classification accuracy declining by 8% (significant at 95% level of confidence) with the use of a training set containing 20% mislabelled cases.
Resumo:
Este artículo-reseña valora positivamente el Quijote publicado por Francisco Rico con motivo del IV Centenario de la Segunda Parte del clásico (1615). Lo que se echa en falta no es de la exclusiva incumbencia del editor o su equipo: estudiar los libros de caballerías, la tratadística militar o la literatura hagiográfica, que se revelan fundamentales para entender el Quijote, son tareas que competen a todos los cervantistas. Detalles al margen, las herramientas a disposición del lector configuran un status quaestionis fresco y preciso; del mismo modo, las fichas lexicográficas cumplen bien su función, aunque las relativas a la botánica esperan una revisión en profundidad.
Resumo:
Cómo se comunican ha sido atribuida recientemente, por primera vez en más de tres siglos, a don Pedro Calderón de la Barca. La comparación entre esta comedia y La selva confusa, del mismo autor, avala esta atribución, ya que los paralelismos en la estructuración dramática de ambas comedias son numerosos y llamativos. Se aducen motivos para suponer que Cómo se comunican es anterior a La selva confusa, lo cual significaría una fecha tempranísima de composición dentro de la producción dramática de Calderón.
Resumo:
Interactions in mobile devices normally happen in an explicit manner, which means that they are initiated by the users. Yet, users are typically unaware that they also interact implicitly with their devices. For instance, our hand pose changes naturally when we type text messages. Whilst the touchscreen captures finger touches, hand movements during this interaction however are unused. If this implicit hand movement is observed, it can be used as additional information to support or to enhance the users’ text entry experience. This thesis investigates how implicit sensing can be used to improve existing, standard interaction technique qualities. In particular, this thesis looks into enhancing front-of-device interaction through back-of-device and hand movement implicit sensing. We propose the investigation through machine learning techniques. We look into problems on how sensor data via implicit sensing can be used to predict a certain aspect of an interaction. For instance, one of the questions that this thesis attempts to answer is whether hand movement during a touch targeting task correlates with the touch position. This is a complex relationship to understand but can be best explained through machine learning. Using machine learning as a tool, such correlation can be measured, quantified, understood and used to make predictions on future touch position. Furthermore, this thesis also evaluates the predictive power of the sensor data. We show this through a number of studies. In Chapter 5 we show that probabilistic modelling of sensor inputs and recorded touch locations can be used to predict the general area of future touches on touchscreen. In Chapter 7, using SVM classifiers, we show that data from implicit sensing from general mobile interactions is user-specific. This can be used to identify users implicitly. In Chapter 6, we also show that touch interaction errors can be detected from sensor data. In our experiment, we show that there are sufficient distinguishable patterns between normal interaction signals and signals that are strongly correlated with interaction error. In all studies, we show that performance gain can be achieved by combining sensor inputs.
Resumo:
El síndrome aórtico agudo puede presentarse como un cuadro clínico característico de una emergencia vascular, o por el contrario de una forma completamente atípica, donde el diagnóstico reta al médico de emergencias, llevando a errores fatales al pasar por alto el diagnóstico de esta entidad. Con el objetivo de mostrar la utilidad del ultrasonido realizado a la cabecera del paciente en el diagnóstico de disección aórtica, se describen 9 casos de pacientes que ingresaron al departamento de emergencias y que fueron diagnosticados con síndrome aórtico agudo, gracias a la valoración ultrasonográfica inicial realizada por residentes y Especialistas en Medicina de Emergencias en un hospital de Bogotá D.C., Colombia. Este reporte de casos muestra que el ultrasonido a la cabecera del paciente, es un método diagnóstico no invasivo, accesible y útil para la detección temprana de esta patología en los servicios de emergencias.
Resumo:
Subtle structural differencescan be observed in the islets of Langer-hans region of microscopic image of pancreas cell of the rats having normal glucose tolerance and the rats having pre-diabetic(glucose intolerant)situa-tions. This paper proposes a way to automatically segment the islets of Langer-hans region fromthe histological image of rat's pancreas cell and on the basis of some morphological feature extracted from the segmented region the images are classified as normal and pre-diabetic.The experiment is done on a set of 134 images of which 56 are of normal type and the rests 78 are of pre-diabetictype. The work has two stages: primarily,segmentationof theregion of interest (roi)i.e. islets of Langerhansfrom the pancreatic cell and secondly, the extrac-tion of the morphological featuresfrom the region of interest for classification. Wavelet analysis and connected component analysis method have been used for automatic segmentationof the images. A few classifiers like OneRule, Naïve Bayes, MLP, J48 Tree, SVM etc.are used for evaluation among which MLP performed the best.
Resumo:
This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.
Resumo:
The main purpose of this study is to evaluate the best set of features that automatically enables the identification of argumentative sentences from unstructured text. As corpus, we use case laws from the European Court of Human Rights (ECHR). Three kinds of experiments are conducted: Basic Experiments, Multi Feature Experiments and Tree Kernel Experiments. These experiments are basically categorized according to the type of features available in the corpus. The features are extracted from the corpus and Support Vector Machine (SVM) and Random Forest are the used as Machine learning algorithms. We achieved F1 score of 0.705 for identifying the argumentative sentences which is quite promising result and can be used as the basis for a general argument-mining framework.
Resumo:
The dissertation starts by providing a description of the phenomena related to the increasing importance recently acquired by satellite applications. The spread of such technology comes with implications, such as an increase in maintenance cost, from which derives the interest in developing advanced techniques that favor an augmented autonomy of spacecrafts in health monitoring. Machine learning techniques are widely employed to lay a foundation for effective systems specialized in fault detection by examining telemetry data. Telemetry consists of a considerable amount of information; therefore, the adopted algorithms must be able to handle multivariate data while facing the limitations imposed by on-board hardware features. In the framework of outlier detection, the dissertation addresses the topic of unsupervised machine learning methods. In the unsupervised scenario, lack of prior knowledge of the data behavior is assumed. In the specific, two models are brought to attention, namely Local Outlier Factor and One-Class Support Vector Machines. Their performances are compared in terms of both the achieved prediction accuracy and the equivalent computational cost. Both models are trained and tested upon the same sets of time series data in a variety of settings, finalized at gaining insights on the effect of the increase in dimensionality. The obtained results allow to claim that both models, combined with a proper tuning of their characteristic parameters, successfully comply with the role of outlier detectors in multivariate time series data. Nevertheless, under this specific context, Local Outlier Factor results to be outperforming One-Class SVM, in that it proves to be more stable over a wider range of input parameter values. This property is especially valuable in unsupervised learning since it suggests that the model is keen to adapting to unforeseen patterns.