348 resultados para SVM-RFE
Resumo:
In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis. © Springer-Verlag Berlin Heidelberg 2010.
Resumo:
Purpose: To build a model that will predict the survival time for patients that were treated with stereotactic radiosurgery for brain metastases using support vector machine (SVM) regression.
Methods and Materials: This study utilized data from 481 patients, which were equally divided into training and validation datasets randomly. The SVM model used a Gaussian RBF function, along with various parameters, such as the size of the epsilon insensitive region and the cost parameter (C) that are used to control the amount of error tolerated by the model. The predictor variables for the SVM model consisted of the actual survival time of the patient, the number of brain metastases, the graded prognostic assessment (GPA) and Karnofsky Performance Scale (KPS) scores, prescription dose, and the largest planning target volume (PTV). The response of the model is the survival time of the patient. The resulting survival time predictions were analyzed against the actual survival times by single parameter classification and two-parameter classification. The predicted mean survival times within each classification were compared with the actual values to obtain the confidence interval associated with the model’s predictions. In addition to visualizing the data on plots using the means and error bars, the correlation coefficients between the actual and predicted means of the survival times were calculated during each step of the classification.
Results: The number of metastases and KPS scores, were consistently shown to be the strongest predictors in the single parameter classification, and were subsequently used as first classifiers in the two-parameter classification. When the survival times were analyzed with the number of metastases as the first classifier, the best correlation was obtained for patients with 3 metastases, while patients with 4 or 5 metastases had significantly worse results. When the KPS score was used as the first classifier, patients with a KPS score of 60 and 90/100 had similar strong correlation results. These mixed results are likely due to the limited data available for patients with more than 3 metastases or KPS scores of 60 or less.
Conclusions: The number of metastases and the KPS score both showed to be strong predictors of patient survival time. The model was less accurate for patients with more metastases and certain KPS scores due to the lack of training data.
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
The CCRUSH Study: Coarse and fine particulate matter measurements in northeastern Colorado 2009-2012
Resumo:
Coarse (PM10-2.5) and fine (PM2.5) particulate matter in the atmosphere adversely affect human health and influence climate. While PM2.5 is relatively well studied, less is known about the sources and fate of PM10-2.5. The Colorado Coarse Rural-Urban Sources and Health (CCRUSH) study measured PM10-2.5 and PM2.5 mass concentrations, as well as the fraction of semi-volatile material (SVM) in each size regime (SVM2.5, SVM10-2.5), for three years in Denver and comparatively rural Greeley, Colorado. Agricultural operations east of Greeley appear to have contributed to the peak PM10-2.5 concentrations there, but concentrations were generally lower in Greeley than in Denver. Traffic-influenced sites in Denver had PM10-2.5 concentrations that averaged from 14.6 to 19.7 µg/m**3 and mean PM10-2.5/PM10 ratios of 0.56 to 0.70, higher than at residential sites in Denver or Greeley. PM10-2.5 concentrations were more temporally variable than PM2.5 concentrations. Concentrations of the two pollutants were not correlated. Spatial correlations of daily averaged PM10-2.5 concentrations ranged from 0.59 to 0.62 for pairs of sites in Denver and from 0.47 to 0.70 between Denver and Greeley. Compared to PM10-2.5, concentrations of PM2.5 were more correlated across sites within Denver and less correlated between Denver and Greeley. PM10-2.5 concentrations were highest during the summer and early fall, while PM2.5 and SVM2.5 concentrations peaked in winter during periodic multi-day inversions. SVM10-2.5 concentrations were low at all sites. Diurnal peaks in PM10-2.5 and PM2.5 concentrations corresponded to morning and afternoon peaks of traffic activity, and were enhanced by boundary layer dynamics. SVM2.5 concentrations peaked around noon on both weekdays and weekends. PM10-2.5 concentrations at sites located near highways generally increased with wind speeds above about 3 m/s. Little wind speed dependence was observed for the residential sites in Denver and Greeley.
Resumo:
Entre los modelos literarios que, en las epopeyas quinientistas acerca de la conquista de México, sirven para dar forma épica a la materia histórica tomada de las crónicas, la Eneida de Virgilio desempeña un papel fundamental. En el presente artículo se pretende mostrar cómo la identificación de Jerónimo de Aguilar con el Aqueménides virgiliano, que se encuentra por primera vez en el Carlo famoso de Luis Zapata, reaparece en Francisco de Terrazas, en Gabriel Lobo Lasso de la Vega y en Antonio de Saavedra Guzmán, así como proponer algunas consideraciones acerca de las relaciones que se hayan podido dar entre las obras de estos poetas.
Resumo:
What is the human being? Which is its origin and its end? What is the influence of the nature in the man and what is his impact on nature? Forthe animalists, men are like other animals; freedom and rationality are not signs of superiority, nor having rights over the animals. For the ecohumanists, human beings are part of nature, but is qualitatively different and superior to animals; and is the creator of the civilization. We analyze these two ecological looks. A special point is the contribution ofecohumanists -from the first half of the Renaissance, who dealt in extenso the dignity and freedom of the human being-, of Michelangelo and finally, of Mozart, through his four insurmountable operas, which display the difficulty of physical ecology to engender so much beauty, so much wealth, so much love for the creatures and so much variety.
Resumo:
Esta investigación analiza el uso del sufijo diminutivo en un corpus oral de jóvenes de la República Dominicana. El material procede de la transcripción de veinte entrevistas orales realizadas en los años noventa en Santo Domingo. En este estudio se realiza un análisis de las ocurrencias documentadas, su morfología, sus preferencias en cuanto a la selección de las clases de palabras que se toman como base para la formación de diminutivos, sus posibles valores semánticos y comunicativos, y, por último, se determina la frecuencia de uso del diminutivo en función del sexo de los hablantes.
Resumo:
This paper formulates a linear kernel support vector machine (SVM) as a regularized least-squares (RLS) problem. By defining a set of indicator variables of the errors, the solution to the RLS problem is represented as an equation that relates the error vector to the indicator variables. Through partitioning the training set, the SVM weights and bias are expressed analytically using the support vectors. It is also shown how this approach naturally extends to Sums with nonlinear kernels whilst avoiding the need to make use of Lagrange multipliers and duality theory. A fast iterative solution algorithm based on Cholesky decomposition with permutation of the support vectors is suggested as a solution method. The properties of our SVM formulation are analyzed and compared with standard SVMs using a simple example that can be illustrated graphically. The correctness and behavior of our solution (merely derived in the primal context of RLS) is demonstrated using a set of public benchmarking problems for both linear and nonlinear SVMs.
Resumo:
To maintain the pace of development set by Moore's law, production processes in semiconductor manufacturing are becoming more and more complex. The development of efficient and interpretable anomaly detection systems is fundamental to keeping production costs low. As the dimension of process monitoring data can become extremely high anomaly detection systems are impacted by the curse of dimensionality, hence dimensionality reduction plays an important role. Classical dimensionality reduction approaches, such as Principal Component Analysis, generally involve transformations that seek to maximize the explained variance. In datasets with several clusters of correlated variables the contributions of isolated variables to explained variance may be insignificant, with the result that they may not be included in the reduced data representation. It is then not possible to detect an anomaly if it is only reflected in such isolated variables. In this paper we present a new dimensionality reduction technique that takes account of such isolated variables and demonstrate how it can be used to build an interpretable and robust anomaly detection system for Optical Emission Spectroscopy data.
Resumo:
[EN]We investigate mechanisms which can endow the computer with the ability of describing a human face by means of computer vision techniques. This is a necessary requirement in order to develop HCI approaches which make the user feel himself/herself perceived. This paper describes our experiences considering gender, race and the presence of moustache and glasses. This is accomplished comparing, on a set of 6000 facial images, two di erent face representation approaches: Principal Components Analysis (PCA) and Gabor lters. The results achieved using a Support Vector Machine (SVM) based classi er are promising and particularly better for the second representation approach.
Resumo:
[EN]The classification speed of state-of-the-art classifiers such as SVM is an important aspect to be considered for emerging applications and domains such as data mining and human-computer interaction. Usually, a test-time speed increase in SVMs is achieved by somehow reducing the number of support vectors, which allows a faster evaluation of the decision function. In this paper a novel approach is described for fast classification in a PCA+SVM scenario. In the proposed approach, classification of an unseen sample is performed incrementally in increasingly larger feature spaces. As soon as the classification confidence is above a threshold the process stops and the class label is retrieved...
Resumo:
Una Brain Computer Interface (BCI) è un dispositivo che permette la misura e l’utilizzo di segnali cerebrali al fine di comandare software e/o periferiche di vario tipo, da semplici videogiochi a complesse protesi robotizzate. Tra i segnali attualmente più utilizzati vi sono i Potenziali Evocati Visivi Steady State (SSVEP), variazioni ritmiche di potenziale elettrico registrabili sulla corteccia visiva primaria con un elettroencefalogramma (EEG) non invasivo; essi sono evocabili attraverso una stimolazione luminosa periodica, e sono caratterizzati da una frequenza di oscillazione pari a quella di stimolazione. Avendo un rapporto segnale rumore (SNR) particolarmente favorevole ed una caratteristica facilmente studiabile, gli SSVEP sono alla base delle più veloci ed immediate BCI attualmente disponibili. All’utente vengono proposte una serie di scelte ciascuna associata ad una stimolazione visiva a diversa frequenza, fra le quali la selezionata si ripresenterà nelle caratteristiche del suo tracciato EEG estratto in tempo reale. L’obiettivo della tesi svolta è stato realizzare un sistema integrato, sviluppato in LabView che implementasse il paradigma BCI SSVEP-based appena descritto, consentendo di: 1. Configurare la generazione di due stimoli luminosi attraverso l’utilizzo di LED esterni; 2. Sincronizzare l’acquisizione del segnale EEG con tale stimolazione; 3. Estrarre features (attributi caratteristici di ciascuna classe) dal suddetto segnale ed utilizzarle per addestrare un classificatore SVM; 4. Utilizzare il classificatore per realizzare un’interfaccia BCI realtime con feedback per l’utente. Il sistema è stato progettato con alcune delle tecniche più avanzate per l’elaborazione spaziale e temporale del segnale ed il suo funzionamento è stato testato su 4 soggetti sani e comparato alle più moderne BCI SSVEP-based confrontabili rinvenute in letteratura.
Resumo:
Current Ambient Intelligence and Intelligent Environment research focuses on the interpretation of a subject’s behaviour at the activity level by logging the Activity of Daily Living (ADL) such as eating, cooking, etc. In general, the sensors employed (e.g. PIR sensors, contact sensors) provide low resolution information. Meanwhile, the expansion of ubiquitous computing allows researchers to gather additional information from different types of sensor which is possible to improve activity analysis. Based on the previous research about sitting posture detection, this research attempts to further analyses human sitting activity. The aim of this research is to use non-intrusive low cost pressure sensor embedded chair system to recognize a subject’s activity by using their detected postures. There are three steps for this research, the first step is to find a hardware solution for low cost sitting posture detection, second step is to find a suitable strategy of sitting posture detection and the last step is to correlate the time-ordered sitting posture sequences with sitting activity. The author initiated a prototype type of sensing system called IntelliChair for sitting posture detection. Two experiments are proceeded in order to determine the hardware architecture of IntelliChair system. The prototype looks at the sensor selection and integration of various sensor and indicates the best for a low cost, non-intrusive system. Subsequently, this research implements signal process theory to explore the frequency feature of sitting posture, for the purpose of determining a suitable sampling rate for IntelliChair system. For second and third step, ten subjects are recruited for the sitting posture data and sitting activity data collection. The former dataset is collected byasking subjects to perform certain pre-defined sitting postures on IntelliChair and it is used for posture recognition experiment. The latter dataset is collected by asking the subjects to perform their normal sitting activity routine on IntelliChair for four hours, and the dataset is used for activity modelling and recognition experiment. For the posture recognition experiment, two Support Vector Machine (SVM) based classifiers are trained (one for spine postures and the other one for leg postures), and their performance evaluated. Hidden Markov Model is utilized for sitting activity modelling and recognition in order to establish the selected sitting activities from sitting posture sequences.2. After experimenting with possible sensors, Force Sensing Resistor (FSR) is selected as the pressure sensing unit for IntelliChair. Eight FSRs are mounted on the seat and back of a chair to gather haptic (i.e., touch-based) posture information. Furthermore, the research explores the possibility of using alternative non-intrusive sensing technology (i.e. vision based Kinect Sensor from Microsoft) and find out the Kinect sensor is not reliable for sitting posture detection due to the joint drifting problem. A suitable sampling rate for IntelliChair is determined according to the experiment result which is 6 Hz. The posture classification performance shows that the SVM based classifier is robust to “familiar” subject data (accuracy is 99.8% with spine postures and 99.9% with leg postures). When dealing with “unfamiliar” subject data, the accuracy is 80.7% for spine posture classification and 42.3% for leg posture classification. The result of activity recognition achieves 41.27% accuracy among four selected activities (i.e. relax, play game, working with PC and watching video). The result of this thesis shows that different individual body characteristics and sitting habits influence both sitting posture and sitting activity recognition. In this case, it suggests that IntelliChair is suitable for individual usage but a training stage is required.
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.