917 resultados para statistical learning
Resumo:
With a significant increment of the number of digital cameras used for various purposes, there is a demanding call for advanced video analysis techniques that can be used to systematically interpret and understand the semantics of video contents, which have been recorded in security surveillance, intelligent transportation, health care, video retrieving and summarization. Understanding and interpreting human behaviours based on video analysis have observed competitive challenges due to non-rigid human motion, self and mutual occlusions, and changes of lighting conditions. To solve these problems, advanced image and signal processing technologies such as neural network, fuzzy logic, probabilistic estimation theory and statistical learning have been overwhelmingly investigated.
Resumo:
Super-resolution refers to the process of obtaining a high resolution image from one or more low resolution images. In this work, we present a novel method for the super-resolution problem for the limited case, where only one image of low resolution is given as an input. The proposed method is based on statistical learning for inferring the high frequencies regions which helps to distinguish a high resolution image from a low resolution one. These inferences are obtained from the correlation between regions of low and high resolution that come exclusively from the image to be super-resolved, in term of small neighborhoods. The Markov random fields are used as a model to capture the local statistics of high and low resolution data when they are analyzed at different scales and resolutions. Experimental results show the viability of the method.
Resumo:
The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.
Resumo:
This study analyzed high-density event-related potentials (ERPs) within an electrical neuroimaging framework to provide insights regarding the interaction between multisensory processes and stimulus probabilities. Specifically, we identified the spatiotemporal brain mechanisms by which the proportion of temporally congruent and task-irrelevant auditory information influences stimulus processing during a visual duration discrimination task. The spatial position (top/bottom) of the visual stimulus was indicative of how frequently the visual and auditory stimuli would be congruent in their duration (i.e., context of congruence). Stronger influences of irrelevant sound were observed when contexts associated with a high proportion of auditory-visual congruence repeated and also when contexts associated with a low proportion of congruence switched. Context of congruence and context transition resulted in weaker brain responses at 228 to 257 ms poststimulus to conditions giving rise to larger behavioral cross-modal interactions. Importantly, a control oddball task revealed that both congruent and incongruent audiovisual stimuli triggered equivalent non-linear multisensory interactions when congruence was not a relevant dimension. Collectively, these results are well explained by statistical learning, which links a particular context (here: a spatial location) with a certain level of top-down attentional control that further modulates cross-modal interactions based on whether a particular context repeated or changed. The current findings shed new light on the importance of context-based control over multisensory processing, whose influences multiplex across finer and broader time scales.
Resumo:
Treating e-mail filtering as a binary text classification problem, researchers have applied several statistical learning algorithms to email corpora with promising results. This paper examines the performance of a Naive Bayes classifier using different approaches to feature selection and tokenization on different email corpora
Resumo:
We present distribution independent bounds on the generalization misclassification performance of a family of kernel classifiers with margin. Support Vector Machine classifiers (SVM) stem out of this class of machines. The bounds are derived through computations of the $V_gamma$ dimension of a family of loss functions where the SVM one belongs to. Bounds that use functions of margin distributions (i.e. functions of the slack variables of SVM) are derived.
Resumo:
The Support Vector (SV) machine is a novel type of learning machine, based on statistical learning theory, which contains polynomial classifiers, neural networks, and radial basis function (RBF) networks as special cases. In the RBF case, the SV algorithm automatically determines centers, weights and threshold such as to minimize an upper bound on the expected test error. The present study is devoted to an experimental comparison of these machines with a classical approach, where the centers are determined by $k$--means clustering and the weights are found using error backpropagation. We consider three machines, namely a classical RBF machine, an SV machine with Gaussian kernel, and a hybrid system with the centers determined by the SV method and the weights trained by error backpropagation. Our results show that on the US postal service database of handwritten digits, the SV machine achieves the highest test accuracy, followed by the hybrid approach. The SV approach is thus not only theoretically well--founded, but also superior in a practical application.
Resumo:
We present an overview of current research on artificial neural networks, emphasizing a statistical perspective. We view neural networks as parameterized graphs that make probabilistic assumptions about data, and view learning algorithms as methods for finding parameter values that look probable in the light of the data. We discuss basic issues in representation and learning, and treat some of the practical issues that arise in fitting networks to data. We also discuss links between neural networks and the general formalism of graphical models.
Resumo:
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples -- in particular the regression problem of approximating a multivariate function from sparse data. We present both formulations in a unified framework, namely in the context of Vapnik's theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics.
Resumo:
When training Support Vector Machines (SVMs) over non-separable data sets, one sets the threshold $b$ using any dual cost coefficient that is strictly between the bounds of $0$ and $C$. We show that there exist SVM training problems with dual optimal solutions with all coefficients at bounds, but that all such problems are degenerate in the sense that the "optimal separating hyperplane" is given by ${f w} = {f 0}$, and the resulting (degenerate) SVM will classify all future points identically (to the class that supplies more training data). We also derive necessary and sufficient conditions on the input data for this to occur. Finally, we show that an SVM training problem can always be made degenerate by the addition of a single data point belonging to a certain unboundedspolyhedron, which we characterize in terms of its extreme points and rays.
Resumo:
This paper investigates detection of architectural distortion in mammographic images using support vector machine. Hausdorff dimension is used to characterise the texture feature of mammographic images. Support vector machine, a learning machine based on statistical learning theory, is trained through supervised learning to detect architectural distortion. Compared to the Radial Basis Function neural networks, SVM produced more accurate classification results in distinguishing architectural distortion abnormality from normal breast parenchyma.
Resumo:
The skin cancer is the most common of all cancers and the increase of its incidence must, in part, caused by the behavior of the people in relation to the exposition to the sun. In Brazil, the non-melanoma skin cancer is the most incident in the majority of the regions. The dermatoscopy and videodermatoscopy are the main types of examinations for the diagnosis of dermatological illnesses of the skin. The field that involves the use of computational tools to help or follow medical diagnosis in dermatological injuries is seen as very recent. Some methods had been proposed for automatic classification of pathology of the skin using images. The present work has the objective to present a new intelligent methodology for analysis and classification of skin cancer images, based on the techniques of digital processing of images for extraction of color characteristics, forms and texture, using Wavelet Packet Transform (WPT) and learning techniques called Support Vector Machine (SVM). The Wavelet Packet Transform is applied for extraction of texture characteristics in the images. The WPT consists of a set of base functions that represents the image in different bands of frequency, each one with distinct resolutions corresponding to each scale. Moreover, the characteristics of color of the injury are also computed that are dependants of a visual context, influenced for the existing colors in its surround, and the attributes of form through the Fourier describers. The Support Vector Machine is used for the classification task, which is based on the minimization principles of the structural risk, coming from the statistical learning theory. The SVM has the objective to construct optimum hyperplanes that represent the separation between classes. The generated hyperplane is determined by a subset of the classes, called support vectors. For the used database in this work, the results had revealed a good performance getting a global rightness of 92,73% for melanoma, and 86% for non-melanoma and benign injuries. The extracted describers and the SVM classifier became a method capable to recognize and to classify the analyzed skin injuries
Resumo:
Individuals with intellectual disabilities (ID) often struggle with learning how to read. Reading difficulties seem to be the most common secondary condition of ID. Only one in five children with mild or moderate ID achieves even minimal literacy skills. However, literacy education for children and adolescents with ID has been largely overlooked by researchers and educators. While there is little research on reading of children with ID, many training studies have been conducted with other populations with reading difficulties. The most common approach of acquiring literacy skills consists of sophisticated programs that train phonological skills and auditory perception. Only few studies investigated the influence of implicit learning on literacy skills. Implicit learning processes seem to be largely independent of age and IQ. Children are sensitive to the statistics of their learning environment. By frequent word reading they acquire implicit knowledge about the frequency of single letters and letter patterns in written words. Additionally, semantic connections not only improve the word understanding, but also facilitate storage of words in memory. Advances in communication technology have introduced new possibilities for remediating literacy skills. Computers can provide training material in attractive ways, for example through animations and immediate feedback .These opportunities can scaffold and support attention processes central to learning. Thus, the aim of this intervention study was to develop and implement a computer based word-picture training, which is based on statistical and semantic learning, and to examine the training effects on reading, spelling and attention in children and adolescents (9-16 years) diagnosed with mental retardation (general IQ 74). Fifty children participated in four to five weekly training sessions of 15-20 minutes over 4 weeks, and completed assessments of attention, reading, spelling, short-term memory and fluid intelligence before and after training. After a first assessment (T1), the entire sample was divided in a training group (group A) and a waiting control group (group B). After 4 weeks of training with group A, a second assessment (T2) was administered with both training groups. Afterwards, group B was trained for 4 weeks, before a last assessment (T3) was carried out in both groups. Overall, the results showed that the word-picture training led to substantial gains on word decoding and attention for both training groups. These effects were preserved six weeks later (group A). There was also a clear tendency of improvement in spelling after training for both groups, although the effect did not reach significance. These findings highlight the fact that an implicit statistical learning training in a playful way by motivating computer programs can not only promote reading development, but also attention in children with intellectual disabilities.
Resumo:
Cervical cancer is the leading cause of death and disease from malignant neoplasms among women in developing countries. Even though the Pap smear has significantly decreased the number of deaths from cervical cancer in the past years, it has its limitations. Researchers have developed an automated screening machine which can potentially detect abnormal cases that are overlooked by conventional screening. The goal of quantitative cytology is to classify the patient's tissue sample based on quantitative measurements of the individual cells. It is also much cheaper and potentially can take less time. One of the major challenges of collecting cells with a cytobrush is the possibility of not sampling any existing dysplastic cells on the cervix. Being able to correctly classify patients who have disease without the presence of dysplastic cells could improve the accuracy of quantitative cytology algorithms. Subtle morphologic changes in normal-appearing tissues adjacent to or distant from malignant tumors have been shown to exist, but a comparison of various statistical methods, including many recent advances in the statistical learning field, has not previously been done. The objective of this thesis is to use different classification methods applied to quantitative cytology data for the detection of malignancy associated changes (MACs). In this thesis, Elastic Net is the best algorithm. When we applied the Elastic Net algorithm to the test set, we combined the training set and validation set as "training" set and used 5-fold cross validation to choose the parameter for Elastic Net. It has a sensitivity of 47% at 80% specificity, an AUC 0.52, and a partial AUC 0.10 (95% CI 0.09-0.11).^
Resumo:
This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.