855 resultados para optimal feature selection


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Environmental monitoring has become increasingly important due to the significant impact of human activities and climate change on biodiversity. Environmental sound sources such as rain and insect vocalizations are a rich and underexploited source of information in environmental audio recordings. This paper is concerned with the classification of rain within acoustic sensor re-cordings. We present the novel application of a set of features for classifying environmental acoustics: acoustic entropy, the acoustic complexity index, spectral cover, and background noise. In order to improve the performance of the rain classification system we automatically classify segments of environmental recordings into the classes of heavy rain or non-rain. A decision tree classifier is experientially compared with other classifiers. The experimental results show that our system is effective in classifying segments of environmental audio recordings with an accuracy of 93% for the binary classification of heavy rain/non-rain.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Facial identity and facial expression matching tasks were completed by 5–12-year-old children and adults using stimuli extracted from the same set of normalized faces. Configural and feature processing were examined using speed and accuracy of responding and facial feature selection, respectively. Facial identity matching was slower than face expression matching for all age groups. Large age effects were found on both speed and accuracy of responding and feature use in both identity and expression matching tasks. Eye region preference was found on the facial identity task and mouth region preference on the facial expression task. Use of mouth region information for facial expression matching increased with age, whereas use of eye region information for facial identity matching peaked early. The feature use information suggests that the specific use of primary facial features to arrive at identity and emotion matching judgments matures across middle childhood.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy. We present a systematic, practical approach to developing risk prediction systems, suitable for use with large databases of medical information. An important part of this approach is a novel feature selection algorithm which uses the area under the receiver operating characteristic (ROC) curve to measure the expected discriminative power of different sets of predictor variables. We describe this algorithm and use it to select variables to predict risk of a specific adverse pregnancy outcome: failure to progress in labour. Neural network, logistic regression and hierarchical Bayesian risk prediction models are constructed, all of which achieve close to the limit of performance attainable on this prediction task. We show that better prediction performance requires more discriminative clinical information rather than improved modelling techniques. It is also shown that better diagnostic criteria in clinical records would greatly assist the development of systems to predict risk in pregnancy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Only some of the information contained in a medical record will be useful to the prediction of patient outcome. We describe a novel method for selecting those outcome predictors which allow us to reliably discriminate between adverse and benign end results. Using the area under the receiver operating characteristic as a nonparametric measure of discrimination, we show how to calculate the maximum discrimination attainable with a given set of discrete valued features. This upper limit forms the basis of our feature selection algorithm. We use the algorithm to select features (from maternity records) relevant to the prediction of failure to progress in labour. The results of this analysis motivate investigation of those predictors of failure to progress relevant to parous and nulliparous sub-populations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Selection of features that will permit accurate pattern classification is a difficult task. However, if a particular data set is represented by discrete valued features, it becomes possible to determine empirically the contribution that each feature makes to the discrimination between classes. This paper extends the discrimination bound method so that both the maximum and average discrimination expected on unseen test data can be estimated. These estimation techniques are the basis of a backwards elimination algorithm that can be use to rank features in order of their discriminative power. Two problems are used to demonstrate this feature selection process: classification of the Mushroom Database, and a real-world, pregnancy related medical risk prediction task - assessment of risk of perinatal death.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Lateralization of temporal lobe epilepsy (TLE) is critical for successful outcome of surgery to relieve seizures. TLE affects brain regions beyond the temporal lobes and has been associated with aberrant brain networks, based on evidence from functional magnetic resonance imaging. We present here a machine learning-based method for determining the laterality of TLE, using features extracted from resting-state functional connectivity of the brain. A comprehensive feature space was constructed to include network properties within local brain regions, between brain regions, and across the whole network. Feature selection was performed based on random forest and a support vector machine was employed to train a linear model to predict the laterality of TLE on unseen patients. A leave-one-patient-out cross validation was carried out on 12 patients and a prediction accuracy of 83% was achieved. The importance of selected features was analyzed to demonstrate the contribution of resting-state connectivity attributes at voxel, region, and network levels to TLE lateralization.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The use of remote sensing imagery as auxiliary data in forest inventory is based on the correlation between features extracted from the images and the ground truth. The bidirectional reflectance and radial displacement cause variation in image features located in different segments of the image but forest characteristics remaining the same. The variation has so far been diminished by different radiometric corrections. In this study the use of sun azimuth based converted image co-ordinates was examined to supplement auxiliary data extracted from digitised aerial photographs. The method was considered as an alternative for radiometric corrections. Additionally, the usefulness of multi-image interpretation of digitised aerial photographs in regression estimation of forest characteristics was studied. The state owned study area located in Leivonmäki, Central Finland and the study material consisted of five digitised and ortho-rectified colour-infrared (CIR) aerial photographs and field measurements of 388 plots, out of which 194 were relascope (Bitterlich) plots and 194 were concentric circular plots. Both the image data and the field measurements were from the year 1999. When examining the effect of the location of the image point on pixel values and texture features of Finnish forest plots in digitised CIR photographs the clearest differences were found between front-and back-lighted image halves. Inside the image half the differences between different blocks were clearly bigger on the front-lighted half than on the back-lighted half. The strength of the phenomenon varied by forest category. The differences between pixel values extracted from different image blocks were greatest in developed and mature stands and smallest in young stands. The differences between texture features were greatest in developing stands and smallest in young and mature stands. The logarithm of timber volume per hectare and the angular transformation of the proportion of broadleaved trees of the total volume were used as dependent variables in regression models. Five different converted image co-ordinates based trend surfaces were used in models in order to diminish the effect of the bidirectional reflectance. The reference model of total volume, in which the location of the image point had been ignored, resulted in RMSE of 1,268 calculated from test material. The best of the trend surfaces was the complete third order surface, which resulted in RMSE of 1,107. The reference model of the proportion of broadleaved trees resulted in RMSE of 0,4292 and the second order trend surface was the best, resulting in RMSE of 0,4270. The trend surface method is applicable, but it has to be applied by forest category and by variable. The usefulness of multi-image interpretation of digitised aerial photographs was studied by building comparable regression models using either the front-lighted image features, back-lighted image features or both. The two-image model turned out to be slightly better than the one-image models in total volume estimation. The best one-image model resulted in RMSE of 1,098 and the two-image model resulted in RMSE of 1,090. The homologous features did not improve the models of the proportion of broadleaved trees. The overall result gives motivation for further research of multi-image interpretation. The focus may be improving regression estimation and feature selection or examination of stratification used in two-phase sampling inventory techniques. Keywords: forest inventory, digitised aerial photograph, bidirectional reflectance, converted image co­ordinates, regression estimation, multi-image interpretation, pixel value, texture, trend surface

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paradigm of computational vision hypothesizes that any visual function -- such as the recognition of your grandparent -- can be replicated by computational processing of the visual input. What are these computations that the brain performs? What should or could they be? Working on the latter question, this dissertation takes the statistical approach, where the suitable computations are attempted to be learned from the natural visual data itself. In particular, we empirically study the computational processing that emerges from the statistical properties of the visual world and the constraints and objectives specified for the learning process. This thesis consists of an introduction and 7 peer-reviewed publications, where the purpose of the introduction is to illustrate the area of study to a reader who is not familiar with computational vision research. In the scope of the introduction, we will briefly overview the primary challenges to visual processing, as well as recall some of the current opinions on visual processing in the early visual systems of animals. Next, we describe the methodology we have used in our research, and discuss the presented results. We have included some additional remarks, speculations and conclusions to this discussion that were not featured in the original publications. We present the following results in the publications of this thesis. First, we empirically demonstrate that luminance and contrast are strongly dependent in natural images, contradicting previous theories suggesting that luminance and contrast were processed separately in natural systems due to their independence in the visual data. Second, we show that simple cell -like receptive fields of the primary visual cortex can be learned in the nonlinear contrast domain by maximization of independence. Further, we provide first-time reports of the emergence of conjunctive (corner-detecting) and subtractive (opponent orientation) processing due to nonlinear projection pursuit with simple objective functions related to sparseness and response energy optimization. Then, we show that attempting to extract independent components of nonlinear histogram statistics of a biologically plausible representation leads to projection directions that appear to differentiate between visual contexts. Such processing might be applicable for priming, \ie the selection and tuning of later visual processing. We continue by showing that a different kind of thresholded low-frequency priming can be learned and used to make object detection faster with little loss in accuracy. Finally, we show that in a computational object detection setting, nonlinearly gain-controlled visual features of medium complexity can be acquired sequentially as images are encountered and discarded. We present two online algorithms to perform this feature selection, and propose the idea that for artificial systems, some processing mechanisms could be selectable from the environment without optimizing the mechanisms themselves. In summary, this thesis explores learning visual processing on several levels. The learning can be understood as interplay of input data, model structures, learning objectives, and estimation algorithms. The presented work adds to the growing body of evidence showing that statistical methods can be used to acquire intuitively meaningful visual processing mechanisms. The work also presents some predictions and ideas regarding biological visual processing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new automata model Mr,k, with a conceptually significant innovation in the form of multi-state alternatives at each instance, is proposed in this study. Computer simulations of the Mr,k, model in the context of feature selection in an unsupervised environment has demonstrated the superiority of the model over similar models without this multi-state-choice innovation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Early detection of (pre-)signs of ulceration on a diabetic foot is valuable for clinical practice. Hyperspectral imaging is a promising technique for detection and classification of such (pre-)signs. However, the number of the spectral bands should be limited to avoid overfitting, which is critical for pixel classification with hyperspectral image data. The goal was to design a detector/classifier based on spectral imaging (SI) with a small number of optical bandpass filters. The performance and stability of the design were also investigated. The selection of the bandpass filters boils down to a feature selection problem. A dataset was built, containing reflectance spectra of 227 skin spots from 64 patients, measured with a spectrometer. Each skin spot was annotated manually by clinicians as "healthy" or a specific (pre-)sign of ulceration. Statistical analysis on the data set showed the number of required filters is between 3 and 7, depending on additional constraints on the filter set. The stability analysis revealed that shot noise was the most critical factor affecting the classification performance. It indicated that this impact could be avoided in future SI systems with a camera sensor whose saturation level is higher than 106, or by postimage processing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article addresses the problem of how to select the optimal combination of sensors and how to determine their optimal placement in a surveillance region in order to meet the given performance requirements at a minimal cost for a multimedia surveillance system. We propose to solve this problem by obtaining a performance vector, with its elements representing the performances of subtasks, for a given input combination of sensors and their placement. Then we show that the optimal sensor selection problem can be converted into the form of Integer Linear Programming problem (ILP) by using a linear model for computing the optimal performance vector corresponding to a sensor combination. Optimal performance vector corresponding to a sensor combination refers to the performance vector corresponding to the optimal placement of a sensor combination. To demonstrate the utility of our technique, we design and build a surveillance system consisting of PTZ (Pan-Tilt-Zoom) cameras and active motion sensors for capturing faces. Finally, we show experimentally that optimal placement of sensors based on the design maximizes the system performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We propose a simple and energy efficient distributed change detection scheme for sensor networks based on Page's parametric CUSUM algorithm. The sensor observations are IID over time and across the sensors conditioned on the change variable. Each sensor runs CUSUM and transmits only when the CUSUM is above some threshold. The transmissions from the sensors are fused at the physical layer. The channel is modeled as a multiple access channel (MAC) corrupted with IID noise. The fusion center which is the global decision maker, performs another CUSUM to detect the change. We provide the analysis and simulation results for our scheme and compare the performance with an existing scheme which ensures energy efficiency via optimal power selection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-nomalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP's with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, Nearest neighbor, Linear discriminant function, SVM's and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Homomorphic analysis and pole-zero modeling of electrocardiogram (ECG) signals are presented in this paper. Four typical ECG signals are considered and deconvolved into their minimum and maximum phase components through cepstral filtering, with a view to study the possibility of more efficient feature selection from the component signals for diagnostic purposes. The complex cepstra of the signals are linearly filtered to extract the basic wavelet and the excitation function. The ECG signals are, in general, mixed phase and hence, exponential weighting is done to aid deconvolution of the signals. The basic wavelet for normal ECG approximates the action potential of the muscle fiber of the heart and the excitation function corresponds to the excitation pattern of the heart muscles during a cardiac cycle. The ECG signals and their components are pole-zero modeled and the pole-zero pattern of the models can give a clue to classify the normal and abnormal signals. Besides, storing only the parameters of the model can result in a data reduction of more than 3:1 for normal signals sampled at a moderate 128 samples/s