874 resultados para correlation-based feature selection
Resumo:
The papermaking industry has been continuously developing intelligent solutions to characterize the raw materials it uses, to control the manufacturing process in a robust way, and to guarantee the desired quality of the end product. Based on the much improved imaging techniques and image-based analysis methods, it has become possible to look inside the manufacturing pipeline and propose more effective alternatives to human expertise. This study is focused on the development of image analyses methods for the pulping process of papermaking. Pulping starts with wood disintegration and forming the fiber suspension that is subsequently bleached, mixed with additives and chemicals, and finally dried and shipped to the papermaking mills. At each stage of the process it is important to analyze the properties of the raw material to guarantee the product quality. In order to evaluate properties of fibers, the main component of the pulp suspension, a framework for fiber characterization based on microscopic images is proposed in this thesis as the first contribution. The framework allows computation of fiber length and curl index correlating well with the ground truth values. The bubble detection method, the second contribution, was developed in order to estimate the gas volume at the delignification stage of the pulping process based on high-resolution in-line imaging. The gas volume was estimated accurately and the solution enabled just-in-time process termination whereas the accurate estimation of bubble size categories still remained challenging. As the third contribution of the study, optical flow computation was studied and the methods were successfully applied to pulp flow velocity estimation based on double-exposed images. Finally, a framework for classifying dirt particles in dried pulp sheets, including the semisynthetic ground truth generation, feature selection, and performance comparison of the state-of-the-art classification techniques, was proposed as the fourth contribution. The framework was successfully tested on the semisynthetic and real-world pulp sheet images. These four contributions assist in developing an integrated factory-level vision-based process control.
Resumo:
Electricity price forecasting has become an important area of research in the aftermath of the worldwide deregulation of the power industry that launched competitive electricity markets now embracing all market participants including generation and retail companies, transmission network providers, and market managers. Based on the needs of the market, a variety of approaches forecasting day-ahead electricity prices have been proposed over the last decades. However, most of the existing approaches are reasonably effective for normal range prices but disregard price spike events, which are caused by a number of complex factors and occur during periods of market stress. In the early research, price spikes were truncated before application of the forecasting model to reduce the influence of such observations on the estimation of the model parameters; otherwise, a very large forecast error would be generated on price spike occasions. Electricity price spikes, however, are significant for energy market participants to stay competitive in a market. Accurate price spike forecasting is important for generation companies to strategically bid into the market and to optimally manage their assets; for retailer companies, since they cannot pass the spikes onto final customers, and finally, for market managers to provide better management and planning for the energy market. This doctoral thesis aims at deriving a methodology able to accurately predict not only the day-ahead electricity prices within the normal range but also the price spikes. The Finnish day-ahead energy market of Nord Pool Spot is selected as the case market, and its structure is studied in detail. It is almost universally agreed in the forecasting literature that no single method is best in every situation. Since the real-world problems are often complex in nature, no single model is able to capture different patterns equally well. Therefore, a hybrid methodology that enhances the modeling capabilities appears to be a possibly productive strategy for practical use when electricity prices are predicted. The price forecasting methodology is proposed through a hybrid model applied to the price forecasting in the Finnish day-ahead energy market. The iterative search procedure employed within the methodology is developed to tune the model parameters and select the optimal input set of the explanatory variables. The numerical studies show that the proposed methodology has more accurate behavior than all other examined methods most recently applied to case studies of energy markets in different countries. The obtained results can be considered as providing extensive and useful information for participants of the day-ahead energy market, who have limited and uncertain information for price prediction to set up an optimal short-term operation portfolio. Although the focus of this work is primarily on the Finnish price area of Nord Pool Spot, given the result of this work, it is very likely that the same methodology will give good results when forecasting the prices on energy markets of other countries.
Resumo:
Les documents publiés par des entreprises, tels les communiqués de presse, contiennent une foule d’informations sur diverses activités des entreprises. C’est une source précieuse pour des analyses en intelligence d’affaire. Cependant, il est nécessaire de développer des outils pour permettre d’exploiter cette source automatiquement, étant donné son grand volume. Ce mémoire décrit un travail qui s’inscrit dans un volet d’intelligence d’affaire, à savoir la détection de relations d’affaire entre les entreprises décrites dans des communiqués de presse. Dans ce mémoire, nous proposons une approche basée sur la classification. Les méthodes de classifications existantes ne nous permettent pas d’obtenir une performance satisfaisante. Ceci est notamment dû à deux problèmes : la représentation du texte par tous les mots, qui n’aide pas nécessairement à spécifier une relation d’affaire, et le déséquilibre entre les classes. Pour traiter le premier problème, nous proposons une approche de représentation basée sur des mots pivots c’est-à-dire les noms d’entreprises concernées, afin de mieux cerner des mots susceptibles de les décrire. Pour le deuxième problème, nous proposons une classification à deux étapes. Cette méthode s’avère plus appropriée que les méthodes traditionnelles de ré-échantillonnage. Nous avons testé nos approches sur une collection de communiqués de presse dans le domaine automobile. Nos expérimentations montrent que les approches proposées peuvent améliorer la performance de classification. Notamment, la représentation du document basée sur les mots pivots nous permet de mieux centrer sur les mots utiles pour la détection de relations d’affaire. La classification en deux étapes apporte une solution efficace au problème de déséquilibre entre les classes. Ce travail montre que la détection automatique des relations d’affaire est une tâche faisable. Le résultat de cette détection pourrait être utilisé dans une analyse d’intelligence d’affaire.
Resumo:
Magnetic Resonance Imaging (MRI) is a multi sequence medical imaging technique in which stacks of images are acquired with different tissue contrasts. Simultaneous observation and quantitative analysis of normal brain tissues and small abnormalities from these large numbers of different sequences is a great challenge in clinical applications. Multispectral MRI analysis can simplify the job considerably by combining unlimited number of available co-registered sequences in a single suite. However, poor performance of the multispectral system with conventional image classification and segmentation methods makes it inappropriate for clinical analysis. Recent works in multispectral brain MRI analysis attempted to resolve this issue by improved feature extraction approaches, such as transform based methods, fuzzy approaches, algebraic techniques and so forth. Transform based feature extraction methods like Independent Component Analysis (ICA) and its extensions have been effectively used in recent studies to improve the performance of multispectral brain MRI analysis. However, these global transforms were found to be inefficient and inconsistent in identifying less frequently occurred features like small lesions, from large amount of MR data. The present thesis focuses on the improvement in ICA based feature extraction techniques to enhance the performance of multispectral brain MRI analysis. Methods using spectral clustering and wavelet transforms are proposed to resolve the inefficiency of ICA in identifying small abnormalities, and problems due to ICA over-completeness. Effectiveness of the new methods in brain tissue classification and segmentation is confirmed by a detailed quantitative and qualitative analysis with synthetic and clinical, normal and abnormal, data. In comparison to conventional classification techniques, proposed algorithms provide better performance in classification of normal brain tissues and significant small abnormalities.
Resumo:
Biometrics is an efficient technology with great possibilities in the area of security system development for official and commercial applications. The biometrics has recently become a significant part of any efficient person authentication solution. The advantage of using biometric traits is that they cannot be stolen, shared or even forgotten. The thesis addresses one of the emerging topics in Authentication System, viz., the implementation of Improved Biometric Authentication System using Multimodal Cue Integration, as the operator assisted identification turns out to be tedious, laborious and time consuming. In order to derive the best performance for the authentication system, an appropriate feature selection criteria has been evolved. It has been seen that the selection of too many features lead to the deterioration in the authentication performance and efficiency. In the work reported in this thesis, various judiciously chosen components of the biometric traits and their feature vectors are used for realizing the newly proposed Biometric Authentication System using Multimodal Cue Integration. The feature vectors so generated from the noisy biometric traits is compared with the feature vectors available in the knowledge base and the most matching pattern is identified for the purpose of user authentication. In an attempt to improve the success rate of the Feature Vector based authentication system, the proposed system has been augmented with the user dependent weighted fusion technique.
Resumo:
There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.
Resumo:
This thesis describes a representation of gait appearance for the purpose of person identification and classification. This gait representation is based on simple localized image features such as moments extracted from orthogonal view video silhouettes of human walking motion. A suite of time-integration methods, spanning a range of coarseness of time aggregation and modeling of feature distributions, are applied to these image features to create a suite of gait sequence representations. Despite their simplicity, the resulting feature vectors contain enough information to perform well on human identification and gender classification tasks. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times and under varying lighting environments. Each of the integration methods are investigated for their advantages and disadvantages. An improved gait representation is built based on our experiences with the initial set of gait representations. In addition, we show gender classification results using our gait appearance features, the effect of our heuristic feature selection method, and the significance of individual features.
Resumo:
Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. Such a descriptor--based on a set of oriented Gaussian derivative filters-- is used in our recognition system. We report here an evaluation of several techniques for orientation estimation to achieve rotation invariance of the descriptor. We also describe feature selection based on a single training image. Virtual images are generated by rotating and rescaling the image and robust features are selected. The results confirm robust performance in cluttered scenes, in the presence of partial occlusions, and when the object is embedded in different backgrounds.
Resumo:
Mosaics have been commonly used as visual maps for undersea exploration and navigation. The position and orientation of an underwater vehicle can be calculated by integrating the apparent motion of the images which form the mosaic. A feature-based mosaicking method is proposed in this paper. The creation of the mosaic is accomplished in four stages: feature selection and matching, detection of points describing the dominant motion, homography computation and mosaic construction. In this work we demonstrate that the use of color and textures as discriminative properties of the image can improve, to a large extent, the accuracy of the constructed mosaic. The system is able to provide 3D metric information concerning the vehicle motion using the knowledge of the intrinsic parameters of the camera while integrating the measurements of an ultrasonic sensor. The experimental results of real images have been tested on the GARBI underwater vehicle
Resumo:
We've developed a new ambient occlusion technique based on an information-theoretic framework. Essentially, our method computes a weighted visibility from each object polygon to all viewpoints; we then use these visibility values to obtain the information associated with each polygon. So, just as a viewpoint has information about the model's polygons, the polygons gather information on the viewpoints. We therefore have two measures associated with an information channel defined by the set of viewpoints as input and the object's polygons as output, or vice versa. From this polygonal information, we obtain an occlusion map that serves as a classic ambient occlusion technique. Our approach also offers additional applications, including an importance-based viewpoint-selection guide, and a means of enhancing object features and producing nonphotorealistic object visualizations
Resumo:
The level of ab initio theory which is necessary to compute reliable values for the static and dynamic (hyper)polarizabilities of three medium size π-conjugated organic nonlinear optical (NLO) molecules is investigated. With the employment of field-induced coordinates in combination with a finite field procedure, the calculations were made possible. It is stated that to obtain reasonable values for the various individual contributions to the (hyper)polarizability, it is necessary to include electron correlation. Based on the results, the convergence of the usual perturbation treatment for vibrational anharmonicity was examined
Resumo:
The Marbled Murrelet (Brachyramphus marmoratus) is a threatened alcid that nests almost exclusively in old-growth forests along the Pacific coast of North America. Nesting habitat has significant economic importance. Murrelet nests are extremely difficult and costly to find, which adds uncertainty to management and conservation planning. Models based on air photo interpretation of forest cover maps or assessments by low-level helicopter flights are currently used to rank presumed Marbled Murrelet nesting habitat quality in British Columbia. These rankings are assumed to correlate with nest usage and murrelet breeding productivity. Our goal was to find the models that best predict Marbled Murrelet nesting habitat in the ground-accessible portion of the two regions studied. We generated Resource Selection Functions (RSF) using logistic regression models of ground-based forest stand variables gathered at plots around 64 nests, located using radio-telemetry, versus 82 random habitat plots. The RSF scores are proportional to the probability of nests occurring in a forest patch. The best models differed somewhat between the two regions, but include both ground variables at the patch scale (0.2-2.0 ha), such as platform tree density, height and trunk diameter of canopy trees and canopy complexity, and landscape scale variables such as elevation, aspect, and slope. Collecting ground-based habitat selection data would not be cost-effective for widespread use in forestry management; air photo interpretation and low-level aerial surveys are much more efficient methods for ranking habitat suitability on a landscape scale. This study provides one method for ground-truthing the remote methods, an essential step made possible using the numerical RSF scores generated herein.
Resumo:
This paper presents a new face verification algorithm based on Gabor wavelets and AdaBoost. In the algorithm, faces are represented by Gabor wavelet features generated by Gabor wavelet transform. Gabor wavelets with 5 scales and 8 orientations are chosen to form a family of Gabor wavelets. By convolving face images with these 40 Gabor wavelets, the original images are transformed into magnitude response images of Gabor wavelet features. The AdaBoost algorithm selects a small set of significant features from the pool of the Gabor wavelet features. Each feature is the basis for a weak classifier which is trained with face images taken from the XM2VTS database. The feature with the lowest classification error is selected in each iteration of the AdaBoost operation. We also address issues regarding computational costs in feature selection with AdaBoost. A support vector machine (SVM) is trained with examples of 20 features, and the results have shown a low false positive rate and a low classification error rate in face verification.
Resumo:
A new robust neurofuzzy model construction algorithm has been introduced for the modeling of a priori unknown dynamical systems from observed finite data sets in the form of a set of fuzzy rules. Based on a Takagi-Sugeno (T-S) inference mechanism a one to one mapping between a fuzzy rule base and a model matrix feature subspace is established. This link enables rule based knowledge to be extracted from matrix subspace to enhance model transparency. In order to achieve maximized model robustness and sparsity, a new robust extended Gram-Schmidt (G-S) method has been introduced via two effective and complementary approaches of regularization and D-optimality experimental design. Model rule bases are decomposed into orthogonal subspaces, so as to enhance model transparency with the capability of interpreting the derived rule base energy level. A locally regularized orthogonal least squares algorithm, combined with a D-optimality used for subspace based rule selection, has been extended for fuzzy rule regularization and subspace based information extraction. By using a weighting for the D-optimality cost function, the entire model construction procedure becomes automatic. Numerical examples are included to demonstrate the effectiveness of the proposed new algorithm.
Resumo:
A new man-made target tracking algorithm integrating features from (Forward Looking InfraRed) image sequence is presented based on particle filter. Firstly, a multiscale fractal feature is used to enhance targets in FLIR images. Secondly, the gray space feature is defined by Bhattacharyya distance between intensity histograms of the reference target and a sample target from MFF (Multi-scale Fractal Feature) image. Thirdly, the motion feature is obtained by differencing between two MFF images. Fourthly, a fusion coefficient can be automatically obtained by online feature selection method for features integrating based on fuzzy logic. Finally, a particle filtering framework is developed to fulfill the target tracking. Experimental results have shown that the proposed algorithm can accurately track weak or small man-made target in FLIR images with complicated background. The algorithm is effective, robust and satisfied to real time tracking.