855 resultados para optimal feature selection


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Les documents publiés par des entreprises, tels les communiqués de presse, contiennent une foule d’informations sur diverses activités des entreprises. C’est une source précieuse pour des analyses en intelligence d’affaire. Cependant, il est nécessaire de développer des outils pour permettre d’exploiter cette source automatiquement, étant donné son grand volume. Ce mémoire décrit un travail qui s’inscrit dans un volet d’intelligence d’affaire, à savoir la détection de relations d’affaire entre les entreprises décrites dans des communiqués de presse. Dans ce mémoire, nous proposons une approche basée sur la classification. Les méthodes de classifications existantes ne nous permettent pas d’obtenir une performance satisfaisante. Ceci est notamment dû à deux problèmes : la représentation du texte par tous les mots, qui n’aide pas nécessairement à spécifier une relation d’affaire, et le déséquilibre entre les classes. Pour traiter le premier problème, nous proposons une approche de représentation basée sur des mots pivots c’est-à-dire les noms d’entreprises concernées, afin de mieux cerner des mots susceptibles de les décrire. Pour le deuxième problème, nous proposons une classification à deux étapes. Cette méthode s’avère plus appropriée que les méthodes traditionnelles de ré-échantillonnage. Nous avons testé nos approches sur une collection de communiqués de presse dans le domaine automobile. Nos expérimentations montrent que les approches proposées peuvent améliorer la performance de classification. Notamment, la représentation du document basée sur les mots pivots nous permet de mieux centrer sur les mots utiles pour la détection de relations d’affaire. La classification en deux étapes apporte une solution efficace au problème de déséquilibre entre les classes. Ce travail montre que la détection automatique des relations d’affaire est une tâche faisable. Le résultat de cette détection pourrait être utilisé dans une analyse d’intelligence d’affaire.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La leucémie aiguë lymphoblastique (LAL) est le cancer pédiatrique le plus fréquent. Elle est la cause principale de mortalité liée au cancer chez les enfants due à un groupe de patient ne répondant pas au traitement. Les patients peuvent aussi souffrir de plusieurs toxicités associées à un traitement intensif de chimiothérapie. Les études en pharmacogénétique de notre groupe ont montré une corrélation tant individuelle que combinée entre les variants génétiques particuliers d’enzymes dépendantes du folate, particulièrement la dihydrofolate réductase (DHFR) ainsi que la thymidylate synthase (TS), principales cibles du méthotrexate (MTX) et le risque élevé de rechute chez les patients atteints de la LAL. En outre, des variations dans le gène ATF5 impliqué dans la régulation de l’asparagine synthetase (ASNS) sont associées à un risque plus élevé de rechute ou à une toxicité ASNase dépendante chez les patients ayant reçu de l’asparaginase d’E.coli (ASNase). Le but principal de mon projet de thèse est de comprendre davantage d’un point de vue fonctionnel, le rôle de variations génétiques dans la réponse thérapeutique chez les patients atteints de la LAL, en se concentrant sur deux composants majeurs du traitement de la LAL soit le MTX ainsi que l’ASNase. Mon objectif spécifique était d’analyser une association trouvée dans des paramètres cliniques par le biais d’essais de prolifération cellulaire de lignées cellulaires lymphoblastoïdes (LCLs, n=93) et d’un modèle murin de xénogreffe de la LAL. Une variation génétique dans le polymorphisme TS (homozygosité de l’allèle de la répétition triple 3R) ainsi que l’haplotype *1b de DHFR (défini par une combinaison particulière d’allèle dérivé de six sites polymorphiques dans le promoteur majeur et mineur de DHFR) et de leurs effets sur la sensibilité au MTX ont été évalués par le biais d’essais de prolifération cellulaire. Des essais in vitro similaires sur la réponse à l’ASNase de E. Coli ont permis d’évaluer l’effet de la variation T1562C de la région 5’UTR de ATF5 ainsi que des haplotypes particuliers du gène ASNS (définis par deux variations génétiques et arbitrairement appelés haplotype *1). Le modèle murin de xénogreffe ont été utilisé pour évaluer l’effet du génotype 3R3R du gène TS. L’analyse de polymorphismes additionnels dans le gène ASNS a révélé une diversification de l’haplotype *1 en 5 sous-types définis par deux polymorphismes (rs10486009 et rs6971012,) et corrélé avec la sensibilité in vitro à l’ASNase et l’un d’eux (rs10486009) semble particulièrement important dans la réduction de la sensibilité in vitro à l’ASNase, pouvant expliquer une sensibilité réduite de l’haplotype *1 dans des paramètres cliniques. Aucune association entre ATF5 T1562C et des essais de prolifération cellulaire en réponse à ASNase de E.Coli n’a été détectée. Nous n’avons pas détecté une association liée au génotype lors d’analyse in vitro de sensibilité au MTX. Par contre, des résultats in vivo issus de modèle murin de xénogreffe ont montré une relation entre le génotype TS 3R/3R et la résistance de manière dose-dépendante au traitement par MTX. Les résultats obtenus ont permis de fournir une explication concernant un haut risque significatif de rechute rencontré chez les patients au génotype TS 3R/3R et suggèrent que ces patients pourraient recevoir une augmentation de leur dose de MTX. À travers ces expériences, nous avons aussi démontré que les modèles murins de xénogreffe peuvent servir comme outil préclinique afin d’explorer l’option d’un traitement individualisé. En conclusion, la connaissance acquise à travers mon projet de thèse a permis de confirmer et/ou d’identifier quelques variants dans la voix d’action du MTX et de l’ASNase qui pourraient faciliter la mise en place de stratégies d’individualisation de la dose, permettant la sélection d’un traitement optimum ou moduler la thérapie basé sur la génétique individuelle.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a writer identification scheme for Malayalam documents. As the accomplishment rate of a scheme is highly dependent on the features extracted from the documents, the process of feature selection and extraction is highly relevant. The paper describes a set of novel features exclusively for Malayalam language. The features were studied in detail which resulted in a comparative study of all the features. The features are fused to form the feature vector or knowledge vector. This knowledge vector is then used in all the phases of the writer identification scheme. The scheme has been tested on a test bed of 280 writers of which 50 writers having only one page, 215 writers with at least 2 pages and 15 writers with at least 4 pages. To perform a comparative evaluation of the scheme the test is conducted using WD-LBP method also. A recognition rate of around 95% was obtained for the proposed approach

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Biometrics is an efficient technology with great possibilities in the area of security system development for official and commercial applications. The biometrics has recently become a significant part of any efficient person authentication solution. The advantage of using biometric traits is that they cannot be stolen, shared or even forgotten. The thesis addresses one of the emerging topics in Authentication System, viz., the implementation of Improved Biometric Authentication System using Multimodal Cue Integration, as the operator assisted identification turns out to be tedious, laborious and time consuming. In order to derive the best performance for the authentication system, an appropriate feature selection criteria has been evolved. It has been seen that the selection of too many features lead to the deterioration in the authentication performance and efficiency. In the work reported in this thesis, various judiciously chosen components of the biometric traits and their feature vectors are used for realizing the newly proposed Biometric Authentication System using Multimodal Cue Integration. The feature vectors so generated from the noisy biometric traits is compared with the feature vectors available in the knowledge base and the most matching pattern is identified for the purpose of user authentication. In an attempt to improve the success rate of the Feature Vector based authentication system, the proposed system has been augmented with the user dependent weighted fusion technique.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper, we seek 1) to advance the understanding of commonly used text classification techniques, and 2) through that understanding, improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes, noting basic properties and proposing ways for its extension and improvement. Next, we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error-Correcting Output Codes. We use experimental evidence on two commonly-used data sets to exhibit an application of the theorem. Finally, we show fundamental flaws in a commonly-used feature selection algorithm and develop a statistics-based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis describes a representation of gait appearance for the purpose of person identification and classification. This gait representation is based on simple localized image features such as moments extracted from orthogonal view video silhouettes of human walking motion. A suite of time-integration methods, spanning a range of coarseness of time aggregation and modeling of feature distributions, are applied to these image features to create a suite of gait sequence representations. Despite their simplicity, the resulting feature vectors contain enough information to perform well on human identification and gender classification tasks. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times and under varying lighting environments. Each of the integration methods are investigated for their advantages and disadvantages. An improved gait representation is built based on our experiences with the initial set of gait representations. In addition, we show gender classification results using our gait appearance features, the effect of our heuristic feature selection method, and the significance of individual features.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A novel approach to multiclass tumor classification using Artificial Neural Networks (ANNs) was introduced in a recent paper cite{Khan2001}. The method successfully classified and diagnosed small, round blue cell tumors (SRBCTs) of childhood into four distinct categories, neuroblastoma (NB), rhabdomyosarcoma (RMS), non-Hodgkin lymphoma (NHL) and the Ewing family of tumors (EWS), using cDNA gene expression profiles of samples that included both tumor biopsy material and cell lines. We report that using an approach similar to the one reported by Yeang et al cite{Yeang2001}, i.e. multiclass classification by combining outputs of binary classifiers, we achieved equal accuracy with much fewer features. We report the performances of 3 binary classifiers (k-nearest neighbors (kNN), weighted-voting (WV), and support vector machines (SVM)) with 3 feature selection techniques (Golub's Signal to Noise (SN) ratios cite{Golub99}, Fisher scores (FSc) and Mukherjee's SVM feature selection (SVMFS))cite{Sayan98}.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Local descriptors are increasingly used for the task of object recognition because of their perceived robustness with respect to occlusions and to global geometrical deformations. Such a descriptor--based on a set of oriented Gaussian derivative filters-- is used in our recognition system. We report here an evaluation of several techniques for orientation estimation to achieve rotation invariance of the descriptor. We also describe feature selection based on a single training image. Virtual images are generated by rotating and rescaling the image and robust features are selected. The results confirm robust performance in cluttered scenes, in the presence of partial occlusions, and when the object is embedded in different backgrounds.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mosaics have been commonly used as visual maps for undersea exploration and navigation. The position and orientation of an underwater vehicle can be calculated by integrating the apparent motion of the images which form the mosaic. A feature-based mosaicking method is proposed in this paper. The creation of the mosaic is accomplished in four stages: feature selection and matching, detection of points describing the dominant motion, homography computation and mosaic construction. In this work we demonstrate that the use of color and textures as discriminative properties of the image can improve, to a large extent, the accuracy of the constructed mosaic. The system is able to provide 3D metric information concerning the vehicle motion using the knowledge of the intrinsic parameters of the camera while integrating the measurements of an ultrasonic sensor. The experimental results of real images have been tested on the GARBI underwater vehicle

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Photo-mosaicing techniques have become popular for seafloor mapping in various marine science applications. However, the common methods cannot accurately map regions with high relief and topographical variations. Ortho-mosaicing borrowed from photogrammetry is an alternative technique that enables taking into account the 3-D shape of the terrain. A serious bottleneck is the volume of elevation information that needs to be estimated from the video data, fused, and processed for the generation of a composite ortho-photo that covers a relatively large seafloor area. We present a framework that combines the advantages of dense depth-map and 3-D feature estimation techniques based on visual motion cues. The main goal is to identify and reconstruct certain key terrain feature points that adequately represent the surface with minimal complexity in the form of piecewise planar patches. The proposed implementation utilizes local depth maps for feature selection, while tracking over several views enables 3-D reconstruction by bundle adjustment. Experimental results with synthetic and real data validate the effectiveness of the proposed approach

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a new face verification algorithm based on Gabor wavelets and AdaBoost. In the algorithm, faces are represented by Gabor wavelet features generated by Gabor wavelet transform. Gabor wavelets with 5 scales and 8 orientations are chosen to form a family of Gabor wavelets. By convolving face images with these 40 Gabor wavelets, the original images are transformed into magnitude response images of Gabor wavelet features. The AdaBoost algorithm selects a small set of significant features from the pool of the Gabor wavelet features. Each feature is the basis for a weak classifier which is trained with face images taken from the XM2VTS database. The feature with the lowest classification error is selected in each iteration of the AdaBoost operation. We also address issues regarding computational costs in feature selection with AdaBoost. A support vector machine (SVM) is trained with examples of 20 features, and the results have shown a low false positive rate and a low classification error rate in face verification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new man-made target tracking algorithm integrating features from (Forward Looking InfraRed) image sequence is presented based on particle filter. Firstly, a multiscale fractal feature is used to enhance targets in FLIR images. Secondly, the gray space feature is defined by Bhattacharyya distance between intensity histograms of the reference target and a sample target from MFF (Multi-scale Fractal Feature) image. Thirdly, the motion feature is obtained by differencing between two MFF images. Fourthly, a fusion coefficient can be automatically obtained by online feature selection method for features integrating based on fuzzy logic. Finally, a particle filtering framework is developed to fulfill the target tracking. Experimental results have shown that the proposed algorithm can accurately track weak or small man-made target in FLIR images with complicated background. The algorithm is effective, robust and satisfied to real time tracking.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Accurate single trial P300 classification lends itself to fast and accurate control of Brain Computer Interfaces (BCIs). Highly accurate classification of single trial P300 ERPs is achieved by characterizing the EEG via corresponding stationary and time-varying Wackermann parameters. Subsets of maximally discriminating parameters are then selected using the Network Clustering feature selection algorithm and classified with Naive-Bayes and Linear Discriminant Analysis classifiers. Hence the method is assessed on two different data-sets from BCI competitions and is shown to produce accuracies of between approximately 70% and 85%. This is promising for the use of Wackermann parameters as features in the classification of single-trial ERP responses.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Three coupled knowledge transfer partnerships used pattern recognition techniques to produce an e-procurement system which, the National Audit Office reports, could save the National Health Service £500 m per annum. An extension to the system, GreenInsight, allows the environmental impact of procurements to be assessed and savings made. Both systems require suitable products to be discovered and equivalent products recognised, for which classification is a key component. This paper describes the innovative work done for product classification, feature selection and reducing the impact of mislabelled data.