979 resultados para size-selection


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classification of a large document collection involves dealing with a huge feature space where each distinct word is a feature. In such an environment, classification is a costly task both in terms of running time and computing resources. Further it will not guarantee optimal results because it is likely to overfit by considering every feature for classification. In such a context, feature selection is inevitable. This work analyses the feature selection methods, explores the relations among them and attempts to find a minimal subset of features which are discriminative for document classification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search. Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reports on the alloys formed from immiscible atoms when they are contained in a nano-sized system have initiated several research activities in the recent years. Bridging of the miscibility gap at nanoscale is significant as it has the potential to produce novel alloy materials with useful technological applications. Although the literature contains noticeable number of reports on the formation of solid solution between bulk immiscible atoms, several issues related to phase stability and microstructure remain unaddressed. This article discusses some of these issues using examples from the work done by the author's research group on isolated nanoparticles of bulk immiscible binary systems such as Ag-Ni, Ag-Fe and Ag-Co.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.