19 resultados para Knowledge Information Objects
Resumo:
Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.
Resumo:
Structural Support Vector Machines (SSVMs) and Conditional Random Fields (CRFs) are popular discriminative methods used for classifying structured and complex objects like parse trees, image segments and part-of-speech tags. The datasets involved are very large dimensional, and the models designed using typical training algorithms for SSVMs and CRFs are non-sparse. This non-sparse nature of models results in slow inference. Thus, there is a need to devise new algorithms for sparse SSVM and CRF classifier design. Use of elastic net and L1-regularizer has already been explored for solving primal CRF and SSVM problems, respectively, to design sparse classifiers. In this work, we focus on dual elastic net regularized SSVM and CRF. By exploiting the weakly coupled structure of these convex programming problems, we propose a new sequential alternating proximal (SAP) algorithm to solve these dual problems. This algorithm works by sequentially visiting each training set example and solving a simple subproblem restricted to a small subset of variables associated with that example. Numerical experiments on various benchmark sequence labeling datasets demonstrate that the proposed algorithm scales well. Further, the classifiers designed are sparser than those designed by solving the respective primal problems and demonstrate comparable generalization performance. Thus, the proposed SAP algorithm is a useful alternative for sparse SSVM and CRF classifier design.
Resumo:
With the preponderance of multidomain proteins in eukaryotic genomes, it is essential to recognize the constituent domains and their functions. Often function involves communications across the domain interfaces, and the knowledge of the interacting sites is essential to our understanding of the structure-function relationship. Using evolutionary information extracted from homologous domains in at least two diverse domain architectures (single and multidomain), we predict the interface residues corresponding to domains from the two-domain proteins. We also use information from the three-dimensional structures of individual domains of two-domain proteins to train naive Bayes classifier model to predict the interfacial residues. Our predictions are highly accurate (approximate to 85%) and specific (approximate to 95%) to the domain-domain interfaces. This method is specific to multidomain proteins which contain domains in at least more than one protein architectural context. Using predicted residues to constrain domain-domain interaction, rigid-body docking was able to provide us with accurate full-length protein structures with correct orientation of domains. We believe that these results can be of considerable interest toward rational protein and interaction design, apart from providing us with valuable information on the nature of interactions. Proteins 2014; 82:1219-1234. (c) 2013 Wiley Periodicals, Inc.
Resumo:
Online Social Networks (OSNs) facilitate to create and spread information easily and rapidly, influencing others to participate and propagandize. This work proposes a novel method of profiling Influential Blogger (IB) based on the activities performed on one's blog documents who influences various other bloggers in Social Blog Network (SBN). After constructing a social blogging site, a SBN is analyzed with appropriate parameters to get the Influential Blog Power (IBP) of each blogger in the network and demonstrate that profiling IB is adequate and accurate. The proposed Profiling Influential Blogger (PIB) Algorithm survival rate of IB is high and stable. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).