135 resultados para Automatic classification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Protein phosphorylation is a generic way to regulate signal transduction pathways in all kingdoms of life. In many organisms, it is achieved by the large family of Ser/Thr/Tyr protein kinases which are traditionally classified into groups and subfamilies on the basis of the amino acid sequence of their catalytic domains. Many protein kinases are multidomain in nature but the diversity of the accessory domains and their organization are usually not taken into account while classifying kinases into groups or subfamilies. Methodology: Here, we present an approach which considers amino acid sequences of complete gene products, in order to suggest refinements in sets of pre-classified sequences. The strategy is based on alignment-free similarity scores and iterative Area Under the Curve (AUC) computation. Similarity scores are computed by detecting common patterns between two sequences and scoring them using a substitution matrix, with a consistent normalization scheme. This allows us to handle full-length sequences, and implicitly takes into account domain diversity and domain shuffling. We quantitatively validate our approach on a subset of 212 human protein kinases. We then employ it on the complete repertoire of human protein kinases and suggest few qualitative refinements in the subfamily assignment stored in the KinG database, which is based on catalytic domains only. Based on our new measure, we delineate 37 cases of potential hybrid kinases: sequences for which classical classification based entirely on catalytic domains is inconsistent with the full-length similarity scores computed here, which implicitly consider multi-domain nature and regions outside the catalytic kinase domain. We also provide some examples of hybrid kinases of the protozoan parasite Entamoeba histolytica. Conclusions: The implicit consideration of multi-domain architectures is a valuable inclusion to complement other classification schemes. The proposed algorithm may also be employed to classify other families of enzymes with multidomain architecture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Equilibrium sediment volume tests are conducted on field soils to classify them based on their degree of expansivity and/or to predict the liquid limit of soils. The present technical paper examines different equilibrium sediment volume tests, critically evaluating each of them. It discusses the settling behavior of fine-grained soils during the soil sediment formation to evolve a rationale for conducting the latest version of equilibrium sediment volume test. Probable limitations of equilibrium sediment volume test and the possible solution to overcome the same have also been indicated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a novel mimetic technique of using frequency domain approach and digital filters for automatic generation of EEG reports. Digitized EEG data files, transported on a cartridge, have been used for the analysis. The signals are filtered for alpha, beta, theta and delta bands with digital bandpass filters of fourth-order, cascaded, Butterworth, infinite impulse response (IIR) type. The maximum amplitude, mean frequency, continuity index and degree of asymmetry have been computed for a given EEG frequency band. Finally, searches for the presence of artifacts (eye movement or muscle artifacts) in the EEG records have been made.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several researchers have looked into various issues related to automatic parallelization of sequential programs for multicomputers. But there is a need for a coherent framework which encompasses all these issues. In this paper we present a such a framework which takes best advantage of the multicomputer architecture. We resort to tiling transformation for iteration space partitioning and propose a scheme of automatic data partitioning and dynamic data distribution. We have tried a simple implementation of our scheme on a transputer based multicomputer [1] and the results are encouraging.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Formal specification is vital to the development of distributed real-time systems as these systems are inherently complex and safety-critical. It is widely acknowledged that formal specification and automatic analysis of specifications can significantly increase system reliability. Although a number of specification techniques for real-time systems have been reported in the literature, most of these formalisms do not adequately address to the constraints that the aspects of 'distribution' and 'real-time' impose on specifications. Further, an automatic verification tool is necessary to reduce human errors in the reasoning process. In this regard, this paper is an attempt towards the development of a novel executable specification language for distributed real-time systems. First, we give a precise characterization of the syntax and semantics of DL. Subsequently, we discuss the problems of model checking, automatic verification of satisfiability of DL specifications, and testing conformance of event traces with DL specifications. Effective solutions to these problems are presented as extensions to the classical first-order tableau algorithm. The use of the proposed framework is illustrated by specifying a sample problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An explicit construction of all the homogeneous holomorphic Hermitian vector bundles over the unit disc D is given. It is shown that every such vector bundle is a direct sum of irreducible ones. Among these irreducible homogeneous holomorphic Hermitian vector bundles over D, the ones corresponding to operators in the Cowen-Douglas class B-n(D) are identified. The classification of homogeneous operators in B-n(D) is completed using an explicit realization of these operators. We also show how the homogeneous operators in B-n(D) split into similarity classes. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper studies the problem of constructing robust classifiers when the training is plagued with uncertainty. The problem is posed as a Chance-Constrained Program (CCP) which ensures that the uncertain data points are classified correctly with high probability. Unfortunately such a CCP turns out to be intractable. The key novelty is in employing Bernstein bounding schemes to relax the CCP as a convex second order cone program whose solution is guaranteed to satisfy the probabilistic constraint. Prior to this work, only the Chebyshev based relaxations were exploited in learning algorithms. Bernstein bounds employ richer partial information and hence can be far less conservative than Chebyshev bounds. Due to this efficient modeling of uncertainty, the resulting classifiers achieve higher classification margins and hence better generalization. Methodologies for classifying uncertain test data points and error measures for evaluating classifiers robust to uncertain data are discussed. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle data uncertainty and outperform state-of-the-art in many cases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces >= 90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A general analysis of squeezing transformations for two-mode systems is given based on the four-dimensional real symplectic group Sp(4, R). Within the framework of the unitary (metaplectic) representation of this group, a distinction between compact photon-number-conserving and noncompact photon-number-nonconserving squeezing transformations is made. We exploit the U(2) invariant squeezing criterion to divide the set of all squeezing transformations into a two-parameter family of distinct equivalence classes with representative elements chosen for each class. Familiar two-mode squeezing transformations in the literature are recognized in our framework and seen to form a set of measure zero. Examples of squeezed coherent and thermal states are worked out. The need to extend the heterodyne detection scheme to encompass all of U(2) is emphasized, and known experimental situations where all U(2) elements can be reproduced are briefly described.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A practical method is proposed to identify the mode associated with the frequency part of the eigenvalue of the Floquet transition matrix (FTM). From the FTM eigenvector, which contains the states and their derivatives, the ratio of the derivative and the state corresponding to the largest component is computed. The method exploits the fact that the imaginary part of this (complex) ratio closely approximates the frequency of the mode. It also lends itself well to automation and has been tested over a large number of FTMs of order as high as 250.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The efficacy of the multifractal spectrum as a tool for characterizing images has been studied. This spectrum has been computed for digitized images of the nucleus of human cervical cancer cells and it was observed that the entire spectrum is almost fully reproduced for a normal cell while only the right half (q<0) of the spectrum is reproduced for a cancerous cell. Cells in stages in between the two extremes show a shortening of the left half of the spectrum proportional to their condition. The extent of this shortening has been found to be sufficient to permit a classification between three classes of cells at varying distances from a basal cancerous layer-the superficial cells, the intermediate cells and the parabasal cells. This technique may be used for automatic screening of the population while also indicating the stage of malignancy

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Three classification techniques, namely, K-means Cluster Analysis (KCA), Fuzzy Cluster Analysis (FCA), and Kohonen Neural Networks (KNN) were employed to group 25 microwatersheds of Kherthal watershed, Rajasthan into homogeneous groups for formulating the basis for suitable conservation and management practices. Ten parameters, mainly, morphological, namely, drainage density (D-d), bifurcation ratio (R-b), stream frequency (F-u), length of overland flow (L-o), form factor (R-f), shape factor (B-s), elongation ratio (R-e), circulatory ratio (R-c), compactness coefficient (C-c) and texture ratio (T) are used for the classification. Optimal number of groups is chosen, based on two cluster validation indices Davies-Bouldin and Dunn's. Comparative analysis of various clustering techniques revealed that 13 microwatersheds out of 25 are commonly suggested by KCA, FCA and KNN i.e., 52%; 17 microwatersheds out of 25 i.e., 68% are commonly suggested by KCA and FCA whereas these are 16 out of 25 in FCA and KNN (64%) and 15 out of 25 in KNN and CA (60%). It is observed from KNN sensitivity analysis that effect of various number of epochs (1000, 3000, 5000) and learning rates (0.01, 0.1-0.9) on total squared error values is significant even though no fixed trend is observed. Sensitivity analysis studies revealed that microwatershecls have occupied all the groups even though their number in each group is different in case of further increase in the number of groups from 5 to 6, 7 and 8. (C) 2010 International Association of Hydro-environment Engineering and Research, Asia Pacific Division. Published by Elsevier B.V. All rights reserved.