47 resultados para Keys to Database Searching
Resumo:
This paper describes the efforts at MILE lab, IISc, to create a 100,000-word database each in Kannada and Tamil for the design and development of Online Handwritten Recognition. It has been collected from over 600 users in order to capture the variations in writing style. We describe features of the scripts and how the number of symbols were reduced to be able to effectively train the data for recognition. The list of words include all the characters, Kannada and Indo-Arabic numerals, punctuations and other symbols. A semi-automated tool for the annotation of data from stroke to word level is used. It segments each word into stroke groups and also acts as a validation mechanism for segmentation. The tool displays the stroke, stroke groups and aksharas of a word and hence can be used to study the various styles of writing, delayed strokes and for assigning quality tags to the words. The tool is currently being used for annotating Tamil and Kannada data. The output is stored in a standard XML format.
Resumo:
This paper presents the preliminary analysis of Kannada WordNet and the set of relevant computational tools. Although the design has been inspired by the famous English WordNet, and to certain extent, by the Hindi WordNet, the unique features of Kannada WordNet are graded antonyms and meronymy relationships, nominal as well as verbal compoundings, complex verb constructions and efficient underlying database design (designed to handle storage and display of Kannada unicode characters). Kannada WordNet would not only add to the sparse collection of machine-readable Kannada dictionaries, but also will give new insights into the Kannada vocabulary. It provides sufficient interface for applications involved in Kannada machine translation, spell checker and semantic analyser.
Resumo:
Multidrug-resistant Salmonella serovars have been a recent concern in curing infectious diseases like typhoid. Salmonella BaeS and BaeR are the two-component system (TCS) that signal transduction proteins found to play an important role in its multidrug resistance. A canonical TCS comprises a histidine kinase (HK) and its cognate partner response regulator (RR). The general approaches for therapeutic targeting are either the catalytic ATP-binding domain or the dimerization domain HisKA (DHp) of the HK, and in some cases, the receiver or the regulatory domain of the RR proteins. Earlier efforts of identifying novel drugs targeting the signal transduction protein have not been quite successful, as it shares similar ATP-binding domain with the key house keeping gene products of the mammalian GHL family. However, targeting the dimerization domain of HisKA through which the signals are received from the RR can be a better approach. In this article, we show stepwise procedure to specifically identify the key interacting residues involved in the dimerization with the RR along with effective targeting by ligands screened from the public database. We have found a few inhibitors which target effectively the important residues for the dimerization activity. Our results suggest a plausible de novo design of better DHp domain inhibitors.
Resumo:
In this paper, we propose a new sub-band approach to estimate the glottal activity. The method is based on the spectral harmonicity and the sub-band temporal properties of voiced speech. We propose a method to represent glottal excitation signal using sub-band temporal envelope. Instants of maximum glottal excitation or Glottal Closure Instants (GCI) are extracted from the estimated glottal excitation pattern and the result is compared with a standard GCI computation method, DYPSA [1]. The performance of the algorithm is also compared for the noisy signal and it is shown that the proposed method is less variant to GCI estimation under noisy conditions compared to DYPSA. The algorithm is evaluated on the CMU-ARCTIC database.
Resumo:
Protein structure alignment is a crucial step in protein structure-function analysis. Despite the advances in protein structure alignment algorithms, some of the local conformationally similar regions are mislabeled as structurally variable regions (SVRs). These regions are not well superimposed because of differences in their spatial orientations. The Database of Structural Alignments (DoSA) addresses this gap in identification of local structural similarities obscured in global protein structural alignments by realigning SVRs using an algorithm based on protein blocks. A set of protein blocks is a structural alphabet that abstracts protein structures into 16 unique local structural motifs. DoSA provides unique information about 159 780 conformationally similar and 56 140 conformationally dissimilar SVRs in 74 705 pairwise structural alignments of homologous proteins. The information provided on conformationally similar and dissimilar SVRs can be helpful to model loop regions. It is also conceivable that conformationally similar SVRs with conserved residues could potentially contribute toward functional integrity of homologues, and hence identifying such SVRs could be helpful in understanding the structural basis of protein function.
Resumo:
Residue depth accurately measures burial and parameterizes local protein environment. Depth is the distance of any atom/residue to the closest bulk water. We consider the non-bulk waters to occupy cavities, whose volumes are determined using a Voronoi procedure. Our estimation of cavity sizes is statistically superior to estimates made by CASTp and VOIDOO, and on par with McVol over a data set of 40 cavities. Our calculated cavity volumes correlated best with the experimentally determined destabilization of 34 mutants from five proteins. Some of the cavities identified are capable of binding small molecule ligands. In this study, we have enhanced our depth-based predictions of binding sites by including evolutionary information. We have demonstrated that on a database (LigASite) of similar to 200 proteins, we perform on par with ConCavity and better than MetaPocket 2.0. Our predictions, while less sensitive, are more specific and precise. Finally, we use depth (and other features) to predict pK(a)s of GLU, ASP, LYS and HIS residues. Our results produce an average error of just <1 pH unit over 60 predictions. Our simple empirical method is statistically on par with two and superior to three other methods while inferior to only one. The DEPTH server (http://mspc.bii.a-star.edu.sg/depth/) is an ideal tool for rapid yet accurate structural analyses of protein structures.
Resumo:
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community. (C) 2014 Acoustical Society of America.
Resumo:
In this paper, we propose an eigen framework for transmit beamforming for single-hop and dual-hop network models with single antenna receivers. In cases where number of receivers is not more than three, the proposed Eigen approach is vastly superior in terms of ease of implementation and computational complexity compared with the existing convex-relaxation-based approaches. The essential premise is that the precoding problems can be posed as equivalent optimization problems of searching for an optimal vector in the joint numerical range of Hermitian matrices. We show that the latter problem has two convex approximations: the first one is a semi-definite program that yields a lower bound on the solution, and the second one is a linear matrix inequality that yields an upper bound on the solution. We study the performance of the proposed and existing techniques using numerical simulations.
Resumo:
Background: Haemophilus influenzae (H. Influenzae) is the causative agent of pneumonia, bacteraemia and meningitis. The organism is responsible for large number of deaths in both developed and developing countries. Even-though the first bacterial genome to be sequenced was that of H. Influenzae, there is no exclusive database dedicated for H. Influenzae. This prompted us to develop the Haemophilus influenzae Genome Database (HIGDB). Methods: All data of HIGDB are stored and managed in MySQL database. The HIGDB is hosted on Solaris server and developed using PERL modules. Ajax and JavaScript are used for the interface development. Results: The HIGDB contains detailed information on 42,741 proteins, 18,077 genes including 10 whole genome sequences and also 284 three dimensional structures of proteins of H. influenzae. In addition, the database provides ``Motif search'' and ``GBrowse''. The HIGDB is freely accessible through the URL:http://bioserverl.physicslisc.ernetin/HIGDB/. Discussion: The HIGDB will be a single point access for bacteriological, clinical, genomic and proteomic information of H. influenzae. The database can also be used to identify DNA motifs within H. influenzae genomes and to compare gene or protein sequences of a particular strain with other strains of H. influenzae. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
Streptococcus pneumoniae causes pneumonia, septicemia and meningitis. S. pneumoniae is responsible for significant mortality both in children and in the elderly. In recent years, the whole genome sequencing of various S. pneumoniae strains have increased manifold and there is an urgent need to provide organism specific annotations to the scientific community. This prompted us to develop the Streptococcus pneumoniae Genome Database (SPGDB) to integrate and analyze the completely sequenced and available S. pneumoniae genome sequences. Further, links to several tools are provided to compare the pool of gene and protein sequences, and proteins structure across different strains of S. pneumoniae. SPGDB aids in the analysis of phenotypic variations as well as to perform extensive genomics and evolutionary studies with reference to S. pneumoniae. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
NrichD
Resumo:
Facial emotions are the most expressive way to display emotions. Many algorithms have been proposed which employ a particular set of people (usually a database) to both train and test their model. This paper focuses on the challenging task of database independent emotion recognition, which is a generalized case of subject-independent emotion recognition. The emotion recognition system employed in this work is a Meta-Cognitive Neuro-Fuzzy Inference System (McFIS). McFIS has two components, a neuro-fuzzy inference system, which is the cognitive component and a self-regulatory learning mechanism, which is the meta-cognitive component. The meta-cognitive component, monitors the knowledge in the neuro-fuzzy inference system and decides on what-to-learn, when-to-learn and how-to-learn the training samples, efficiently. For each sample, the McFIS decides whether to delete the sample without being learnt, use it to add/prune or update the network parameter or reserve it for future use. This helps the network avoid over-training and as a result improve its generalization performance over untrained databases. In this study, we extract pixel based emotion features from well-known (Japanese Female Facial Expression) JAFFE and (Taiwanese Female Expression Image) TFEID database. Two sets of experiment are conducted. First, we study the individual performance of both databases on McFIS based on 5-fold cross validation study. Next, in order to study the generalization performance, McFIS trained on JAFFE database is tested on TFEID and vice-versa. The performance The performance comparison in both experiments against SVNI classifier gives promising results.
Resumo:
3,4-Dichlorophenol (1) crystallizes in the tetragonal space group I4(1)/a with a short axis of 3.7926 (9) angstrom. The structure is unique in that both type I and type II Cl.....Cl interactions are present, these contact types being distinguished by the angle ranges of the respective C-Cl....Cl angles. The present study shows that these two types of contacts are utterly different. The crystal structures of 4-bromo-3-chlorophenol (2) and 3-bromo-4-chlorophenol (3) have been determined. The crystal structure of (2) is isomorphous to that of (1) with the Br atom in the 4-position participating in a type II interaction. However, the monoclinic P2(1)/c packing of compound (3) is different; while the structure still has O-H....O hydrogen bonds, the tetramer O-H.....O synthon seen in (1) and (2) is not seen. Rather than a type I Br....Br interaction which would have been mandated if (3) were isomorphous to (1) and (2), Br forms a Br....O contact wherein its electrophilic character is clearly evident. Crystal structures of the related compounds 4-chloro-3-iodophenol (4) and 3,5-dibromophenol (5) were also determined. A computational survey of the structural landscape was undertaken for (1), (2) and (3), using a crystal structure prediction protocol in space groups P2(1)/c and I4(1)/a with the COMPASS26 force field. While both tetragonal and monoclinic structures are energetically reasonable for all compounds, the fact that (3) takes the latter structure indicates that Br prefers type II over type I contacts. In order to differentiate further between type I and type II halogen contacts, which being chemically distinct are expected to have different distance fall-off properties, a variable-temperature crystallography study was performed on compounds (1), (2) and (4). Length variations with temperature are greater for type II contacts compared with type I. The type II Br....Br interaction in (2) is stronger than the corresponding type II Cl....Cl interaction in (1), leading to elastic bending of the former upon application of mechanical stress, which contrasts with the plastic deformation of (1). The observation of elastic deformation in (2) is noteworthy; in that it finds an explanation based on the strengths of the respective halogen bonds, it could also be taken as a good starting model for future property design. Cl/Br isostructurality is studied with the Cambridge Structural Database and it is indicated that this isostructurality is based on shape and size similarity of Cl and Br, rather than arising from any chemical resemblance.
Resumo:
In big data image/video analytics, we encounter the problem of learning an over-complete dictionary for sparse representation from a large training dataset, which cannot be processed at once because of storage and computational constraints. To tackle the problem of dictionary learning in such scenarios, we propose an algorithm that exploits the inherent clustered structure of the training data and make use of a divide-and-conquer approach. The fundamental idea behind the algorithm is to partition the training dataset into smaller clusters, and learn local dictionaries for each cluster. Subsequently, the local dictionaries are merged to form a global dictionary. Merging is done by solving another dictionary learning problem on the atoms of the locally trained dictionaries. This algorithm is referred to as the split-and-merge algorithm. We show that the proposed algorithm is efficient in its usage of memory and computational complexity, and performs on par with the standard learning strategy, which operates on the entire data at a time. As an application, we consider the problem of image denoising. We present a comparative analysis of our algorithm with the standard learning techniques that use the entire database at a time, in terms of training and denoising performance. We observe that the split-and-merge algorithm results in a remarkable reduction of training time, without significantly affecting the denoising performance.
Resumo:
Although several factors have been suggested to contribute to thermostability, the stabilization strategies used by proteins are still enigmatic. Studies on a recombinant xylanase from Bacilllus sp. NG-27 (RBSX), which has the ubiquitous (beta/alpha)(8)-triosephosphate isomerase barrel fold, showed that just a single mutation, V1L, although not located in any secondary structural element, markedly enhanced the stability from 70 degrees C to 75 degrees C without loss of catalytic activity. Conversely, the V1A mutation at the same position decreased the stability of the enzyme from 70 degrees C to 68 degrees C. To gain structural insights into how a single extreme N-terminus mutation can markedly influence the thermostability of the enzyme, we determined the crystal structure of RBSX and the two mutants. On the basis of computational analysis of their crystal structures, including residue interaction networks, we established a link between N-terminal to C-terminal contacts and RBSX thermostability. Our study reveals that augmenting N-terminal to C-terminal noncovalent interactions is associated with enhancement of the stability of the enzyme. In addition, we discuss several lines of evidence supporting a connection between N-terminal to C-terminal noncovalent interactions and protein stability in different proteins. We propose that the strategy of mutations at the termini could be exploited with a view to modulate stability without compromising enzymatic activity, or in general, protein function in diverse folds where N and C termini are in close proximity. Database The coordinates of RBSX, V1A and V1L have been deposited in the PDB database under the accession numbers 4QCE, 4QCF, and 4QDM, respectively