195 resultados para Tree identification
em Indian Institute of Science - Bangalore - Índia
Resumo:
The problem of identifying user intent has received considerable attention in recent years, particularly in the context of improving the search experience via query contextualization. Intent can be characterized by multiple dimensions, which are often not observed from query words alone. Accurate identification of Intent from query words remains a challenging problem primarily because it is extremely difficult to discover these dimensions. The problem is often significantly compounded due to lack of representative training sample. We present a generic, extensible framework for learning the multi-dimensional representation of user intent from the query words. The approach models the latent relationships between facets using tree structured distribution which leads to an efficient and convergent algorithm, FastQ, for identifying the multi-faceted intent of users based on just the query words. We also incorporated WordNet to extend the system capabilities to queries which contain words that do not appear in the training data. Empirical results show that FastQ yields accurate identification of intent when compared to a gold standard.
Resumo:
Design of speaker identification schemes for a small number of speakers (around 10) with a high degree of accuracy in controlled environment is a practical proposition today. When the number of speakers is large (say 50–100), many of these schemes cannot be directly extended, as both recognition error and computation time increase monotonically with population size. The feature selection problem is also complex for such schemes. Though there were earlier attempts to rank order features based on statistical distance measures, it has been observed only recently that the best two independent measurements are not the same as the combination in two's for pattern classification. We propose here a systematic approach to the problem using the decision tree or hierarchical classifier with the following objectives: (1) Design of optimal policy at each node of the tree given the tree structure i.e., the tree skeleton and the features to be used at each node. (2) Determination of the optimal feature measurement and decision policy given only the tree skeleton. Applicability of optimization procedures such as dynamic programming in the design of such trees is studied. The experimental results deal with the design of a 50 speaker identification scheme based on this approach.
Resumo:
The hazards associated with major accident hazard (MAN) industries are fire, explosion and toxic gas releases. Of these, toxic gas release is the worst as it has the potential to cause extensive fatalities. Qualitative and quantitative hazard analyses are essential for the identification and quantification of these hazards related to chemical industries. Fault tree analysis (FTA) is an established technique in hazard identification. This technique has the advantage of being both qualitative and quantitative, if the probabilities and frequencies of the basic events are known. This paper outlines the estimation of the probability of release of chlorine from storage and filling facility of chlor-alkali industry using FTA. An attempt has also been made to arrive at the probability of chlorine release using expert elicitation and proven fuzzy logic technique for Indian conditions. Sensitivity analysis has been done to evaluate the percentage contribution of each basic event that could lead to chlorine release. Two-dimensional fuzzy fault tree analysis (TDFFTA) has been proposed for balancing the hesitation factor involved in expert elicitation. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Sandalwood is an economically important aromatic tree belonging to the family Santalaceae. The trees are used mainly for their fragrant heartwood and oil that have immense potential for foreign exchange. Very little information is available on the genetic diversity in this species. Hence studies were initiated and genetic diversity estimated using RAPD markers in 51 genotypes of Santalum album procured from different geographcial regions of India and three exotic lines of S. spicatum from Australia. Eleven selected Operon primers (10mer) generated a total of 156 consistent and unambiguous amplification products ranging from 200bp to 4kb. Rare and genotype specific bands were identified which could be effectively used to distinguish the genotypes. Genetic relationships within the genotypes were evaluated by generating a dissimilarity matrix based on Ward's method (Squared Euclidean distance). The phenetic dendrogram and the Principal Component Analysis generated, separated the 51 Indian genotypes from the three Australian lines. The cluster analysis indicated that sandalwood germplasm within India constitutes a broad genetic base with values of genetic dissimilarity ranging from 15 to 91 %. A core collection of 21 selected individuals revealed the same diversity of the entire population. The results show that RAPD analysis is an efficient marker technology for estimating genetic diversity and relatedness, thereby enabling the formulation of appropriate strategies for conservation, germplasm management, and selection of diverse parents for sandalwood improvement programmes.
Resumo:
Addition of estradiol 17-beta to first trimester human placental minces resulted in an increased synthesis of a protein of apparent molecular weight 45 kDa. The specific involvement of estrogen in the stimulation of this protein was established by demonstrating a reduction in the level of this protein by the addition of CCS 16949 A, an inhibitor of aromatase, a key enzyme in the biosynthesis of estradiol 17-beta and ICI 182,780, an estrogen receptor antagonist. The protein was purified to homogeneity and N-terminal sequencing of two of the internal peptides obtained by enzymatic digestion of the protein, as well as the absence of a free N-terminal indicated that it could be actin. This was confirmed by Western blotting using commercially available actin antiserum. The role of estradiol 17-beta in the stimulation of actin synthesis in human placenta was also established by monitoring the quantitative inhibition of DNase I by actin.
Identification of amino groups in the carbohydrate binding activity of winged bean acidic agglutinin
Resumo:
Chemical modification studies reveal that the modification of amino groups in WBA II leads to a complete loss in the hemagglutinating and saccharide binding activities. Since WBA II is a dimeric molecule and contains two binding sites, one amino group in each of the binding sites is inferred to be essential for its activity. The presence of amino group which has a potential to form hydrogen bonded interactions with the ligand, substantiates our observation regarding the forces involved in WBA II-receptor and WBA II-simple sugar interactions.
Inverse Sensitivity Analysis of Singular Solutions of FRF matrix in Structural System Identification
Resumo:
The problem of structural damage detection based on measured frequency response functions of the structure in its damaged and undamaged states is considered. A novel procedure that is based on inverse sensitivity of the singular solutions of the system FRF matrix is proposed. The treatment of possibly ill-conditioned set of equations via regularization scheme and questions on spatial incompleteness of measurements are considered. The application of the method in dealing with systems with repeated natural frequencies and (or) packets of closely spaced modes is demonstrated. The relationship between the proposed method and the methods based on inverse sensitivity of eigensolutions and frequency response functions is noted. The numerical examples on a 5-degree of freedom system, a one span free-free beam and a spatially periodic multi-span beam demonstrate the efficacy of the proposed method and its superior performance vis-a-vis methods based on inverse eigensensitivity.
Resumo:
Background: Tuberculosis still remains one of the largest killer infectious diseases, warranting the identification of newer targets and drugs. Identification and validation of appropriate targets for designing drugs are critical steps in drug discovery, which are at present major bottle-necks. A majority of drugs in current clinical use for many diseases have been designed without the knowledge of the targets, perhaps because standard methodologies to identify such targets in a high-throughput fashion do not really exist. With different kinds of 'omics' data that are now available, computational approaches can be powerful means of obtaining short-lists of possible targets for further experimental validation. Results: We report a comprehensive in silico target identification pipeline, targetTB, for Mycobacterium tuberculosis. The pipeline incorporates a network analysis of the protein-protein interactome, a flux balance analysis of the reactome, experimentally derived phenotype essentiality data, sequence analyses and a structural assessment of targetability, using novel algorithms recently developed by us. Using flux balance analysis and network analysis, proteins critical for survival of M. tuberculosis are first identified, followed by comparative genomics with the host, finally incorporating a novel structural analysis of the binding sites to assess the feasibility of a protein as a target. Further analyses include correlation with expression data and non-similarity to gut flora proteins as well as 'anti-targets' in the host, leading to the identification of 451 high-confidence targets. Through phylogenetic profiling against 228 pathogen genomes, shortlisted targets have been further explored to identify broad-spectrum antibiotic targets, while also identifying those specific to tuberculosis. Targets that address mycobacterial persistence and drug resistance mechanisms are also analysed. Conclusion: The pipeline developed provides rational schema for drug target identification that are likely to have high rates of success, which is expected to save enormous amounts of money, resources and time in the drug discovery process. A thorough comparison with previously suggested targets in the literature demonstrates the usefulness of the integrated approach used in our study, highlighting the importance of systems-level analyses in particular. The method has the potential to be used as a general strategy for target identification and validation and hence significantly impact most drug discovery programmes.
Resumo:
Assembly intermediates of icosahedral viruses are usually transient and are difficult to identify. In the present investigation, site-specific and deletion mutants of the coat protein gene of physalis mottle tymovirus (PhMV) were used to delineate the role of specific amino acid residues in the assembly of the virus and to identify intermediates in this process. N-terminal 30, 34, 35 and 39 amino acid deletion and single C-terminal (N188) deletion mutant proteins of PhMV were expressed in Escherichia coli. Site-specific mutants H69A, C75A, W96A, D144N, D144N-T151A, K143E and N188A were also constructed and expressed. The mutant protein lacking 30 amino acid residues from the N terminus self-assembled to T = 3 particles in vivo while deletions of 34, 35 and 39 amino acid residues resulted in the mutant proteins that were insoluble. Interestingly, the coat protein (pR PhCP) expressed using pRSET B vector with an additional 41 amino acid residues at the N terminus also assembled into T = 3 particles that were more compact and had a smaller diameter. These results demonstrate that the amino-terminal segment is flexible and either the deletion or addition of amino acid residues at the N terminus does not affect T = 3 capsid assembly, in contrast, the deletion of even a single residue from the C terminus (PhN188 Delta 1) resulted in capsids that were unstable. These capsids disassembled to a discrete intermediate with a sedimentation coefficent of 19.4 S. However, the replacement of C-terminal asparagine 188 by alanine led to the formation of stable capsids. The C75A and D144N mutant proteins also assembled into capsids that were as stable as the pR PhCP, suggesting that C75A and D144 are not crucial for the T = 3 capsid assembly. pR PhW96A and pR PhD144N-T151A mutant proteins failed to form capsids and were present as heterogeneous aggregates. Interestingly, the pR PhK143E mutant protein behaved in a manner similar to the C-terminal deletion protein in forming unstable capsids. The intermediate with an s value of 19.4 S was the major assembly product of pR PhH69A mutant protein and could correspond to a 30mer. It is possible that the assembly or disassembly is arrested at a similar stage in pR PhN188 Delta 1, pR PhH69A and pR PhK143E mutant proteins.
Resumo:
In this article, we present the detailed investigations on platinum related midgap state corresponding to E-c -0.52 eV probed by deep level transient spectroscopy. By irradiating the platinum doped samples with high-energy (1.1 MeV) gamma rays, we observed that the concentration of the midgap state increases and follows a square dependence with irradiation dose. However, the concentration of the acceptor corresponding to E-c -20.28 eV remained constant. Furthermore, from the studies on passivation by atomic hydrogen and thermal reactivation, we noticed that the E-c -0.52 eV level reappears in the samples annealed at high temperatures after hydrogenation. The interaction of platinum with various defects and the qualitative arguments based on the law of mass action suggest that the platinum related midgap defect might possibly correspond to the interstitial platinum-divacancy complex (V-Pt-V).
Resumo:
Background: Regulation of gene expression in Plasmodium falciparum (Pf) remains poorly understood. While over half the genes are estimated to be regulated at the transcriptional level, few regulatory motifs and transcription regulators have been found. Results: The study seeks to identify putative regulatory motifs in the upstream regions of 13 functional groups of genes expressed in the intraerythrocytic developmental cycle of Pf. Three motif-discovery programs were used for the purpose, and motifs were searched for only on the gene coding strand. Four motifs – the 'G-rich', the 'C-rich', the 'TGTG' and the 'CACA' motifs – were identified, and zero to all four of these occur in the 13 sets of upstream regions. The 'CACA motif' was absent in functional groups expressed during the ring to early trophozoite transition. For functional groups expressed in each transition, the motifs tended to be similar. Upstream motifs in some functional groups showed 'positional conservation' by occurring at similar positions relative to the translational start site (TLS); this increases their significance as regulatory motifs. In the ribonucleotide synthesis, mitochondrial, proteasome and organellar translation machinery genes, G-rich, C-rich, CACA and TGTG motifs, respectively, occur with striking positional conservation. In the organellar translation machinery group, G-rich motifs occur close to the TLS. The same motifs were sometimes identified for multiple functional groups; differences in location and abundance of the motifs appear to ensure different modes of action. Conclusion: The identification of positionally conserved over-represented upstream motifs throws light on putative regulatory elements for transcription in Pf.
Resumo:
Several techniques are known for searching an ordered collection of data. The techniques and analyses of retrieval methods based on primary attributes are straightforward. Retrieval using secondary attributes depends on several factors. For secondary attribute retrieval, the linear structures—inverted lists, multilists, doubly linked lists—and the recently proposed nonlinear tree structures—multiple attribute tree (MAT), K-d tree (kdT)—have their individual merits. It is shown in this paper that, of the two tree structures, MAT possesses several features of a systematic data structure for external file organisation which make it superior to kdT. Analytic estimates for the complexity of node searchers, in MAT and kdT for several types of queries, are developed and compared.
Resumo:
A variety of data structures such as inverted file, multi-lists, quad tree, k-d tree, range tree, polygon tree, quintary tree, multidimensional tries, segment tree, doubly chained tree, the grid file, d-fold tree. super B-tree, Multiple Attribute Tree (MAT), etc. have been studied for multidimensional searching and related problems. Physical data base organization, which is an important application of multidimensional searching, is traditionally and mostly handled by employing inverted file. This study proposes MAT data structure for bibliographic file systems, by illustrating the superiority of MAT data structure over inverted file. Both the methods are compared in terms of preprocessing, storage and query costs. Worst-case complexity analysis of both the methods, for a partial match query, is carried out in two cases: (a) when directory resides in main memory, (b) when directory resides in secondary memory. In both cases, MAT data structure is shown to be more efficient than the inverted file method. Arguments are given to illustrate the superiority of MAT data structure in an average case also. An efficient adaptation of MAT data structure, that exploits the special features of MAT structure and bibliographic files, is proposed for bibliographic file systems. In this adaptation, suitable techniques for fixing and ranking of the attributes for MAT data structure are proposed. Conclusions and proposals for future research are presented.
Resumo:
A method is presented for identification of parameters in unconfined aquifers from pumping tests, based on the optimisation of the objective function using the least squares approach. Four parameters are to be evaluated, namely: The hydraulic conductivity in the radial and the vertical directions, the storage coefficient and the specific yield. The sensitivity analysis technique is used for solving the optimisation problem. Besides eliminating the subjectivity involved in the graphical procedure, the method takes into account the field data at all time intervals without classifying them into small and large time intervals and does not use the approximation that the ratio of the storage coefficient to the specific yield tends to zero. Two illustrative examples are presented and it is found that the parameter estimates from the computational and graphical procedures differ fairly significantly.
Resumo:
The minimum cost classifier when general cost functionsare associated with the tasks of feature measurement and classification is formulated as a decision graph which does not reject class labels at intermediate stages. Noting its complexities, a heuristic procedure to simplify this scheme to a binary decision tree is presented. The optimizationof the binary tree in this context is carried out using ynamicprogramming. This technique is applied to the voiced-unvoiced-silence classification in speech processing.