851 resultados para Topic discovery
Resumo:
In this paper we consider the process of discovering frequent episodes in event sequences. The most computationally intensive part of this process is that of counting the frequencies of a set of candidate episodes. We present two new frequency counting algorithms for speeding up this part. These, referred to as non-overlapping and non-inteleaved frequency counts, are based on directly counting suitable subsets of the occurrences of an episode. Hence they are different from the frequency counts of Mannila et al [1], where they count the number of windows in which the episode occurs. Our new frequency counts offer a speed-up factor of 7 or more on real and synthetic datasets. We also show how the new frequency counts can be used when the events in episodes have time-durations as well.
Resumo:
Frequent episode discovery framework is a popular framework in temporal data mining with many applications. Over the years, many different notions of frequencies of episodes have been proposed along with different algorithms for episode discovery. In this paper, we present a unified view of all the apriori-based discovery methods for serial episodes under these different notions of frequencies. Specifically, we present a unified view of the various frequency counting algorithms. We propose a generic counting algorithm such that all current algorithms are special cases of it. This unified view allows one to gain insights into different frequencies, and we present quantitative relationships among different frequencies. Our unified view also helps in obtaining correctness proofs for various counting algorithms as we show here. It also aids in understanding and obtaining the anti-monotonicity properties satisfied by the various frequencies, the properties exploited by the candidate generation step of any apriori-based method. We also point out how our unified view of counting helps to consider generalization of the algorithm to count episodes with general partial orders.
Resumo:
Quest for new drug targets in Plasmodium sp. has underscored malonyl CoA:ACP transacylase (PfFabD) of fatty acid biosynthetic pathway in apicoplast. In this study, a piggyback approach was employed for the receptor deorphanization using inhibitors of bacterial FabD enzymes. Due to the lack of crystal structure, theoretical model was constructed using the structural details of homologous enzymes. Sequence and structure analysis has localized the presence of two conserved pentapeptide motifs: GQGXG and GXSXG and five key invariant residues viz., Gln109, Ser193, Arg218, His305 and Gln354 characteristic of FabD enzyme. Active site mapping of PfFabD using substrate molecules has disclosed the spatial arrangement of key residues in the cavity. As structurally similar molecules exhibit similar biological activities, signature pharmacophore fingerprints of FabD antagonists were generated using 0D-3D descriptors for molecular similarity-based cluster analysis and to correlate with their binding profiles. It was observed that antagonists showing good geometrical fitness score were grouped in cluster-1, whereas those exhibiting high binding affinities in cluster-2. This study proves important to shed light on the active site environment to reveal the hotspot for binding with higher affinity and to narrow down the virtual screening process by searching for close neighbors of the active compounds.
Resumo:
Introduction: Advances in genomics technologies are providing a very large amount of data on genome-wide gene expression profiles, protein molecules and their interactions with other macromolecules and metabolites. Molecular interaction networks provide a useful way to capture this complex data and comprehend it. Networks are beginning to be used in drug discovery, in many steps of the modern discovery pipeline, with large-scale molecular networks being particularly useful for the understanding of the molecular basis of the disease. Areas covered: The authors discuss network approaches used for drug target discovery and lead identification in the drug discovery pipeline. By reconstructing networks of targets, drugs and drug candidates as well as gene expression profiles under normal and disease conditions, the paper illustrates how it is possible to find relationships between different diseases, find biomarkers, explore drug repurposing and study emergence of drug resistance. Furthermore, the authors also look at networks which address particular important aspects such as off-target effects, combination-targets, mechanism of drug action and drug safety. Expert opinion: The network approach represents another paradigm shift in drug discovery science. A network approach provides a fresh perspective of understanding important proteins in the context of their cellular environments, providing a rational basis for deriving useful strategies in drug design. Besides drug target identification and inferring mechanism of action, networks enable us to address new ideas that could prove to be extremely useful for new drug discovery, such as drug repositioning, drug synergy, polypharmacology and personalized medicine.
Resumo:
We report a series of new glitazones incorporated with phenylalanine and tyrosine. All the compounds were tested for their in vitro glucose uptake activity using rat-hemidiaphragm, both in presence and absence of insulin. Six of the most active compounds from the in vitro screening were taken forward for their in vivo triglyceride and glucose lowering activity against dexamethazone induced hyperlipidemia and insulin resistance in Wistar rats. The liver samples of rats that received the most active compounds, 23 and 24, in the in vivo studies, were subjected to histopathological examination to assess their short term hepatotoxicity. The investigations on the in vitro glucose uptake, in vivo triglyceride and glucose lowering activity are described here along with the quantitative structure-activity relationships. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Facet-based sentiment analysis involves discovering the latent facets, sentiments and their associations. Traditional facet-based sentiment analysis algorithms typically perform the various tasks in sequence, and fail to take advantage of the mutual reinforcement of the tasks. Additionally,inferring sentiment levels typically requires domain knowledge or human intervention. In this paper, we propose aseries of probabilistic models that jointly discover latent facets and sentiment topics, and also order the sentiment topics with respect to a multi-point scale, in a language and domain independent manner. This is achieved by simultaneously capturing both short-range syntactic structure and long range semantic dependencies between the sentiment and facet words. The models further incorporate coherence in reviews, where reviewers dwell on one facet or sentiment level before moving on, for more accurate facet and sentiment discovery. For reviews which are supplemented with ratings, our models automatically order the latent sentiment topics, without requiring seed-words or domain-knowledge. To the best of our knowledge, our work is the first attempt to combine the notions of syntactic and semantic dependencies in the domain of review mining. Further, the concept of facet and sentiment coherence has not been explored earlier either. Extensive experimental results on real world review data show that the proposed models outperform various state of the art baselines for facet-based sentiment analysis.
Resumo:
There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
Resumo:
We demonstrate the possibility of accelerated identification of potential compositions for high-temperature shape memory alloys (SMAs) through a combinatorial material synthesis and analysis approach, wherein we employ the combination of diffusion couple and indentation techniques. The former was utilized to generate smooth and compositionally graded inter-diffusion zones (IDZs) in the Ni-Ti-Pd ternary alloy system of varying IDZ thickness, depending on the annealing time at high temperature. The IDZs thus produced were then impressed with an indenter with a spherical tip so as to inscribe a predetermined indentation strain. Subsequent annealing of the indented samples at various elevated temperatures, T-a, ranging between 150 and 550 degrees C allows for partial to full relaxation of the strain imposed due to the shape memory effect. If T-a is above the austenite finish temperature, A(f), the relaxation will be complete. By measuring the depth recovery, which serves as a proxy for the shape recovery characteristic of the SMA, a three-dimensional map in the recovery temperature composition space is constructed. A comparison of the published Af data for different compositions with the Ta data shows good agreement when the depth recovery is between 70% and 80%, indicating that the methodology proposed in this paper can be utilized for the identification of promising compositions. Advantages and further possibilities of this methodology are discussed.
Resumo:
Frequent episode discovery is a popular framework for pattern discovery from sequential data. It has found many applications in domains like alarm management in telecommunication networks, fault analysis in the manufacturing plants, predicting user behavior in web click streams and so on. In this paper, we address the discovery of serial episodes. In the episodes context, there have been multiple ways to quantify the frequency of an episode. Most of the current algorithms for episode discovery under various frequencies are apriori-based level-wise methods. These methods essentially perform a breadth-first search of the pattern space. However currently there are no depth-first based methods of pattern discovery in the frequent episode framework under many of the frequency definitions. In this paper, we try to bridge this gap. We provide new depth-first based algorithms for serial episode discovery under non-overlapped and total frequencies. Under non-overlapped frequency, we present algorithms that can take care of span constraint and gap constraint on episode occurrences. Under total frequency we present an algorithm that can handle span constraint. We provide proofs of correctness for the proposed algorithms. We demonstrate the effectiveness of the proposed algorithms by extensive simulations. We also give detailed run-time comparisons with the existing apriori-based methods and illustrate scenarios under which the proposed pattern-growth algorithms perform better than their apriori counterparts. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
It is particularly appropriate that the Journal of the Indian Institute of Science is bringing out a commemorative issue to mark the International Year of Crystallography 2014 (IYCr2014). India has had a strong crystallographic tradition, and the earliest work in what may be described as structural crystallography from this country is the work of K. Banerjee on the determination of the crystal structure of naphthalene in 1930. The Indian Institute of Science itself has played no small part in establishing and sustaining the subject of crystallography in this country. A large number of papers in this special issue are written by authors who have either have been trained in the Institute or who have some kind of professional association with this organization. In this article I will try to capture some unique features that characterize the intersection of the crystallographic and the chemical domains, mostly as they pertain to the Indian contribution to this subject. Crystallography is of course is as old as chemistry itself, and some would say it is even older. The relationships between chemistry and crystallography go back to much before the discovery of diffraction of X-rays by crystals.The discovery of polymorphism by Mitscherlisch in 1822, Haüy’s formulation of the molecule integrante, and the work of Fedorov and Groth on the identification of crystals from their morphology alone, are well known examples of such relationships.A very early article by Tutton speaks of “crystallo-chemical analysis”. In this article, I shall, however, be dealing with the interplay of chemistry and crystallography only in the post diffraction era, that is, after 1912. Much had been written and said about chemical crystallography, and even within the context of the present special issue, there is a review of chemical crystallography in India including some futuristic trends. This topic was also reviewed by Nangia in a special publication brought out by Indian Academy of Sciences in 2009,and by Desiraju in a special publication brought out by the Indian National Science Academy in 2010. A rather detailed account of crystallography in India appeared in 2007 in the newsletter of the International Union of Crystallography (IUCr) in which chemical crystallography was detailed. Since all these publications are fairly recent there is little need for me to attempt a comprehensive coverage of chemical crystallography in India in this short review
Resumo:
We revisit the constraints on the parameter space of the Minimal Supersymmetric Standard Model (MSSM), from charge and color breaking minima in the light of information on the Higgs from the LHC so far. We study the behavior of the scalar potential keeping two light sfermion fields along with the Higgs in the pMSSM framework and analyze the stability of the vacuum. We find that for lightest stops a parts per thousand(2) 1 TeV and small mu a parts per thousand(2) 500 GeV, the absolute stability of the potential can be attained only for . The bounds become stronger for larger values of the mu parameter. Note that this is approximately the value of Xt which maximizes the Higgs mass. Our bounds on the low scale MSSM parameters are more stringent than those reported earlier in literature. We reanalyze the stau sector as well, keeping both staus. We study the connections between the observed Higgs rates and vacuum (meta)stability. We show how a precision study of the ratio of signal strengths, (mu (gamma gamma) /mu (ZZ) ) can shed further light.
Resumo:
In eubacteria, RecA is essential for recombinational DNA repair and for stalled replication forks to resume DNA synthesis. Recent work has implicated a role for RecA in the development of antibiotic resistance in pathogenic bacteria. Consequently, our goal is to identify and characterize small-molecule inhibitors that target RecA both in vitro and in vivo. We employed ATPase, DNA strand exchange and LexA cleavage assays to elucidate the inhibitory effects of suramin on Mycobacterium tuberculosis RecA. To gain insights into the mechanism of suramin action, we directly visualized the structure of RecA nucleoprotein filaments by atomic force microscopy. To determine the specificity of suramin action in vivo, we investigated its effect on the SOS response by pull-down and western blot assays as well as for its antibacterial activity. We show that suramin is a potent inhibitor of DNA strand exchange and ATPase activities of bacterial RecA proteins with IC50 values in the low micromolar range. Additional evidence shows that suramin inhibits RecA-catalysed proteolytic cleavage of the LexA repressor. The mechanism underlying such inhibitory actions of suramin involves its ability to disassemble RecA-single-stranded DNA filaments. Notably, suramin abolished ciprofloxacin-induced recA gene expression and the SOS response and augmented the bactericidal action of ciprofloxacin. Our findings suggest a strategy to chemically disrupt the vital processes controlled by RecA and hence the promise of small molecules for use against drug-susceptible as well as drug-resistant strains of M. tuberculosis for better infection control and the development of new therapies.
Resumo:
Indian civilization developed a strong system of traditional medicine and was one of the first nations to develop a synthetic drug. In the postindependence era, Indian pharmaceutical industry developed a strong base for production of generic drugs. Challenges for the future are to give its traditional medicine a strong scientific base and develop research and clinical capability to consistently produce new drugs based on advances in modem biological sciences.
Resumo:
The Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando) uses similarity of compound-proteome interaction signatures to infer homology of compound/drug behavior. We constructed interaction signatures for 3733 human ingestible compounds covering 48,278 protein structures mapping to 2030 indications based on basic science methodologies to predict and analyze protein structure, function, and interactions developed by us and others. Our signature comparison and ranking approach yielded benchmarking accuracies of 12-25% for 1439 indications with at least two approved compounds. We prospectively validated 49/82 `high value' predictions from nine studies covering seven indications, with comparable or better activity to existing drugs, which serve as novel repurposed therapeutics. Our approach may be generalized to compounds beyond those approved by the FDA, and can also consider mutations in protein structures to enable personalization. Our platform provides a holistic multiscale modeling framework of complex atomic, molecular, and physiological systems with broader applications in medicine and engineering.