765 resultados para Sentiment Analysis, Opinion Mining, Twitter
Resumo:
Encoding protein 3D structures into 1D string using short structural prototypes or structural alphabets opens a new front for structure comparison and analysis. Using the well-documented 16 motifs of Protein Blocks (PBs) as structural alphabet, we have developed a methodology to compare protein structures that are encoded as sequences of PBs by aligning them using dynamic programming which uses a substitution matrix for PBs. This methodology is implemented in the applications available in Protein Block Expert (PBE) server. PBE addresses common issues in the field of protein structure analysis such as comparison of proteins structures and identification of protein structures in structural databanks that resemble a given structure. PBE-T provides facility to transform any PDB file into sequences of PBs. PBE-ALIGNc performs comparison of two protein structures based on the alignment of their corresponding PB sequences. PBE-ALIGNm is a facility for mining SCOP database for similar structures based on the alignment of PBs. Besides, PBE provides an interface to a database (PBE-SAdb) of preprocessed PB sequences from SCOP culled at 95% and of all-against-all pairwise PB alignments at family and superfamily levels. PBE server is freely available at http://bioinformatics.univ-reunion.fr/ PBE/.
Resumo:
Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.
Resumo:
In order to answer the practically important question of whether the down conductors of lightning protection systems to tall towers and buildings can be electrically isolated from the structure itself, this work is conducted. As a first step in this regard, it is presumed that the down conductor placed on metallic tower will be a pessimistic representation of the actual problem. This opinion was based on the fact that the proximity of heavy metallic structure will have a large damping effect. The post-stroke current distributions along the down conductors and towers, which can be quite different from that in the lightning channel, govern the post-stroke near field and the resulting gradient in the soil. Also, for a reliable estimation of the actual stroke current from the measured down conductor currents, it is essential to know the current distribution characteristics along the down conductors. In view of these, the present work attempts to deduce the post-stroke current and voltage distribution along typical down conductors and towers. A solution of the governing field equations on an electromagnetic model of the system is sought for the investigation. Simulation results providing the spatio-temporal distribution of the post-stroke current and voltage has provided very interesting results. It is concluded that it is almost impossible to achieve electrical isolation between the structure and the down conductor. Furthermore, there will be significant induction into the steel matrix of the supporting structure.
Resumo:
A systematic approach is developed for scaling analysis of momentum, heat and species conservation equations pertaining to the case of solidification of a binary mixture. The problem formulation and description of boundary conditions are kept fairly general, so that a large class of problems can be addressed. Analysis of the momentum equations coupled with phase change considerations leads to the establishment of an advection velocity scale. Analysis of the energy equation leads to an estimation of the solid layer thickness. Different regimes corresponding to different dominant modes of transport are simultaneously identified. A comparative study involving several cases of possible thermal boundary conditions is also performed. Finally, a scaling analysis of the species conservation equation is carried out, revealing the effect of a non-equilibrium solidification model on solute segregation and species distribution. It is shown that non-equilibrium effects result in an enhanced macrosegregation compared with the case of an equilibrium model. For the sake of assessment of the scaling analysis, the predictions are validated against corresponding computational results.
Resumo:
Presented is a thermodynamic feasibility analysis of extracting base metal chlorides fiom low-grade,multimetallic oxide ores using CaClz as a chlorinating agent in the presence of SOz undoz. The oxides react to form corresponding chlorides, while CaClz is converted to CaS04. The Ellingham diagram is usedfor comparing the standard Gibbs' fiee energy chanlpef or the su(fation-chlorinationr eaction of a large number of oxides. Except for alumina, silica and chromia, most of the other metal oxides will be converted to their respective chlorides. The volatile chlorides can be condensed, and the chlorides present in the condensed state can be leached. A process is proposed that uses a nontoxic chlorinating agent and gives an eficient sepurutiort cftlte metallic vuluesfr.om the garlgue.
Resumo:
The distribution of zinc cation between crystallographically nonequivalent positions in ZnFe204 has been determined by anomalous X-ray scattering near the Zn K absorption edge. Measured intensity ratio with two energies close to the edge can be quantitatively explained only by assigning all zinc cations to the tetrahedral position in the approximately cubic close packed array of oxygen ions. A similar conclusion has also been reached for ZnxFe3-x04 solid solutions with x = 0.73, 0.54 and 0.35 employing the improved X-ray method. This is consistent with the EXAFS results which indicate an almost unchanged environmental structure around zinc cation in these solid solutions.
Resumo:
Mining association rules from a large collection of databases is based on two main tasks. One is generation of large itemsets; and the other is finding associations between the discovered large itemsets. Existing formalism for association rules are based on a single transaction database which is not sufficient to describe the association rules based on multiple database environment. In this paper, we give a general characterization of association rules and also give a framework for knowledge-based mining of multiple databases for association rules.
Resumo:
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.
Resumo:
Results from elasto-plastic numerical simulations of jointed rocks using both the equivalent continuum and discrete continuum approaches are presented, and are compared with experimental measurements. Initially triaxial compression tests on different types of rocks with wide variation in the uniaxial compressive strength are simulated using both the approaches and the results are compared. The applicability and relative merits and limitations of both the approaches for the simulation of jointed rocks are discussed. It is observed that both the approaches are reasonably good in predicting the real response. However, the equivalent continuum approach has predicted somewhat higher stiffness values at low strains. Considering the modelling effort involved in case of discrete continuum approach, for problems with complex geometry, it is suggested that a proper equivalent continuum model can be used, without compromising much on the accuracy of the results. Then the numerical analysis of a tunnel in Japan is taken up using the continuum approach. The deformations predicted are compared well against the field measurements and the predictions from discontinuum analysis. (C) 2012 Elsevier Ltd. All rights reserved.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
In several species including the buffalo cow, prostaglandin (PG) F-2 alpha is the key molecule responsible for regression of corpus luteum (CL). Experiments were carried out to characterize gene expression changes in the CL tissue at various time points after administration of luteolytic dose of PGF(2 alpha) in buffalo cows. Circulating progesterone levels decreased within 1 h of PGF(2 alpha) treatment and evidence of apoptosis was demonstrable at 18 h post treatment. Microarray analysis indicated expression changes in several of immediate early genes and transcription factors within 3 h of treatment. Also, changes in expression of genes associated with cell to cell signaling, cytokine signaling, steroidogenesis, PG synthesis and apoptosis were observed. Analysis of various components of LH/CGR signaling in CL tissues indicated decreased LH/CGR protein expression, pCREB levels and PKA activity post PGF(2 alpha) treatment. The novel finding of this study is the down regulation of CYP19A1 gene expression accompanied by decrease in expression of E-2 receptors and circulating and intra luteal E-2 post PGF(2 alpha) treatment. Mining of microarray data revealed several differentially expressed E-2 responsive genes. Since CYP19A1 gene expression is low in the bovine CL, mining of microarray data of PGF(2 alpha)-treated macaques, the species with high luteal CYP19A1 expression, showed good correlation between differentially expressed E-2 responsive genes between both the species. Taken together, the results of this study suggest that PGF(2 alpha) interferes with luteotrophic signaling, impairs intraluteal E-2 levels and regulates various signaling pathways before the effects on structural luteolysis are manifest.
Resumo:
Stranded marine mammals have long attracted public attention. Those that wash up dead are, for all their value to science, seldom seen by the public as more than curiosities. Animals that are sick, injured, orphaned or abandoned ignite a different response. Generally, public sentiment supports any effort to rescue, treat and return them to sea. Institutions displaying marine mammals showed an early interest in live-stranded animals as a source of specimens -- in 1948, Marine Studios in St. Augustine, Florida, rescued a young short-finned pilot whale (Globicephala macrorhynchus), the first ever in captivity (Kritzler 1952). Eventually, the public as well as government agencies looked to these institutions for their recognized expertise in marine mammal care and medicine. More recently, facilities have been established for the sole purpose of rehabilitating marine mammals and preparing them for return to the wild. Four such institutions are the Marine Mammal Center (Sausalito, CA), the Research Institute for Nature Management (Pieterburen, The Netherlands), the RSPCA, Norfolk Wildlife Hospital (Norfolk, United Kingdom) and the Institute for Wildlife Biology of Christian-Albrects University (Kiel, Germany).(PDF contains 68 pages.)
Resumo:
This thesis examines four distinct facets and methods for understanding political ideology, and so it includes four distinct chapters with only moderate connections between them. Chapter 2 examines how reactions to emotional stimuli vary with political opinion, and how the stimuli can produce changes in an individuals political preferences. Chapter 3 examines the connection between self-reported fear and item nonresponse on surveys. Chapter 4 examines the connection between political and moral consistency with low-dimensional ideology, and Chapter 5 develops a technique for estimating ideal points and salience in a low-dimensional ideological space.
Resumo:
194 p.