951 resultados para Computational tools
Resumo:
Mass spectrometry (MS) became a standard tool for identifying metabolites in biological tissues, and metabolomics is slowly acknowledged as a legitimate research discipline for characterizing biological conditions. The computational analyses of metabolomics, however, lag behind compared with the rapid advances in analytical aspects for two reasons. First is the lack of standardized data repository for mass spectra: each research institution is flooded with gigabytes of mass-spectral data from its own analytical groups and cannot host a world-class repository for mass spectra. The second reason is the lack of informatics experts that are fully experienced with spectral analyses. The two barriers must be overcome to establish a publicly free data server for MS analysis in metabolomics as does GenBank in genomics and UniProt in proteomics. The workshop brought together bioinformaticians working on mass spectral analyses in Finland and Japan with the goal to establish a consortium to freely exchange and publicize mass spectra of metabolites measured on various platforms computational tools to analyze spectra spectral knowledge that are computationally predicted from standardized data. This book contains the abstracts of the presentations given in the workshop. The programme of the workshop consisted of oral presentations from Japan and Finland, invited lectures from Steffen Neumann (Leibniz Institute of Plant Biochemistry), Matej Oresic (VTT), Merja Penttila (VTT) and Nicola Zamboni (ETH Zurich) as well as free form discussion among the participants. The event was funded by Academy of Finland (grants 139203 and 118653), Japan Society for the Promotion of Science (JSPS Japan-Finland Bilateral Semi- nar Program 2010) and Department of Computer Science University of Helsinki. We would like to thank all the people contributing to the technical pro- gramme and the sponsors for making the workshop possible. Helsinki, October 2010 Masanori Arita, Markus Heinonen and Juho Rousu
Resumo:
This paper presents the preliminary analysis of Kannada WordNet and the set of relevant computational tools. Although the design has been inspired by the famous English WordNet, and to certain extent, by the Hindi WordNet, the unique features of Kannada WordNet are graded antonyms and meronymy relationships, nominal as well as verbal compoundings, complex verb constructions and efficient underlying database design (designed to handle storage and display of Kannada unicode characters). Kannada WordNet would not only add to the sparse collection of machine-readable Kannada dictionaries, but also will give new insights into the Kannada vocabulary. It provides sufficient interface for applications involved in Kannada machine translation, spell checker and semantic analyser.
Resumo:
Layer-wise, distance-dependent orientational relaxation of water confined in reverse micelles (RM) is studied using theoretical and computational tools. We use both a newly constructed ``spins on a ring'' (SOR) Ising-type model (with Shore-Zwanzig rotational dynamics) and atomistic simulations with explicit water. Our study explores the effect of reverse micelle size and role of intermolecular correlations, compromised by the presence of a highly polar surface, on the distance (from the interface) dependence of water relaxation. The ``spins on a ring'' model can capture some aspects of distance dependence of relaxation, such as acceleration of orientational relaxation at intermediate layers. In atomistic simulations, layer-wise decomposition of hydrogen bond formation pattern clearly reveals that hydrogen bond arrangement of water at a certain distance away from the surface can remain frustrated due to the interaction with the polar surface head groups. This layer-wise analysis also reveals the presence of a non-monotonic slow relaxation component which can be attributed to this frustration effect and which is accentuated in small to intermediate size RMs. For large size RMs, the long time component decreases monotonically from the interface to the interior of the RMs with slowest relaxation observed at the interface. (C) 2012 American Institute of Physics. http://dx.doi.org/10.1063/1.4732095]
Resumo:
This article is concerned with the evolution of haploid organisms that reproduce asexually. In a seminal piece of work, Eigen and coauthors proposed the quasispecies model in an attempt to understand such an evolutionary process. Their work has impacted antiviral treatment and vaccine design strategies. Yet, predictions of the quasispecies model are at best viewed as a guideline, primarily because it assumes an infinite population size, whereas realistic population sizes can be quite small. In this paper we consider a population genetics-based model aimed at understanding the evolution of such organisms with finite population sizes and present a rigorous study of the convergence and computational issues that arise therein. Our first result is structural and shows that, at any time during the evolution, as the population size tends to infinity, the distribution of genomes predicted by our model converges to that predicted by the quasispecies model. This justifies the continued use of the quasispecies model to derive guidelines for intervention. While the stationary state in the quasispecies model is readily obtained, due to the explosion of the state space in our model, exact computations are prohibitive. Our second set of results are computational in nature and address this issue. We derive conditions on the parameters of evolution under which our stochastic model mixes rapidly. Further, for a class of widely used fitness landscapes we give a fast deterministic algorithm which computes the stationary distribution of our model. These computational tools are expected to serve as a framework for the modeling of strategies for the deployment of mutagenic drugs.
Resumo:
The influence of geometric parameters, such as blade profile and hub geometry on axial flow turbines for micro hydro application remains poorly characterized. This paper first introduces a holistic theoretical model for studying the hydraulic phenomenon resulting from geometric modification to the blades. It then describes modification carried out on two runner stages, of which one has untwisted blades and the other has twisted blades obtained by modifying the inlet hub. The experimental results showed that the performance of the untwisted blade runner was satisfactory with a maximum efficiency of 68%. However, positive effects of twisted blades were clearly evident with an efficiency rise of more than 2%. This study also looks into the possible limitations of the model and suggests the extension of the experimental work and the use of computational tools to conduct a progressive validation of all experimental findings, especially on the flow physics within the hub region and the slip phenomena. The paper finally underlines the importance of developing a standardization philosophy for axial flow turbines specific for micro hydro requirements. DOI:10.1061/(ASCE)EY.1943-7897.0000060. (C) 2012 American Society of Civil Engineers.
Resumo:
Full-length and truncated linear plug nozzle flowfields have been analyzed, using both experimental and computational tools, for pressure ratios ranging from 5 to 72, which include the transition of an open base wake to a closed base wake. A good agreement has been found between computational and experimental results on the plug surface. Considering the deficiencies of the computational tools in predicting base flows associated with truncated plug nozzles, an engineering model to predict the wake structure transition in such flows is proposed. The utility of this model in conjunction with empirical tools for the closed-wake base pressure prediction is established. The model is validated against the experimental results available in open literature.
Resumo:
The flowfields associated with truncated annular plug nozzles of varying lengths are studied both experimentally and using computational tools. The nozzles are designed to observe wake structure transition for the range of pressure ratios considered. A classification of the open wake regime is proposed for comparing and analyzing the plug flowfields. The three-dimensional relief experienced by the annular plug flow leads to greater wave interactions on the plug surface as compared with linear plug flow, resulting in a delayed transition of the base wake. The Reynolds averaged Navier-Stokes based solvers employed in the studies could predict the plug surface flow accurately, whereas they exhibited limitations with regard to plug base flow predictions. Based on the experimental data generated, an empirical model for predicting closed wake base pressure is proposed and compared with other models available in literature.
Resumo:
Most of the biological processes are governed through specific protein-ligand interactions. Discerning different components that contribute toward a favorable protein-ligand interaction could contribute significantly toward better understanding protein function, rationalizing drug design and obtaining design principles for protein engineering. The Protein Data Bank (PDB) currently hosts the structure of similar to 68 000 protein-ligand complexes. Although several databases exist that classify proteins according to sequence and structure, a mere handful of them annotate and classify protein-ligand interactions and provide information on different attributes of molecular recognition. In this study, an exhaustive comparison of all the biologically relevant ligand-binding sites (84 846 sites) has been conducted using PocketMatch: a rapid, parallel, in-house algorithm. PocketMatch quantifies the similarity between binding sites based on structural descriptors and residue attributes. A similarity network was constructed using binding sites whose PocketMatch scores exceeded a high similarity threshold (0.80). The binding site similarity network was clustered into discrete sets of similar sites using the Markov clustering (MCL) algorithm. Furthermore, various computational tools have been used to study different attributes of interactions within the individual clusters. The attributes can be roughly divided into (i) binding site characteristics including pocket shape, nature of residues and interaction profiles with different kinds of atomic probes, (ii) atomic contacts consisting of various types of polar, hydrophobic and aromatic contacts along with binding site water molecules that could play crucial roles in protein-ligand interactions and (iii) binding energetics involved in interactions derived from scoring functions developed for docking. For each ligand-binding site in each protein in the PDB, site similarity information, clusters they belong to and description of site attributes are provided as a relational database-protein-ligand interaction clusters (PLIC).
Resumo:
Sequential Monte Carlo methods, also known as particle methods, are a widely used set of computational tools for inference in non-linear non-Gaussian state-space models. In many applications it may be necessary to compute the sensitivity, or derivative, of the optimal filter with respect to the static parameters of the state-space model; for instance, in order to obtain maximum likelihood model parameters of interest, or to compute the optimal controller in an optimal control problem. In Poyiadjis et al. [2011] an original particle algorithm to compute the filter derivative was proposed and it was shown using numerical examples that the particle estimate was numerically stable in the sense that it did not deteriorate over time. In this paper we substantiate this claim with a detailed theoretical study. Lp bounds and a central limit theorem for this particle approximation of the filter derivative are presented. It is further shown that under mixing conditions these Lp bounds and the asymptotic variance characterized by the central limit theorem are uniformly bounded with respect to the time index. We demon- strate the performance predicted by theory with several numerical examples. We also use the particle approximation of the filter derivative to perform online maximum likelihood parameter estimation for a stochastic volatility model.
Resumo:
Sequential Monte Carlo (SMC) methods are a widely used set of computational tools for inference in non-linear non-Gaussian state-space models. We propose a new SMC algorithm to compute the expectation of additive functionals recursively. Essentially, it is an on-line or "forward only" implementation of a forward filtering backward smoothing SMC algorithm proposed by Doucet, Godsill and Andrieu (2000). Compared to the standard \emph{path space} SMC estimator whose asymptotic variance increases quadratically with time even under favorable mixing assumptions, the non asymptotic variance of the proposed SMC estimator only increases linearly with time. We show how this allows us to perform recursive parameter estimation using an SMC implementation of an on-line version of the Expectation-Maximization algorithm which does not suffer from the particle path degeneracy problem.
Resumo:
Sequential Monte Carlo (SMC) methods are popular computational tools for Bayesian inference in non-linear non-Gaussian state-space models. For this class of models, we propose SMC algorithms to compute the score vector and observed information matrix recursively in time. We propose two different SMC implementations, one with computational complexity $\mathcal{O}(N)$ and the other with complexity $\mathcal{O}(N^{2})$ where $N$ is the number of importance sampling draws. Although cheaper, the performance of the $\mathcal{O}(N)$ method degrades quickly in time as it inherently relies on the SMC approximation of a sequence of probability distributions whose dimension is increasing linearly with time. In particular, even under strong \textit{mixing} assumptions, the variance of the estimates computed with the $\mathcal{O}(N)$ method increases at least quadratically in time. The $\mathcal{O}(N^{2})$ is a non-standard SMC implementation that does not suffer from this rapid degrade. We then show how both methods can be used to perform batch and recursive parameter estimation.
Resumo:
The main focus of this thesis is the use of high-throughput sequencing technologies in functional genomics (in particular in the form of ChIP-seq, chromatin immunoprecipitation coupled with sequencing, and RNA-seq) and the study of the structure and regulation of transcriptomes. Some parts of it are of a more methodological nature while others describe the application of these functional genomic tools to address various biological problems. A significant part of the research presented here was conducted as part of the ENCODE (ENCyclopedia Of DNA Elements) Project.
The first part of the thesis focuses on the structure and diversity of the human transcriptome. Chapter 1 contains an analysis of the diversity of the human polyadenylated transcriptome based on RNA-seq data generated for the ENCODE Project. Chapter 2 presents a simulation-based examination of the performance of some of the most popular computational tools used to assemble and quantify transcriptomes. Chapter 3 includes a study of variation in gene expression, alternative splicing and allelic expression bias on the single-cell level and on a genome-wide scale in human lymphoblastoid cells; it also brings forward a number of critical to the practice of single-cell RNA-seq measurements methodological considerations.
The second part presents several studies applying functional genomic tools to the study of the regulatory biology of organellar genomes, primarily in mammals but also in plants. Chapter 5 contains an analysis of the occupancy of the human mitochondrial genome by TFAM, an important structural and regulatory protein in mitochondria, using ChIP-seq. In Chapter 6, the mitochondrial DNA occupancy of the TFB2M transcriptional regulator, the MTERF termination factor, and the mitochondrial RNA and DNA polymerases is characterized. Chapter 7 consists of an investigation into the curious phenomenon of the physical association of nuclear transcription factors with mitochondrial DNA, based on the diverse collections of transcription factor ChIP-seq datasets generated by the ENCODE, mouseENCODE and modENCODE consortia. In Chapter 8 this line of research is further extended to existing publicly available ChIP-seq datasets in plants and their mitochondrial and plastid genomes.
The third part is dedicated to the analytical and experimental practice of ChIP-seq. As part of the ENCODE Project, a set of metrics for assessing the quality of ChIP-seq experiments was developed, and the results of this activity are presented in Chapter 9. These metrics were later used to carry out a global analysis of ChIP-seq quality in the published literature (Chapter 10). In Chapter 11, the development and initial application of an automated robotic ChIP-seq (in which these metrics also played a major role) is presented.
The fourth part presents the results of some additional projects the author has been involved in, including the study of the role of the Piwi protein in the transcriptional regulation of transposon expression in Drosophila (Chapter 12), and the use of single-cell RNA-seq to characterize the heterogeneity of gene expression during cellular reprogramming (Chapter 13).
The last part of the thesis provides a review of the results of the ENCODE Project and the interpretation of the complexity of the biochemical activity exhibited by mammalian genomes that they have revealed (Chapters 15 and 16), an overview of the expected in the near future technical developments and their impact on the field of functional genomics (Chapter 14), and a discussion of some so far insufficiently explored research areas, the future study of which will, in the opinion of the author, provide deep insights into many fundamental but not yet completely answered questions about the transcriptional biology of eukaryotes and its regulation.
Resumo:
O comportamento dos radionuclídeos no solo pode variar de acordo com sua interação com os elementos que compões este solo. O fator de transferência (FT) é o parâmetro que descreve a interação que ocorre entre o solo e as plantas para um determinado radionuclídeo, observando que este solo tem propriedades químicas e físicas que favorecem seu crescimento. Através de ferramentas computacionais e baseado em solos extremos, com o FT conhecidos na literatura e nos parâmetros de solo que interferem no comportamento de 137Cs (como K trocável, capacidade de troca catiônica e pH), este trabalho visa aplicar técnicas de geoprocessamento para a criação de um mapa de vulnerabilidade de solos ao 137Cs e sua automação. Este estudo mostra que o uso de técnicas de geoprocessamento visando o mapeamento da vulnerabilidade ao 137Cs pode ser uma ferramenta importante para o planejamento de ações de emergência em áreas rurais, a identificação de áreas risco à contaminação radioativa, na escolha de ações corretivas adequadas, bem como no suporte a criação de políticas públicas.
Resumo:
As ferramentas computacionais estão apoiando, de maneira crescente, o processo de ensino e aprendizagem em diversas áreas. Elas aumentam as possibilidades do docente para ministrar um conteúdo e interagir com seus alunos. Neste grupo de ferramentas estão as simulações baseadas em sistemas multiagentes. Neste contexto, este trabalho tem como objetivo apresentar um ambiente de simulação do crescimento populacional de uma colmeia para o ensino de Biologia. As variáveis do sistema podem ser alteradas visando analisar os diferentes resultados obtidos. Aspectos como duração e tempo da florada das plantações, conhecidos como campos de flores, podem ser manipulados pelo aluno. A abordagem multiagentes em Inteligência Artificial Distribuída foi a solução escolhida, para que o controle das atividades do aplicativo fosse feito de maneira automatizada. A Realidade Virtual foi utilizada para acrescentar aspectos importantes do processo que não podem ser visualizados pela simulação matemática. Uma síntese da utilização de tecnologias na educação, em especial da Informática, é discutida no trabalho. Aspectos da aplicação no ensino de Biologia são apresentados, assim como resultados iniciais de sua utilização.
Resumo:
Somente no ano de 2011 foram adquiridos mais de 1.000TB de novos registros digitais de imagem advindos de Sensoriamento Remoto orbital. Tal gama de registros, que possui uma progressão geométrica crescente, é adicionada, anualmente, a incrível e extraordinária massa de dados de imagens orbitais já existentes da superfície da Terra (adquiridos desde a década de 70 do século passado). Esta quantidade maciça de registros, onde a grande maioria sequer foi processada, requer ferramentas computacionais que permitam o reconhecimento automático de padrões de imagem desejados, de modo a permitir a extração dos objetos geográficos e de alvos de interesse, de forma mais rápida e concisa. A proposta de tal reconhecimento ser realizado automaticamente por meio da integração de técnicas de Análise Espectral e de Inteligência Computacional com base no Conhecimento adquirido por especialista em imagem foi implementada na forma de um integrador com base nas técnicas de Redes Neurais Computacionais (ou Artificiais) (através do Mapa de Características Auto- Organizáveis de Kohonen SOFM) e de Lógica Difusa ou Fuzzy (através de Mamdani). Estas foram aplicadas às assinaturas espectrais de cada padrão de interesse, formadas pelos níveis de quantização ou níveis de cinza do respectivo padrão em cada uma das bandas espectrais, de forma que a classificação dos padrões irá depender, de forma indissociável, da correlação das assinaturas espectrais nas seis bandas do sensor, tal qual o trabalho dos especialistas em imagens. Foram utilizadas as bandas 1 a 5 e 7 do satélite LANDSAT-5 para a determinação de cinco classes/alvos de interesse da cobertura e ocupação terrestre em três recortes da área-teste, situados no Estado do Rio de Janeiro (Guaratiba, Mangaratiba e Magé) nesta integração, com confrontação dos resultados obtidos com aqueles derivados da interpretação da especialista em imagens, a qual foi corroborada através de verificação da verdade terrestre. Houve também a comparação dos resultados obtidos no integrador com dois sistemas computacionais comerciais (IDRISI Taiga e ENVI 4.8), no que tange a qualidade da classificação (índice Kappa) e tempo de resposta. O integrador, com classificações híbridas (supervisionadas e não supervisionadas) em sua implementação, provou ser eficaz no reconhecimento automático (não supervisionado) de padrões multiespectrais e no aprendizado destes padrões, pois para cada uma das entradas dos recortes da área-teste, menor foi o aprendizado necessário para sua classificação alcançar um acerto médio final de 87%, frente às classificações da especialista em imagem. A sua eficácia também foi comprovada frente aos sistemas computacionais testados, com índice Kappa médio de 0,86.