Background: Temporal analysis of gene expression data has been limited to identifying genes whose expression varies with time and/or correlation between genes that have similar temporal profiles. Often, the methods do not consider the underlying network constraints that connect the genes. It is becoming increasingly evident that interactions change substantially with time. Thus far, there is no systematic method to relate the temporal changes in gene expression to the dynamics of interactions between them. Information on interaction dynamics would open up possibilities for discovering new mechanisms of regulation by providing valuable insight into identifying time-sensitive interactions as well as permit studies on the effect of a genetic perturbation. Results: We present NETGEM, a tractable model rooted in Markov dynamics, for analyzing the dynamics of the interactions between proteins based on the dynamics of the expression changes of the genes that encode them. The model treats the interaction strengths as random variables which are modulated by suitable priors. This approach is necessitated by the extremely small sample size of the datasets, relative to the number of interactions. The model is amenable to a linear time algorithm for efficient inference. Using temporal gene expression data, NETGEM was successful in identifying (i) temporal interactions and determining their strength, (ii) functional categories of the actively interacting partners and (iii) dynamics of interactions in perturbed networks. Conclusions: NETGEM represents an optimal trade-off between model complexity and data requirement. It was able to deduce actively interacting genes and functional categories from temporal gene expression data. It permits inference by incorporating the information available in perturbed networks. Given that the inputs to NETGEM are only the network and the temporal variation of the nodes, this algorithm promises to have widespread applications, beyond biological systems. The source code for NETGEM is available from https://github.com/vjethava/NETGEM


We study a State Dependent Attempt Rate (SDAR) approximation to model M queues (one queue per node) served by the Carrier Sense Multiple Access with Collision Avoidance (CSMA/CA) protocol as standardized in the IEEE 802.11 Distributed Coordination Function (DCF). The approximation is that, when n of the M queues are non-empty, the (transmission) attempt probability of each of the n non-empty nodes is given by the long-term (transmission) attempt probability of n saturated nodes. With the arrival of packets into the M queues according to independent Poisson processes, the SDAR approximation reduces a single cell with non-saturated nodes to a Markovian coupled queueing system. We provide a sufficient condition under which the joint queue length Markov chain is positive recurrent. For the symmetric case of equal arrival rates and finite and equal buffers, we develop an iterative method which leads to accurate predictions for important performance measures such as collision probability, throughput and mean packet delay. We replace the MAC layer with the SDAR model of contention by modifying the NS-2 source code pertaining to the MAC layer, keeping all other layers unchanged. By this model-based simulation technique at the MAC layer, we achieve speed-ups (w.r.t. MAC layer operations) up to 5.4. Through extensive model-based simulations and numerical results, we show that the SDAR model is an accurate model for the DCF MAC protocol in single cells. (C) 2012 Elsevier B.V. All rights reserved.


Knowledge of protein-ligand interactions is essential to understand several biological processes and important for applications ranging from understanding protein function to drug discovery and protein engineering. Here, we describe an algorithm for the comparison of three-dimensional ligand-binding sites in protein structures. A previously described algorithm, PocketMatch (version 1.0) is optimised, expanded, and MPI-enabled for parallel execution. PocketMatch (version 2.0) rapidly quantifies binding-site similarity based on structural descriptors such as residue nature and interatomic distances. Atomic-scale alignments may also be obtained from amino acid residue pairings generated. It allows an end-user to compute database-wide, all-to-all comparisons in a matter of hours. The use of our algorithm on a sample dataset, performance-analysis, and annotated source code is also included.


Atomization is the process of disintegration of a liquid jet into ligaments and subsequently into smaller droplets. A liquid jet injected from a circular orifice into cross flow of air undergoes atomization primarily due to the interaction of the two phases rather than an intrinsic break up. Direct numerical simulation of this process resolving the finest droplets is computationally very expensive and impractical. In the present study, we resort to multiscale modelling to reduce the computational cost. The primary break up of the liquid jet is simulated using Gerris, an open source code, which employs Volume-of-Fluid (VOF) algorithm. The smallest droplets formed during primary atomization are modeled as Lagrangian particles. This one-way coupling approach is validated with the help of the simple test case of tracking a particle in a Taylor-Green vortex. The temporal evolution of the liquid jet forming the spray is captured and the flattening of the cylindrical liquid column prior to breakup is observed. The size distribution of the resultant droplets is presented at different distances downstream from the location of injection and their spatial evolution is analyzed.


Esta tese tem por objetivo propor uma estratégia de obtenção automática de parâmetros hidrodinâmicos e de transporte através da solução de problemas inversos. A obtenção dos parâmetros de um modelo físico representa um dos principais problemas em sua calibração, e isso se deve em grande parte à dificuldade na medição em campo desses parâmetros. Em particular na modelagem de rios e estuários, a altura da rugosidade e o coeficiente de difusão turbulenta representam dois dos parâmetros com maior dificuldade de medição. Nesta tese é apresentada uma técnica automatizada de estimação desses parâmetros através deum problema inverso aplicado a um modelo do estuário do rio Macaé, localizado no norte do Rio de Janeiro. Para este estudo foi utilizada a plataforma MOHID, desenvolvida na Universidade Técnica de Lisboa, e que tem tido ampla aplicação na simulação de corpos hídricos. Foi realizada uma análise de sensibilidade das respostas do modelo com relação aos parâmetros de interesse. Verificou-se que a salinidade é uma variável sensível a ambos parâmetros. O problema inverso foi então resolvido utilizando vários métodos de otimização através do acoplamento da plataforma MOHID a códigos de otimização implementados em Fortran. O acoplamento foi realizado de forma a não alterar o código fonte do MOHID, possibilitando a utilização da ferramenta computacional aqui desenvolvida em qualquer versão dessa plataforma, bem como a sua alteração para o uso com outros simuladores. Os testes realizados confirmam a eficiência da técnica e apontam as melhores abordagens para uma rápida e precisa estimação dos parâmetros.


[ES]Este proyecto consiste en obtener un mayor control por parte del usuario a nivel de red en entornos con máquinas virtuales creadas a partir de la plataforma OpenStack. Cada vez que se arranca o inicia una máquina virtual en OpenStack, los parámetros de red se asignan por defecto, haciendo muy difícil su gestión y control tanto para investigación como para mantenimiento. Si estos parámetros siguieran un mismo patrón para cada proyecto o usuario sería mucho más sencillo tener controlado cada interfaz de red, pudiendo así gestionarlos de una manera más eficiente. Para realizar esta tarea será necesario introducir unos cambios en el código fuente de OpenStack, adaptándolo para que cumpla con nuestros requerimientos.


Os recentes avanços tecnológicos fizeram aumentar o nível de qualificação do pesquisador em epidemiologia. A importância do papel estratégico da educação não pode ser ignorada. Todavia, a Associação Brasileira de Pós-graduação em Saúde Coletiva (ABRASCO), no seu último plano diretor (2005-2009), aponta uma pequena valorização na produção de material didático-pedagógico e, ainda, a falta de uma política de desenvolvimento e utilização de software livre no ensino da epidemiologia. É oportuno, portanto, investir em uma perspectiva relacional, na linha do que a corrente construtivista propõe, uma vez que esta teoria tem sido reconhecida como a mais adequada no desenvolvimento de materiais didáticos informatizados. Neste sentido, promover cursos interativos e, no bojo destes, desenvolver material didático conexo é oportuno e profícuo. No âmbito da questão política de desenvolvimento e utilização de software livre no ensino da epidemiologia, particularmente em estatística aplicada, o R tem se mostrado um software de interesse emergente. Ademais, não só porque evita possíveis penalizações por utilização de software comercial sem licença, mas também porque o franco acesso aos códigos e programação o torna uma ferramenta excelente para a elaboração de material didático em forma de hiperdocumentos, importantes alicerces para uma tão desejada interação docentediscente em sala de aula. O principal objetivo é desenvolver material didático em R para os cursos de bioestatística aplicada à análise epidemiológica. Devido a não implementação de certas funções estatísticas no R, também foi incluída a programação de funções adicionais. Os cursos empregados no desenvolvimento desse material fundamentaram-se nas disciplinas Uma introdução à Plataforma R para Modelagem Estatística de Dados e Instrumento de Aferição em Epidemiologia I: Teoria Clássica de Medidas (Análise) vinculadas ao departamento de Epidemiologia, Instituto de Medicina Social (IMS) da Universidade do Estado do Rio de Janeiro (UERJ). A base teórico-pedagógica foi definida a partir dos princípios construtivistas, na qual o indivíduo é agente ativo e crítico de seu próprio conhecimento, construindo significados a partir de experiências próprias. E, à ótica construtivista, seguiu-se a metodologia de ensino da problematização, abrangendo problemas oriundos de situações reais e sistematizados por escrito. Já os métodos computacionais foram baseados nas Novas Tecnologias da Informação e Comunicação (NTIC). As NTICs exploram a busca pela consolidação de currículos mais flexíveis, adaptados às características diferenciadas de aprendizagem dos alunos. A implementação das NTICs foi feita através de hipertexto, que é uma estrutura de textos interligados por nós ou vínculos (links), formando uma rede de informações relacionadas. Durante a concepção do material didático, foram realizadas mudanças na interface básica do sistema de ajuda do R para garantir a interatividade aluno-material. O próprio instrutivo é composto por blocos, que incentivam a discussão e a troca de informações entre professor e alunos.


We present a new software framework for the implementation of applications that use stencil computations on block-structured grids to solve partial differential equations. A key feature of the framework is the extensive use of automatic source code generation which is used to achieve high performance on a range of leading multi-core processors. Results are presented for a simple model stencil running on Intel and AMD CPUs as well as the NVIDIA GT200 GPU. The generality of the framework is demonstrated through the implementation of a complete application consisting of many different stencil computations, taken from the field of computational fluid dynamics. © 2010 IEEE.


随着硬件性能的不断提升,计算机正在被赋予越来越艰巨的任务,运行其上的软件作为沟通人类思维和底层硬件的桥梁,其重要性日益增加。与此同时,软件系统的规模也在不断变大,所涉及的逻辑也更为复杂,这导致开发人员难免会由于疏漏在软件设计实现的过程中引入缺陷、埋下隐患。所以,如何检验、确保软件的属性就成为时下一个亟待解决的热点问题。而在此背景下,源代码静态分析技术由于恰好可以弥补现有测试方法的不足,已经开始在这一研究领域崭露头角。有鉴于此,本文为了推进安全信息系统的研发,分别围绕源代码静态分析技术在软件属性保障中两个最主要的应用场景展开研究,涉及高等级安全操作系统开发过程中的源代码自动化审计,以及分布式信息系统中平台间互信建立时针对软件属性所进行的远程验证,其中,前者是为从深度上将现有安全操作系统向更高等级推进提供助力,而后者是为了从广度上将信息安全领域现有的围绕单机平台的研究成果向分布式架构推广建立基础。具体来说,本文选择针对编程接口规范的一致性检验和应用静态分析的软件属性远程验证作为研究的切入点,探讨了应用源代码静态分析技术检验、确保软件属性的方法和用途,主要取得以下几个方面的成果: 第一,本文给出了一个基于值等价类的别名分析方法。该方法依据相关的传值操作维护一个值等价类空间,可以在编程接口规范一致性检验的过程中按需推导变量符号间的等值关系,不仅有能力支持上下文相关、路径相关的全局分析,还可以有效应对C代码中因结构、指针等构件所衍生出来的大量变量符号。 第二,针对大部分现有代码静态分析工具分析规模受限的问题,本文围绕编程接口规范的一致性检验给出了可以与别名分析有效结合的性能优化方案。该方案不仅能通过剔除与分析无关的执行分支和引入缓存机制提高分析效率,还可以尽量确保分析的准确性少受影响。 第三,我们设计、实现了一个C代码静态分析工具ABAZER(A Bug AnalyZER)。该工具可以依据用户使用有限自动机模型描述的编程接口规范,对操作系统内核级别的软件进行全局分析,指出代码中可能有悖于规范的部分。我们使用ABAZER实际考查了FreeBSD内核中锁机制以及GCC 4.x中库GNU Libiberty的使用情况,从中发现了若干真实的缺陷。 第四,本文针对现有应用可信计算技术、基于完整性信息进行远程验证的方案在灵活性和实用性上所存在的不足,给出一个扩展方案。该方案通过引入虚拟机技术,在软件构建过程中收集举证信息,应用静态分析方法分析软件功能模块间的相关性,划分出与验证相关的模块,有效控制用户定制软件验证时所要依赖的可信列表的规模,使其有能力适应当今网络环境中的大量异质平台和各种安全需求。此外,它还可以为自身所依赖的可信计算基的替换和更新提供支持。 第五,本文针对Flask架构的特点,给出了一个既能检验强制访问控制实现正确性,又能最大限度保留软件灵活性、使得用户可以在一定程度上对软件进行定制的远程验证方案。该方案依赖源代码静态分析技术界定软件中无需基于完整性进行验证的模块,在进一步缩减可信列表规模的同时,使用代码改写技术在这些模块中自动化地插入监控代码约束软件的动态行为,以达到确保强制访问控制实现正确性的目的。该方案初步展现了源代码静态分析技术在远程验证中广阔的应用前景。


随着互联网的飞速发展,网络拥塞已经成为一个十分重要的问题,网络仿真是一种检测拥塞控制算法有效性的常用方法.该文给出了一种开放源代码的网络仿真器NS2(Network Simulator V2)的原理与实现.首先比较了四种不同仿真器的优缺点,然后详细描述了NS2的模块组成、工作环境、主代码结构以及扩展方法等,最后用RED(Random EarlyDetection)队列调度和移动IP数据传输两个典型实例说明了NS2的应用价值.


In the first part of this paper we reviewed the fingerprint classification literature from two different perspectives: the feature extraction and the classifier learning. Aiming at answering the question of which among the reviewed methods would perform better in a real implementation we end up in a discussion which showed the difficulty in answering this question. No previous comparison exists in the literature and comparisons among papers are done with different experimental frameworks. Moreover, the difficulty in implementing published methods was stated due to the lack of details in their description, parameters and the fact that no source code is shared. For this reason, in this paper we will go through a deep experimental study following the proposed double perspective. In order to do so, we have carefully implemented some of the most relevant feature extraction methods according to the explanations found in the corresponding papers and we have tested their performance with different classifiers, including those specific proposals made by the authors. Our aim is to develop an objective experimental study in a common framework, which has not been done before and which can serve as a baseline for future works on the topic. This way, we will not only test their quality, but their reusability by other researchers and will be able to indicate which proposals could be considered for future developments. Furthermore, we will show that combining different feature extraction models in an ensemble can lead to a superior performance, significantly increasing the results obtained by individual models.


This article describes advances in statistical computation for large-scale data analysis in structured Bayesian mixture models via graphics processing unit (GPU) programming. The developments are partly motivated by computational challenges arising in fitting models of increasing heterogeneity to increasingly large datasets. An example context concerns common biological studies using high-throughput technologies generating many, very large datasets and requiring increasingly high-dimensional mixture models with large numbers of mixture components.We outline important strategies and processes for GPU computation in Bayesian simulation and optimization approaches, give examples of the benefits of GPU implementations in terms of processing speed and scale-up in ability to analyze large datasets, and provide a detailed, tutorial-style exposition that will benefit readers interested in developing GPU-based approaches in other statistical models. Novel, GPU-oriented approaches to modifying existing algorithms software design can lead to vast speed-up and, critically, enable statistical analyses that presently will not be performed due to compute time limitations in traditional computational environments. Supplementalmaterials are provided with all source code, example data, and details that will enable readers to implement and explore the GPU approach in this mixture modeling context. © 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America.


BACKGROUND: Computer simulations are of increasing importance in modeling biological phenomena. Their purpose is to predict behavior and guide future experiments. The aim of this project is to model the early immune response to vaccination by an agent based immune response simulation that incorporates realistic biophysics and intracellular dynamics, and which is sufficiently flexible to accurately model the multi-scale nature and complexity of the immune system, while maintaining the high performance critical to scientific computing. RESULTS: The Multiscale Systems Immunology (MSI) simulation framework is an object-oriented, modular simulation framework written in C++ and Python. The software implements a modular design that allows for flexible configuration of components and initialization of parameters, thus allowing simulations to be run that model processes occurring over different temporal and spatial scales. CONCLUSION: MSI addresses the need for a flexible and high-performing agent based model of the immune system.


MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.