21 resultados para Source code (Computer science)
Resumo:
To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.
Resumo:
Developing analytical models that can accurately describe behaviors of Internet-scale networks is difficult. This is due, in part, to the heterogeneous structure, immense size and rapidly changing properties of today's networks. The lack of analytical models makes large-scale network simulation an indispensable tool for studying immense networks. However, large-scale network simulation has not been commonly used to study networks of Internet-scale. This can be attributed to three factors: 1) current large-scale network simulators are geared towards simulation research and not network research, 2) the memory required to execute an Internet-scale model is exorbitant, and 3) large-scale network models are difficult to validate. This dissertation tackles each of these problems. ^ First, this work presents a method for automatically enabling real-time interaction, monitoring, and control of large-scale network models. Network researchers need tools that allow them to focus on creating realistic models and conducting experiments. However, this should not increase the complexity of developing a large-scale network simulator. This work presents a systematic approach to separating the concerns of running large-scale network models on parallel computers and the user facing concerns of configuring and interacting with large-scale network models. ^ Second, this work deals with reducing memory consumption of network models. As network models become larger, so does the amount of memory needed to simulate them. This work presents a comprehensive approach to exploiting structural duplications in network models to dramatically reduce the memory required to execute large-scale network experiments. ^ Lastly, this work addresses the issue of validating large-scale simulations by integrating real protocols and applications into the simulation. With an emulation extension, a network simulator operating in real-time can run together with real-world distributed applications and services. As such, real-time network simulation not only alleviates the burden of developing separate models for applications in simulation, but as real systems are included in the network model, it also increases the confidence level of network simulation. This work presents a scalable and flexible framework to integrate real-world applications with real-time simulation.^
Resumo:
The lack of analytical models that can accurately describe large-scale networked systems makes empirical experimentation indispensable for understanding complex behaviors. Research on network testbeds for testing network protocols and distributed services, including physical, emulated, and federated testbeds, has made steady progress. Although the success of these testbeds is undeniable, they fail to provide: 1) scalability, for handling large-scale networks with hundreds or thousands of hosts and routers organized in different scenarios, 2) flexibility, for testing new protocols or applications in diverse settings, and 3) inter-operability, for combining simulated and real network entities in experiments. This dissertation tackles these issues in three different dimensions. First, we present SVEET, a system that enables inter-operability between real and simulated hosts. In order to increase the scalability of networks under study, SVEET enables time-dilated synchronization between real hosts and the discrete-event simulator. Realistic TCP congestion control algorithms are implemented in the simulator to allow seamless interactions between real and simulated hosts. SVEET is validated via extensive experiments and its capabilities are assessed through case studies involving real applications. Second, we present PrimoGENI, a system that allows a distributed discrete-event simulator, running in real-time, to interact with real network entities in a federated environment. PrimoGENI greatly enhances the flexibility of network experiments, through which a great variety of network conditions can be reproduced to examine what-if questions. Furthermore, PrimoGENI performs resource management functions, on behalf of the user, for instantiating network experiments on shared infrastructures. Finally, to further increase the scalability of network testbeds to handle large-scale high-capacity networks, we present a novel symbiotic simulation approach. We present SymbioSim, a testbed for large-scale network experimentation where a high-performance simulation system closely cooperates with an emulation system in a mutually beneficial way. On the one hand, the simulation system benefits from incorporating the traffic metadata from real applications in the emulation system to reproduce the realistic traffic conditions. On the other hand, the emulation system benefits from receiving the continuous updates from the simulation system to calibrate the traffic between real applications. Specific techniques that support the symbiotic approach include: 1) a model downscaling scheme that can significantly reduce the complexity of the large-scale simulation model, resulting in an efficient emulation system for modulating the high-capacity network traffic between real applications; 2) a queuing network model for the downscaled emulation system to accurately represent the network effects of the simulated traffic; and 3) techniques for reducing the synchronization overhead between the simulation and emulation systems.
Resumo:
As researchers and practitioners move towards a vision of software systems that configure, optimize, protect, and heal themselves, they must also consider the implications of such self-management activities on software reliability. Autonomic computing (AC) describes a new generation of software systems that are characterized by dynamically adaptive self-management features. During dynamic adaptation, autonomic systems modify their own structure and/or behavior in response to environmental changes. Adaptation can result in new system configurations and capabilities, which need to be validated at runtime to prevent costly system failures. However, although the pioneers of AC recognize that validating autonomic systems is critical to the success of the paradigm, the architectural blueprint for AC does not provide a workflow or supporting design models for runtime testing. ^ This dissertation presents a novel approach for seamlessly integrating runtime testing into autonomic software. The approach introduces an implicit self-test feature into autonomic software by tailoring the existing self-management infrastructure to runtime testing. Autonomic self-testing facilitates activities such as test execution, code coverage analysis, timed test performance, and post-test evaluation. In addition, the approach is supported by automated testing tools, and a detailed design methodology. A case study that incorporates self-testing into three autonomic applications is also presented. The findings of the study reveal that autonomic self-testing provides a flexible approach for building safe, reliable autonomic software, while limiting the development and performance overhead through software reuse. ^
Resumo:
Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool.
Resumo:
Software engineering researchers are challenged to provide increasingly more powerful levels of abstractions to address the rising complexity inherent in software solutions. One new development paradigm that places models as abstraction at the forefront of the development process is Model-Driven Software Development (MDSD). MDSD considers models as first class artifacts, extending the capability for engineers to use concepts from the problem domain of discourse to specify apropos solutions. A key component in MDSD is domain-specific modeling languages (DSMLs) which are languages with focused expressiveness, targeting a specific taxonomy of problems. The de facto approach used is to first transform DSML models to an intermediate artifact in a HLL e.g., Java or C++, then execute that resulting code.^ Our research group has developed a class of DSMLs, referred to as interpreted DSMLs (i-DSMLs), where models are directly interpreted by a specialized execution engine with semantics based on model changes at runtime. This execution engine uses a layered architecture and is referred to as a domain-specific virtual machine (DSVM). As the domain-specific model being executed descends the layers of the DSVM the semantic gap between the user-defined model and the services being provided by the underlying infrastructure is closed. The focus of this research is the synthesis engine, the layer in the DSVM which transforms i-DSML models into executable scripts for the next lower layer to process.^ The appeal of an i-DSML is constrained as it possesses unique semantics contained within the DSVM. Existing DSVMs for i-DSMLs exhibit tight coupling between the implicit model of execution and the semantics of the domain, making it difficult to develop DSVMs for new i-DSMLs without a significant investment in resources.^ At the onset of this research only one i-DSML had been created for the user- centric communication domain using the aforementioned approach. This i-DSML is the Communication Modeling Language (CML) and its DSVM is the Communication Virtual machine (CVM). A major problem with the CVM's synthesis engine is that the domain-specific knowledge (DSK) and the model of execution (MoE) are tightly interwoven consequently subsequent DSVMs would need to be developed from inception with no reuse of expertise.^ This dissertation investigates how to decouple the DSK from the MoE and subsequently producing a generic model of execution (GMoE) from the remaining application logic. This GMoE can be reused to instantiate synthesis engines for DSVMs in other domains. The generalized approach to developing the model synthesis component of i-DSML interpreters utilizes a reusable framework loosely coupled to DSK as swappable framework extensions.^ This approach involves first creating an i-DSML and its DSVM for a second do- main, demand-side smartgrid, or microgrid energy management, and designing the synthesis engine so that the DSK and MoE are easily decoupled. To validate the utility of the approach, the SEs are instantiated using the GMoE and DSKs of the two aforementioned domains and an empirical study to support our claim of reduced developmental effort is performed.^