14 resultados para Parallel and distributed systems
em Digital Commons at Florida International University
Resumo:
Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.
Resumo:
Electrical energy is an essential resource for the modern world. Unfortunately, its price has almost doubled in the last decade. Furthermore, energy production is also currently one of the primary sources of pollution. These concerns are becoming more important in data-centers. As more computational power is required to serve hundreds of millions of users, bigger data-centers are becoming necessary. This results in higher electrical energy consumption. Of all the energy used in data-centers, including power distribution units, lights, and cooling, computer hardware consumes as much as 80%. Consequently, there is opportunity to make data-centers more energy efficient by designing systems with lower energy footprint. Consuming less energy is critical not only in data-centers. It is also important in mobile devices where battery-based energy is a scarce resource. Reducing the energy consumption of these devices will allow them to last longer and re-charge less frequently. Saving energy in computer systems is a challenging problem. Improving a system's energy efficiency usually comes at the cost of compromises in other areas such as performance or reliability. In the case of secondary storage, for example, spinning-down the disks to save energy can incur high latencies if they are accessed while in this state. The challenge is to be able to increase the energy efficiency while keeping the system as reliable and responsive as before. This thesis tackles the problem of improving energy efficiency in existing systems while reducing the impact on performance. First, we propose a new technique to achieve fine grained energy proportionality in multi-disk systems; Second, we design and implement an energy-efficient cache system using flash memory that increases disk idleness to save energy; Finally, we identify and explore solutions for the page fetch-before-update problem in caching systems that can: (a) control better I/O traffic to secondary storage and (b) provide critical performance improvement for energy efficient systems.
Resumo:
Petri Nets are a formal, graphical and executable modeling technique for the specification and analysis of concurrent and distributed systems and have been widely applied in computer science and many other engineering disciplines. Low level Petri nets are simple and useful for modeling control flows but not powerful enough to define data and system functionality. High level Petri nets (HLPNs) have been developed to support data and functionality definitions, such as using complex structured data as tokens and algebraic expressions as transition formulas. Compared to low level Petri nets, HLPNs result in compact system models that are easier to be understood. Therefore, HLPNs are more useful in modeling complex systems. ^ There are two issues in using HLPNs—modeling and analysis. Modeling concerns the abstracting and representing the systems under consideration using HLPNs, and analysis deals with effective ways study the behaviors and properties of the resulting HLPN models. In this dissertation, several modeling and analysis techniques for HLPNs are studied, which are integrated into a framework that is supported by a tool. ^ For modeling, this framework integrates two formal languages: a type of HLPNs called Predicate Transition Net (PrT Net) is used to model a system's behavior and a first-order linear time temporal logic (FOLTL) to specify the system's properties. The main contribution of this dissertation with regard to modeling is to develop a software tool to support the formal modeling capabilities in this framework. ^ For analysis, this framework combines three complementary techniques, simulation, explicit state model checking and bounded model checking (BMC). Simulation is a straightforward and speedy method, but only covers some execution paths in a HLPN model. Explicit state model checking covers all the execution paths but suffers from the state explosion problem. BMC is a tradeoff as it provides a certain level of coverage while more efficient than explicit state model checking. The main contribution of this dissertation with regard to analysis is adapting BMC to analyze HLPN models and integrating the three complementary analysis techniques in a software tool to support the formal analysis capabilities in this framework. ^ The SAMTools developed for this framework in this dissertation integrates three tools: PIPE+ for HLPNs behavioral modeling and simulation, SAMAT for hierarchical structural modeling and property specification, and PIPE+Verifier for behavioral verification.^
Resumo:
The performance of building envelopes and roofing systems significantly depends on accurate knowledge of wind loads and the response of envelope components under realistic wind conditions. Wind tunnel testing is a well-established practice to determine wind loads on structures. For small structures much larger model scales are needed than for large structures, to maintain modeling accuracy and minimize Reynolds number effects. In these circumstances the ability to obtain a large enough turbulence integral scale is usually compromised by the limited dimensions of the wind tunnel meaning that it is not possible to simulate the low frequency end of the turbulence spectrum. Such flows are called flows with Partial Turbulence Simulation. In this dissertation, the test procedure and scaling requirements for tests in partial turbulence simulation are discussed. A theoretical method is proposed for including the effects of low-frequency turbulences in the post-test analysis. In this theory the turbulence spectrum is divided into two distinct statistical processes, one at high frequencies which can be simulated in the wind tunnel, and one at low frequencies which can be treated in a quasi-steady manner. The joint probability of load resulting from the two processes is derived from which full-scale equivalent peak pressure coefficients can be obtained. The efficacy of the method is proved by comparing predicted data derived from tests on large-scale models of the Silsoe Cube and Texas-Tech University buildings in Wall of Wind facility at Florida International University with the available full-scale data. For multi-layer building envelopes such as rain-screen walls, roof pavers, and vented energy efficient walls not only peak wind loads but also their spatial gradients are important. Wind permeable roof claddings like roof pavers are not well dealt with in many existing building codes and standards. Large-scale experiments were carried out to investigate the wind loading on concrete pavers including wind blow-off tests and pressure measurements. Simplified guidelines were developed for design of loose-laid roof pavers against wind uplift. The guidelines are formatted so that use can be made of the existing information in codes and standards such as ASCE 7-10 on pressure coefficients on components and cladding.
Resumo:
I proposed the study of two distinct aspects of Ten-Eleven Translocation 2 (TET2) protein for understanding specific functions in different body systems. In Part I, I characterized the molecular mechanisms of Tet2 in the hematological system. As the second member of Ten-Eleven Translocation protein family, TET2 is frequently mutated in leukemic patients. Previous studies have shown that the TET2 mutations frequently occur in 20% myelodysplastic syndrome/myeloproliferative neoplasm (MDS/MPN), 10% T-cell lymphoma leukemia and 2% B-cell lymphoma leukemia. Genetic mouse models also display distinct phenotypes of various types of hematological malignancies. I performed 5-hydroxymethylcytosine (5hmC) chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq) of hematopoietic stem/progenitor cells to determine whether the deletion of Tet2 can affect the abundance of 5hmC at myeloid, T-cell and B-cell specific gene transcription start sites, which ultimately result in various hematological malignancies. Subsequent Exome sequencing (Exome-Seq) showed that disease-specific genes are mutated in different types of tumors, which suggests that TET2 may protect the genome from being mutated. The direct interaction between TET2 and Mutator S Homolog 6 (MSH6) protein suggests TET2 is involved in DNA mismatch repair. Finally, in vivo mismatch repair studies show that the loss of Tet2 causes a mutator phenotype. Taken together, my data indicate that TET2 binds to MSH6 to protect genome integrity. In Part II, I intended to better understand the role of Tet2 in the nervous system. 5-hydroxymethylcytosine regulates epigenetic modification during neurodevelopment and aging. Thus, Tet2 may play a critical role in regulating adult neurogenesis. To examine the physiological significance of Tet2 in the nervous system, I first showed that the deletion of Tet2 reduces the 5hmC levels in neural stem cells. Mice lacking Tet2 show abnormal hippocampal neurogenesis along with 5hmC alternations at different gene promoters and corresponding gene expression downregulation. Through the luciferase reporter assay, two neural factors Neurogenic differentiation 1 (NeuroD1) and Glial fibrillary acidic protein (Gfap) were down-regulated in Tet2 knockout cells. My results suggest that Tet2 regulates neural stem/progenitor cell proliferation and differentiation in adult brain.
Resumo:
Petri Nets are a formal, graphical and executable modeling technique for the specification and analysis of concurrent and distributed systems and have been widely applied in computer science and many other engineering disciplines. Low level Petri nets are simple and useful for modeling control flows but not powerful enough to define data and system functionality. High level Petri nets (HLPNs) have been developed to support data and functionality definitions, such as using complex structured data as tokens and algebraic expressions as transition formulas. Compared to low level Petri nets, HLPNs result in compact system models that are easier to be understood. Therefore, HLPNs are more useful in modeling complex systems. There are two issues in using HLPNs - modeling and analysis. Modeling concerns the abstracting and representing the systems under consideration using HLPNs, and analysis deals with effective ways study the behaviors and properties of the resulting HLPN models. In this dissertation, several modeling and analysis techniques for HLPNs are studied, which are integrated into a framework that is supported by a tool. For modeling, this framework integrates two formal languages: a type of HLPNs called Predicate Transition Net (PrT Net) is used to model a system's behavior and a first-order linear time temporal logic (FOLTL) to specify the system's properties. The main contribution of this dissertation with regard to modeling is to develop a software tool to support the formal modeling capabilities in this framework. For analysis, this framework combines three complementary techniques, simulation, explicit state model checking and bounded model checking (BMC). Simulation is a straightforward and speedy method, but only covers some execution paths in a HLPN model. Explicit state model checking covers all the execution paths but suffers from the state explosion problem. BMC is a tradeoff as it provides a certain level of coverage while more efficient than explicit state model checking. The main contribution of this dissertation with regard to analysis is adapting BMC to analyze HLPN models and integrating the three complementary analysis techniques in a software tool to support the formal analysis capabilities in this framework. The SAMTools developed for this framework in this dissertation integrates three tools: PIPE+ for HLPNs behavioral modeling and simulation, SAMAT for hierarchical structural modeling and property specification, and PIPE+Verifier for behavioral verification.
Resumo:
The performance of building envelopes and roofing systems significantly depends on accurate knowledge of wind loads and the response of envelope components under realistic wind conditions. Wind tunnel testing is a well-established practice to determine wind loads on structures. For small structures much larger model scales are needed than for large structures, to maintain modeling accuracy and minimize Reynolds number effects. In these circumstances the ability to obtain a large enough turbulence integral scale is usually compromised by the limited dimensions of the wind tunnel meaning that it is not possible to simulate the low frequency end of the turbulence spectrum. Such flows are called flows with Partial Turbulence Simulation.^ In this dissertation, the test procedure and scaling requirements for tests in partial turbulence simulation are discussed. A theoretical method is proposed for including the effects of low-frequency turbulences in the post-test analysis. In this theory the turbulence spectrum is divided into two distinct statistical processes, one at high frequencies which can be simulated in the wind tunnel, and one at low frequencies which can be treated in a quasi-steady manner. The joint probability of load resulting from the two processes is derived from which full-scale equivalent peak pressure coefficients can be obtained. The efficacy of the method is proved by comparing predicted data derived from tests on large-scale models of the Silsoe Cube and Texas-Tech University buildings in Wall of Wind facility at Florida International University with the available full-scale data.^ For multi-layer building envelopes such as rain-screen walls, roof pavers, and vented energy efficient walls not only peak wind loads but also their spatial gradients are important. Wind permeable roof claddings like roof pavers are not well dealt with in many existing building codes and standards. Large-scale experiments were carried out to investigate the wind loading on concrete pavers including wind blow-off tests and pressure measurements. Simplified guidelines were developed for design of loose-laid roof pavers against wind uplift. The guidelines are formatted so that use can be made of the existing information in codes and standards such as ASCE 7-10 on pressure coefficients on components and cladding.^
Resumo:
I proposed the study of two distinct aspects of Ten-Eleven Translocation 2 (TET2) protein for understanding specific functions in different body systems. ^ In Part I, I characterized the molecular mechanisms of Tet2 in the hematological system. As the second member of Ten-Eleven Translocation protein family, TET2 is frequently mutated in leukemic patients. Previous studies have shown that the TET2 mutations frequently occur in 20% myelodysplastic syndrome/myeloproliferative neoplasm (MDS/MPN), 10% T-cell lymphoma leukemia and 2% B-cell lymphoma leukemia. Genetic mouse models also display distinct phenotypes of various types of hematological malignancies. I performed 5-hydroxymethylcytosine (5hmC) chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq) of hematopoietic stem/progenitor cells to determine whether the deletion of Tet2 can affect the abundance of 5hmC at myeloid, T-cell and B-cell specific gene transcription start sites, which ultimately result in various hematological malignancies. Subsequent Exome sequencing (Exome-Seq) showed that disease-specific genes are mutated in different types of tumors, which suggests that TET2 may protect the genome from being mutated. The direct interaction between TET2 and Mutator S Homolog 6 (MSH6) protein suggests TET2 is involved in DNA mismatch repair. Finally, in vivo mismatch repair studies show that the loss of Tet2 causes a mutator phenotype. Taken together, my data indicate that TET2 binds to MSH6 to protect genome integrity. ^ In Part II, I intended to better understand the role of Tet2 in the nervous system. 5-hydroxymethylcytosine regulates epigenetic modification during neurodevelopment and aging. Thus, Tet2 may play a critical role in regulating adult neurogenesis. To examine the physiological significance of Tet2 in the nervous system, I first showed that the deletion of Tet2 reduces the 5hmC levels in neural stem cells. Mice lacking Tet2 show abnormal hippocampal neurogenesis along with 5hmC alternations at different gene promoters and corresponding gene expression downregulation. Through the luciferase reporter assay, two neural factors Neurogenic differentiation 1 (NeuroD1) and Glial fibrillary acidic protein (Gfap) were down-regulated in Tet2 knockout cells. My results suggest that Tet2 regulates neural stem/progenitor cell proliferation and differentiation in adult brain.^
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
Parallel processing is prevalent in many manufacturing and service systems. Many manufactured products are built and assembled from several components fabricated in parallel lines. An example of this manufacturing system configuration is observed at a manufacturing facility equipped to assemble and test web servers. Characteristics of a typical web server assembly line are: multiple products, job circulation, and paralleling processing. The primary objective of this research was to develop analytical approximations to predict performance measures of manufacturing systems with job failures and parallel processing. The analytical formulations extend previous queueing models used in assembly manufacturing systems in that they can handle serial and different configurations of paralleling processing with multiple product classes, and job circulation due to random part failures. In addition, appropriate correction terms via regression analysis were added to the approximations in order to minimize the gap in the error between the analytical approximation and the simulation models. Markovian and general type manufacturing systems, with multiple product classes, job circulation due to failures, and fork and join systems to model parallel processing were studied. In the Markovian and general case, the approximations without correction terms performed quite well for one and two product problem instances. However, it was observed that the flow time error increased as the number of products and net traffic intensity increased. Therefore, correction terms for single and fork-join stations were developed via regression analysis to deal with more than two products. The numerical comparisons showed that the approximations perform remarkably well when the corrections factors were used in the approximations. In general, the average flow time error was reduced from 38.19% to 5.59% in the Markovian case, and from 26.39% to 7.23% in the general case. All the equations stated in the analytical formulations were implemented as a set of Matlab scripts. By using this set, operations managers of web server assembly lines, manufacturing or other service systems with similar characteristics can estimate different system performance measures, and make judicious decisions - especially setting delivery due dates, capacity planning, and bottleneck mitigation, among others.
Resumo:
The future power grid will effectively utilize renewable energy resources and distributed generation to respond to energy demand while incorporating information technology and communication infrastructure for their optimum operation. This dissertation contributes to the development of real-time techniques, for wide-area monitoring and secure real-time control and operation of hybrid power systems. ^ To handle the increased level of real-time data exchange, this dissertation develops a supervisory control and data acquisition (SCADA) system that is equipped with a state estimation scheme from the real-time data. This system is verified on a specially developed laboratory-based test bed facility, as a hardware and software platform, to emulate the actual scenarios of a real hybrid power system with the highest level of similarities and capabilities to practical utility systems. It includes phasor measurements at hundreds of measurement points on the system. These measurements were obtained from especially developed laboratory based Phasor Measurement Unit (PMU) that is utilized in addition to existing commercially based PMU’s. The developed PMU was used in conjunction with the interconnected system along with the commercial PMU’s. The tested studies included a new technique for detecting the partially islanded micro grids in addition to several real-time techniques for synchronization and parameter identifications of hybrid systems. ^ Moreover, due to numerous integration of renewable energy resources through DC microgrids, this dissertation performs several practical cases for improvement of interoperability of such systems. Moreover, increased number of small and dispersed generating stations and their need to connect fast and properly into the AC grids, urged this work to explore the challenges that arise in synchronization of generators to the grid and through introduction of a Dynamic Brake system to improve the process of connecting distributed generators to the power grid.^ Real time operation and control requires data communication security. A research effort in this dissertation was developed based on Trusted Sensing Base (TSB) process for data communication security. The innovative TSB approach improves the security aspect of the power grid as a cyber-physical system. It is based on available GPS synchronization technology and provides protection against confidentiality attacks in critical power system infrastructures. ^
Resumo:
Parallel processing is prevalent in many manufacturing and service systems. Many manufactured products are built and assembled from several components fabricated in parallel lines. An example of this manufacturing system configuration is observed at a manufacturing facility equipped to assemble and test web servers. Characteristics of a typical web server assembly line are: multiple products, job circulation, and paralleling processing. The primary objective of this research was to develop analytical approximations to predict performance measures of manufacturing systems with job failures and parallel processing. The analytical formulations extend previous queueing models used in assembly manufacturing systems in that they can handle serial and different configurations of paralleling processing with multiple product classes, and job circulation due to random part failures. In addition, appropriate correction terms via regression analysis were added to the approximations in order to minimize the gap in the error between the analytical approximation and the simulation models. Markovian and general type manufacturing systems, with multiple product classes, job circulation due to failures, and fork and join systems to model parallel processing were studied. In the Markovian and general case, the approximations without correction terms performed quite well for one and two product problem instances. However, it was observed that the flow time error increased as the number of products and net traffic intensity increased. Therefore, correction terms for single and fork-join stations were developed via regression analysis to deal with more than two products. The numerical comparisons showed that the approximations perform remarkably well when the corrections factors were used in the approximations. In general, the average flow time error was reduced from 38.19% to 5.59% in the Markovian case, and from 26.39% to 7.23% in the general case. All the equations stated in the analytical formulations were implemented as a set of Matlab scripts. By using this set, operations managers of web server assembly lines, manufacturing or other service systems with similar characteristics can estimate different system performance measures, and make judicious decisions - especially setting delivery due dates, capacity planning, and bottleneck mitigation, among others.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
A methodology for formally modeling and analyzing software architecture of mobile agent systems provides a solid basis to develop high quality mobile agent systems, and the methodology is helpful to study other distributed and concurrent systems as well. However, it is a challenge to provide the methodology because of the agent mobility in mobile agent systems.^ The methodology was defined from two essential parts of software architecture: a formalism to define the architectural models and an analysis method to formally verify system properties. The formalism is two-layer Predicate/Transition (PrT) nets extended with dynamic channels, and the analysis method is a hierarchical approach to verify models on different levels. The two-layer modeling formalism smoothly transforms physical models of mobile agent systems into their architectural models. Dynamic channels facilitate the synchronous communication between nets, and they naturally capture the dynamic architecture configuration and agent mobility of mobile agent systems. Component properties are verified based on transformed individual components, system properties are checked in a simplified system model, and interaction properties are analyzed on models composing from involved nets. Based on the formalism and the analysis method, this researcher formally modeled and analyzed a software architecture of mobile agent systems, and designed an architectural model of a medical information processing system based on mobile agents. The model checking tool SPIN was used to verify system properties such as reachability, concurrency and safety of the medical information processing system. ^ From successful modeling and analyzing the software architecture of mobile agent systems, the conclusion is that PrT nets extended with channels are a powerful tool to model mobile agent systems, and the hierarchical analysis method provides a rigorous foundation for the modeling tool. The hierarchical analysis method not only reduces the complexity of the analysis, but also expands the application scope of model checking techniques. The results of formally modeling and analyzing the software architecture of the medical information processing system show that model checking is an effective and an efficient way to verify software architecture. Moreover, this system shows a high level of flexibility, efficiency and low cost of mobile agent technologies. ^