8 resultados para Parallel or distributed processing
em Digital Commons at Florida International University
Resumo:
This research is motivated by a practical application observed at a printed circuit board (PCB) manufacturing facility. After assembly, the PCBs (or jobs) are tested in environmental stress screening (ESS) chambers (or batch processing machines) to detect early failures. Several PCBs can be simultaneously tested as long as the total size of all the PCBs in the batch does not violate the chamber capacity. PCBs from different production lines arrive dynamically to a queue in front of a set of identical ESS chambers, where they are grouped into batches for testing. Each line delivers PCBs that vary in size and require different testing (or processing) times. Once a batch is formed, its processing time is the longest processing time among the PCBs in the batch, and its ready time is given by the PCB arriving last to the batch. ESS chambers are expensive and a bottleneck. Consequently, its makespan has to be minimized. ^ A mixed-integer formulation is proposed for the problem under study and compared to a formulation recently published. The proposed formulation is better in terms of the number of decision variables, linear constraints and run time. A procedure to compute the lower bound is proposed. For sparse problems (i.e. when job ready times are dispersed widely), the lower bounds are close to optimum. ^ The problem under study is NP-hard. Consequently, five heuristics, two metaheuristics (i.e. simulated annealing (SA) and greedy randomized adaptive search procedure (GRASP)), and a decomposition approach (i.e. column generation) are proposed—especially to solve problem instances which require prohibitively long run times when a commercial solver is used. Extensive experimental study was conducted to evaluate the different solution approaches based on the solution quality and run time. ^ The decomposition approach improved the lower bounds (or linear relaxation solution) of the mixed-integer formulation. At least one of the proposed heuristic outperforms the Modified Delay heuristic from the literature. For sparse problems, almost all the heuristics report a solution close to optimum. GRASP outperforms SA at a higher computational cost. The proposed approaches are viable to implement as the run time is very short. ^
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
3D geographic information system (GIS) is data and computation intensive in nature. Internet users are usually equipped with low-end personal computers and network connections of limited bandwidth. Data reduction and performance optimization techniques are of critical importance in quality of service (QoS) management for online 3D GIS. In this research, QoS management issues regarding distributed 3D GIS presentation were studied to develop 3D TerraFly, an interactive 3D GIS that supports high quality online terrain visualization and navigation. ^ To tackle the QoS management challenges, multi-resolution rendering model, adaptive level of detail (LOD) control and mesh simplification algorithms were proposed to effectively reduce the terrain model complexity. The rendering model is adaptively decomposed into sub-regions of up-to-three detail levels according to viewing distance and other dynamic quality measurements. The mesh simplification algorithm was designed as a hybrid algorithm that combines edge straightening and quad-tree compression to reduce the mesh complexity by removing geometrically redundant vertices. The main advantage of this mesh simplification algorithm is that grid mesh can be directly processed in parallel without triangulation overhead. Algorithms facilitating remote accessing and distributed processing of volumetric GIS data, such as data replication, directory service, request scheduling, predictive data retrieving and caching were also proposed. ^ A prototype of the proposed 3D TerraFly implemented in this research demonstrates the effectiveness of our proposed QoS management framework in handling interactive online 3D GIS. The system implementation details and future directions of this research are also addressed in this thesis. ^
Resumo:
Parallel processing is prevalent in many manufacturing and service systems. Many manufactured products are built and assembled from several components fabricated in parallel lines. An example of this manufacturing system configuration is observed at a manufacturing facility equipped to assemble and test web servers. Characteristics of a typical web server assembly line are: multiple products, job circulation, and paralleling processing. The primary objective of this research was to develop analytical approximations to predict performance measures of manufacturing systems with job failures and parallel processing. The analytical formulations extend previous queueing models used in assembly manufacturing systems in that they can handle serial and different configurations of paralleling processing with multiple product classes, and job circulation due to random part failures. In addition, appropriate correction terms via regression analysis were added to the approximations in order to minimize the gap in the error between the analytical approximation and the simulation models. Markovian and general type manufacturing systems, with multiple product classes, job circulation due to failures, and fork and join systems to model parallel processing were studied. In the Markovian and general case, the approximations without correction terms performed quite well for one and two product problem instances. However, it was observed that the flow time error increased as the number of products and net traffic intensity increased. Therefore, correction terms for single and fork-join stations were developed via regression analysis to deal with more than two products. The numerical comparisons showed that the approximations perform remarkably well when the corrections factors were used in the approximations. In general, the average flow time error was reduced from 38.19% to 5.59% in the Markovian case, and from 26.39% to 7.23% in the general case. All the equations stated in the analytical formulations were implemented as a set of Matlab scripts. By using this set, operations managers of web server assembly lines, manufacturing or other service systems with similar characteristics can estimate different system performance measures, and make judicious decisions - especially setting delivery due dates, capacity planning, and bottleneck mitigation, among others.
Resumo:
Parallel processing is prevalent in many manufacturing and service systems. Many manufactured products are built and assembled from several components fabricated in parallel lines. An example of this manufacturing system configuration is observed at a manufacturing facility equipped to assemble and test web servers. Characteristics of a typical web server assembly line are: multiple products, job circulation, and paralleling processing. The primary objective of this research was to develop analytical approximations to predict performance measures of manufacturing systems with job failures and parallel processing. The analytical formulations extend previous queueing models used in assembly manufacturing systems in that they can handle serial and different configurations of paralleling processing with multiple product classes, and job circulation due to random part failures. In addition, appropriate correction terms via regression analysis were added to the approximations in order to minimize the gap in the error between the analytical approximation and the simulation models. Markovian and general type manufacturing systems, with multiple product classes, job circulation due to failures, and fork and join systems to model parallel processing were studied. In the Markovian and general case, the approximations without correction terms performed quite well for one and two product problem instances. However, it was observed that the flow time error increased as the number of products and net traffic intensity increased. Therefore, correction terms for single and fork-join stations were developed via regression analysis to deal with more than two products. The numerical comparisons showed that the approximations perform remarkably well when the corrections factors were used in the approximations. In general, the average flow time error was reduced from 38.19% to 5.59% in the Markovian case, and from 26.39% to 7.23% in the general case. All the equations stated in the analytical formulations were implemented as a set of Matlab scripts. By using this set, operations managers of web server assembly lines, manufacturing or other service systems with similar characteristics can estimate different system performance measures, and make judicious decisions - especially setting delivery due dates, capacity planning, and bottleneck mitigation, among others.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.