932 resultados para distributed computing projects
Resumo:
The multi-relational Data Mining approach has emerged as alternative to the analysis of structured data, such as relational databases. Unlike traditional algorithms, the multi-relational proposals allow mining directly multiple tables, avoiding the costly join operations. In this paper, is presented a comparative study involving the traditional Patricia Mine algorithm and its corresponding multi-relational proposed, MR-Radix in order to evaluate the performance of two approaches for mining association rules are used for relational databases. This study presents two original contributions: the proposition of an algorithm multi-relational MR-Radix, which is efficient for use in relational databases, both in terms of execution time and in relation to memory usage and the presentation of the empirical approach multirelational advantage in performance over several tables, which avoids the costly join operations from multiple tables. © 2011 IEEE.
Resumo:
Multi-relational data mining enables pattern mining from multiple tables. The existing multi-relational mining association rules algorithms are not able to process large volumes of data, because the amount of memory required exceeds the amount available. The proposed algorithm MRRadix presents a framework that promotes the optimization of memory usage. It also uses the concept of partitioning to handle large volumes of data. The original contribution of this proposal is enable a superior performance when compared to other related algorithms and moreover successfully concludes the task of mining association rules in large databases, bypass the problem of available memory. One of the tests showed that the MR-Radix presents fourteen times less memory usage than the GFP-growth. © 2011 IEEE.
Resumo:
Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.
Resumo:
The development of new technologies that use peer-to-peer networks grows every day, with the object to supply the need of sharing information, resources and services of databases around the world. Among them are the peer-to-peer databases that take advantage of peer-to-peer networks to manage distributed knowledge bases, allowing the sharing of information semantically related but syntactically heterogeneous. However, it is a challenge to ensure the efficient search for information without compromising the autonomy of each node and network flexibility, given the structural characteristics of these networks. On the other hand, some studies propose the use of ontology semantics by assigning standardized categorization of information. The main original contribution of this work is the approach of this problem with a proposal for optimization of queries supported by the Ant Colony algorithm and classification though ontologies. The results show that this strategy enables the semantic support to the searches in peer-to-peer databases, aiming to expand the results without compromising network performance. © 2011 IEEE.
Resumo:
Non-conventional database management systems are used to achieve a better performance when dealing with complex data. One fundamental concept of these systems is object identity (OID), because each object in the database has a unique identifier that is used to access and reference it in relationships to other objects. Two approaches can be used for the implementation of OIDs: physical or logical OIDs. In order to manage complex data, was proposed the Multimedia Data Manager Kernel (NuGeM) that uses a logical technique, named Indirect Mapping. This paper proposes an improvement to the technique used by NuGeM, whose original contribution is management of OIDs with a fewer number of disc accesses and less processing, thus reducing management time from the pages and eliminating the problem with exhaustion of OIDs. Also, the technique presented here can be applied to others OODBMSs. © 2011 IEEE.
Resumo:
The significant volume of work accidents in the cities causes an expressive loss to society. The development of Spatial Data Mining technologies presents a new perspective for the extraction of knowledge from the correlation between conventional and spatial attributes. One of the most important techniques of the Spatial Data Mining is the Spatial Clustering, which clusters similar spatial objects to find a distribution of patterns, taking into account the geographical position of the objects. Applying this technique to the health area, will provide information that can contribute towards the planning of more adequate strategies for the prevention of work accidents. The original contribution of this work is to present an application of tools developed for Spatial Clustering which supply a set of graphic resources that have helped to discover knowledge and support for management in the work accidents area. © 2011 IEEE.
Resumo:
Non-conventional database management systems are used to achieve a better performance when dealing with complex data. One fundamental concept of these systems is object identity (OID). Two techniques can be used for the implementation of OIDs: physical or logical. A logical implementation of OIDs, based on an Indirection Table, is used by NuGeM, a multimedia data manager kernel which is described in this paper. NuGeM Indirection Table allows the relocation of all pages in a database. The proposed strategy modifies the workings of this table so that it is possible to reduce considerably the number of I/O operations during the request and release of pages containing objects and their OIDs. Tests show a reduction of 84% in reading operations and a 67% reduction in writing operations when pages are requested. Although no changes were observed in writing operations during the release of pages, a 100% of reduction in reading operations was obtained. © 2012 IEEE.
Resumo:
The application of Concurrency Theory to Systems Biology is in its earliest stage of progress. The metaphor of cells as computing systems by Regev and Shapiro opened the employment of concurrent languages for the modelling of biological systems. Their peculiar characteristics led to the design of many bio-inspired formalisms which achieve higher faithfulness and specificity. In this thesis we present pi@, an extremely simple and conservative extension of the pi-calculus representing a keystone in this respect, thanks to its expressiveness capabilities. The pi@ calculus is obtained by the addition of polyadic synchronisation and priority to the pi-calculus, in order to achieve compartment semantics and atomicity of complex operations respectively. In its direct application to biological modelling, the stochastic variant of the calculus, Spi@, is shown able to model consistently several phenomena such as formation of molecular complexes, hierarchical subdivision of the system into compartments, inter-compartment reactions, dynamic reorganisation of compartment structure consistent with volume variation. The pivotal role of pi@ is evidenced by its capability of encoding in a compositional way several bio-inspired formalisms, so that it represents the optimal core of a framework for the analysis and implementation of bio-inspired languages. In this respect, the encodings of BioAmbients, Brane Calculi and a variant of P Systems in pi@ are formalised. The conciseness of their translation in pi@ allows their indirect comparison by means of their encodings. Furthermore it provides a ready-to-run implementation of minimal effort whose correctness is granted by the correctness of the respective encoding functions. Further important results of general validity are stated on the expressive power of priority. Several impossibility results are described, which clearly state the superior expressiveness of prioritised languages and the problems arising in the attempt of providing their parallel implementation. To this aim, a new setting in distributed computing (the last man standing problem) is singled out and exploited to prove the impossibility of providing a purely parallel implementation of priority by means of point-to-point or broadcast communication.
Resumo:
Nowadays, computing is migrating from traditional high performance and distributed computing to pervasive and utility computing based on heterogeneous networks and clients. The current trend suggests that future IT services will rely on distributed resources and on fast communication of heterogeneous contents. The success of this new range of services is directly linked to the effectiveness of the infrastructure in delivering them. The communication infrastructure will be the aggregation of different technologies even though the current trend suggests the emergence of single IP based transport service. Optical networking is a key technology to answer the increasing requests for dynamic bandwidth allocation and configure multiple topologies over the same physical layer infrastructure, optical networks today are still “far” from accessible from directly configure and offer network services and need to be enriched with more “user oriented” functionalities. However, current Control Plane architectures only facilitate efficient end-to-end connectivity provisioning and certainly cannot meet future network service requirements, e.g. the coordinated control of resources. The overall objective of this work is to provide the network with the improved usability and accessibility of the services provided by the Optical Network. More precisely, the definition of a service-oriented architecture is the enable technology to allow user applications to gain benefit of advanced services over an underlying dynamic optical layer. The definition of a service oriented networking architecture based on advanced optical network technologies facilitates users and applications access to abstracted levels of information regarding offered advanced network services. This thesis faces the problem to define a Service Oriented Architecture and its relevant building blocks, protocols and languages. In particular, this work has been focused on the use of the SIP protocol as a inter-layers signalling protocol which defines the Session Plane in conjunction with the Network Resource Description language. On the other hand, an advantage optical network must accommodate high data bandwidth with different granularities. Currently, two main technologies are emerging promoting the development of the future optical transport network, Optical Burst and Packet Switching. Both technologies respectively promise to provide all optical burst or packet switching instead of the current circuit switching. However, the electronic domain is still present in the scheduler forwarding and routing decision. Because of the high optics transmission frequency the burst or packet scheduler faces a difficult challenge, consequentially, high performance and time focused design of both memory and forwarding logic is need. This open issue has been faced in this thesis proposing an high efficiently implementation of burst and packet scheduler. The main novelty of the proposed implementation is that the scheduling problem has turned into simple calculation of a min/max function and the function complexity is almost independent of on the traffic conditions.
Resumo:
Synthetic biology has recently had a great development, many papers have been published and many applications have been presented, spanning from the production of biopharmacheuticals to the synthesis of bioenergetic substrates or industrial catalysts. But, despite these advances, most of the applications are quite simple and don’t fully exploit the potential of this discipline. This limitation in complexity has many causes, like the incomplete characterization of some components, or the intrinsic variability of the biological systems, but one of the most important reasons is the incapability of the cell to sustain the additional metabolic burden introduced by a complex circuit. The objective of the project, of which this work is part, is trying to solve this problem through the engineering of a multicellular behaviour in prokaryotic cells. This system will introduce a cooperative behaviour that will allow to implement complex functionalities, that can’t be obtained with a single cell. In particular the goal is to implement the Leader Election, this procedure has been firstly devised in the field of distributed computing, to identify the process that allow to identify a single process as organizer and coordinator of a series of tasks assigned to the whole population. The election of the Leader greatly simplifies the computation providing a centralized control. Further- more this system may even be useful to evolutionary studies that aims to explain how complex organisms evolved from unicellular systems. The work presented here describes, in particular, the design and the experimental characterization of a component of the circuit that solves the Leader Election problem. This module, composed of an hybrid promoter and a gene, is activated in the non-leader cells after receiving the signal that a leader is present in the colony. The most important element, in this case, is the hybrid promoter, it has been realized in different versions, applying the heuristic rules stated in [22], and their activity has been experimentally tested. The objective of the experimental characterization was to test the response of the genetic circuit to the introduction, in the cellular environment, of particular molecules, inducers, that can be considered inputs of the system. The desired behaviour is similar to the one of a logic AND gate in which the exit, represented by the luminous signal produced by a fluorescent protein, is one only in presence of both inducers. The robustness and the stability of this behaviour have been tested by changing the concentration of the input signals and building dose response curves. From these data it is possible to conclude that the analysed constructs have an AND-like behaviour over a wide range of inducers’ concentrations, even if it is possible to identify many differences in the expression profiles of the different constructs. This variability accounts for the fact that the input and the output signals are continuous, and so their binary representation isn’t able to capture the complexity of the behaviour. The module of the circuit that has been considered in this analysis has a fundamental role in the realization of the intercellular communication system that is necessary for the cooperative behaviour to take place. For this reason, the second phase of the characterization has been focused on the analysis of the signal transmission. In particular, the interaction between this element and the one that is responsible for emitting the chemical signal has been tested. The desired behaviour is still similar to a logic AND, since, even in this case, the exit signal is determined by the hybrid promoter activity. The experimental results have demonstrated that the systems behave correctly, even if there is still a substantial variability between them. The dose response curves highlighted that stricter constrains on the inducers concentrations need to be imposed in order to obtain a clear separation between the two levels of expression. In the conclusive chapter the DNA sequences of the hybrid promoters are analysed, trying to identify the regulatory elements that are most important for the determination of the gene expression. Given the available data it wasn’t possible to draw definitive conclusions. In the end, few considerations on promoter engineering and complex circuits realization are presented. This section aims to briefly recall some of the problems outlined in the introduction and provide a few possible solutions.
Resumo:
In addition to multi-national Grid infrastructures, several countries operate their own national Grid infrastructures to support science and industry within national borders. These infrastructures have the benefit of better satisfying the needs of local, regional and national user communities. Although Switzerland has strong research groups in several fields of distributed computing, only recently a national Grid effort was kick-started to integrate a truly heterogeneous set of resource providers, middleware pools, and users. In the following. article we discuss our efforts to start Grid activities at a national scale to combine several scientific communities and geographical domains. We make a strong case for the need of standards that have to be built on top of existing software systems in order to provide support for a heterogeneous Grid infrastruc
Resumo:
We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.
Resumo:
Complexity has always been one of the most important issues in distributed computing. From the first clusters to grid and now cloud computing, dealing correctly and efficiently with system complexity is the key to taking technology a step further. In this sense, global behavior modeling is an innovative methodology aimed at understanding the grid behavior. The main objective of this methodology is to synthesize the grid's vast, heterogeneous nature into a simple but powerful behavior model, represented in the form of a single, abstract entity, with a global state. Global behavior modeling has proved to be very useful in effectively managing grid complexity but, in many cases, deeper knowledge is needed. It generates a descriptive model that could be greatly improved if extended not only to explain behavior, but also to predict it. In this paper we present a prediction methodology whose objective is to define the techniques needed to create global behavior prediction models for grid systems. This global behavior prediction can benefit grid management, specially in areas such as fault tolerance or job scheduling. The paper presents experimental results obtained in real scenarios in order to validate this approach.
Resumo:
Abstract machines provide a certain separation between platformdependent and platform-independent concerns in compilation. Many of the differences between architectures are encapsulated in the speciflc abstract machine implementation and the bytecode is left largely architecture independent. Taking advantage of this fact, we present a framework for estimating upper and lower bounds on the execution times of logic programs running on a bytecode-based abstract machine. Our approach includes a one-time, programindependent proflling stage which calculates constants or functions bounding the execution time of each abstract machine instruction. Then, a compile-time cost estimation phase, using the instruction timing information, infers expressions giving platform-dependent upper and lower bounds on actual execution time as functions of input data sizes for each program. Working at the abstract machine level makes it possible to take into account low-level issues in new architectures and platforms by just reexecuting the calibration stage instead of having to tailor the analysis for each architecture and platform. Applications of such predicted execution times include debugging/veriflcation of time properties, certiflcation of time properties in mobile code, granularity control in parallel/distributed computing, and resource-oriented specialization.