922 resultados para Data replication processes


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Currently, many museums, botanic gardens and herbariums keep data of biological collections and using computational tools researchers digitalize and provide access to their data using data portals. The replication of databases in portals can be accomplished through the use of protocols and data schema. However, the implementation of this solution demands a large amount of time, concerning both the transfer of fragments of data and processing data within the portal. With the growth of data digitalization in institutions, this scenario tends to be increasingly exacerbated, making it hard to maintain the records updated on the portals. As an original contribution, this research proposes analysing the data replication process to evaluate the performance of portals. The Inter-American Biodiversity Information Network (IABIN) biodiversity data portal of pollinators was used as a study case, which supports both situations: conventional data replication of records of specimen occurrences and interactions between them. With the results of this research, it is possible to simulate a situation before its implementation, thus predicting the performance of replication operations. Additionally, these results may contribute to future improvements to this process, in order to decrease the time required to make the data available in portals. © Rinton Press.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Agrobacterium tumefaciens, a bacterial plant pathogen, when transformed with plasmid constructs containing greater than unit length DNA of tomato leaf curl geminivirus accumulates viral replicative form DNAs indistinguishable from those produced in infected plants. The accumulation of the viral DNA species depends on the presence of two origins of replication in the DNA constructs and is drastically reduced by introducing mutations into the viral replication-associated protein (Rep or C1) ORF, indicating that an active viral replication process is occurring in the bacterial cell. The accumulation of these viral DNA species is not affected by mutations or deletions in the other viral open reading frames. The observation that geminivirus DNA replication functions are supported by the bacterial cellular machinery provides evidence for the theory that these circular single-stranded DNA viruses have evolved from prokaryotic episomal replicons.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract This thesis proposes a set of adaptive broadcast solutions and an adaptive data replication solution to support the deployment of P2P applications. P2P applications are an emerging type of distributed applications that are running on top of P2P networks. Typical P2P applications are video streaming, file sharing, etc. While interesting because they are fully distributed, P2P applications suffer from several deployment problems, due to the nature of the environment on which they perform. Indeed, defining an application on top of a P2P network often means defining an application where peers contribute resources in exchange for their ability to use the P2P application. For example, in P2P file sharing application, while the user is downloading some file, the P2P application is in parallel serving that file to other users. Such peers could have limited hardware resources, e.g., CPU, bandwidth and memory or the end-user could decide to limit the resources it dedicates to the P2P application a priori. In addition, a P2P network is typically emerged into an unreliable environment, where communication links and processes are subject to message losses and crashes, respectively. To support P2P applications, this thesis proposes a set of services that address some underlying constraints related to the nature of P2P networks. The proposed services include a set of adaptive broadcast solutions and an adaptive data replication solution that can be used as the basis of several P2P applications. Our data replication solution permits to increase availability and to reduce the communication overhead. The broadcast solutions aim, at providing a communication substrate encapsulating one of the key communication paradigms used by P2P applications: broadcast. Our broadcast solutions typically aim at offering reliability and scalability to some upper layer, be it an end-to-end P2P application or another system-level layer, such as a data replication layer. Our contributions are organized in a protocol stack made of three layers. In each layer, we propose a set of adaptive protocols that address specific constraints imposed by the environment. Each protocol is evaluated through a set of simulations. The adaptiveness aspect of our solutions relies on the fact that they take into account the constraints of the underlying system in a proactive manner. To model these constraints, we define an environment approximation algorithm allowing us to obtain an approximated view about the system or part of it. This approximated view includes the topology and the components reliability expressed in probabilistic terms. To adapt to the underlying system constraints, the proposed broadcast solutions route messages through tree overlays permitting to maximize the broadcast reliability. Here, the broadcast reliability is expressed as a function of the selected paths reliability and of the use of available resources. These resources are modeled in terms of quotas of messages translating the receiving and sending capacities at each node. To allow a deployment in a large-scale system, we take into account the available memory at processes by limiting the view they have to maintain about the system. Using this partial view, we propose three scalable broadcast algorithms, which are based on a propagation overlay that tends to the global tree overlay and adapts to some constraints of the underlying system. At a higher level, this thesis also proposes a data replication solution that is adaptive both in terms of replica placement and in terms of request routing. At the routing level, this solution takes the unreliability of the environment into account, in order to maximize reliable delivery of requests. At the replica placement level, the dynamically changing origin and frequency of read/write requests are analyzed, in order to define a set of replica that minimizes communication cost.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação de Mestrado em Engenharia Informática

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Nowadays, data available and used by companies is growing very fast creating the need to use and manage this data in the most efficient way. To this end, data is replicated overmultiple datacenters and use different replication protocols, according to their needs, like more availability or stronger consistency level. The costs associated with full data replication can be very high, and most of the times, full replication is not needed since information can be logically partitioned. Another problem, is that by using datacenters to store and process information clients become heavily dependent on them. We propose a partial replication protocol called ParTree, which replicates data to clients, and organizes clients in a hierarchy, using communication between them to propagate information. This solution addresses some of these problems, namely by supporting partial data replication and offline execution mode. Given the complexity of the protocol, the use of formal verification is crucial to ensure the protocol two correctness properties: causal consistency and preservation of data. The use of TLA+ language and tools to formally specificity and verify the proposed protocol are also described.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Current data mining engines are difficult to use, requiring optimizations by data mining experts in order to provide optimal results. To solve this problem a new concept was devised, by maintaining the functionality of current data mining tools and adding pervasive characteristics such as invisibility and ubiquity which focus on their users, providing better ease of use and usefulness, by providing autonomous and intelligent data mining processes. This article introduces an architecture to implement a data mining engine, composed by four major components: database; Middleware (control); Middleware (processing); and interface. These components are interlinked but provide independent scaling, allowing for a system that adapts to the user’s needs. A prototype has been developed in order to test the architecture. The results are very promising and showed their functionality and the need for further improvements.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The complete mitochondrial DNA (mtDNA) control region was amplified and directly sequenced in two species of shrew, Crocidura russula and Sorex araneus (Insectivora, Mammalia). The general organization is similar to that found in other mammals: a central conserved region surrounded by two more variable domains. However, we have found in shrews the simultaneous presence of arrays of tandem repeats in potential locations where repeats tend to occur separately in other mammalian species. These locations correspond to regions which are associated with a possible interruption of the replication processes, either at the end of the three-stranded D-loop structure or toward the end of the heavy-strand replication. In the left domain the repeated sequences (R1 repeats) are 78 bp long, whereas in the right domain the repeats are 12 bp long in C. russula and 14 bp long in S. araneus (R2 repeats). Variation in the copy number of these repeated sequences results in mtDNA control region length differences. Southern blot analysis indicates that level of heteroplasmy (more than one mtDNA form within an individual) differs between species. A comparative study of the R2 repeats in 12 additional species representing three shrew subfamilies provides useful indications for the understanding of the origin and the evolution of these homologous tandemly repeated sequences. An asymmetry in the distribution of variants within the arrays, as well as the constant occurrence of shorter repeated sequences flanking only one side of the R2 arrays, could be related to asymmetry in the replication of each strand of the mtDNA molecule. The pattern of sequence and length variation within and between species, together with the capability of the arrays to form stable secondary structures, suggests that the dominant mechanism involved in the evolution of these arrays in unidirectional replication slippage.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis consists of three main theoretical themes: quality of data, success of information systems, and metadata in data warehousing. Loosely defined, metadata is descriptive data about data, and, in this thesis, master data means reference data about customers, products etc. The objective of the thesis is to contribute to an implementation of a metadata management solution for an industrial enterprise. The metadata system incorporates a repository, integration, delivery and access tools, as well as semantic rules and procedures for master data maintenance. It targets to improve maintenance processes and quality of hierarchical master data in the case company’s informational systems. That should bring benefits to whole organization in improved information quality, especially in cross-system data consistency, and in more efficient and effective data management processes. As the result of this thesis, the requirements for the metadata management solution in case were compiled, and the success of the new information system and the implementation project was evaluated.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This article reflects on key methodological issues emerging from children and young people's involvement in data analysis processes. We outline a pragmatic framework illustrating different approaches to engaging children, using two case studies of children's experiences of participating in data analysis. The article highlights methods of engagement and important issues such as the balance of power between adults and children, training, support, ethical considerations, time and resources. We argue that involving children in data analysis processes can have several benefits, including enabling a greater understanding of children's perspectives and helping to prioritise children's agendas in policy and practice. (C) 2007 The Author(s). Journal compilation (C) 2007 National Children's Bureau.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Traceability is a concept that arose from the need for monitoring of production processes, this concept is usually used in sectors related to food production or activities involving some kind of direct risk to people. Agribusiness in the cotton industry does not have a comprehensive infrastructure for all stages of the processes involved in production. Map and define the data to enable traceability of products is synonymous to delegate responsibilities for all involved in the production, the collection of aggregate data on cotton production is done in stages and specific pre-defined since the choice of the variety through the processing, the scope of this article specifically addresses the production of lint cotton. The paper presents a proposal based on service oriented architecture (SOA) for data integration processes in the cotton industry, this proposal provide support for the implementation of platform independent solutions.