972 resultados para Distributed environments
Resumo:
A web service is a collection of industry standards to enable reusability of services and interoperability of heterogeneous applications. The UMLS Knowledge Source (UMLSKS) Server provides remote access to the UMLSKS and related resources. We propose a Web Services Architecture that encapsulates UMLSKS-API and makes it available in distributed and heterogeneous environments. This is the first step towards intelligent and automatic UMLS services discovery and invocation by computer systems in distributed environments such as web.
Resumo:
In computer science, different types of reusable components for building software applications were proposed as a direct consequence of the emergence of new software programming paradigms. The success of these components for building applications depends on factors such as the flexibility in their combination or the facility for their selection in centralised or distributed environments such as internet. In this article, we propose a general type of reusable component, called primitive of representation, inspired by a knowledge-based approach that can promote reusability. The proposal can be understood as a generalisation of existing partial solutions that is applicable to both software and knowledge engineering for the development of hybrid applications that integrate conventional and knowledge based techniques. The article presents the structure and use of the component and describes our recent experience in the development of real-world applications based on this approach.
Resumo:
Los avances en el hardware permiten disponer de grandes volúmenes de datos, surgiendo aplicaciones que deben suministrar información en tiempo cuasi-real, la monitorización de pacientes, ej., el seguimiento sanitario de las conducciones de agua, etc. Las necesidades de estas aplicaciones hacen emerger el modelo de flujo de datos (data streaming) frente al modelo almacenar-para-despuésprocesar (store-then-process). Mientras que en el modelo store-then-process, los datos son almacenados para ser posteriormente consultados; en los sistemas de streaming, los datos son procesados a su llegada al sistema, produciendo respuestas continuas sin llegar a almacenarse. Esta nueva visión impone desafíos para el procesamiento de datos al vuelo: 1) las respuestas deben producirse de manera continua cada vez que nuevos datos llegan al sistema; 2) los datos son accedidos solo una vez y, generalmente, no son almacenados en su totalidad; y 3) el tiempo de procesamiento por dato para producir una respuesta debe ser bajo. Aunque existen dos modelos para el cómputo de respuestas continuas, el modelo evolutivo y el de ventana deslizante; éste segundo se ajusta mejor en ciertas aplicaciones al considerar únicamente los datos recibidos más recientemente, en lugar de todo el histórico de datos. En los últimos años, la minería de datos en streaming se ha centrado en el modelo evolutivo. Mientras que, en el modelo de ventana deslizante, el trabajo presentado es más reducido ya que estos algoritmos no sólo deben de ser incrementales si no que deben borrar la información que caduca por el deslizamiento de la ventana manteniendo los anteriores tres desafíos. Una de las tareas fundamentales en minería de datos es la búsqueda de agrupaciones donde, dado un conjunto de datos, el objetivo es encontrar grupos representativos, de manera que se tenga una descripción sintética del conjunto. Estas agrupaciones son fundamentales en aplicaciones como la detección de intrusos en la red o la segmentación de clientes en el marketing y la publicidad. Debido a las cantidades masivas de datos que deben procesarse en este tipo de aplicaciones (millones de eventos por segundo), las soluciones centralizadas puede ser incapaz de hacer frente a las restricciones de tiempo de procesamiento, por lo que deben recurrir a descartar datos durante los picos de carga. Para evitar esta perdida de datos, se impone el procesamiento distribuido de streams, en concreto, los algoritmos de agrupamiento deben ser adaptados para este tipo de entornos, en los que los datos están distribuidos. En streaming, la investigación no solo se centra en el diseño para tareas generales, como la agrupación, sino también en la búsqueda de nuevos enfoques que se adapten mejor a escenarios particulares. Como ejemplo, un mecanismo de agrupación ad-hoc resulta ser más adecuado para la defensa contra la denegación de servicio distribuida (Distributed Denial of Services, DDoS) que el problema tradicional de k-medias. En esta tesis se pretende contribuir en el problema agrupamiento en streaming tanto en entornos centralizados y distribuidos. Hemos diseñado un algoritmo centralizado de clustering mostrando las capacidades para descubrir agrupaciones de alta calidad en bajo tiempo frente a otras soluciones del estado del arte, en una amplia evaluación. Además, se ha trabajado sobre una estructura que reduce notablemente el espacio de memoria necesario, controlando, en todo momento, el error de los cómputos. Nuestro trabajo también proporciona dos protocolos de distribución del cómputo de agrupaciones. Se han analizado dos características fundamentales: el impacto sobre la calidad del clustering al realizar el cómputo distribuido y las condiciones necesarias para la reducción del tiempo de procesamiento frente a la solución centralizada. Finalmente, hemos desarrollado un entorno para la detección de ataques DDoS basado en agrupaciones. En este último caso, se ha caracterizado el tipo de ataques detectados y se ha desarrollado una evaluación sobre la eficiencia y eficacia de la mitigación del impacto del ataque. ABSTRACT Advances in hardware allow to collect huge volumes of data emerging applications that must provide information in near-real time, e.g., patient monitoring, health monitoring of water pipes, etc. The data streaming model emerges to comply with these applications overcoming the traditional store-then-process model. With the store-then-process model, data is stored before being consulted; while, in streaming, data are processed on the fly producing continuous responses. The challenges of streaming for processing data on the fly are the following: 1) responses must be produced continuously whenever new data arrives in the system; 2) data is accessed only once and is generally not maintained in its entirety, and 3) data processing time to produce a response should be low. Two models exist to compute continuous responses: the evolving model and the sliding window model; the latter fits best with applications must be computed over the most recently data rather than all the previous data. In recent years, research in the context of data stream mining has focused mainly on the evolving model. In the sliding window model, the work presented is smaller since these algorithms must be incremental and they must delete the information which expires when the window slides. Clustering is one of the fundamental techniques of data mining and is used to analyze data sets in order to find representative groups that provide a concise description of the data being processed. Clustering is critical in applications such as network intrusion detection or customer segmentation in marketing and advertising. Due to the huge amount of data that must be processed by such applications (up to millions of events per second), centralized solutions are usually unable to cope with timing restrictions and recur to shedding techniques where data is discarded during load peaks. To avoid discarding of data, processing of streams (such as clustering) must be distributed and adapted to environments where information is distributed. In streaming, research does not only focus on designing for general tasks, such as clustering, but also in finding new approaches that fit bests with particular scenarios. As an example, an ad-hoc grouping mechanism turns out to be more adequate than k-means for defense against Distributed Denial of Service (DDoS). This thesis contributes to the data stream mining clustering technique both for centralized and distributed environments. We present a centralized clustering algorithm showing capabilities to discover clusters of high quality in low time and we provide a comparison with existing state of the art solutions. We have worked on a data structure that significantly reduces memory requirements while controlling the error of the clusters statistics. We also provide two distributed clustering protocols. We focus on the analysis of two key features: the impact on the clustering quality when computation is distributed and the requirements for reducing the processing time compared to the centralized solution. Finally, with respect to ad-hoc grouping techniques, we have developed a DDoS detection framework based on clustering.We have characterized the attacks detected and we have evaluated the efficiency and effectiveness of mitigating the attack impact.
Resumo:
It has been demonstrated that rating trust and reputation of individual nodes is an effective approach in distributed environments in order to improve security, support decision-making and promote node collaboration. Nevertheless, these systems are vulnerable to deliberate false or unfair testimonies. In one scenario, the attackers collude to give negative feedback on the victim in order to lower or destroy its reputation. This attack is known as bad mouthing attack. In another scenario, a number of entities agree to give positive feedback on an entity (often with adversarial intentions). This attack is known as ballot stuffing. Both attack types can significantly deteriorate the performances of the network. The existing solutions for coping with these attacks are mainly concentrated on prevention techniques. In this work, we propose a solution that detects and isolates the abovementioned attackers, impeding them in this way to further spread their malicious activity. The approach is based on detecting outliers using clustering, in this case self-organizing maps. An important advantage of this approach is that we have no restrictions on training data, and thus there is no need for any data pre-processing. Testing results demonstrate the capability of the approach in detecting both bad mouthing and ballot stuffing attack in various scenarios.
Resumo:
El presente Proyecto de Fin de Máster consiste en crear una herramienta software capaz de monitorizar y gestionar la actividad de Hydra, una herramienta de gestión de entornos distribuidos, para que su estrategia de balanceo de carga se adecúe al modelo creado por GloBeM, una metodología de análisis de entornos distribuidos. GloBeM, que es una metodología externa, puede analizar y crear un modelo de máquina de estados finitos a partir de un sistema distribuido concreto. Hydra, una herramienta también externa, es un sistema de gestión de entornos cloud recientemente desarrollado y de código abierto, con un sistema de balanceo de carga efectivo pero algo limitado. El software construido recoge el modelo creado por GloBeM y lo analiza. A partir de ahí, monitoriza en tiempo real y a una frecuencia determinada la actividad de Hydra y el sistema cloud que ésta gestiona, y reconfigura sus parámetros para que su desempeño se ciña a lo estipulado por el modelo de GloBeM, extendiendo así el sistema de balanceo de carga original de Hydra.---ABSTRACT---This Master's Thesis Project involves creating a software able to monitor and manage the activity of Hydra, a tool for managing distributed environments, in order to adjust its load balancing strategy to the model created by GloBeM, an analysis methodology for distributed environments. GloBeM, which is an external methodology, can analyse and create a finite-state machine model from a particular cloud system. Hydra, also an external tool, is an open source management system for cloud environments recently developed, with a relatively limited system of load balancing. The created software gets the model created by GloBeM as an input and analyses it. From there, it monitors in real time and at a certain frequency Hydra’s activity and the cloud system that it manages, and reconfigures its parameters to adjust its performance to the stipulations by the GloBeM’s model, extending Hydra's original load balancing system.
Resumo:
This thesis presents the formal definition of a novel Mobile Cloud Computing (MCC) extension of the Networked Autonomic Machine (NAM) framework, a general-purpose conceptual tool which describes large-scale distributed autonomic systems. The introduction of autonomic policies in the MCC paradigm has proved to be an effective technique to increase the robustness and flexibility of MCC systems. In particular, autonomic policies based on continuous resource and connectivity monitoring help automate context-aware decisions for computation offloading. We have also provided NAM with a formalization in terms of a transformational operational semantics in order to fill the gap between its existing Java implementation NAM4J and its conceptual definition. Moreover, we have extended NAM4J by adding several components with the purpose of managing large scale autonomic distributed environments. In particular, the middleware allows for the implementation of peer-to-peer (P2P) networks of NAM nodes. Moreover, NAM mobility actions have been implemented to enable the migration of code, execution state and data. Within NAM4J, we have designed and developed a component, denoted as context bus, which is particularly useful in collaborative applications in that, if replicated on each peer, it instantiates a virtual shared channel allowing nodes to notify and get notified about context events. Regarding the autonomic policies management, we have provided NAM4J with a rule engine, whose purpose is to allow a system to autonomously determine when offloading is convenient. We have also provided NAM4J with trust and reputation management mechanisms to make the middleware suitable for applications in which such aspects are of great interest. To this purpose, we have designed and implemented a distributed framework, denoted as DARTSense, where no central server is required, as reputation values are stored and updated by participants in a subjective fashion. We have also investigated the literature regarding MCC systems. The analysis pointed out that all MCC models focus on mobile devices, and consider the Cloud as a system with unlimited resources. To contribute in filling this gap, we defined a modeling and simulation framework for the design and analysis of MCC systems, encompassing both their sides. We have also implemented a modular and reusable simulator of the model. We have applied the NAM principles to two different application scenarios. First, we have defined a hybrid P2P/cloud approach where components and protocols are autonomically configured according to specific target goals, such as cost-effectiveness, reliability and availability. Merging P2P and cloud paradigms brings together the advantages of both: high availability, provided by the Cloud presence, and low cost, by exploiting inexpensive peers resources. As an example, we have shown how the proposed approach can be used to design NAM-based collaborative storage systems based on an autonomic policy to decide how to distribute data chunks among peers and Cloud, according to cost minimization and data availability goals. As a second application, we have defined an autonomic architecture for decentralized urban participatory sensing (UPS) which bridges sensor networks and mobile systems to improve effectiveness and efficiency. The developed application allows users to retrieve and publish different types of sensed information by using the features provided by NAM4J's context bus. Trust and reputation is managed through the application of DARTSense mechanisms. Also, the application includes an autonomic policy that detects areas characterized by few contributors, and tries to recruit new providers by migrating code necessary to sensing, through NAM mobility actions.
Resumo:
Cloud computing realizes the long-held dream of converting computing capability into a type of utility. It has the potential to fundamentally change the landscape of the IT industry and our way of life. However, as cloud computing expanding substantially in both scale and scope, ensuring its sustainable growth is a critical problem. Service providers have long been suffering from high operational costs. Especially the costs associated with the skyrocketing power consumption of large data centers. In the meantime, while efficient power/energy utilization is indispensable for the sustainable growth of cloud computing, service providers must also satisfy a user's quality of service (QoS) requirements. This problem becomes even more challenging considering the increasingly stringent power/energy and QoS constraints, as well as other factors such as the highly dynamic, heterogeneous, and distributed nature of the computing infrastructures, etc. ^ In this dissertation, we study the problem of delay-sensitive cloud service scheduling for the sustainable development of cloud computing. We first focus our research on the development of scheduling methods for delay-sensitive cloud services on a single server with the goal of maximizing a service provider's profit. We then extend our study to scheduling cloud services in distributed environments. In particular, we develop a queue-based model and derive efficient request dispatching and processing decisions in a multi-electricity-market environment to improve the profits for service providers. We next study a problem of multi-tier service scheduling. By carefully assigning sub deadlines to the service tiers, our approach can significantly improve resource usage efficiencies with statistically guaranteed QoS. Finally, we study the power conscious resource provision problem for service requests with different QoS requirements. By properly sharing computing resources among different requests, our method statistically guarantees all QoS requirements with a minimized number of powered-on servers and thus the power consumptions. The significance of our research is that it is one part of the integrated effort from both industry and academia to ensure the sustainable growth of cloud computing as it continues to evolve and change our society profoundly.^
Resumo:
Postprint
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Cloud computing realizes the long-held dream of converting computing capability into a type of utility. It has the potential to fundamentally change the landscape of the IT industry and our way of life. However, as cloud computing expanding substantially in both scale and scope, ensuring its sustainable growth is a critical problem. Service providers have long been suffering from high operational costs. Especially the costs associated with the skyrocketing power consumption of large data centers. In the meantime, while efficient power/energy utilization is indispensable for the sustainable growth of cloud computing, service providers must also satisfy a user's quality of service (QoS) requirements. This problem becomes even more challenging considering the increasingly stringent power/energy and QoS constraints, as well as other factors such as the highly dynamic, heterogeneous, and distributed nature of the computing infrastructures, etc. In this dissertation, we study the problem of delay-sensitive cloud service scheduling for the sustainable development of cloud computing. We first focus our research on the development of scheduling methods for delay-sensitive cloud services on a single server with the goal of maximizing a service provider's profit. We then extend our study to scheduling cloud services in distributed environments. In particular, we develop a queue-based model and derive efficient request dispatching and processing decisions in a multi-electricity-market environment to improve the profits for service providers. We next study a problem of multi-tier service scheduling. By carefully assigning sub deadlines to the service tiers, our approach can significantly improve resource usage efficiencies with statistically guaranteed QoS. Finally, we study the power conscious resource provision problem for service requests with different QoS requirements. By properly sharing computing resources among different requests, our method statistically guarantees all QoS requirements with a minimized number of powered-on servers and thus the power consumptions. The significance of our research is that it is one part of the integrated effort from both industry and academia to ensure the sustainable growth of cloud computing as it continues to evolve and change our society profoundly.
Resumo:
Virtual Reality (VR) has grown to become state-of-theart technology in many business- and consumer oriented E-Commerce applications. One of the major design challenges of VR environments is the placement of the rendering process. The rendering process converts the abstract description of a scene as contained in an object database to an image. This process is usually done at the client side like in VRML [1] a technology that requires the client’s computational power for smooth rendering. The vision of VR is also strongly connected to the issue of Quality of Service (QoS) as the perceived realism is subject to an interactive frame rate ranging from 10 to 30 frames-per-second (fps), real-time feedback mechanisms and realistic image quality. These requirements overwhelm traditional home computers or even high sophisticated graphical workstations over their limits. Our work therefore introduces an approach for a distributed rendering architecture that gracefully balances the workload between the client and a clusterbased server. We believe that a distributed rendering approach as described in this paper has three major benefits: It reduces the clients workload, it decreases the network traffic and it allows to re-use already rendered scenes.
Resumo:
Through this study, we will measure how the collective MPI operations behaves in virtual and physical clusters, and its impact on the application performance. As we stated before, we will use as a test case the Weather Research and Forecasting simulations.
Resumo:
Abstract The solvability of the problem of fair exchange in a synchronous system subject to Byzantine failures is investigated in this work. The fair exchange problem arises when a group of processes are required to exchange digital items in a fair manner, which means that either each process obtains the item it was expecting or no process obtains any information on, the inputs of others. After introducing a novel specification of fair exchange that clearly separates safety and liveness, we give an overview of the difficulty of solving such a problem in the context of a fully-connected topology. On one hand, we show that no solution to fair exchange exists in the absence of an identified process that every process can trust a priori; on the other, a well-known solution to fair exchange relying on a trusted third party is recalled. These two results lead us to complete our system model with a flexible representation of the notion of trust. We then show that fair exchange is solvable if and only if a connectivity condition, named the reachable majority condition, is satisfied. The necessity of the condition is proven by an impossibility result and its sufficiency by presenting a general solution to fair exchange relying on a set of trusted processes. The focus is then turned towards a specific network topology in order to provide a fully decentralized, yet realistic, solution to fair exchange. The general solution mentioned above is optimized by reducing the computational load assumed by trusted processes as far as possible. Accordingly, our fair exchange protocol relies on trusted tamperproof modules that have limited communication abilities and are only required in key steps of the algorithm. This modular solution is then implemented in the context of a pedagogical application developed for illustrating and apprehending the complexity of fair exchange. This application, which also includes the implementation of a wide range of Byzantine behaviors, allows executions of the algorithm to be set up and monitored through a graphical display. Surprisingly, some of our results on fair exchange seem contradictory with those found in the literature of secure multiparty computation, a problem from the field of modern cryptography, although the two problems have much in common. Both problems are closely related to the notion of trusted third party, but their approaches and descriptions differ greatly. By introducing a common specification framework, a comparison is proposed in order to clarify their differences and the possible origins of the confusion between them. This leads us to introduce the problem of generalized fair computation, a generalization of fair exchange. Finally, a solution to this new problem is given by generalizing our modular solution to fair exchange
Resumo:
This paper proposes a solution to the problems associated with network latency within distributed virtual environments. It begins by discussing the advantages and disadvantages of synchronous and asynchronous distributed models, in the areas of user and object representation and user-to-user interaction. By introducing a hybrid solution, which utilises the concept of a causal surface, the advantages of both synchronous and asynchronous models are combined. Object distortion is a characteristic feature of the hybrid system, and this is proposed as a solution which facilitates dynamic real-time user collaboration. The final section covers implementation details, with reference to a prototype system available from the Internet.