917 resultados para LHC,CMS,Big Data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Measurements of charged-particle fragmentation functions of jets produced in ultra-relativistic nuclear collisions can provide insight into the modification of parton showers in the hot, dense medium created in the collisions. ATLAS has measured jets in √sNN=2.76 TeV Pb+Pb collisions at the LHC using a data set recorded in 2011 with an integrated luminosity of 0.14 nb−1. Jets were reconstructed using the anti-kt algorithm with distance parameter values R = 0.2, 0.3, and 0.4. Distributions of charged-particle transverse momentum and longitudinal momentum fraction are reported for seven bins in collision centrality for R=0.4 jets with pjetT>100 GeV. Commensurate minimum pT values are used for the other radii. Ratios of fragment distributions in each centrality bin to those measured in the most peripheral bin are presented. These ratios show a reduction of fragment yield in central collisions relative to peripheral collisions at intermediate z values, 0.04≲z≲0.2 and an enhancement in fragment yield for z≲0.04. A smaller, less significant enhancement is observed at large z and large pT in central collisions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel surrogate model-based global optimization framework allowing a large number of function evaluations. The method, called SpLEGO, is based on a multi-scale expected improvement (EI) framework relying on both sparse and local Gaussian process (GP) models. First, a bi-objective approach relying on a global sparse GP model is used to determine potential next sampling regions. Local GP models are then constructed within each selected region. The method subsequently employs the standard expected improvement criterion to deal with the exploration-exploitation trade-off within selected local models, leading to a decision on where to perform the next function evaluation(s). The potential of our approach is demonstrated using the so-called Sparse Pseudo-input GP as a global model. The algorithm is tested on four benchmark problems, whose number of starting points ranges from 102 to 104. Our results show that SpLEGO is effective and capable of solving problems with large number of starting points, and it even provides significant advantages when compared with state-of-the-art EI algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work deals with parallel optimization of expensive objective functions which are modelled as sample realizations of Gaussian processes. The study is formalized as a Bayesian optimization problem, or continuous multi-armed bandit problem, where a batch of q > 0 arms is pulled in parallel at each iteration. Several algorithms have been developed for choosing batches by trading off exploitation and exploration. As of today, the maximum Expected Improvement (EI) and Upper Confidence Bound (UCB) selection rules appear as the most prominent approaches for batch selection. Here, we build upon recent work on the multipoint Expected Improvement criterion, for which an analytic expansion relying on Tallis’ formula was recently established. The computational burden of this selection rule being still an issue in application, we derive a closed-form expression for the gradient of the multipoint Expected Improvement, which aims at facilitating its maximization using gradient-based ascent algorithms. Substantial computational savings are shown in application. In addition, our algorithms are tested numerically and compared to state-of-the-art UCB-based batchsequential algorithms. Combining starting designs relying on UCB with gradient-based EI local optimization finally appears as a sound option for batch design in distributed Gaussian Process optimization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La Internet de las Cosas (IoT), como parte de la Futura Internet, se ha convertido en la actualidad en uno de los principales temas de investigación; en parte gracias a la atención que la sociedad está poniendo en el desarrollo de determinado tipo de servicios (telemetría, generación inteligente de energía, telesanidad, etc.) y por las recientes previsiones económicas que sitúan a algunos actores, como los operadores de telecomunicaciones (que se encuentran desesperadamente buscando nuevas oportunidades), al frente empujando algunas tecnologías interrelacionadas como las comunicaciones Máquina a Máquina (M2M). En este contexto, un importante número de actividades de investigación a nivel mundial se están realizando en distintas facetas: comunicaciones de redes de sensores, procesado de información, almacenamiento de grandes cantidades de datos (big--‐data), semántica, arquitecturas de servicio, etc. Todas ellas, de forma independiente, están llegando a un nivel de madurez que permiten vislumbrar la realización de la Internet de las Cosas más que como un sueño, como una realidad tangible. Sin embargo, los servicios anteriormente mencionados no pueden esperar a desarrollarse hasta que las actividades de investigación obtengan soluciones holísticas completas. Es importante proporcionar resultados intermedios que eviten soluciones verticales realizadas para desarrollos particulares. En este trabajo, nos hemos focalizado en la creación de una plataforma de servicios que pretende facilitar, por una parte la integración de redes de sensores y actuadores heterogéneas y geográficamente distribuidas, y por otra lado el desarrollo de servicios horizontales utilizando dichas redes y la información que proporcionan. Este habilitador se utilizará para el desarrollo de servicios y para la experimentación en la Internet de las Cosas. Previo a la definición de la plataforma, se ha realizado un importante estudio focalizando no sólo trabajos y proyectos de investigación, sino también actividades de estandarización. Los resultados se pueden resumir en las siguientes aseveraciones: a) Los modelos de datos definidos por el grupo “Sensor Web Enablement” (SWE™) del “Open Geospatial Consortium (OGC®)” representan hoy en día la solución más completa para describir las redes de sensores y actuadores así como las observaciones. b) Las interfaces OGC, a pesar de las limitaciones que requieren cambios y extensiones, podrían ser utilizadas como las bases para acceder a sensores y datos. c) Las redes de nueva generación (NGN) ofrecen un buen sustrato que facilita la integración de redes de sensores y el desarrollo de servicios. En consecuencia, una nueva plataforma de Servicios, llamada Ubiquitous Sensor Networks (USN), se ha definido en esta Tesis tratando de contribuir a rellenar los huecos previamente mencionados. Los puntos más destacados de la plataforma USN son: a) Desde un punto de vista arquitectónico, sigue una aproximación de dos niveles (Habilitador y Gateway) similar a otros habilitadores que utilizan las NGN (como el OMA Presence). b) Los modelos de datos están basado en los estándares del OGC SWE. iv c) Está integrado en las NGN pero puede ser utilizado sin ellas utilizando infraestructuras IP abiertas. d) Las principales funciones son: Descubrimiento de sensores, Almacenamiento de observaciones, Publicacion--‐subscripcion--‐notificación, ejecución remota homogénea, seguridad, gestión de diccionarios de datos, facilidades de monitorización, utilidades de conversión de protocolos, interacciones síncronas y asíncronas, soporte para el “streaming” y arbitrado básico de recursos. Para demostrar las funcionalidades que la Plataforma USN propuesta pueden ofrecer a los futuros escenarios de la Internet de las Cosas, se presentan resultados experimentales de tres pruebas de concepto (telemetría, “Smart Places” y monitorización medioambiental) reales a pequeña escala y un estudio sobre semántica (sistema de información vehicular). Además, se está utilizando actualmente como Habilitador para desarrollar tanto experimentación como servicios reales en el proyecto Europeo SmartSantander (que aspira a integrar alrededor de 20.000 dispositivos IoT). v Abstract Internet of Things, as part of the Future Internet, has become one of the main research topics nowadays; in part thanks to the pressure the society is putting on the development of a particular kind of services (Smart metering, Smart Grids, eHealth, etc.), and by the recent business forecasts that situate some players, like Telecom Operators (which are desperately seeking for new opportunities), at the forefront pushing for some interrelated technologies like Machine--‐to--‐Machine (M2M) communications. Under this context, an important number of research activities are currently taking place worldwide at different levels: sensor network communications, information processing, big--‐ data storage, semantics, service level architectures, etc. All of them, isolated, are arriving to a level of maturity that envision the achievement of Internet of Things (IoT) more than a dream, a tangible goal. However, the aforementioned services cannot wait to be developed until the holistic research actions bring complete solutions. It is important to come out with intermediate results that avoid vertical solutions tailored for particular deployments. In the present work, we focus on the creation of a Service--‐level platform intended to facilitate, from one side the integration of heterogeneous and geographically disperse Sensors and Actuator Networks (SANs), and from the other the development of horizontal services using them and the information they provide. This enabler will be used for horizontal service development and for IoT experimentation. Prior to the definition of the platform, we have realized an important study targeting not just research works and projects, but also standardization topics. The results can be summarized in the following assertions: a) Open Geospatial Consortium (OGC®) Sensor Web Enablement (SWE™) data models today represent the most complete solution to describe SANs and observations. b) OGC interfaces, despite the limitations that require changes and extensions, could be used as the bases for accessing sensors and data. c) Next Generation Networks (NGN) offer a good substrate that facilitates the integration of SANs and the development of services. Consequently a new Service Layer platform, called Ubiquitous Sensor Networks (USN), has been defined in this Thesis trying to contribute to fill in the previous gaps. The main highlights of the proposed USN Platform are: a) From an architectural point of view, it follows a two--‐layer approach (Enabler and Gateway) similar to other enablers that run on top of NGN (like the OMA Presence). b) Data models and interfaces are based on the OGC SWE standards. c) It is integrated in NGN but it can be used without it over open IP infrastructures. d) Main functions are: Sensor Discovery, Observation Storage, Publish--‐Subscribe--‐Notify, homogeneous remote execution, security, data dictionaries handling, monitoring facilities, authorization support, protocol conversion utilities, synchronous and asynchronous interactions, streaming support and basic resource arbitration. vi In order to demonstrate the functionalities that the proposed USN Platform can offer to future IoT scenarios, some experimental results have been addressed in three real--‐life small--‐scale proofs--‐of concepts (Smart Metering, Smart Places and Environmental monitoring) and a study for semantics (in--‐vehicle information system). Furthermore we also present the current use of the proposed USN Platform as an Enabler to develop experimentation and real services in the SmartSantander EU project (that aims at integrating around 20.000 IoT devices).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the main challenges facing next generation Cloud platform services is the need to simultaneously achieve ease of programming, consistency, and high scalability. Big Data applications have so far focused on batch processing. The next step for Big Data is to move to the online world. This shift will raise the requirements for transactional guarantees. CumuloNimbo is a new EC-funded project led by Universidad Politécnica de Madrid (UPM) that addresses these issues via a highly scalable multi-tier transactional platform as a service (PaaS) that bridges the gap between OLTP and Big Data applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aiming to address requirements concerning integration of services in the context of ?big data?, this paper presents an innovative approach that (i) ensures a flexible, adaptable and scalable information and computation infrastructure, and (ii) exploits the competences of stakeholders and information workers to meaningfully confront information management issues such as information characterization, classification and interpretation, thus incorporating the underlying collective intelligence. Our approach pays much attention to the issues of usability and ease-of-use, not requiring any particular programming expertise from the end users. We report on a series of technical issues concerning the desired flexibility of the proposed integration framework and we provide related recommendations to developers of such solutions. Evaluation results are also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sensor network deployments have become a primary source of big data about the real world that surrounds us, measuring a wide range of physical properties in real time. With such large amounts of heterogeneous data, a key challenge is to describe and annotate sensor data with high-level metadata, using and extending models, for instance with ontologies. However, to automate this task there is a need for enriching the sensor metadata using the actual observed measurements and extracting useful meta-information from them. This paper proposes a novel approach of characterization and extraction of semantic metadata through the analysis of sensor data raw observations. This approach consists in using approximations to represent the raw sensor measurements, based on distributions of the observation slopes, building a classi?cation scheme to automatically infer sensor metadata like the type of observed property, integrating the semantic analysis results with existing sensor networks metadata.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since the beginning of Internet, Internet Service Providers (ISP) have seen the need of giving to users? traffic different treatments defined by agree- ments between ISP and customers. This procedure, known as Quality of Service Management, has not much changed in the last years (DiffServ and Deep Pack-et Inspection have been the most chosen mechanisms). However, the incremen-tal growth of Internet users and services jointly with the application of recent Ma- chine Learning techniques, open up the possibility of going one step for-ward in the smart management of network traffic. In this paper, we first make a survey of current tools and techniques for QoS Management. Then we intro-duce clustering and classifying Machine Learning techniques for traffic charac-terization and the concept of Quality of Experience. Finally, with all these com-ponents, we present a brand new framework that will manage in a smart way Quality of Service in a telecom Big Data based scenario, both for mobile and fixed communications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, the research community has focused on new classification methods that rely on statistical characteristics of Internet traffic, instead of pre-viously popular port-number-based or payload-based methods, which are under even bigger constrictions. Some research works based on statistical characteristics generated large fea-ture sets of Internet traffic; however, nowadays it?s impossible to handle hun-dreds of features in big data scenarios, only leading to unacceptable processing time and misleading classification results due to redundant and correlative data. As a consequence, a feature selection procedure is essential in the process of Internet traffic characterization. In this paper a survey of feature selection methods is presented: feature selection frameworks are introduced, and differ-ent categories of methods are briefly explained and compared; several proposals on feature selection in Internet traffic characterization are shown; finally, future application of feature selection to a concrete project is proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En la situación actual donde los sistemas TI sanitarios son diversos con modelos que van desde soluciones predominantes, adoptadas y creadas por grandes organizaciones, hasta soluciones a medida desarrolladas por cualquier empresa de la competencia para satisfacer necesidades concretas. Todos estos sistemas se encuentran bajo similares presiones financieras, no sólo de las condiciones económicas mundiales actuales y el aumento de los costes sanitarios, sino también bajo las presiones de una población que ha adoptado los avances tecnológicos actuales, y demanda una atención sanitaria más personalizable a la altura de esos avances tecnológicos que disfruta en otros ámbitos. El objeto es desarrollar un modelo de negocio orientado al soporte del intercambio de información en el ámbito clínico. El objetivo de este modelo de negocio es aumentar la competitividad dentro de este sector sin la necesidad de recurrir a expertos en estándares, proporcionando perfiles técnicos cualificados menos costosos con la ayuda de herramientas que simplifiquen el uso de los estándares de interoperabilidad. Se hará uso de especificaciones abiertas ya existentes como FHIR, que publica documentación y tutoriales bajo licencias abiertas. La principal ventaja que nos encontramos es que ésta especificación presenta un giro en la concepción actual de la disposición de información clínica, vista hasta ahora como especial por el requerimiento de estándares más complejos que solucionen cualquier caso por específico que sea. Ésta especificación permite hacer uso de la información clínica a través de tecnologías web actuales (HTTP, HTML, OAuth2, JSON, XML) que todo el mundo puede usar sin un entrenamiento particular para crear y consumir esta información. Partiendo por tanto de un mercado con una integración de la información casi inexistente, comparada con otros entornos actuales, hará que el gasto en integración clínica aumente dramáticamente, dejando atrás los desafíos técnicos cuyo gasto retrocederá a un segundo plano. El gasto se centrará en las expectativas de lo que se puede obtener en la tendencia actual de la personalización de los datos clínicos de los pacientes, con acceso a los registros de instituciones junto con datos ‘sociales/móviles/big data’.---ABSTRACT---In the current situation IT health systems are diverse, with models varying from predominant solutions adopted and created by large organizations, to ad-hoc solutions developed by any company to meet specific needs. However, all these systems are under similar financial pressures, not only from current global economic conditions and increased health care costs, but also under pressure from a population that has embraced the current technological advances, and demand a more personalized health care, up to those enjoyed by technological advances in other areas. The purpose of this thesis is to develop a business model aimed at the provision of information exchange within the clinical domain. It is intended to increase competitiveness in the health IT sector without the need for experts in standards, providing qualified technical profiles less expensively with the help of tools that simplify the use of interoperability standards. Open specifications, like FHIR, will be used in order to enable interoperability between systems. The main advantage found within FHIR is that introduces a shift in the current conception of available clinical information. So far seen, the clinical information domain IT systems, as a special requirement for more complex standards that address any specific case. This specification allows the use of clinical information through existing web technologies (HTTP, HTML, OAuth2, JSON and XML), which everyone can use with no particular training to create and consume this information. The current situation in the sector is that the integration of information is almost nonexistent, compared to current trends. Spending in IT health systems will increase dramatically within clinical integration for the next years, leaving the technical challenges whose costs will recede into the background. The investment on this area will focus on the expectations of what can be obtained in the current trend of personalization of clinical data of patients with access to records of institutions with ‘social /mobile /big data’.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Urban economic activities are an essential facet in defining city identity. Traditional approaches rely very often on the most theoretical and quantitative features of the studies, excluding de-facto a direct association between those findings and the tangible subject of the analysis. To fill the gap, the Big Data era and information visualization methodologies could help analysts, stakeholders and general audience to gain a new insight on the field. In this paper, we want to provide some food for thought about new opportunities arising in visual urban economies as well as present some visual results on possible scenarios.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recientemente, el paradigma de la computación en la nube ha recibido mucho interés por parte tanto de la industria como del mundo académico. Las infraestructuras cloud públicas están posibilitando nuevos modelos de negocio y ayudando a reducir costes. Sin embargo, una compañía podría desear ubicar sus datos y servicios en sus propias instalaciones, o tener que atenerse a leyes de protección de datos. Estas circunstancias hacen a las infraestructuras cloud privadas ciertamente deseables, ya sea para complementar a las públicas o para sustituirlas por completo. Por desgracia, las carencias en materia de estándares han impedido que las soluciones para la gestión de infraestructuras privadas se hayan desarrollado adecuadamente. Además, la multitud de opciones disponibles ha creado en los clientes el miedo a depender de una tecnología concreta (technology lock-in). Una de las causas de este problema es la falta de alineación entre la investigación académica y los productos comerciales, ya que aquella está centrada en el estudio de escenarios idealizados sin correspondencia con el mundo real, mientras que éstos consisten en soluciones desarrolladas sin tener en cuenta cómo van a encajar con los estándares más comunes o sin preocuparse de hacer públicos sus resultados. Con objeto de resolver este problema, propongo un sistema de gestión modular para infraestructuras cloud privadas enfocado en tratar con las aplicaciones en lugar de centrarse únicamente en los recursos hardware. Este sistema de gestión sigue el paradigma de la computación autónoma y está diseñado en torno a un modelo de información sencillo, desarrollado para ser compatible con los estándares más comunes. Este modelo divide el entorno en dos vistas, que sirven para separar aquello que debe preocupar a cada actor involucrado del resto de información, pero al mismo tiempo permitiendo relacionar el entorno físico con las máquinas virtuales que se despliegan encima de él. En dicho modelo, las aplicaciones cloud están divididas en tres tipos genéricos (Servicios, Trabajos de Big Data y Reservas de Instancias), para que así el sistema de gestión pueda sacar partido de las características propias de cada tipo. El modelo de información está complementado por un conjunto de acciones de gestión atómicas, reversibles e independientes, que determinan las operaciones que se pueden llevar a cabo sobre el entorno y que es usado para hacer posible la escalabilidad en el entorno. También describo un motor de gestión encargado de, a partir del estado del entorno y usando el ya mencionado conjunto de acciones, la colocación de recursos. Está dividido en dos niveles: la capa de Gestores de Aplicación, encargada de tratar sólo con las aplicaciones; y la capa del Gestor de Infraestructura, responsable de los recursos físicos. Dicho motor de gestión obedece un ciclo de vida con dos fases, para así modelar mejor el comportamiento de una infraestructura real. El problema de la colocación de recursos es atacado durante una de las fases (la de consolidación) por un resolutor de programación entera, y durante la otra (la online) por un heurístico hecho ex-profeso. Varias pruebas han demostrado que este acercamiento combinado es superior a otras estrategias. Para terminar, el sistema de gestión está acoplado a arquitecturas de monitorización y de actuadores. Aquella estando encargada de recolectar información del entorno, y ésta siendo modular en su diseño y capaz de conectarse con varias tecnologías y ofrecer varios modos de acceso. ABSTRACT The cloud computing paradigm has raised in popularity within the industry and the academia. Public cloud infrastructures are enabling new business models and helping to reduce costs. However, the desire to host company’s data and services on premises, and the need to abide to data protection laws, make private cloud infrastructures desirable, either to complement or even fully substitute public oferings. Unfortunately, a lack of standardization has precluded private infrastructure management solutions to be developed to a certain level, and a myriad of diferent options have induced the fear of lock-in in customers. One of the causes of this problem is the misalignment between academic research and industry ofering, with the former focusing in studying idealized scenarios dissimilar from real-world situations, and the latter developing solutions without taking care about how they f t with common standards, or even not disseminating their results. With the aim to solve this problem I propose a modular management system for private cloud infrastructures that is focused on the applications instead of just the hardware resources. This management system follows the autonomic system paradigm, and is designed around a simple information model developed to be compatible with common standards. This model splits the environment in two views that serve to separate the concerns of the stakeholders while at the same time enabling the traceability between the physical environment and the virtual machines deployed onto it. In it, cloud applications are classifed in three broad types (Services, Big Data Jobs and Instance Reservations), in order for the management system to take advantage of each type’s features. The information model is paired with a set of atomic, reversible and independent management actions which determine the operations that can be performed over the environment and is used to realize the cloud environment’s scalability. From the environment’s state and using the aforementioned set of actions, I also describe a management engine tasked with the resource placement. It is divided in two tiers: the Application Managers layer, concerned just with applications; and the Infrastructure Manager layer, responsible of the actual physical resources. This management engine follows a lifecycle with two phases, to better model the behavior of a real infrastructure. The placement problem is tackled during one phase (consolidation) by using an integer programming solver, and during the other (online) with a custom heuristic. Tests have demonstrated that this combined approach is superior to other strategies. Finally, the management system is paired with monitoring and actuators architectures. The former able to collect the necessary information from the environment, and the later modular in design and capable of interfacing with several technologies and ofering several access interfaces.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El presente Trabajo de Fin de Grado se enmarca dentro de un sistema de control y desarrollo de sistemas inteligentes de transporte (ITS). Este Trabajo consta de varias líneas de desarrollo, que se engloban dentro de dicho marco y surgen de la necesidad de aumentar la seguridad, flujo, estructura y mantenimiento de las carreteras incorporando las tecnologías más recientes. En primer lugar, el presente Trabajo se centra en el desarrollo de un nuevo sistema de procesamiento de datos de tráfico en tiempo real que aprovecha las tecnologías de Big Data, Cloud Computing y Map-Reduce que han surgido estos últimos años. Para ello se realiza un estudio previo de los datos de tráfico vial que originan los vehículos que viajan por carreteras. Centrándose en el sistema empleado por la Dirección General de Tráfico de España y comparándolos con el de las Empresas basadas en servicios de localización (LBS). Se expone el modelo Hadoop utilizado así como el proceso Map-Reduce implementado en este sistema analizador. Por último los datos de salida son preparados y enviados a un módulo web básico que actúa como Sistema de Información Geográfica (GIS).---ABSTRACT---This Final Degree Project is part of a control system and development of intelligent transport systems (ITS). This work is part of a several lines of development, which are included within this framework and arise from the need to increase security, flow, structure and maintenance of roads incorporating the latest technologies. First, this paper focuses on the development of a new data processing system of real-time traffic that takes advantage of Big Data, Cloud Computing and Map-Reduce technologies emerged in our recent years. It is made a preliminary study of road traffic data originated by vehicles traveling by road. Focusing on the system used by the Dirección General de Tráfico of Spain and compared with that of the companies offering location based services (LBS). It is exposed the used Hadoop model and the Map-Reduce process implemented on this analyzer system. Finally, the output data is prepared and sent to a basic web module that acts as Geographic Information System (GIS).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El avance tecnológico de los últimos años ha aumentado la necesidad de guardar enormes cantidades de datos de forma masiva, llegando a una situación de desorden en el proceso de almacenamiento de datos, a su desactualización y a complicar su análisis. Esta situación causó un gran interés para las organizaciones en la búsqueda de un enfoque para obtener información relevante de estos grandes almacenes de datos. Surge así lo que se define como inteligencia de negocio, un conjunto de herramientas, procedimientos y estrategias para llevar a cabo la “extracción de conocimiento”, término con el que se refiere comúnmente a la extracción de información útil para la propia organización. Concretamente en este proyecto, se ha utilizado el enfoque Knowledge Discovery in Databases (KDD), que permite lograr la identificación de patrones y un manejo eficiente de las anomalías que puedan aparecer en una red de comunicaciones. Este enfoque comprende desde la selección de los datos primarios hasta su análisis final para la determinación de patrones. El núcleo de todo el enfoque KDD es la minería de datos, que contiene la tecnología necesaria para la identificación de los patrones mencionados y la extracción de conocimiento. Para ello, se utilizará la herramienta RapidMiner en su versión libre y gratuita, debido a que es más completa y de manejo más sencillo que otras herramientas como KNIME o WEKA. La gestión de una red engloba todo el proceso de despliegue y mantenimiento. Es en este procedimiento donde se recogen y monitorizan todas las anomalías ocasionadas en la red, las cuales pueden almacenarse en un repositorio. El objetivo de este proyecto es realizar un planteamiento teórico y varios experimentos que permitan identificar patrones en registros de anomalías de red. Se ha estudiado el repositorio de MAWI Lab, en el que se han almacenado anomalías diarias. Se trata de buscar indicios característicos anuales detectando patrones. Los diferentes experimentos y procedimientos de este estudio pretenden demostrar la utilidad de la inteligencia de negocio a la hora de extraer información a partir de un almacén de datos masivo, para su posterior análisis o futuros estudios. ABSTRACT. The technological progresses in the recent years required to store a big amount of information in repositories. This information is often in disorder, outdated and needs a complex analysis. This situation has caused a relevant interest in investigating methodologies to obtain important information from these huge data stores. Business intelligence was born as a set of tools, procedures and strategies to implement the "knowledge extraction". Specifically in this project, Knowledge Discovery in Databases (KDD) approach has been used. KDD is one of the most important processes of business intelligence to achieve the identification of patterns and the efficient management of the anomalies in a communications network. This approach includes all necessary stages from the selection of the raw data until the analysis to determine the patterns. The core process of the whole KDD approach is the Data Mining process, which analyzes the information needed to identify the patterns and to extract the knowledge. In this project we use the RapidMiner tool to carry out the Data Mining process, because this tool has more features and is easier to use than other tools like WEKA or KNIME. Network management includes the deployment, supervision and maintenance tasks. Network management process is where all anomalies are collected, monitored, and can be stored in a repository. The goal of this project is to construct a theoretical approach, to implement a prototype and to carry out several experiments that allow identifying patterns in some anomalies records. MAWI Lab repository has been selected to be studied, which contains daily anomalies. The different experiments show the utility of the business intelligence to extract information from big data warehouse.