31 resultados para Big Data Analytics

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

To date, big data applications have focused on the store-and-process paradigm. In this paper we describe an initiative to deal with big data applications for continuous streams of events. In many emerging applications, the volume of data being streamed is so large that the traditional ‘store-then-process’ paradigm is either not suitable or too inefficient. Moreover, soft-real time requirements might severely limit the engineering solutions. Many scenarios fit this description. In network security for cloud data centres, for instance, very high volumes of IP packets and events from sensors at firewalls, network switches and routers and servers need to be analyzed and should detect attacks in minimal time, in order to limit the effect of the malicious activity over the IT infrastructure. Similarly, in the fraud department of a credit card company, payment requests should be processed online and need to be processed as quickly as possible in order to provide meaningful results in real-time. An ideal system would detect fraud during the authorization process that lasts hundreds of milliseconds and deny the payment authorization, minimizing the damage to the user and the credit card company.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Desde el inicio de los tiempos el ser humano ha tenido la necesidad de comprender y analizar todo lo que nos rodea, para ello se ha valido de diferentes herramientas como las pinturas rupestres, la biblioteca de Alejandría, bastas colecciones de libros y actualmente una enorme cantidad de información informatizada. Todo esto siempre se ha almacenado, según la tecnología de la época lo permitía, con la esperanza de que fuera útil mediante su consulta y análisis. En la actualidad continúa ocurriendo lo mismo. Hasta hace unos años se ha realizado el análisis de información manualmente o mediante bases de datos relacionales. Ahora ha llegado el momento de una nueva tecnología, Big Data, con la cual se puede realizar el análisis de extensas cantidades de datos de todo tipo en tiempos relativamente pequeños. A lo largo de este libro, se estudiarán las características y ventajas de Big Data, además de realizar un estudio de la plataforma Hadoop. Esta es una plataforma basada en Java y puede realizar el análisis de grandes cantidades de datos de diferentes formatos y procedencias. Durante la lectura de estas páginas se irá dotando al lector de los conocimientos previos necesarios para su mejor comprensión, así como de ubicarle temporalmente en el desarrollo de este concepto, de su uso, las previsiones y la evolución y desarrollo que se prevé tenga en los próximos años. ABSTRACT. Since the beginning of time, human being was in need of understanding and analyzing everything around him. In order to do that, he used different media as cave paintings, Alexandria library, big amount of book collections and nowadays massive amount of computerized information. All this information was stored, depending on the age and technology capability, with the expectation of being useful though it consulting and analysis. Nowadays they keep doing the same. In the last years, they have been processing the information manually or using relational databases. Now it is time for a new technology, Big Data, which is able to analyze huge amount of data in a, relatively, small time. Along this book, characteristics and advantages of Big Data will be detailed, so as an introduction to Hadoop platform. This platform is based on Java and can perform the analysis of massive amount of data in different formats and coming from different sources. During this reading, the reader will be provided with the prior knowledge needed to it understanding, so as the temporal location, uses, forecast, evolution and growth in the next years.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En estos tiempos toma un papel fundamental poder analizar toda la información que circula por la red de una manera rápida y sencilla para poder obtener un gran valor de ella. La denominada Big Data es cada día más importante para las empresas y es por ello por lo que en este trabajo se va a estudiar una solución novedosa para su manejo. Apache Spark es una herramienta creada para el manejo de esas cantidades de información y a lo largo de este trabajo se van a mostrar sus puntos fuertes, así como diferentes casos de uso donde aporta una gran ventaja sobre sus alternativas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract: Context aware applications, which can adapt their behaviors to changing environments, are attracting more and more attention. To simplify the complexity of developing applications, context aware middleware, which introduces context awareness into the traditional middleware, is highlighted to provide a homogeneous interface involving generic context management solutions. This paper provides a survey of state-of-the-art context aware middleware architectures proposed during the period from 2009 through 2015. First, a preliminary background, such as the principles of context, context awareness, context modelling, and context reasoning, is provided for a comprehensive understanding of context aware middleware. On this basis, an overview of eleven carefully selected middleware architectures is presented and their main features explained. Then, thorough comparisons and analysis of the presented middleware architectures are performed based on technical parameters including architectural style, context abstraction, context reasoning, scalability, fault tolerance, interoperability, service discovery, storage, security & privacy, context awareness level, and cloud-based big data analytics. The analysis shows that there is actually no context aware middleware architecture that complies with all requirements. Finally, challenges are pointed out as open issues for future work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since the beginning of Internet, Internet Service Providers (ISP) have seen the need of giving to users? traffic different treatments defined by agree- ments between ISP and customers. This procedure, known as Quality of Service Management, has not much changed in the last years (DiffServ and Deep Pack-et Inspection have been the most chosen mechanisms). However, the incremen-tal growth of Internet users and services jointly with the application of recent Ma- chine Learning techniques, open up the possibility of going one step for-ward in the smart management of network traffic. In this paper, we first make a survey of current tools and techniques for QoS Management. Then we intro-duce clustering and classifying Machine Learning techniques for traffic charac-terization and the concept of Quality of Experience. Finally, with all these com-ponents, we present a brand new framework that will manage in a smart way Quality of Service in a telecom Big Data based scenario, both for mobile and fixed communications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sensor networks are increasingly becoming one of the main sources of Big Data on the Web. However, the observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse these data for other purposes than those for which they were originally set up. In this thesis we address these challenges, considering how we can transform streaming raw data to rich ontology-based information that is accessible through continuous queries for streaming data. Our main contribution is an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. We introduce novel query rewriting and data translation techniques that rely on mapping definitions relating streaming data models to ontological concepts. Specific contributions include: • The syntax and semantics of the SPARQLStream query language for ontologybased data access, and a query rewriting approach for transforming SPARQLStream queries into streaming algebra expressions. • The design of an ontology-based streaming data access engine that can internally reuse an existing data stream engine, complex event processor or sensor middleware, using R2RML mappings for defining relationships between streaming data models and ontology concepts. Concerning the sensor metadata of such streaming data sources, we have investigated how we can use raw measurements to characterize streaming data, producing enriched data descriptions in terms of ontological models. Our specific contributions are: • A representation of sensor data time series that captures gradient information that is useful to characterize types of sensor data. • A method for classifying sensor data time series and determining the type of data, using data mining techniques, and a method for extracting semantic sensor metadata features from the time series.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The electrical power distribution and commercialization scenario is evolving worldwide, and electricity companies, faced with the challenge of new information requirements, are demanding IT solutions to deal with the smart monitoring of power networks. Two main challenges arise from data management and smart monitoring of power networks: real-time data acquisition and big data processing over short time periods. We present a solution in the form of a system architecture that conveys real time issues and has the capacity for big data management.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Web of Data currently comprises ? 62 billion triples from more than 2,000 different datasets covering many fields of knowledge3. This volume of structured Linked Data can be seen as a particular case of Big Data, referred to as Big Semantic Data [4]. Obviously, powerful computational configurations are tradi- tionally required to deal with the scalability problems arising to Big Semantic Data. It is not surprising that this ?data revolution? has competed in parallel with the growth of mobile computing. Smartphones and tablets are massively used at the expense of traditional computers but, to date, mobile devices have more limited computation resources. Therefore, one question that we may ask ourselves would be: can (potentially large) semantic datasets be consumed natively on mobile devices? Currently, only a few mobile apps (e.g., [1, 9, 2, 8]) make use of semantic data that they store in the mobile devices, while many others access existing SPARQL endpoints or Linked Data directly. Two main reasons can be considered for this fact. On the one hand, in spite of some initial approaches [6, 3], there are no well-established triplestores for mobile devices. This is an important limitation because any po- tential app must assume both RDF storage and SPARQL resolution. On the other hand, the particular features of these devices (little storage space, less computational power or more limited bandwidths) limit the adoption of seman- tic data for different uses and purposes. This paper introduces our HDTourist mobile application prototype. It con- sumes urban data from DBpedia4 to help tourists visiting a foreign city. Although it is a simple app, its functionality allows illustrating how semantic data can be stored and queried with limited resources. Our prototype is implemented for An- droid, but its foundations, explained in Section 2, can be deployed in any other platform. The app is described in Section 3, and Section 4 concludes about our current achievements and devises the future work.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Over the last few years, the Data Center market has increased exponentially and this tendency continues today. As a direct consequence of this trend, the industry is pushing the development and implementation of different new technologies that would improve the energy consumption efficiency of data centers. An adaptive dashboard would allow the user to monitor the most important parameters of a data center in real time. For that reason, monitoring companies work with IoT big data filtering tools and cloud computing systems to handle the amounts of data obtained from the sensors placed in a data center.Analyzing the market trends in this field we can affirm that the study of predictive algorithms has become an essential area for competitive IT companies. Complex algorithms are used to forecast risk situations based on historical data and warn the user in case of danger. Considering that several different users will interact with this dashboard from IT experts or maintenance staff to accounting managers, it is vital to personalize it automatically. Following that line of though, the dashboard should only show relevant metrics to the user in different formats like overlapped maps or representative graphs among others. These maps will show all the information needed in a visual and easy-to-evaluate way. To sum up, this dashboard will allow the user to visualize and control a wide range of variables. Monitoring essential factors such as average temperature, gradients or hotspots as well as energy and power consumption and savings by rack or building would allow the client to understand how his equipment is behaving, helping him to optimize the energy consumption and efficiency of the racks. It also would help him to prevent possible damages in the equipment with predictive high-tech algorithms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La Internet de las Cosas (IoT), como parte de la Futura Internet, se ha convertido en la actualidad en uno de los principales temas de investigación; en parte gracias a la atención que la sociedad está poniendo en el desarrollo de determinado tipo de servicios (telemetría, generación inteligente de energía, telesanidad, etc.) y por las recientes previsiones económicas que sitúan a algunos actores, como los operadores de telecomunicaciones (que se encuentran desesperadamente buscando nuevas oportunidades), al frente empujando algunas tecnologías interrelacionadas como las comunicaciones Máquina a Máquina (M2M). En este contexto, un importante número de actividades de investigación a nivel mundial se están realizando en distintas facetas: comunicaciones de redes de sensores, procesado de información, almacenamiento de grandes cantidades de datos (big--‐data), semántica, arquitecturas de servicio, etc. Todas ellas, de forma independiente, están llegando a un nivel de madurez que permiten vislumbrar la realización de la Internet de las Cosas más que como un sueño, como una realidad tangible. Sin embargo, los servicios anteriormente mencionados no pueden esperar a desarrollarse hasta que las actividades de investigación obtengan soluciones holísticas completas. Es importante proporcionar resultados intermedios que eviten soluciones verticales realizadas para desarrollos particulares. En este trabajo, nos hemos focalizado en la creación de una plataforma de servicios que pretende facilitar, por una parte la integración de redes de sensores y actuadores heterogéneas y geográficamente distribuidas, y por otra lado el desarrollo de servicios horizontales utilizando dichas redes y la información que proporcionan. Este habilitador se utilizará para el desarrollo de servicios y para la experimentación en la Internet de las Cosas. Previo a la definición de la plataforma, se ha realizado un importante estudio focalizando no sólo trabajos y proyectos de investigación, sino también actividades de estandarización. Los resultados se pueden resumir en las siguientes aseveraciones: a) Los modelos de datos definidos por el grupo “Sensor Web Enablement” (SWE™) del “Open Geospatial Consortium (OGC®)” representan hoy en día la solución más completa para describir las redes de sensores y actuadores así como las observaciones. b) Las interfaces OGC, a pesar de las limitaciones que requieren cambios y extensiones, podrían ser utilizadas como las bases para acceder a sensores y datos. c) Las redes de nueva generación (NGN) ofrecen un buen sustrato que facilita la integración de redes de sensores y el desarrollo de servicios. En consecuencia, una nueva plataforma de Servicios, llamada Ubiquitous Sensor Networks (USN), se ha definido en esta Tesis tratando de contribuir a rellenar los huecos previamente mencionados. Los puntos más destacados de la plataforma USN son: a) Desde un punto de vista arquitectónico, sigue una aproximación de dos niveles (Habilitador y Gateway) similar a otros habilitadores que utilizan las NGN (como el OMA Presence). b) Los modelos de datos están basado en los estándares del OGC SWE. iv c) Está integrado en las NGN pero puede ser utilizado sin ellas utilizando infraestructuras IP abiertas. d) Las principales funciones son: Descubrimiento de sensores, Almacenamiento de observaciones, Publicacion--‐subscripcion--‐notificación, ejecución remota homogénea, seguridad, gestión de diccionarios de datos, facilidades de monitorización, utilidades de conversión de protocolos, interacciones síncronas y asíncronas, soporte para el “streaming” y arbitrado básico de recursos. Para demostrar las funcionalidades que la Plataforma USN propuesta pueden ofrecer a los futuros escenarios de la Internet de las Cosas, se presentan resultados experimentales de tres pruebas de concepto (telemetría, “Smart Places” y monitorización medioambiental) reales a pequeña escala y un estudio sobre semántica (sistema de información vehicular). Además, se está utilizando actualmente como Habilitador para desarrollar tanto experimentación como servicios reales en el proyecto Europeo SmartSantander (que aspira a integrar alrededor de 20.000 dispositivos IoT). v Abstract Internet of Things, as part of the Future Internet, has become one of the main research topics nowadays; in part thanks to the pressure the society is putting on the development of a particular kind of services (Smart metering, Smart Grids, eHealth, etc.), and by the recent business forecasts that situate some players, like Telecom Operators (which are desperately seeking for new opportunities), at the forefront pushing for some interrelated technologies like Machine--‐to--‐Machine (M2M) communications. Under this context, an important number of research activities are currently taking place worldwide at different levels: sensor network communications, information processing, big--‐ data storage, semantics, service level architectures, etc. All of them, isolated, are arriving to a level of maturity that envision the achievement of Internet of Things (IoT) more than a dream, a tangible goal. However, the aforementioned services cannot wait to be developed until the holistic research actions bring complete solutions. It is important to come out with intermediate results that avoid vertical solutions tailored for particular deployments. In the present work, we focus on the creation of a Service--‐level platform intended to facilitate, from one side the integration of heterogeneous and geographically disperse Sensors and Actuator Networks (SANs), and from the other the development of horizontal services using them and the information they provide. This enabler will be used for horizontal service development and for IoT experimentation. Prior to the definition of the platform, we have realized an important study targeting not just research works and projects, but also standardization topics. The results can be summarized in the following assertions: a) Open Geospatial Consortium (OGC®) Sensor Web Enablement (SWE™) data models today represent the most complete solution to describe SANs and observations. b) OGC interfaces, despite the limitations that require changes and extensions, could be used as the bases for accessing sensors and data. c) Next Generation Networks (NGN) offer a good substrate that facilitates the integration of SANs and the development of services. Consequently a new Service Layer platform, called Ubiquitous Sensor Networks (USN), has been defined in this Thesis trying to contribute to fill in the previous gaps. The main highlights of the proposed USN Platform are: a) From an architectural point of view, it follows a two--‐layer approach (Enabler and Gateway) similar to other enablers that run on top of NGN (like the OMA Presence). b) Data models and interfaces are based on the OGC SWE standards. c) It is integrated in NGN but it can be used without it over open IP infrastructures. d) Main functions are: Sensor Discovery, Observation Storage, Publish--‐Subscribe--‐Notify, homogeneous remote execution, security, data dictionaries handling, monitoring facilities, authorization support, protocol conversion utilities, synchronous and asynchronous interactions, streaming support and basic resource arbitration. vi In order to demonstrate the functionalities that the proposed USN Platform can offer to future IoT scenarios, some experimental results have been addressed in three real--‐life small--‐scale proofs--‐of concepts (Smart Metering, Smart Places and Environmental monitoring) and a study for semantics (in--‐vehicle information system). Furthermore we also present the current use of the proposed USN Platform as an Enabler to develop experimentation and real services in the SmartSantander EU project (that aims at integrating around 20.000 IoT devices).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the main challenges facing next generation Cloud platform services is the need to simultaneously achieve ease of programming, consistency, and high scalability. Big Data applications have so far focused on batch processing. The next step for Big Data is to move to the online world. This shift will raise the requirements for transactional guarantees. CumuloNimbo is a new EC-funded project led by Universidad Politécnica de Madrid (UPM) that addresses these issues via a highly scalable multi-tier transactional platform as a service (PaaS) that bridges the gap between OLTP and Big Data applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aiming to address requirements concerning integration of services in the context of ?big data?, this paper presents an innovative approach that (i) ensures a flexible, adaptable and scalable information and computation infrastructure, and (ii) exploits the competences of stakeholders and information workers to meaningfully confront information management issues such as information characterization, classification and interpretation, thus incorporating the underlying collective intelligence. Our approach pays much attention to the issues of usability and ease-of-use, not requiring any particular programming expertise from the end users. We report on a series of technical issues concerning the desired flexibility of the proposed integration framework and we provide related recommendations to developers of such solutions. Evaluation results are also discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sensor network deployments have become a primary source of big data about the real world that surrounds us, measuring a wide range of physical properties in real time. With such large amounts of heterogeneous data, a key challenge is to describe and annotate sensor data with high-level metadata, using and extending models, for instance with ontologies. However, to automate this task there is a need for enriching the sensor metadata using the actual observed measurements and extracting useful meta-information from them. This paper proposes a novel approach of characterization and extraction of semantic metadata through the analysis of sensor data raw observations. This approach consists in using approximations to represent the raw sensor measurements, based on distributions of the observation slopes, building a classi?cation scheme to automatically infer sensor metadata like the type of observed property, integrating the semantic analysis results with existing sensor networks metadata.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last decade, the research community has focused on new classification methods that rely on statistical characteristics of Internet traffic, instead of pre-viously popular port-number-based or payload-based methods, which are under even bigger constrictions. Some research works based on statistical characteristics generated large fea-ture sets of Internet traffic; however, nowadays it?s impossible to handle hun-dreds of features in big data scenarios, only leading to unacceptable processing time and misleading classification results due to redundant and correlative data. As a consequence, a feature selection procedure is essential in the process of Internet traffic characterization. In this paper a survey of feature selection methods is presented: feature selection frameworks are introduced, and differ-ent categories of methods are briefly explained and compared; several proposals on feature selection in Internet traffic characterization are shown; finally, future application of feature selection to a concrete project is proposed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En la situación actual donde los sistemas TI sanitarios son diversos con modelos que van desde soluciones predominantes, adoptadas y creadas por grandes organizaciones, hasta soluciones a medida desarrolladas por cualquier empresa de la competencia para satisfacer necesidades concretas. Todos estos sistemas se encuentran bajo similares presiones financieras, no sólo de las condiciones económicas mundiales actuales y el aumento de los costes sanitarios, sino también bajo las presiones de una población que ha adoptado los avances tecnológicos actuales, y demanda una atención sanitaria más personalizable a la altura de esos avances tecnológicos que disfruta en otros ámbitos. El objeto es desarrollar un modelo de negocio orientado al soporte del intercambio de información en el ámbito clínico. El objetivo de este modelo de negocio es aumentar la competitividad dentro de este sector sin la necesidad de recurrir a expertos en estándares, proporcionando perfiles técnicos cualificados menos costosos con la ayuda de herramientas que simplifiquen el uso de los estándares de interoperabilidad. Se hará uso de especificaciones abiertas ya existentes como FHIR, que publica documentación y tutoriales bajo licencias abiertas. La principal ventaja que nos encontramos es que ésta especificación presenta un giro en la concepción actual de la disposición de información clínica, vista hasta ahora como especial por el requerimiento de estándares más complejos que solucionen cualquier caso por específico que sea. Ésta especificación permite hacer uso de la información clínica a través de tecnologías web actuales (HTTP, HTML, OAuth2, JSON, XML) que todo el mundo puede usar sin un entrenamiento particular para crear y consumir esta información. Partiendo por tanto de un mercado con una integración de la información casi inexistente, comparada con otros entornos actuales, hará que el gasto en integración clínica aumente dramáticamente, dejando atrás los desafíos técnicos cuyo gasto retrocederá a un segundo plano. El gasto se centrará en las expectativas de lo que se puede obtener en la tendencia actual de la personalización de los datos clínicos de los pacientes, con acceso a los registros de instituciones junto con datos ‘sociales/móviles/big data’.---ABSTRACT---In the current situation IT health systems are diverse, with models varying from predominant solutions adopted and created by large organizations, to ad-hoc solutions developed by any company to meet specific needs. However, all these systems are under similar financial pressures, not only from current global economic conditions and increased health care costs, but also under pressure from a population that has embraced the current technological advances, and demand a more personalized health care, up to those enjoyed by technological advances in other areas. The purpose of this thesis is to develop a business model aimed at the provision of information exchange within the clinical domain. It is intended to increase competitiveness in the health IT sector without the need for experts in standards, providing qualified technical profiles less expensively with the help of tools that simplify the use of interoperability standards. Open specifications, like FHIR, will be used in order to enable interoperability between systems. The main advantage found within FHIR is that introduces a shift in the current conception of available clinical information. So far seen, the clinical information domain IT systems, as a special requirement for more complex standards that address any specific case. This specification allows the use of clinical information through existing web technologies (HTTP, HTML, OAuth2, JSON and XML), which everyone can use with no particular training to create and consume this information. The current situation in the sector is that the integration of information is almost nonexistent, compared to current trends. Spending in IT health systems will increase dramatically within clinical integration for the next years, leaving the technical challenges whose costs will recede into the background. The investment on this area will focus on the expectations of what can be obtained in the current trend of personalization of clinical data of patients with access to records of institutions with ‘social /mobile /big data’.