818 resultados para big data
Resumo:
Teaching is a dynamic activity. It can be very effective, if its impact is constantly monitored and adjusted to the demands of changing social contexts and needs of learners. This implies that teachers need to be aware about teaching and learning processes. Moreover, they should constantly question their didactical methods and the learning resources, which they provide to their students. They should reflect if their actions are suitable, and they should regulate their teaching, e.g., by updating learning materials based on new knowledge about learners, or by motivating learners to engage in further learning activities. In the last years, a rising interest in ‘learning analytics’ is observable. This interest is motivated by the availability of massive amounts of educational data. Also, the continuously increasing processing power, and a strong motivation for discovering new information from these pools of educational data, is pushing further developments within the learning analytics research field. Learning analytics could be a method for reflective teaching practice that enables and guides teachers to investigate and evaluate their work in future learning scenarios. However, this potentially positive impact has not yet been sufficiently verified by learning analytics research. Another method that pursues these goals is ‘action research’. Learning analytics promises to initiate action research processes because it facilitates awareness, reflection and regulation of teaching activities analogous to action research. Therefore, this thesis joins both concepts, in order to improve the design of learning analytics tools. Central research question of this thesis are: What are the dimensions of learning analytics in relation to action research, which need to be considered when designing a learning analytics tool? How does a learning analytics dashboard impact the teachers of technology-enhanced university lectures regarding ‘awareness’, ‘reflection’ and ‘action’? Does it initiate action research? Which are central requirements for a learning analytics tool, which pursues such effects? This project followed design-based research principles, in order to answer these research questions. The main contributions are: a theoretical reference model that connects action research and learning analytics, the conceptualization and implementation of a learning analytics tool, a requirements catalogue for useful and usable learning analytics design based on evaluations, a tested procedure for impact analysis, and guidelines for the introduction of learning analytics into higher education.
Resumo:
Logistiknetzwerke von Unternehmen wachsen sehr schnell und werden immer komplexer. Unternehmen wissen oft nicht, von welchen anderen Unternehmen sie abhängig sind und welche geschäftskritischen Risiken sich daraus für sie ergeben. Aus diesem Grund wird in diesem Artikel ein Konzept eines proaktiven Ri-sikomanagements in Logistiknetzwerken vorgestellt. Das Konzept basiert auf der Big Data Technologie und verwendet zur Identifikation von Risiken und zum Aufbau eines Logistiknetzwerkes neben internen Unternehmensdaten auch externe Daten, z. B. Social Media Plattformen oder andere Datenportale. Diese Daten werden ausgewertet und mit Risiken behaftete Beziehungen werden dem Bediener grafisch angezeigt. Zusätzlich dazu kann das System dem Benutzer mögliche Alternativen zur Vermeidung dieser Risiken aufzeigen und somit zur Entscheidungsunterstützung genutzt werden.
Resumo:
Simulation techniques are almost indispensable in the analysis of complex systems. Materials- and related information flow processes in logistics often possess such complexity. Further problem arise as the processes change over time and pose a Big Data problem as well. To cope with these issues adaptive simulations are more and more frequently used. This paper presents a few relevant advanced simulation models and intro-duces a novel model structure, which unifies modelling of geometrical relations and time processes. This way the process structure and their geometric relations can be handled in a well understandable and transparent way. Capabilities and applicability of the model is also presented via a demonstrational example.
Resumo:
This chapter presents fuzzy cognitive maps (FCM) as a vehicle for Web knowledge aggregation, representation, and reasoning. The corresponding Web KnowARR framework incorporates findings from fuzzy logic. To this end, a first emphasis is particularly on the Web KnowARR framework along with a stakeholder management use case to illustrate the framework’s usefulness as a second focal point. This management form is to help projects to acceptance and assertiveness where claims for company decisions are actively involved in the management process. Stakeholder maps visually (re-) present these claims. On one hand, they resort to non-public content and on the other they resort to content that is available to the public (mostly on the Web). The Semantic Web offers opportunities not only to present public content descriptively but also to show relationships. The proposed framework can serve as the basis for the public content of stakeholder maps.
Resumo:
The fuzzy analytical network process (FANP) is introduced as a potential multi-criteria-decision-making (MCDM) method to improve digital marketing management endeavors. Today’s information overload makes digital marketing optimization, which is needed to continuously improve one’s business, increasingly difficult. The proposed FANP framework is a method for enhancing the interaction between customers and marketers (i.e., involved stakeholders) and thus for reducing the challenges of big data. The presented implementation takes realities’ fuzziness into account to manage the constant interaction and continuous development of communication between marketers and customers on the Web. Using this FANP framework, the marketers are able to increasingly meet the varying requirements of their customers. To improve the understanding of the implementation, advanced visualization methods (e.g., wireframes) are used.
Resumo:
We present a novel surrogate model-based global optimization framework allowing a large number of function evaluations. The method, called SpLEGO, is based on a multi-scale expected improvement (EI) framework relying on both sparse and local Gaussian process (GP) models. First, a bi-objective approach relying on a global sparse GP model is used to determine potential next sampling regions. Local GP models are then constructed within each selected region. The method subsequently employs the standard expected improvement criterion to deal with the exploration-exploitation trade-off within selected local models, leading to a decision on where to perform the next function evaluation(s). The potential of our approach is demonstrated using the so-called Sparse Pseudo-input GP as a global model. The algorithm is tested on four benchmark problems, whose number of starting points ranges from 102 to 104. Our results show that SpLEGO is effective and capable of solving problems with large number of starting points, and it even provides significant advantages when compared with state-of-the-art EI algorithms.
Resumo:
This work deals with parallel optimization of expensive objective functions which are modelled as sample realizations of Gaussian processes. The study is formalized as a Bayesian optimization problem, or continuous multi-armed bandit problem, where a batch of q > 0 arms is pulled in parallel at each iteration. Several algorithms have been developed for choosing batches by trading off exploitation and exploration. As of today, the maximum Expected Improvement (EI) and Upper Confidence Bound (UCB) selection rules appear as the most prominent approaches for batch selection. Here, we build upon recent work on the multipoint Expected Improvement criterion, for which an analytic expansion relying on Tallis’ formula was recently established. The computational burden of this selection rule being still an issue in application, we derive a closed-form expression for the gradient of the multipoint Expected Improvement, which aims at facilitating its maximization using gradient-based ascent algorithms. Substantial computational savings are shown in application. In addition, our algorithms are tested numerically and compared to state-of-the-art UCB-based batchsequential algorithms. Combining starting designs relying on UCB with gradient-based EI local optimization finally appears as a sound option for batch design in distributed Gaussian Process optimization.
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
La Internet de las Cosas (IoT), como parte de la Futura Internet, se ha convertido en la actualidad en uno de los principales temas de investigación; en parte gracias a la atención que la sociedad está poniendo en el desarrollo de determinado tipo de servicios (telemetría, generación inteligente de energía, telesanidad, etc.) y por las recientes previsiones económicas que sitúan a algunos actores, como los operadores de telecomunicaciones (que se encuentran desesperadamente buscando nuevas oportunidades), al frente empujando algunas tecnologías interrelacionadas como las comunicaciones Máquina a Máquina (M2M). En este contexto, un importante número de actividades de investigación a nivel mundial se están realizando en distintas facetas: comunicaciones de redes de sensores, procesado de información, almacenamiento de grandes cantidades de datos (big--‐data), semántica, arquitecturas de servicio, etc. Todas ellas, de forma independiente, están llegando a un nivel de madurez que permiten vislumbrar la realización de la Internet de las Cosas más que como un sueño, como una realidad tangible. Sin embargo, los servicios anteriormente mencionados no pueden esperar a desarrollarse hasta que las actividades de investigación obtengan soluciones holísticas completas. Es importante proporcionar resultados intermedios que eviten soluciones verticales realizadas para desarrollos particulares. En este trabajo, nos hemos focalizado en la creación de una plataforma de servicios que pretende facilitar, por una parte la integración de redes de sensores y actuadores heterogéneas y geográficamente distribuidas, y por otra lado el desarrollo de servicios horizontales utilizando dichas redes y la información que proporcionan. Este habilitador se utilizará para el desarrollo de servicios y para la experimentación en la Internet de las Cosas. Previo a la definición de la plataforma, se ha realizado un importante estudio focalizando no sólo trabajos y proyectos de investigación, sino también actividades de estandarización. Los resultados se pueden resumir en las siguientes aseveraciones: a) Los modelos de datos definidos por el grupo “Sensor Web Enablement” (SWE™) del “Open Geospatial Consortium (OGC®)” representan hoy en día la solución más completa para describir las redes de sensores y actuadores así como las observaciones. b) Las interfaces OGC, a pesar de las limitaciones que requieren cambios y extensiones, podrían ser utilizadas como las bases para acceder a sensores y datos. c) Las redes de nueva generación (NGN) ofrecen un buen sustrato que facilita la integración de redes de sensores y el desarrollo de servicios. En consecuencia, una nueva plataforma de Servicios, llamada Ubiquitous Sensor Networks (USN), se ha definido en esta Tesis tratando de contribuir a rellenar los huecos previamente mencionados. Los puntos más destacados de la plataforma USN son: a) Desde un punto de vista arquitectónico, sigue una aproximación de dos niveles (Habilitador y Gateway) similar a otros habilitadores que utilizan las NGN (como el OMA Presence). b) Los modelos de datos están basado en los estándares del OGC SWE. iv c) Está integrado en las NGN pero puede ser utilizado sin ellas utilizando infraestructuras IP abiertas. d) Las principales funciones son: Descubrimiento de sensores, Almacenamiento de observaciones, Publicacion--‐subscripcion--‐notificación, ejecución remota homogénea, seguridad, gestión de diccionarios de datos, facilidades de monitorización, utilidades de conversión de protocolos, interacciones síncronas y asíncronas, soporte para el “streaming” y arbitrado básico de recursos. Para demostrar las funcionalidades que la Plataforma USN propuesta pueden ofrecer a los futuros escenarios de la Internet de las Cosas, se presentan resultados experimentales de tres pruebas de concepto (telemetría, “Smart Places” y monitorización medioambiental) reales a pequeña escala y un estudio sobre semántica (sistema de información vehicular). Además, se está utilizando actualmente como Habilitador para desarrollar tanto experimentación como servicios reales en el proyecto Europeo SmartSantander (que aspira a integrar alrededor de 20.000 dispositivos IoT). v Abstract Internet of Things, as part of the Future Internet, has become one of the main research topics nowadays; in part thanks to the pressure the society is putting on the development of a particular kind of services (Smart metering, Smart Grids, eHealth, etc.), and by the recent business forecasts that situate some players, like Telecom Operators (which are desperately seeking for new opportunities), at the forefront pushing for some interrelated technologies like Machine--‐to--‐Machine (M2M) communications. Under this context, an important number of research activities are currently taking place worldwide at different levels: sensor network communications, information processing, big--‐ data storage, semantics, service level architectures, etc. All of them, isolated, are arriving to a level of maturity that envision the achievement of Internet of Things (IoT) more than a dream, a tangible goal. However, the aforementioned services cannot wait to be developed until the holistic research actions bring complete solutions. It is important to come out with intermediate results that avoid vertical solutions tailored for particular deployments. In the present work, we focus on the creation of a Service--‐level platform intended to facilitate, from one side the integration of heterogeneous and geographically disperse Sensors and Actuator Networks (SANs), and from the other the development of horizontal services using them and the information they provide. This enabler will be used for horizontal service development and for IoT experimentation. Prior to the definition of the platform, we have realized an important study targeting not just research works and projects, but also standardization topics. The results can be summarized in the following assertions: a) Open Geospatial Consortium (OGC®) Sensor Web Enablement (SWE™) data models today represent the most complete solution to describe SANs and observations. b) OGC interfaces, despite the limitations that require changes and extensions, could be used as the bases for accessing sensors and data. c) Next Generation Networks (NGN) offer a good substrate that facilitates the integration of SANs and the development of services. Consequently a new Service Layer platform, called Ubiquitous Sensor Networks (USN), has been defined in this Thesis trying to contribute to fill in the previous gaps. The main highlights of the proposed USN Platform are: a) From an architectural point of view, it follows a two--‐layer approach (Enabler and Gateway) similar to other enablers that run on top of NGN (like the OMA Presence). b) Data models and interfaces are based on the OGC SWE standards. c) It is integrated in NGN but it can be used without it over open IP infrastructures. d) Main functions are: Sensor Discovery, Observation Storage, Publish--‐Subscribe--‐Notify, homogeneous remote execution, security, data dictionaries handling, monitoring facilities, authorization support, protocol conversion utilities, synchronous and asynchronous interactions, streaming support and basic resource arbitration. vi In order to demonstrate the functionalities that the proposed USN Platform can offer to future IoT scenarios, some experimental results have been addressed in three real--‐life small--‐scale proofs--‐of concepts (Smart Metering, Smart Places and Environmental monitoring) and a study for semantics (in--‐vehicle information system). Furthermore we also present the current use of the proposed USN Platform as an Enabler to develop experimentation and real services in the SmartSantander EU project (that aims at integrating around 20.000 IoT devices).
Resumo:
One of the main challenges facing next generation Cloud platform services is the need to simultaneously achieve ease of programming, consistency, and high scalability. Big Data applications have so far focused on batch processing. The next step for Big Data is to move to the online world. This shift will raise the requirements for transactional guarantees. CumuloNimbo is a new EC-funded project led by Universidad Politécnica de Madrid (UPM) that addresses these issues via a highly scalable multi-tier transactional platform as a service (PaaS) that bridges the gap between OLTP and Big Data applications.
Resumo:
Aiming to address requirements concerning integration of services in the context of ?big data?, this paper presents an innovative approach that (i) ensures a flexible, adaptable and scalable information and computation infrastructure, and (ii) exploits the competences of stakeholders and information workers to meaningfully confront information management issues such as information characterization, classification and interpretation, thus incorporating the underlying collective intelligence. Our approach pays much attention to the issues of usability and ease-of-use, not requiring any particular programming expertise from the end users. We report on a series of technical issues concerning the desired flexibility of the proposed integration framework and we provide related recommendations to developers of such solutions. Evaluation results are also discussed.
Resumo:
Sensor network deployments have become a primary source of big data about the real world that surrounds us, measuring a wide range of physical properties in real time. With such large amounts of heterogeneous data, a key challenge is to describe and annotate sensor data with high-level metadata, using and extending models, for instance with ontologies. However, to automate this task there is a need for enriching the sensor metadata using the actual observed measurements and extracting useful meta-information from them. This paper proposes a novel approach of characterization and extraction of semantic metadata through the analysis of sensor data raw observations. This approach consists in using approximations to represent the raw sensor measurements, based on distributions of the observation slopes, building a classi?cation scheme to automatically infer sensor metadata like the type of observed property, integrating the semantic analysis results with existing sensor networks metadata.
Resumo:
Since the beginning of Internet, Internet Service Providers (ISP) have seen the need of giving to users? traffic different treatments defined by agree- ments between ISP and customers. This procedure, known as Quality of Service Management, has not much changed in the last years (DiffServ and Deep Pack-et Inspection have been the most chosen mechanisms). However, the incremen-tal growth of Internet users and services jointly with the application of recent Ma- chine Learning techniques, open up the possibility of going one step for-ward in the smart management of network traffic. In this paper, we first make a survey of current tools and techniques for QoS Management. Then we intro-duce clustering and classifying Machine Learning techniques for traffic charac-terization and the concept of Quality of Experience. Finally, with all these com-ponents, we present a brand new framework that will manage in a smart way Quality of Service in a telecom Big Data based scenario, both for mobile and fixed communications.
Resumo:
In the last decade, the research community has focused on new classification methods that rely on statistical characteristics of Internet traffic, instead of pre-viously popular port-number-based or payload-based methods, which are under even bigger constrictions. Some research works based on statistical characteristics generated large fea-ture sets of Internet traffic; however, nowadays it?s impossible to handle hun-dreds of features in big data scenarios, only leading to unacceptable processing time and misleading classification results due to redundant and correlative data. As a consequence, a feature selection procedure is essential in the process of Internet traffic characterization. In this paper a survey of feature selection methods is presented: feature selection frameworks are introduced, and differ-ent categories of methods are briefly explained and compared; several proposals on feature selection in Internet traffic characterization are shown; finally, future application of feature selection to a concrete project is proposed.
Resumo:
En la situación actual donde los sistemas TI sanitarios son diversos con modelos que van desde soluciones predominantes, adoptadas y creadas por grandes organizaciones, hasta soluciones a medida desarrolladas por cualquier empresa de la competencia para satisfacer necesidades concretas. Todos estos sistemas se encuentran bajo similares presiones financieras, no sólo de las condiciones económicas mundiales actuales y el aumento de los costes sanitarios, sino también bajo las presiones de una población que ha adoptado los avances tecnológicos actuales, y demanda una atención sanitaria más personalizable a la altura de esos avances tecnológicos que disfruta en otros ámbitos. El objeto es desarrollar un modelo de negocio orientado al soporte del intercambio de información en el ámbito clínico. El objetivo de este modelo de negocio es aumentar la competitividad dentro de este sector sin la necesidad de recurrir a expertos en estándares, proporcionando perfiles técnicos cualificados menos costosos con la ayuda de herramientas que simplifiquen el uso de los estándares de interoperabilidad. Se hará uso de especificaciones abiertas ya existentes como FHIR, que publica documentación y tutoriales bajo licencias abiertas. La principal ventaja que nos encontramos es que ésta especificación presenta un giro en la concepción actual de la disposición de información clínica, vista hasta ahora como especial por el requerimiento de estándares más complejos que solucionen cualquier caso por específico que sea. Ésta especificación permite hacer uso de la información clínica a través de tecnologías web actuales (HTTP, HTML, OAuth2, JSON, XML) que todo el mundo puede usar sin un entrenamiento particular para crear y consumir esta información. Partiendo por tanto de un mercado con una integración de la información casi inexistente, comparada con otros entornos actuales, hará que el gasto en integración clínica aumente dramáticamente, dejando atrás los desafíos técnicos cuyo gasto retrocederá a un segundo plano. El gasto se centrará en las expectativas de lo que se puede obtener en la tendencia actual de la personalización de los datos clínicos de los pacientes, con acceso a los registros de instituciones junto con datos ‘sociales/móviles/big data’.---ABSTRACT---In the current situation IT health systems are diverse, with models varying from predominant solutions adopted and created by large organizations, to ad-hoc solutions developed by any company to meet specific needs. However, all these systems are under similar financial pressures, not only from current global economic conditions and increased health care costs, but also under pressure from a population that has embraced the current technological advances, and demand a more personalized health care, up to those enjoyed by technological advances in other areas. The purpose of this thesis is to develop a business model aimed at the provision of information exchange within the clinical domain. It is intended to increase competitiveness in the health IT sector without the need for experts in standards, providing qualified technical profiles less expensively with the help of tools that simplify the use of interoperability standards. Open specifications, like FHIR, will be used in order to enable interoperability between systems. The main advantage found within FHIR is that introduces a shift in the current conception of available clinical information. So far seen, the clinical information domain IT systems, as a special requirement for more complex standards that address any specific case. This specification allows the use of clinical information through existing web technologies (HTTP, HTML, OAuth2, JSON and XML), which everyone can use with no particular training to create and consume this information. The current situation in the sector is that the integration of information is almost nonexistent, compared to current trends. Spending in IT health systems will increase dramatically within clinical integration for the next years, leaving the technical challenges whose costs will recede into the background. The investment on this area will focus on the expectations of what can be obtained in the current trend of personalization of clinical data of patients with access to records of institutions with ‘social /mobile /big data’.