21 resultados para Big data analytics
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
En aquest treball es presenta el laboratori que l'eLearn Center posa a disposició del professorat i investigadors en e-learning de la UOC per al disseny sistemàtic d'experiments sota una perspectiva de Learning Analytics, però tambémecanismes per fer seguiment i documentar tot el procés que envolta el disseny d'experiències docents de forma que sigui més senzill transferir-les a altres àmbits.
Resumo:
En este trabajo se hace una evaluación de la solución Big Data Hadoop como alternativa de almacenamiento y procesado de elevados volúmenes de datos en comparación con modelos relacionales tradicionales en un Enterprise Data Warehouse (EDW) corporativo, y de cómo ésta es capaz de integrarse con las herramientas de visualización típicas de las suites Business Intelligence.
Resumo:
Degut al gran interès actual per instal·lar clústers dedicats al tractament de dades amb Hadoop, s'ha dissenyat una distribució de Linux que automatitza totes les tasques associades. Aquesta distribució permet fer el desplegament sobre un clúster i realitzar una configuració bàsica del mateix de la forma més desatesa possible.
Resumo:
Este proyecto de final de carrera corresponde al área de inteligencia artificial y representa un caso de uso que pretende utilizar datos reales referentes a accidentes de tráfico (datos de accidentes, muertos, heridos, etc.) y analizarlas conjuntamente con datos que puedan tener una posible relación con los accidentes como el parque de vehículos, las temperaturas de la zona de los accidentes, etc. con la finalidad de poder obtener las posibles relaciones causa-efecto.
Resumo:
Aquest treball de final de carrera vol donar una solució a un suposat encàrrec de la Unió Europea de construir una base de dades relacional que permeti emmagatzemar dades de l'activitat física dels ciutadans, obtingudes a partir de dispositius wearables, i dades de l'estat de salut i malalties diagnosticades, recollides pels sistemes informàtics dels diferents serveis de salut. Amb totes aquestes dades recopilades la nostra base de dades permetrà, a través d'aplicacions d'alt nivell, extreure informació útil que permeti conèixer l'estat de salut real dels ciutadans i dissenyar actuacions i campanyes que permetin la seva millora.
Resumo:
DnaSP is a software package for the analysis of DNA polymorphism data. Present version introduces several new modules and features which, among other options allow: (1) handling big data sets (~5 Mb per sequence); (2) conducting a large number of coalescent-based tests by Monte Carlo computer simulations; (3) extensive analyses of the genetic differentiation and gene flow among populations; (4) analysing the evolutionary pattern of preferred and unpreferred codons; (5) generating graphical outputs for an easy visualization of results. Availability: The software package, including complete documentation and examples, is freely available to academic users from: http://www.ub.es/dnasp
Resumo:
The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.
Resumo:
The European Space Agency's Gaia mission will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), by providing unprecedented position, parallax, proper motion, and radial velocity measurements for about one billion stars. The resulting catalogue will be made available to the scientific community and will be analyzed in many different ways, including the production of a variety of statistics. The latter will often entail the generation of multidimensional histograms and hypercubes as part of the precomputed statistics for each data release, or for scientific analysis involving either the final data products or the raw data coming from the satellite instruments. In this paper we present and analyze a generic framework that allows the hypercube generation to be easily done within a MapReduce infrastructure, providing all the advantages of the new Big Data analysis paradigmbut without dealing with any specific interface to the lower level distributed system implementation (Hadoop). Furthermore, we show how executing the framework for different data storage model configurations (i.e. row or column oriented) and compression techniques can considerably improve the response time of this type of workload for the currently available simulated data of the mission. In addition, we put forward the advantages and shortcomings of the deployment of the framework on a public cloud provider, benchmark against other popular solutions available (that are not always the best for such ad-hoc applications), and describe some user experiences with the framework, which was employed for a number of dedicated astronomical data analysis techniques workshops.
Resumo:
The final year project came to us as an opportunity to get involved in a topic which has appeared to be attractive during the learning process of majoring in economics: statistics and its application to the analysis of economic data, i.e. econometrics.Moreover, the combination of econometrics and computer science is a very hot topic nowadays, given the Information Technologies boom in the last decades and the consequent exponential increase in the amount of data collected and stored day by day. Data analysts able to deal with Big Data and to find useful results from it are verydemanded in these days and, according to our understanding, the work they do, although sometimes controversial in terms of ethics, is a clear source of value added both for private corporations and the public sector. For these reasons, the essence of this project is the study of a statistical instrument valid for the analysis of large datasets which is directly related to computer science: Partial Correlation Networks.The structure of the project has been determined by our objectives through the development of it. At first, the characteristics of the studied instrument are explained, from the basic ideas up to the features of the model behind it, with the final goal of presenting SPACE model as a tool for estimating interconnections in between elements in large data sets. Afterwards, an illustrated simulation is performed in order to show the power and efficiency of the model presented. And at last, the model is put into practice by analyzing a relatively large data set of real world data, with the objective of assessing whether the proposed statistical instrument is valid and useful when applied to a real multivariate time series. In short, our main goals are to present the model and evaluate if Partial Correlation Network Analysis is an effective, useful instrument and allows finding valuable results from Big Data.As a result, the findings all along this project suggest the Partial Correlation Estimation by Joint Sparse Regression Models approach presented by Peng et al. (2009) to work well under the assumption of sparsity of data. Moreover, partial correlation networks are shown to be a very valid tool to represent cross-sectional interconnections in between elements in large data sets.The scope of this project is however limited, as there are some sections in which deeper analysis would have been appropriate. Considering intertemporal connections in between elements, the choice of the tuning parameter lambda, or a deeper analysis of the results in the real data application are examples of aspects in which this project could be completed.To sum up, the analyzed statistical tool has been proved to be a very useful instrument to find relationships that connect the elements present in a large data set. And after all, partial correlation networks allow the owner of this set to observe and analyze the existing linkages that could have been omitted otherwise.
Resumo:
Hoy en día, las posibilidades del Big Data son incontables. Existe gran cantidad de información generada por la población general y disponible de forma pública.El reto consiste en poder trabajar con esta información y extraer conclusiones útiles y que generen valor.En este proyecto, queremos analizar en el tiempo el interés general de la población respecto a una enfermedad común como la gripe, y poder relacionarlos con brotes de gripe existentes en el pasado, para de esta manera, poder extrapolar y predecir futuros brotes.Esta información, en manos de las autoridades sanitarias, puede ser de gran ayuda para poder prevenir picos de solicitudes en los servicios de urgencias, anticipándose para gestionar de manera más eficaz los recursos disponibles, consiguiendo, de esta manera, un mejor servicio a la población en general.De esta manera, son los propios usuarios los que, sin saberlo, posibilitan una mayor y mejor respuesta en los servicios sanitarios mediante la información que ellos mismos distribuyen libremente, consiguiéndose de esta manera valiosos beneficios para la población general.
Resumo:
En este proyecto queremos analizar en el tiempo el interés general de la población respecto a una enfermedad común como la gripe, y poder relacionarlos con brotes de gripe existentes en el pasado, para de esta manera, poder extrapolar y predecir futuros brotes. Esta información, en manos de las autoridades sanitarias, puede ser de gran ayuda para poder prevenir picos de solicitudes en los servicios de urgencias, anticipándose para gestionar de manera más eficaz los recursos disponibles, consiguiendo, de esta manera, un mejor servicio a la población en general.
Resumo:
Estudi de viabilitat sobre la implantació d'un software-defined storage open source en entorns empresarials. Comparativa entre Gluster, Ceph, OpenAFS, TahoeFS i XtreemFS.
Resumo:
Proceedings of Internet, Law and Politics. A decade of transformations.
Resumo:
Este proyecto consiste en diseñar e implementar un sistema de información alojado en una base de datos Oracle, con el fin de dar respuesta al proyecto Big Data, cuyo objetivo es cruzar los datos de salud y los datos de actividad física de los ciudadanos europeos.
Resumo:
Peer-reviewed