936 resultados para Data Analytics
Resumo:
Wednesday 26th March 2014 Speaker(s): Dr Trung Dong Huynh Organiser: Dr Tim Chown Time: 26/03/2014 11:00-11:50 Location: B32/3077 File size: 349Mb Abstract Understanding the dynamics of a crowdsourcing application and controlling the quality of the data it generates is challenging, partly due to the lack of tools to do so. Provenance is a domain-independent means to represent what happened in an application, which can help verify data and infer their quality. It can also reveal the processes that led to a data item and the interactions of contributors with it. Provenance patterns can manifest real-world phenomena such as a significant interest in a piece of content, providing an indication of its quality, or even issues such as undesirable interactions within a group of contributors. In this talk, I will present an application-independent methodology for analysing provenance graphs, constructed from provenance records, to learn about such patterns and to use them for assessing some key properties of crowdsourced data, such as their quality, in an automated manner. I will also talk about CollabMap (www.collabmap.org), an online crowdsourcing mapping application, and show how we applied the approach above to the trust classification of data generated by the crowd, achieving an accuracy over 95%.
Resumo:
Abstract: Big Data has been characterised as a great economic opportunity and a massive threat to privacy. Both may be correct: the same technology can indeed be used in ways that are highly beneficial and those that are ethically intolerable, maybe even simultaneously. Using examples of how Big Data might be used in education - normally referred to as "learning analytics" - the seminar will discuss possible ethical and legal frameworks for Big Data, and how these might guide the development of technologies, processes and policies that can deliver the benefits of Big Data without the nightmares. Speaker Biography: Andrew Cormack is Chief Regulatory Adviser, Jisc Technologies. He joined the company in 1999 as head of the JANET-CERT and EuroCERT incident response teams. In his current role he concentrates on the security, policy and regulatory issues around the network and services that Janet provides to its customer universities and colleges. Previously he worked for Cardiff University running web and email services, and for NERC's Shipboard Computer Group. He has degrees in Mathematics, Humanities and Law.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
As we enter an era of ‘big data’, asset information is becoming a deliverable of complex projects. Prior research suggests digital technologies enable rapid, flexible forms of project organizing. This research analyses practices of managing change in Airbus, CERN and Crossrail, through desk-based review, interviews, visits and a cross-case workshop. These organizations deliver complex projects, rely on digital technologies to manage large data-sets; and use configuration management, a systems engineering approach with mid-20th century origins, to establish and maintain integrity. In them, configuration management has become more, rather than less, important. Asset information is structured, with change managed through digital systems, using relatively hierarchical, asynchronous and sequential processes. The paper contributes by uncovering limits to flexibility in complex projects where integrity is important. Challenges of managing change are discussed, considering the evolving nature of configuration management; potential use of analytics on complex projects; and implications for research and practice.
Resumo:
Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.
Resumo:
This paper reviews the literature concerning the practice of using Online Analytical Processing (OLAP) systems to recall information stored by Online Transactional Processing (OLTP) systems. Such a review provides a basis for discussion on the need for the information that are recalled through OLAP systems to maintain the contexts of transactions with the data captured by the respective OLTP system. The paper observes an industry trend involving the use of OLTP systems to process information into data, which are then stored in databases without the business rules that were used to process information and data stored in OLTP databases without associated business rules. This includes the necessitation of a practice, whereby, sets of business rules are used to extract, cleanse, transform and load data from disparate OLTP systems into OLAP databases to support the requirements for complex reporting and analytics. These sets of business rules are usually not the same as business rules used to capture data in particular OLTP systems. The paper argues that, differences between the business rules used to interpret these same data sets, risk gaps in semantics between information captured by OLTP systems and information recalled through OLAP systems. Literature concerning the modeling of business transaction information as facts with context as part of the modelling of information systems were reviewed to identify design trends that are contributing to the design quality of OLTP and OLAP systems. The paper then argues that; the quality of OLTP and OLAP systems design has a critical dependency on the capture of facts with associated context, encoding facts with contexts into data with business rules, storage and sourcing of data with business rules, decoding data with business rules into the facts with the context and recall of facts with associated contexts. The paper proposes UBIRQ, a design model to aid the co-design of data with business rules storage for OLTP and OLAP purposes. The proposed design model provides the opportunity for the implementation and use of multi-purpose databases, and business rules stores for OLTP and OLAP systems. Such implementations would enable the use of OLTP systems to record and store data with executions of business rules, which will allow for the use of OLTP and OLAP systems to query data with business rules used to capture the data. Thereby ensuring information recalled via OLAP systems preserves the contexts of transactions as per the data captured by the respective OLTP system.
Resumo:
Clustering methods are increasingly being applied to residential smart meter data, providing a number of important opportunities for distribution network operators (DNOs) to manage and plan the low voltage networks. Clustering has a number of potential advantages for DNOs including, identifying suitable candidates for demand response and improving energy profile modelling. However, due to the high stochasticity and irregularity of household level demand, detailed analytics are required to define appropriate attributes to cluster. In this paper we present in-depth analysis of customer smart meter data to better understand peak demand and major sources of variability in their behaviour. We find four key time periods in which the data should be analysed and use this to form relevant attributes for our clustering. We present a finite mixture model based clustering where we discover 10 distinct behaviour groups describing customers based on their demand and their variability. Finally, using an existing bootstrapping technique we show that the clustering is reliable. To the authors knowledge this is the first time in the power systems literature that the sample robustness of the clustering has been tested.
Resumo:
New business and technology platforms are required to sustainably manage urban water resources [1,2]. However, any proposed solutions must be cognisant of security, privacy and other factors that may inhibit adoption and hence impact. The FP7 WISDOM project (funded by the European Commission - GA 619795) aims to achieve a step change in water and energy savings via the integration of innovative Information and Communication Technologies (ICT) frameworks to optimize water distribution networks and to enable change in consumer behavior through innovative demand management and adaptive pricing schemes [1,2,3]. The WISDOM concept centres on the integration of water distribution, sensor monitoring and communication systems coupled with semantic modelling (using ontologies, potentially connected to BIM, to serve as intelligent linkages throughout the entire framework) and control capabilities to provide for near real-time management of urban water resources. Fundamental to this framework are the needs and operational requirements of users and stakeholders at domestic, corporate and city levels and this requires the interoperability of a number of demand and operational models, fed with data from diverse sources such as sensor networks and crowsourced information. This has implications regarding the provenance and trustworthiness of such data and how it can be used in not only the understanding of system and user behaviours, but more importantly in the real-time control of such systems. Adaptive and intelligent analytics will be used to produce decision support systems that will drive the ability to increase the variability of both supply and consumption [3]. This in turn paves the way for adaptive pricing incentives and a greater understanding of the water-energy nexus. This integration is complex and uncertain yet being typical of a cyber-physical system, and its relevance transcends the water resource management domain. The WISDOM framework will be modeled and simulated with initial testing at an experimental facility in France (AQUASIM – a full-scale test-bed facility to study sustainable water management), then deployed and evaluated in in two pilots in Cardiff (UK) and La Spezia (Italy). These demonstrators will evaluate the integrated concept providing insight for wider adoption.
Resumo:
Retirado do blog de Marc Pickren do dia 13 jun. 2014.
Resumo:
Sound the vuvuzelas, the World Cup is officially here. The biggest sporting event in the world is set to break all kinds of viewing records. Sporting in the digital world is just as much about stats as it is about the game itself. Enter Brandwatch. The social media analytics company has taken it upon itself to track social media statistics for the entire run of the World Cup with their new real-time data visualization tool.
Resumo:
Pós-graduação em Ciências Cartográficas - FCT
Resumo:
We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents.Techniques are organized considering their target input materialeither single texts or collections of textsand their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine.We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, discuss how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, and strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics.
Resumo:
I dispositivi mobili, dagli smartphone ai tablet, sono entrati a far parte della nostra quotidianità. Controllando l’infrastruttura delle comunicazioni, rispetto a qualsiasi altro settore, si ha un maggiore accesso a informazioni relative alla geo-localizzazione degli utenti e alle loro interazioni. Questa grande mole di informazioni può aiutare a costruire città intelligenti e sostenibili, che significa modernizzare ed innovare le infrastrutture, migliorare la qualità della vita e soddisfare le esigenze di cittadini, imprese e istituzioni. Vodafone offre soluzioni concrete nel campo dell’info-mobilità consentendo la trasformazione delle nostre città in Smart City. Obiettivo della tesi e del progetto Proactive è cercare di sviluppare strumenti che, a partire da dati provenienti dalla rete mobile Vodafone, consentano di ricavare e di rappresentare su cartografia dati indicanti la presenza dei cittadini in determinati punti d’interesse, il profilo di traffico di determinati segmenti viari e le matrici origine/destinazione. Per fare questo verranno prima raccolti e filtrati i dati della città di Milano e della regione Lombardia provenienti dalla rete mobile Vodafone per poi, in un secondo momento, sviluppare degli algoritmi e delle procedure in PL/SQL che siano in grado di ricevere questo tipo di dato, di analizzarlo ed elaborarlo restituendo i risultati prestabiliti. Questi risultati saranno poi rappresentati su cartografia grazie a QGis e grazie ad una Dashboard aziendale interna di Vodafone. Lo sviluppo delle procedure e la rappresentazione cartografica dei risultati verranno eseguite in ambiente di Test e se i risultati soddisferanno i requisiti di progetto verrà effettuato il porting in ambiente di produzione. Grazie a questo tipo di soluzioni, che forniscono dati in modalità anonima e aggregata in ottemperanza alle normative di privacy, le aziende di trasporto pubblico, ad esempio, potranno essere in grado di gestire il traffico in modo più efficiente.
Resumo:
Teaching is a dynamic activity. It can be very effective, if its impact is constantly monitored and adjusted to the demands of changing social contexts and needs of learners. This implies that teachers need to be aware about teaching and learning processes. Moreover, they should constantly question their didactical methods and the learning resources, which they provide to their students. They should reflect if their actions are suitable, and they should regulate their teaching, e.g., by updating learning materials based on new knowledge about learners, or by motivating learners to engage in further learning activities. In the last years, a rising interest in ‘learning analytics’ is observable. This interest is motivated by the availability of massive amounts of educational data. Also, the continuously increasing processing power, and a strong motivation for discovering new information from these pools of educational data, is pushing further developments within the learning analytics research field. Learning analytics could be a method for reflective teaching practice that enables and guides teachers to investigate and evaluate their work in future learning scenarios. However, this potentially positive impact has not yet been sufficiently verified by learning analytics research. Another method that pursues these goals is ‘action research’. Learning analytics promises to initiate action research processes because it facilitates awareness, reflection and regulation of teaching activities analogous to action research. Therefore, this thesis joins both concepts, in order to improve the design of learning analytics tools. Central research question of this thesis are: What are the dimensions of learning analytics in relation to action research, which need to be considered when designing a learning analytics tool? How does a learning analytics dashboard impact the teachers of technology-enhanced university lectures regarding ‘awareness’, ‘reflection’ and ‘action’? Does it initiate action research? Which are central requirements for a learning analytics tool, which pursues such effects? This project followed design-based research principles, in order to answer these research questions. The main contributions are: a theoretical reference model that connects action research and learning analytics, the conceptualization and implementation of a learning analytics tool, a requirements catalogue for useful and usable learning analytics design based on evaluations, a tested procedure for impact analysis, and guidelines for the introduction of learning analytics into higher education.