818 resultados para big data
Resumo:
Place-names are a fundamental concept in all academic collections: everything happens somewhere. Contemporary place-names are comprehensively represented in digital gazetteer and geospatial web services such as GeoNames. However, despite millions of pounds of investment by JISC and other agencies in historical online resources in recent years, there is currently no equivalent for historic place-names. This project will digitize the entire 86 volume corpus of the Survey of English Place-Names (SEPN), the ultimate authority on historic place-names in England, and make its 4 million forms available.
Resumo:
Software-as-a-service (SaaS) is a type of software service delivery model which encompasses a broad range of business opportunities and challenges. Users and service providers are reluctant to integrate their business into SaaS due to its security concerns while at the same time they are attracted by its benefits. This article highlights SaaS utility and applicability in different environments like cloud computing, mobile cloud computing, software defined networking and Internet of things. It then embarks on the analysis of SaaS security challenges spanning across data security, application security and SaaS deployment security. A detailed review of the existing mainstream solutions to tackle the respective security issues mapping into different SaaS security challenges is presented. Finally, possible solutions or techniques which can be applied in tandem are presented for a secure SaaS platform.
Resumo:
Background: Oncology is a field that profits tremendously from the genomic data generated by high-throughput technologies, including next-generation sequencing. However, in order to exploit, integrate, visualize and interpret such high-dimensional data efficiently, non-trivial computational and statistical analysis methods are required that need to be developed in a problem-directed manner.
Discussion: For this reason, computational cancer biology aims to fill this gap. Unfortunately, computational cancer biology is not yet fully recognized as a coequal field in oncology, leading to a delay in its maturation and, as an immediate consequence, an under-exploration of high-throughput data for translational research.
Summary: Here we argue that this imbalance, favoring 'wet lab-based activities', will be naturally rectified over time, if the next generation of scientists receives an academic education that provides a fair and competent introduction to computational biology and its manifold capabilities. Furthermore, we discuss a number of local educational provisions that can be implemented on university level to help in facilitating the process of harmonization.
Resumo:
This special issue provides the latest research and development on wireless mobile wearable communications. According to a report by Juniper Research, the market value of connected wearable devices is expected to reach $1.5 billion by 2014, and the shipment of wearable devices may reach 70 million by 2017. Good examples of wearable devices are the prominent Google Glass and Microsoft HoloLens. As wearable technology is rapidly penetrating our daily life, mobile wearable communication is becoming a new communication paradigm. Mobile wearable device communications create new challenges compared to ordinary sensor networks and short-range communication. In mobile wearable communications, devices communicate with each other in a peer-to-peer fashion or client-server fashion and also communicate with aggregation points (e.g., smartphones, tablets, and gateway nodes). Wearable devices are expected to integrate multiple radio technologies for various applications' needs with small power consumption and low transmission delays. These devices can hence collect, interpret, transmit, and exchange data among supporting components, other wearable devices, and the Internet. Such data are not limited to people's personal biomedical information but also include human-centric social and contextual data. The success of mobile wearable technology depends on communication and networking architectures that support efficient and secure end-to-end information flows. A key design consideration of future wearable devices is the ability to ubiquitously connect to smartphones or the Internet with very low energy consumption. Radio propagation and, accordingly, channel models are also different from those in other existing wireless technologies. A huge number of connected wearable devices require novel big data processing algorithms, efficient storage solutions, cloud-assisted infrastructures, and spectrum-efficient communications technologies.
Resumo:
Recommending users for a new social network user to follow is a topic of interest at present. The existing approaches rely on using various types of information about the new user to determine recommended users who have similar interests to the new user. However, this presents a problem when a new user joins a social network, who is yet to have any interaction on the social network. In this paper we present a particular type of conversational recommendation approach, critiquing-based recommendation, to solve the cold start problem. We present a critiquing-based recommendation system, called CSFinder, to recommend users for a new user to follow. A traditional critiquing-based recommendation system allows a user to critique a feature of a recommended item at a time and gradually leads the user to the target recommendation. However this may require a lengthy recommendation session. CSFinder aims to reduce the session length by taking a case-based reasoning approach. It selects relevant recommendation sessions of past users that match the recommendation session of the current user to shortcut the current recommendation session. It selects relevant recommendation sessions from a case base that contains the successful recommendation sessions of past users. A past recommendation session can be selected if it contains recommended items and critiques that sufficiently overlap with the ones in the current session. Our experimental results show that CSFinder has significantly shorter sessions than the ones of an Incremental Critiquing system, which is a baseline critiquing-based recommendation system.
Resumo:
Abstract
Publicly available, outdoor webcams continuously view the world and share images. These cameras include traffic cams, campus cams, ski-resort cams, etc. The Archive of Many Outdoor Scenes (AMOS) is a project aiming to geolocate, annotate, archive, and visualize these cameras and images to serve as a resource for a wide variety of scientific applications. The AMOS dataset has archived over 750 million images of outdoor environments from 27,000 webcams since 2006. Our goal is to utilize the AMOS image dataset and crowdsourcing to develop reliable and valid tools to improve physical activity assessment via online, outdoor webcam capture of global physical activity patterns and urban built environment characteristics.
This project’s grand scale-up of capturing physical activity patterns and built environments is a methodological step forward in advancing a real-time, non-labor intensive assessment using webcams, crowdsourcing, and eventually machine learning. The combined use of webcams capturing outdoor scenes every 30 min and crowdsources providing the labor of annotating the scenes allows for accelerated public health surveillance related to physical activity across numerous built environments. The ultimate goal of this public health and computer vision collaboration is to develop machine learning algorithms that will automatically identify and calculate physical activity patterns.
Resumo:
Background: Gene expression connectivity mapping has proven to be a powerful and flexible tool for research. Its application has been shown in a broad range of research topics, most commonly as a means of identifying potential small molecule compounds, which may be further investigated as candidates for repurposing to treat diseases. The public release of voluminous data from the Library of Integrated Cellular Signatures (LINCS) programme further enhanced the utilities and potentials of gene expression connectivity mapping in biomedicine. Results: We describe QUADrATiC (http://go.qub.ac.uk/QUADrATiC), a user-friendly tool for the exploration of gene expression connectivity on the subset of the LINCS data set corresponding to FDA-approved small molecule compounds. It enables the identification of compounds for repurposing therapeutic potentials. The software is designed to cope with the increased volume of data over existing tools, by taking advantage of multicore computing architectures to provide a scalable solution, which may be installed and operated on a range of computers, from laptops to servers. This scalability is provided by the use of the modern concurrent programming paradigm provided by the Akka framework. The QUADrATiC Graphical User Interface (GUI) has been developed using advanced Javascript frameworks, providing novel visualization capabilities for further analysis of connections. There is also a web services interface, allowing integration with other programs or scripts.Conclusions: QUADrATiC has been shown to provide an improvement over existing connectivity map software, in terms of scope (based on the LINCS data set), applicability (using FDA-approved compounds), usability and speed. It offers potential to biological researchers to analyze transcriptional data and generate potential therapeutics for focussed study in the lab. QUADrATiC represents a step change in the process of investigating gene expression connectivity and provides more biologically-relevant results than previous alternative solutions.
Resumo:
Modern approaches to biomedical research and diagnostics targeted towards precision medicine are generating ‘big data’ across a range of high-throughput experimental and analytical platforms. Integrative analysis of this rich clinical, pathological, molecular and imaging data represents one of the greatest bottlenecks in biomarker discovery research in cancer and other diseases. Following on from the publication of our successful framework for multimodal data amalgamation and integrative analysis, Pathology Integromics in Cancer (PICan), this article will explore the essential elements of assembling an integromics framework from a more detailed perspective. PICan, built around a relational database storing curated multimodal data, is the research tool sitting at the heart of our interdisciplinary efforts to streamline biomarker discovery and validation. While recognizing that every institution has a unique set of priorities and challenges, we will use our experiences with PICan as a case study and starting point, rationalizing the design choices we made within the context of our local infrastructure and specific needs, but also highlighting alternative approaches that may better suit other programmes of research and discovery. Along the way, we stress that integromics is not just a set of tools, but rather a cohesive paradigm for how modern bioinformatics can be enhanced. Successful implementation of an integromics framework is a collaborative team effort that is built with an eye to the future and greatly accelerates the processes of biomarker discovery, validation and translation into clinical practice.
Resumo:
There is still a lack of effective paradigms and tools for analysing and discovering the contents and relationships of project knowledge contexts in the field of project management. In this paper, a new framework for extracting and representing project knowledge contexts using topic models and dynamic knowledge maps under big data environments is proposed and developed. The conceptual paradigm, theoretical underpinning, extended topic model, and illustration examples of the ontology model for project knowledge maps are presented, with further research work envisaged.
Resumo:
In this article we provide brief descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems, Jobs Schedulers and Big Data Schedulers. We describe their evolution from early adoptions to modern implementations, considering both the use and features of algorithms. In summary, we discuss differences between all presented classes of schedulers and discuss their chronological development. In conclusion, we highlight similarities in the focus of scheduling strategies design, applicable to both local and distributed systems.
Resumo:
Nos últimos anos o aumento exponencial da utilização de dispositivos móveis e serviços disponibilizados na “Cloud” levou a que a forma como os sistemas são desenhados e implementados mudasse, numa perspectiva de tentar alcançar requisitos que até então não eram essenciais. Analisando esta evolução, com o enorme aumento dos dispositivos móveis, como os “smartphones” e “tablets” fez com que o desenho e implementação de sistemas distribuidos fossem ainda mais importantes nesta área, na tentativa de promover sistemas e aplicações que fossem mais flexíveis, robutos, escaláveis e acima de tudo interoperáveis. A menor capacidade de processamento ou armazenamento destes dispositivos tornou essencial o aparecimento e crescimento de tecnologias que prometem solucionar muitos dos problemas identificados. O aparecimento do conceito de Middleware visa solucionar estas lacunas nos sistemas distribuidos mais evoluídos, promovendo uma solução a nível de organização e desenho da arquitetura dos sistemas, ao memo tempo que fornece comunicações extremamente rápidas, seguras e de confiança. Uma arquitetura baseada em Middleware visa dotar os sistemas de um canal de comunicação que fornece uma forte interoperabilidade, escalabilidade, e segurança na troca de mensagens, entre outras vantagens. Nesta tese vários tipos e exemplos de sistemas distribuídos e são descritos e analisados, assim como uma descrição em detalhe de três protocolos (XMPP, AMQP e DDS) de comunicação, sendo dois deles (XMPP e AMQP) utilzados em projecto reais que serão descritos ao longo desta tese. O principal objetivo da escrita desta tese é demonstrar o estudo e o levantamento do estado da arte relativamente ao conceito de Middleware aplicado a sistemas distribuídos de larga escala, provando que a utilização de um Middleware pode facilitar e agilizar o desenho e desenvolvimento de um sistema distribuído e traz enormes vantagens num futuro próximo.
Resumo:
In the recent past, hardly anyone could predict this course of GIS development. GIS is moving from desktop to cloud. Web 2.0 enabled people to input data into web. These data are becoming increasingly geolocated. Big amounts of data formed something that is called "Big Data". Scientists still don't know how to deal with it completely. Different Data Mining tools are used for trying to extract some useful information from this Big Data. In our study, we also deal with one part of these data - User Generated Geographic Content (UGGC). The Panoramio initiative allows people to upload photos and describe them with tags. These photos are geolocated, which means that they have exact location on the Earth's surface according to a certain spatial reference system. By using Data Mining tools, we are trying to answer if it is possible to extract land use information from Panoramio photo tags. Also, we tried to answer to what extent this information could be accurate. At the end, we compared different Data Mining methods in order to distinguish which one has the most suited performances for this kind of data, which is text. Our answers are quite encouraging. With more than 70% of accuracy, we proved that extracting land use information is possible to some extent. Also, we found Memory Based Reasoning (MBR) method the most suitable method for this kind of data in all cases.
Resumo:
Actualmente, com a massificação da utilização das redes sociais, as empresas passam a sua mensagem nos seus canais de comunicação, mas os consumidores dão a sua opinião sobre ela. Argumentam, opinam, criticam (Nardi, Schiano, Gumbrecht, & Swartz, 2004). Positiva ou negativamente. Neste contexto o Text Mining surge como uma abordagem interessante para a resposta à necessidade de obter conhecimento a partir dos dados existentes. Neste trabalho utilizámos um algoritmo de Clustering hierárquico com o objectivo de descobrir temas distintos num conjunto de tweets obtidos ao longo de um determinado período de tempo para as empresas Burger King e McDonald’s. Com o intuito de compreender o sentimento associado a estes temas foi feita uma análise de sentimentos a cada tema encontrado, utilizando um algoritmo Bag-of-Words. Concluiu-se que o algoritmo de Clustering foi capaz de encontrar temas através do tweets obtidos, essencialmente ligados a produtos e serviços comercializados pelas empresas. O algoritmo de Sentiment Analysis atribuiu um sentimento a esses temas, permitindo compreender de entre os produtos/serviços identificados quais os que obtiveram uma polaridade positiva ou negativa, e deste modo sinalizar potencias situações problemáticas na estratégia das empresas, e situações positivas passíveis de identificação de decisões operacionais bem-sucedidas.
Resumo:
An inspirational and educational flashcard resource for secondary school children. Can be used as flashcards or as a matching activity (depending on how cards are cut out).
Resumo:
Tuesday 13th May Building 34 Room 3001, 16.15-17.45 Elena & Rikki/Jian Presenting: Groups: M, N, O, P Marking Groups: Q, R, S, T Schedule and Topics 16.15-16.20: Introduction and protocol for the session 16.20 Group M: Serious games – gaming as a driver for applications online 16.40 Group N: Open Education OERs 17.00 Group O: Big Data – the big picture 17.20 Group P: Rights and equality in the workplace 17.40-18.00: Wash-up: feedback session for presentation groups