781 resultados para big data storage
Resumo:
Background: Oncology is a field that profits tremendously from the genomic data generated by high-throughput technologies, including next-generation sequencing. However, in order to exploit, integrate, visualize and interpret such high-dimensional data efficiently, non-trivial computational and statistical analysis methods are required that need to be developed in a problem-directed manner.
Discussion: For this reason, computational cancer biology aims to fill this gap. Unfortunately, computational cancer biology is not yet fully recognized as a coequal field in oncology, leading to a delay in its maturation and, as an immediate consequence, an under-exploration of high-throughput data for translational research.
Summary: Here we argue that this imbalance, favoring 'wet lab-based activities', will be naturally rectified over time, if the next generation of scientists receives an academic education that provides a fair and competent introduction to computational biology and its manifold capabilities. Furthermore, we discuss a number of local educational provisions that can be implemented on university level to help in facilitating the process of harmonization.
Resumo:
Next-generation sequencing (NGS) technologies have begun to revolutionize the field of haematological malignancies through the assessment of a patient's genetic makeup with a minimal cost. Significant discoveries have already provided a unique insight into disease initiation, risk stratification and therapeutic intervention. Sequencing analysis will likely form part of the routine diagnostic testing in the future. However, a number of important issues need to be addressed for that to become a reality with regard to result interpretation, laboratory workflow, data storage and ethical issues. In this review we summarize the contribution that NGS has already made to the field of haematological malignancies. Finally, we discuss the challenges that NGS technologies will bring in relation to data storage, ethical and legal issues and laboratory validation. Despite these challenges, we predict that high-throughput DNA sequencing will redefine haematological malignancies based on individualized genomic analysis.
Resumo:
Recommending users for a new social network user to follow is a topic of interest at present. The existing approaches rely on using various types of information about the new user to determine recommended users who have similar interests to the new user. However, this presents a problem when a new user joins a social network, who is yet to have any interaction on the social network. In this paper we present a particular type of conversational recommendation approach, critiquing-based recommendation, to solve the cold start problem. We present a critiquing-based recommendation system, called CSFinder, to recommend users for a new user to follow. A traditional critiquing-based recommendation system allows a user to critique a feature of a recommended item at a time and gradually leads the user to the target recommendation. However this may require a lengthy recommendation session. CSFinder aims to reduce the session length by taking a case-based reasoning approach. It selects relevant recommendation sessions of past users that match the recommendation session of the current user to shortcut the current recommendation session. It selects relevant recommendation sessions from a case base that contains the successful recommendation sessions of past users. A past recommendation session can be selected if it contains recommended items and critiques that sufficiently overlap with the ones in the current session. Our experimental results show that CSFinder has significantly shorter sessions than the ones of an Incremental Critiquing system, which is a baseline critiquing-based recommendation system.
Resumo:
Abstract
Publicly available, outdoor webcams continuously view the world and share images. These cameras include traffic cams, campus cams, ski-resort cams, etc. The Archive of Many Outdoor Scenes (AMOS) is a project aiming to geolocate, annotate, archive, and visualize these cameras and images to serve as a resource for a wide variety of scientific applications. The AMOS dataset has archived over 750 million images of outdoor environments from 27,000 webcams since 2006. Our goal is to utilize the AMOS image dataset and crowdsourcing to develop reliable and valid tools to improve physical activity assessment via online, outdoor webcam capture of global physical activity patterns and urban built environment characteristics.
This project’s grand scale-up of capturing physical activity patterns and built environments is a methodological step forward in advancing a real-time, non-labor intensive assessment using webcams, crowdsourcing, and eventually machine learning. The combined use of webcams capturing outdoor scenes every 30 min and crowdsources providing the labor of annotating the scenes allows for accelerated public health surveillance related to physical activity across numerous built environments. The ultimate goal of this public health and computer vision collaboration is to develop machine learning algorithms that will automatically identify and calculate physical activity patterns.
Resumo:
Background: Gene expression connectivity mapping has proven to be a powerful and flexible tool for research. Its application has been shown in a broad range of research topics, most commonly as a means of identifying potential small molecule compounds, which may be further investigated as candidates for repurposing to treat diseases. The public release of voluminous data from the Library of Integrated Cellular Signatures (LINCS) programme further enhanced the utilities and potentials of gene expression connectivity mapping in biomedicine. Results: We describe QUADrATiC (http://go.qub.ac.uk/QUADrATiC), a user-friendly tool for the exploration of gene expression connectivity on the subset of the LINCS data set corresponding to FDA-approved small molecule compounds. It enables the identification of compounds for repurposing therapeutic potentials. The software is designed to cope with the increased volume of data over existing tools, by taking advantage of multicore computing architectures to provide a scalable solution, which may be installed and operated on a range of computers, from laptops to servers. This scalability is provided by the use of the modern concurrent programming paradigm provided by the Akka framework. The QUADrATiC Graphical User Interface (GUI) has been developed using advanced Javascript frameworks, providing novel visualization capabilities for further analysis of connections. There is also a web services interface, allowing integration with other programs or scripts.Conclusions: QUADrATiC has been shown to provide an improvement over existing connectivity map software, in terms of scope (based on the LINCS data set), applicability (using FDA-approved compounds), usability and speed. It offers potential to biological researchers to analyze transcriptional data and generate potential therapeutics for focussed study in the lab. QUADrATiC represents a step change in the process of investigating gene expression connectivity and provides more biologically-relevant results than previous alternative solutions.
Resumo:
Modern approaches to biomedical research and diagnostics targeted towards precision medicine are generating ‘big data’ across a range of high-throughput experimental and analytical platforms. Integrative analysis of this rich clinical, pathological, molecular and imaging data represents one of the greatest bottlenecks in biomarker discovery research in cancer and other diseases. Following on from the publication of our successful framework for multimodal data amalgamation and integrative analysis, Pathology Integromics in Cancer (PICan), this article will explore the essential elements of assembling an integromics framework from a more detailed perspective. PICan, built around a relational database storing curated multimodal data, is the research tool sitting at the heart of our interdisciplinary efforts to streamline biomarker discovery and validation. While recognizing that every institution has a unique set of priorities and challenges, we will use our experiences with PICan as a case study and starting point, rationalizing the design choices we made within the context of our local infrastructure and specific needs, but also highlighting alternative approaches that may better suit other programmes of research and discovery. Along the way, we stress that integromics is not just a set of tools, but rather a cohesive paradigm for how modern bioinformatics can be enhanced. Successful implementation of an integromics framework is a collaborative team effort that is built with an eye to the future and greatly accelerates the processes of biomarker discovery, validation and translation into clinical practice.
Resumo:
The Acoustic Oceanographic Buoy (AOB) Telemetry System has been designed to meet acoustic rapid environmental assessment requirements. It uses a standard institute of Electrical and Electronics Engineers 802.11 wireless local area network (WLAN) to integrate the air radio network (RaN) and a hydrophone array and acoustic source to integrate the underwater acoustic network (AcN). It offers advantages including local data storage, dedicated signal processing, and global positioning system (GPS) timing and localization. The AOB can also be integrated with other similar systems, due to its WLAN transceivers, to form a flexible network and perform on-line high speed data transmissions. The AOB is a reusable system that requires less maintenance and can also work as a salt-water plug-and-play system at sea as it is designed to operate in free drifting mode. The AOB is also suitable for performing distributed digital signal processing tasks due to its digital signal processor facility.
Resumo:
There is still a lack of effective paradigms and tools for analysing and discovering the contents and relationships of project knowledge contexts in the field of project management. In this paper, a new framework for extracting and representing project knowledge contexts using topic models and dynamic knowledge maps under big data environments is proposed and developed. The conceptual paradigm, theoretical underpinning, extended topic model, and illustration examples of the ontology model for project knowledge maps are presented, with further research work envisaged.
Resumo:
In this article we provide brief descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems, Jobs Schedulers and Big Data Schedulers. We describe their evolution from early adoptions to modern implementations, considering both the use and features of algorithms. In summary, we discuss differences between all presented classes of schedulers and discuss their chronological development. In conclusion, we highlight similarities in the focus of scheduling strategies design, applicable to both local and distributed systems.
Resumo:
This paper presents the system developed to promote the rational use of electric energy among consumers and, thus, increase the energy efficiency. The goal is to provide energy consumers with an application that displays the energy consumption/production profiles, sets up consuming ceilings, defines automatic alerts and alarms, compares anonymously consumers with identical energy usage profiles by region and predicts, in the case of non-residential installations, the expected consumption/production values. The resulting distributed system is organized in two main blocks: front-end and back-end. The front-end includes user interface applications for Android mobile devices and Web browsers. The back-end provides data storage and processing functionalities and is installed in a cloud computing platform - the Google App Engine - which provides a standard Web service interface. This option ensures interoperability, scalability and robustness to the system.
Resumo:
Nos últimos anos o aumento exponencial da utilização de dispositivos móveis e serviços disponibilizados na “Cloud” levou a que a forma como os sistemas são desenhados e implementados mudasse, numa perspectiva de tentar alcançar requisitos que até então não eram essenciais. Analisando esta evolução, com o enorme aumento dos dispositivos móveis, como os “smartphones” e “tablets” fez com que o desenho e implementação de sistemas distribuidos fossem ainda mais importantes nesta área, na tentativa de promover sistemas e aplicações que fossem mais flexíveis, robutos, escaláveis e acima de tudo interoperáveis. A menor capacidade de processamento ou armazenamento destes dispositivos tornou essencial o aparecimento e crescimento de tecnologias que prometem solucionar muitos dos problemas identificados. O aparecimento do conceito de Middleware visa solucionar estas lacunas nos sistemas distribuidos mais evoluídos, promovendo uma solução a nível de organização e desenho da arquitetura dos sistemas, ao memo tempo que fornece comunicações extremamente rápidas, seguras e de confiança. Uma arquitetura baseada em Middleware visa dotar os sistemas de um canal de comunicação que fornece uma forte interoperabilidade, escalabilidade, e segurança na troca de mensagens, entre outras vantagens. Nesta tese vários tipos e exemplos de sistemas distribuídos e são descritos e analisados, assim como uma descrição em detalhe de três protocolos (XMPP, AMQP e DDS) de comunicação, sendo dois deles (XMPP e AMQP) utilzados em projecto reais que serão descritos ao longo desta tese. O principal objetivo da escrita desta tese é demonstrar o estudo e o levantamento do estado da arte relativamente ao conceito de Middleware aplicado a sistemas distribuídos de larga escala, provando que a utilização de um Middleware pode facilitar e agilizar o desenho e desenvolvimento de um sistema distribuído e traz enormes vantagens num futuro próximo.
Resumo:
In the recent past, hardly anyone could predict this course of GIS development. GIS is moving from desktop to cloud. Web 2.0 enabled people to input data into web. These data are becoming increasingly geolocated. Big amounts of data formed something that is called "Big Data". Scientists still don't know how to deal with it completely. Different Data Mining tools are used for trying to extract some useful information from this Big Data. In our study, we also deal with one part of these data - User Generated Geographic Content (UGGC). The Panoramio initiative allows people to upload photos and describe them with tags. These photos are geolocated, which means that they have exact location on the Earth's surface according to a certain spatial reference system. By using Data Mining tools, we are trying to answer if it is possible to extract land use information from Panoramio photo tags. Also, we tried to answer to what extent this information could be accurate. At the end, we compared different Data Mining methods in order to distinguish which one has the most suited performances for this kind of data, which is text. Our answers are quite encouraging. With more than 70% of accuracy, we proved that extracting land use information is possible to some extent. Also, we found Memory Based Reasoning (MBR) method the most suitable method for this kind of data in all cases.
Resumo:
Actualmente, com a massificação da utilização das redes sociais, as empresas passam a sua mensagem nos seus canais de comunicação, mas os consumidores dão a sua opinião sobre ela. Argumentam, opinam, criticam (Nardi, Schiano, Gumbrecht, & Swartz, 2004). Positiva ou negativamente. Neste contexto o Text Mining surge como uma abordagem interessante para a resposta à necessidade de obter conhecimento a partir dos dados existentes. Neste trabalho utilizámos um algoritmo de Clustering hierárquico com o objectivo de descobrir temas distintos num conjunto de tweets obtidos ao longo de um determinado período de tempo para as empresas Burger King e McDonald’s. Com o intuito de compreender o sentimento associado a estes temas foi feita uma análise de sentimentos a cada tema encontrado, utilizando um algoritmo Bag-of-Words. Concluiu-se que o algoritmo de Clustering foi capaz de encontrar temas através do tweets obtidos, essencialmente ligados a produtos e serviços comercializados pelas empresas. O algoritmo de Sentiment Analysis atribuiu um sentimento a esses temas, permitindo compreender de entre os produtos/serviços identificados quais os que obtiveram uma polaridade positiva ou negativa, e deste modo sinalizar potencias situações problemáticas na estratégia das empresas, e situações positivas passíveis de identificação de decisões operacionais bem-sucedidas.
Resumo:
Therapeutic drug monitoring (TDM) aims to optimize treatments by individualizing dosage regimens based on the measurement of blood concentrations. Dosage individualization to maintain concentrations within a target range requires pharmacokinetic and clinical capabilities. Bayesian calculations currently represent the gold standard TDM approach but require computation assistance. In recent decades computer programs have been developed to assist clinicians in this assignment. The aim of this survey was to assess and compare computer tools designed to support TDM clinical activities. The literature and the Internet were searched to identify software. All programs were tested on personal computers. Each program was scored against a standardized grid covering pharmacokinetic relevance, user friendliness, computing aspects, interfacing and storage. A weighting factor was applied to each criterion of the grid to account for its relative importance. To assess the robustness of the software, six representative clinical vignettes were processed through each of them. Altogether, 12 software tools were identified, tested and ranked, representing a comprehensive review of the available software. Numbers of drugs handled by the software vary widely (from two to 180), and eight programs offer users the possibility of adding new drug models based on population pharmacokinetic analyses. Bayesian computation to predict dosage adaptation from blood concentration (a posteriori adjustment) is performed by ten tools, while nine are also able to propose a priori dosage regimens, based only on individual patient covariates such as age, sex and bodyweight. Among those applying Bayesian calculation, MM-USC*PACK© uses the non-parametric approach. The top two programs emerging from this benchmark were MwPharm© and TCIWorks. Most other programs evaluated had good potential while being less sophisticated or less user friendly. Programs vary in complexity and might not fit all healthcare settings. Each software tool must therefore be regarded with respect to the individual needs of hospitals or clinicians. Programs should be easy and fast for routine activities, including for non-experienced users. Computer-assisted TDM is gaining growing interest and should further improve, especially in terms of information system interfacing, user friendliness, data storage capability and report generation.
Resumo:
Contexte. Les phénotypes ABO et Rh(D) des donneurs de sang ainsi que des patients transfusés sont analysés de façon routinière pour assurer une complète compatibilité. Ces analyses sont accomplies par agglutination suite à une réaction anticorps-antigènes. Cependant, pour des questions de coûts et de temps d’analyses faramineux, les dons de sang ne sont pas testés sur une base routinière pour les antigènes mineurs du sang. Cette lacune peut résulter à une allo-immunisation des patients receveurs contre un ou plusieurs antigènes mineurs et ainsi amener des sévères complications pour de futures transfusions. Plan d’étude et Méthodes. Pour ainsi aborder le problème, nous avons produit un panel génétique basé sur la technologie « GenomeLab _SNPstream» de Beckman Coulter, dans l’optique d’analyser simultanément 22 antigènes mineurs du sang. La source d’ADN provient des globules blancs des patients préalablement isolés sur papiers FTA. Résultats. Les résultats démontrent que le taux de discordance des génotypes, mesuré par la corrélation des résultats de génotypage venant des deux directions de l’ADN, ainsi que le taux d’échec de génotypage sont très bas (0,1%). Également, la corrélation entre les résultats de phénotypes prédit par génotypage et les phénotypes réels obtenus par sérologie des globules rouges et plaquettes sanguines, varient entre 97% et 100%. Les erreurs expérimentales ou encore de traitement des bases de données ainsi que de rares polymorphismes influençant la conformation des antigènes, pourraient expliquer les différences de résultats. Cependant, compte tenu du fait que les résultats de phénotypages obtenus par génotypes seront toujours co-vérifiés avant toute transfusion sanguine par les technologies standards approuvés par les instances gouvernementales, les taux de corrélation obtenus sont de loin supérieurs aux critères de succès attendus pour le projet. Conclusion. Le profilage génétique des antigènes mineurs du sang permettra de créer une banque informatique centralisée des phénotypes des donneurs, permettant ainsi aux banques de sang de rapidement retrouver les profiles compatibles entre les donneurs et les receveurs.