937 resultados para Data structures (Computer science)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An application that translates raw thermal melt curve data into more easily assimilated knowledge is described. This program, called ‘Meltdown’, performs a number of data remediation steps before classifying melt curves and estimating melting temperatures. The final output is a report that summarizes the results of a differential scanning fluorimetry experiment. Meltdown uses a Bayesian classification scheme, enabling reproducible identification of various trends commonly found in DSF datasets. The goal of Meltdown is not to replace human analysis of the raw data, but to provide a sensible interpretation of the data to make this useful experimental technique accessible to naïve users, as well as providing a starting point for detailed analyses by more experienced users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Self-tracking, the process of recording one's own behaviours, thoughts and feelings, is a popular approach to enhance one's self-knowledge. While dedicated self-tracking apps and devices support data collection, previous research highlights that the integration of data constitutes a barrier for users. In this study we investigated how members of the Quantified Self movement---early adopters of self-tracking tools---overcome these barriers. We conducted a qualitative analysis of 51 videos of Quantified Self presentations to explore intentions for collecting data, methods for integrating and representing data, and how intentions and methods shaped reflection. The findings highlight two different intentions---striving for self-improvement and curiosity in personal data---which shaped how these users integrated data, i.e. the effort required. Furthermore, we identified three methods for representing data---binary, structured and abstract---which influenced reflection. Binary representations supported reflection-in-action, whereas structured and abstract representations supported iterative processes of data collection, integration and reflection. For people tracking out of curiosity, this iterative engagement with personal data often became an end in itself, rather than a means to achieve a goal. We discuss how these findings contribute to our current understanding of self-tracking amongst Quantified Self members and beyond, and we conclude with directions for future work to support self-trackers with their aspirations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis introduced two novel reputation models to generate accurate item reputation scores using ratings data and the statistics of the dataset. It also presented an innovative method that incorporates reputation awareness in recommender systems by employing voting system methods to produce more accurate top-N item recommendations. Additionally, this thesis introduced a personalisation method for generating reputation scores based on users' interests, where a single item can have different reputation scores for different users. The personalised reputation scores are then used in the proposed reputation-aware recommender systems to enhance the recommendation quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Solving large-scale all-to-all comparison problems using distributed computing is increasingly significant for various applications. Previous efforts to implement distributed all-to-all comparison frameworks have treated the two phases of data distribution and comparison task scheduling separately. This leads to high storage demands as well as poor data locality for the comparison tasks, thus creating a need to redistribute the data at runtime. Furthermore, most previous methods have been developed for homogeneous computing environments, so their overall performance is degraded even further when they are used in heterogeneous distributed systems. To tackle these challenges, this paper presents a data-aware task scheduling approach for solving all-to-all comparison problems in heterogeneous distributed systems. The approach formulates the requirements for data distribution and comparison task scheduling simultaneously as a constrained optimization problem. Then, metaheuristic data pre-scheduling and dynamic task scheduling strategies are developed along with an algorithmic implementation to solve the problem. The approach provides perfect data locality for all comparison tasks, avoiding rearrangement of data at runtime. It achieves load balancing among heterogeneous computing nodes, thus enhancing the overall computation time. It also reduces data storage requirements across the network. The effectiveness of the approach is demonstrated through experimental studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This workshop is jointly organized by EFMI Working Groups Security, Safety and Ethics and Personal Portable Devices in cooperation with IMIA Working Group "Security in Health Information Systems". In contemporary healthcare and personal health management the collection and use of personal health information takes place in different contexts and jurisdictions. Global use of health data is also expanding. The approach taken by different experts, health service providers, data subjects and secondary users in understanding privacy and the privacy expectations others may have is strongly context dependent. To make eHealth, global healthcare, mHealth and personal health management successful and to enable fair secondary use of personal health data, it is necessary to find a practical and functional balance between privacy expectations of stakeholder groups. The workshop will highlight these privacy concerns by presenting different cases and approaches. Workshop participants will analyse stakeholder privacy expectations that take place in different real-life contexts such as portable health devices and personal health records, and develop a mechanism to balance them in such a way that global protection of health data and its meaningful use is realized simultaneously. Based on the results of the workshop, initial requirements for a global healthcare information certification framework will be developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This workshop aims at discussing alternative approaches to resolving the problem of health information fragmentation, partially resulting from difficulties of health complex systems to semantically interact at the information level. In principle, we challenge the current paradigm of keeping medical records where they were created and discuss an alternative approach in which an individual's health data can be maintained by new entities whose sole responsibility is the sustainability of individual-centric health records. In particular, we will discuss the unique characteristics of the European health information landscape. This workshop is also a business meeting of the IMIA Working Group on Health Record Banking.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Body Area Network (BAN) is an emerging technology that focuses on monitoring physiological data in, on and around the human body. BAN technology permits wearable and implanted sensors to collect vital data about the human body and transmit it to other nodes via low-energy communication. In this paper, we investigate interactions in terms of data flows between parties involved in BANs under four different scenarios targeting outdoor and indoor medical environments: hospital, home, emergency and open areas. Based on these scenarios, we identify data flow requirements between BAN elements such as sensors and control units (CUs) and parties involved in BANs such as the patient, doctors, nurses and relatives. Identified requirements are used to generate BAN data flow models. Petri Nets (PNs) are used as the formal modelling language. We check the validity of the models and compare them with the existing related work. Finally, using the models, we identify communication and security requirements based on the most common active and passive attack scenarios.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N / n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The search engine log files have been used to gather direct user feedback on the relevancy of the documents presented in the results page. Typically the relative position of the clicks gathered from the log files is used a proxy for the direct user feedback. In this paper we identify reasons for the incompleteness of the relative position of clicks for deciphering the user preferences. Hence, we propose the use of time spent by the user in reading through the document as indicative of user preference for a document with respect to a query. Also, we identify the issues involved in using the time measure and propose means to address them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A key trait of Free and Open Source Software (FOSS) development is its distributed nature. Nevertheless, two project-level operations, the fork and the merge of program code, are among the least well understood events in the lifespan of a FOSS project. Some projects have explicitly adopted these operations as the primary means of concurrent development. In this study, we examine the effect of highly distributed software development, is found in the Linux kernel project, on collection and modelling of software development data. We find that distributed development calls for sophisticated temporal modelling techniques where several versions of the source code tree can exist at once. Attention must be turned towards the methods of quality assurance and peer review that projects employ to manage these parallel source trees. Our analysis indicates that two new metrics, fork rate and merge rate, could be useful for determining the role of distributed version control systems in FOSS projects. The study presents a preliminary data set consisting of version control and mailing list data.