949 resultados para data visualisation
Resumo:
Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.
Resumo:
This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.
Resumo:
Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.
Resumo:
Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.
Resumo:
A teaching and learning development project is currently under way at Queensland University of Technology to develop advanced technology videotapes for use with the delivery of structural engineering courses. These tapes consist of integrated computer and laboratory simulations of important concepts, and behaviour of structures and their components for a number of structural engineering subjects. They will be used as part of the regular lectures and thus will not only improve the quality of lectures and learning environment, but also will be able to replace the ever-dwindling laboratory teaching in these subjects. The use of these videotapes, developed using advanced computer graphics, data visualization and video technologies, will enrich the learning process of the current diverse engineering student body. This paper presents the details of this new method, the methodology used, the results and evaluation in relation to one of the structural engineering subjects, steel structures.
Resumo:
A teaching and learning development project is currently under way at Queensland University of Technology to develop advanced technology videotapes for use with the delivery of structural engineering courses. These tapes consist of integrated computer and laboratory simulations of important concepts, and behaviour of structures and their components for a number of structural engineering subjects. They will be used as part of the regular lectures and thus will not only improve the quality of lectures and learning environment, but also will be able to replace the ever-dwindling laboratory teaching in these subjects. The use of these videotapes, developed using advanced computer graphics, data visualization and video technologies, will enrich the learning process of the current diverse engineering student body. This paper presents the details of this new method, the methodology used, the results and evaluation in relation to one of the structural engineering subjects, steel structures.
Resumo:
The authors currently engage in two projects to improve human-computer interaction (HCI) designs that can help conserve resources. The projects explore motivation and persuasion strategies relevant to ubiquitous computing systems that bring real-time consumption data into the homes and hands of residents in Brisbane, Australia. The first project seeks to increase understanding among university staff of the tangible and negative effects that excessive printing has on the workplace and local environment. The second project seeks to shift attitudes toward domestic energy conservation through software and hardware that monitor real-time, in situ electricity consumption in homes across Queensland. The insights drawn from these projects will help develop resource consumption user archetypes, providing a framework linking people to differing interface design requirements.
Resumo:
Computational journalism involves the application of software and technologies to the activities of journalism, and it draws from the fields of computer science, the social sciences, and media and communications. New technologies may enhance the traditional aims of journalism, or may initiate greater interaction between journalists and information and communication technology (ICT) specialists. The enhanced use of computing in news production is related in particular to three factors: larger government data sets becoming more widely available; the increasingly sophisticated and ubiquitous nature of software; and the developing digital economy. Drawing upon international examples, this paper argues that computational journalism techniques may provide new foundations for original investigative journalism and increase the scope for new forms of interaction with readers. Computer journalism provides a major opportunity to enhance the delivery of original investigative journalism, and to attract and retain readers online.
Resumo:
Twitter has become a major instrument for the rapid dissemination and subsequent debate of news stories. It has been instrumental both in drawing attention to events as they unfolded (such as the emergency landing of a plane in New York’s Hudson River in 2009) and in facilitating a sustained discussion of major stories over timeframes measured in weeks and months (including the continuing saga around Wikileaks and Julian Assange), sometimes still keeping stories alive even if mainstream media attention has moved on elsewhere. More comprehensive methodologies for research into news discussion on Twitter – beyond anecdotal or case study approaches – are only now beginning to emerge. This paper presents a large-scale quantitative approach to studying public communication in the Australian Twittersphere, developed as part of a three-year ARC Discovery project that also examines blogs and other social media spaces. The paper will both outline the innovative research tools developed for this work, and present outcomes from an application of these methodologies to recent and present news themes. Our methodology enables us to identify major themes in Twitter’s discussion of these events, trace their development and decline over time, and map the dynamics of the discussion networks formed ad hoc around specific themes (in part with the help of Twitter #hashtags: brief identifiers which mark a tweet as taking part in an established discussion). It is also able to identify links to major news stories and other online resources, and to track their dissemination across the wider Twittersphere.
Resumo:
The growth of technologies and tools branded as =new media‘ or =Web 2.0‘ has sparked much discussion about the internet and its place in all facets of social life. Such debate includes the potential for blogs and citizen journalism projects to replace or alter journalism and mainstream media practices. However, while the journalism-blog dynamic has attracted the most attention, the actual work of political bloggers, the roles they play in the mediasphere and the resources they use, has been comparatively ignored. This project will look at political blogging in Australia and France - sites commenting on or promoting political events and ideas, and run by citizens, politicians, and journalists alike. In doing so, the structure of networks formed by bloggers and the nature of communication within political blogospheres will be examined. Previous studies of political blogging around the world have focussed on individual nations, finding that in some cases the networks are divided between different political ideologies. By comparing two countries with different political representation (two-party dominated system vs. a wider political spectrum), this study will determine the structure of these political blogospheres, and correlate these structures with the political environment in which they are situated. The thesis adapts concepts from communication and media theories, including framing, agenda setting, and opinion leaders, to examine the work of political bloggers and their place within the mediasphere. As well as developing a hybrid theoretical base for research into blogs and other online communication, the project outlines new methodologies for carrying out studies of online activity through the analysis of several topical networks within the wider activity collected for this project. The project draws on hyperlink and textual data collected from a sample of Australian and French blogs between January and August 2009. From this data, the thesis provides an overview of =everyday‘ political blogging, showing posting patterns over several months of activity, away from national elections and their associated campaigns. However, while other work in this field has looked solely at cumulative networks, treating collected data as a static network, this project will also look at specific cases to see how the blogospheres change with time and topics of discussion. Three case studies are used within the thesis to examine how blogs cover politics, featuring an international political event (the Obama inauguration), and local political topics (the opposition to the =Création et Internet‘, or HADOPI, law in France, the =Utegate‘ scandal in Australia). By using a mixture of qualitative and quantitative methods, the study analyses data collected from a population of sites from both countries, looking at their linking patterns, relationship with mainstream media, and topics of interest. This project will subsequently help to further develop methodologies in this field and provide new and detailed information on both online networks and internet-based political communication in Australia and France.
Resumo:
Smart metering presents opportunities for business model creation. However the viability of many potential business models in a smart metering scenario may be dictated by privacy regulation and data sharing arrangements. An understanding by businesses of customers’ preferences for the visualisation of their electricity consumption and the degree to which they are willing to share it, is valuable. We present results from two interviews exploring data visualisation and willingness to share personal electricity consumption information. Participants displayed a high willingness to share and a preference for access to additional information when visualising their electricity consumption.
Resumo:
We have always felt that “something very special” was happening in the 48hr and other similar game jams. This “something” is more than the intensity and challenge of the experience, although this certainly has appeal for the participants. We had an intuition that these intense 48 hour game jams exposed something pertinent to the changing shape of the Australian games industry where we see the demise of the late 20th century large studio - the “Night Elf” model and the growth of the small independent model. There are a large number of wider economic and cultural factors around this evolution but our interest is specifically in the change from “industry” to “creative industry” and the growth of games as a cultural media and art practice. If we are correct in our intuition, then illuminating this something also has important ramifications for those courses which teach game and interaction design and development. Rather than undertake a formal ethno-methodological approach, we decided to track as many of the actors in the event as possible. We documented the experience (Keith Novak’s beautiful B&W photography), the individual and their technology (IOGraph mouse tracking), the teams as a group (Time lapse photography) and movement tracking throughout the whole space (Blue tooth phone tracking). The raw data collected has given us opportunity to start a commentary on the “something special” happening in the 48hr.
Resumo:
The Course Quality Assurance System at Queensland University of Technology (QUT) has as its centrepiece an exemplar of data visualisation known as the Individual Course Report (ICR). This report provides every course coordinator with an annual snapshot of their performance data evaluated against QUT and national benchmarks. In this article, the impact of the ICR is explored through the case study of one undergraduate course identified as underperforming. The case study features an innovative, ethnographic approach to working with course teams and highlights the importance of context, collaboration and appropriate support in creating evidence-based action plans for course improvement.
Resumo:
The convergence of locative and social media with collaborative interfaces and data visualisation has expanded the potential of online information provision. Offering new ways for communities to share contextually specific information, it presents the opportunity to expand social media’s current focus on micro self-publishing with applications that support communities to actively address areas of local need. This paper details the design and development of a prototype application that illustrates this potential. Entitled PetSearch, it was designed in collaboration with the Animal Welfare League of Queensland to support communities to map and locate lost, found and injured pets, and to build community engagement in animal welfare issues. We argue that, while established approaches to social and locative media provide a useful foundation for designing applications to harness social capital, they must be re-envisaged if they are to effectively facilitate community collaboration. We conclude by arguing that the principles of user engagement and co-operation employed in this project can be extrapolated to other online approaches that aim to facilitate co-operative problem solving for social benefit.