827 resultados para data-based reporting
Resumo:
In contemporary game development circles the ‘game making jam’ has become an important rite of passage and baptism event, an exploration space and a central indie lifestyle affirmation and community event. Game jams have recently become a focus for design researchers interested in the creative process. In this paper we tell the story of an established local game jam and our various documentation and data collection methods. We present the beginnings of the current project, which seeks to map the creative teams and their process in the space of the challenge, and which aims to enable participants to be more than the objects of the data collection. A perceived issue is that typical documentation approaches are ‘about’ the event as opposed to ‘made by’ the participants and are thus both at odds with the spirit of the jam as a phenomenon and do not really access the rich playful potential of participant experience. In the data collection and visualisation projects described here, we focus on using collected data to re-include the participants in telling stories about their experiences of the event as a place-based experience. Our goal is to find a means to encourage production of ‘anecdata’ - data based on individual story telling that is subjective, malleable, and resists collection via formal mechanisms - and to enable mimesis, or active narrating, on the part of the participants. We present a concept design for data as game based on the logic of early medieval maps and we reflect on how we could enable participation in the data collection itself.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
We develop and test a method to estimate relative abundance from catch and effort data using neural networks. Most stock assessment models use time series of relative abundance as their major source of information on abundance levels. These time series of relative abundance are frequently derived from catch-per-unit-of-effort (CPUE) data, using general linearized models (GLMs). GLMs are used to attempt to remove variation in CPUE that is not related to the abundance of the population. However, GLMs are restricted in the types of relationships between the CPUE and the explanatory variables. An alternative approach is to use structural models based on scientific understanding to develop complex non-linear relationships between CPUE and the explanatory variables. Unfortunately, the scientific understanding required to develop these models may not be available. In contrast to structural models, neural networks uses the data to estimate the structure of the non-linear relationship between CPUE and the explanatory variables. Therefore neural networks may provide a better alternative when the structure of the relationship is uncertain. We use simulated data based on a habitat based-method to test the neural network approach and to compare it to the GLM approach. Cross validation and simulation tests show that the neural network performed better than nominal effort and the GLM approach. However, the improvement over GLMs is not substantial. We applied the neural network model to CPUE data for bigeye tuna (Thunnus obesus) in the Pacific Ocean.
Peat multi-proxy data from Mannikjarve bog as indicators of late Holocene climate changes in Estonia
Resumo:
As part of a wider project on European climate change over the past 4500 years, a 4.5-m peat core was taken from a lawn microform on Mannikjarve bog, Estonia. Several methods were used to yield proxy-climate data: (i) a quadrat and leaf-count method for plant macrofossil data, (ii) testate amoebae analysis, and (iii) colorimetric determination of peat humification. These data are provided with an exceptionally high resolution and precise chronology. Changes in bog surface wetness were inferred using Detrended Correspondence Analysis (DCA) and zonation of macrofossil data, particularly concerning the occurrence of Sphagnum balticum, and a transfer function for water-table depth for testate amoebae data. Based on the results, periods of high bog surface wetness appear to have occurred at c. 3100, 3010-2990, 2300, 1750-1610, 1510, 14 10, 1110, 540 and 3 10 cal. yr BP, during four longer periods between c. 3170 and 2850 cal. yr BP, 2450 and 2000 cal. yr BP, 1770 and 1530 cal. yr BP and in the period from 880 cal. yr BP until the present. In the period between 1770 and 1530 cal. yr BP. the extension or initiation of a hollow microtope occurred, which corresponds with other research results from Mannikjarve bog. This and other changes towards increasing bog surface wetness may be the responses to colder temperatures and the predominance of a more continental climate in the region, which favoured the development of bog microdepressions and a complex bog microtopography. Located in the border zone of oceanic and continental climatic sectors, in an area almost without land uplift, this study site may provide valuable information about changes in palaeohydrological and palaeoclimatological conditions in the northern parts of the eastern Baltic Sea region.
Resumo:
We have developed an in-house pipeline for the processing and analyses of sequence data generated during Illumina technology-based metagenomic studies of the human gut microbiota. Each component of the pipeline has been selected following comparative analysis of available tools; however, the modular nature of software facilitates replacement of any individual component with an alternative should a better tool become available in due course. The pipeline consists of quality analysis and trimming followed by taxonomic filtering of sequence data allowing reads associated with samples to be binned according to whether they represent human, prokaryotic (bacterial/archaeal), viral, parasite, fungal or plant DNA. Viral, parasite, fungal and plant DNA can be assigned to species level on a presence/absence basis, allowing – for example – identification of dietary intake of plant-based foodstuffs and their derivatives. Prokaryotic DNA is subject to taxonomic and functional analyses, with assignment to taxonomic hierarchies (kingdom, class, order, family, genus, species, strain/subspecies) and abundance determination. After de novo assembly of sequence reads, genes within samples are predicted and used to build a non-redundant catalogue of genes. From this catalogue, per-sample gene abundance can be determined after normalization of data based on gene length. Functional annotation of genes is achieved through mapping of gene clusters against KEGG proteins, and InterProScan. The pipeline is undergoing validation using the human faecal metagenomic data of Qin et al. (2014, Nature 513, 59–64). Outputs from the pipeline allow development of tools for the integration of metagenomic and metabolomic data, moving metagenomic studies beyond determination of gene richness and representation towards microbial-metabolite mapping. There is scope to improve the outputs from viral, parasite, fungal and plant DNA analyses, depending on the depth of sequencing associated with samples. The pipeline can easily be adapted for the analyses of environmental and non-human animal samples, and for use with data generated via non-Illumina sequencing platforms.
Resumo:
This interactive resource introduces Social Science students to recognition and interpretation of data contained in a table. The RLO uses data based on the causes of death of Rock and R&B musicians. When you view an object note that the panel on the left generated by the repository can be dragged sideways to view the learning object full screen. Item from RLO-CETL.
Resumo:
A project to identify metrics for assessing the quality of open data based on the needs of small voluntary sector organisations in the UK and India. For this project we assumed the purpose of open data metrics is to determine the value of a group of open datasets to a defined community of users. We adopted a much more user-centred approach than most open data research using small structured workshops to identify users’ key problems and then working from those problems to understand how open data can help address them and the key attributes of the data if it is to be successful. We then piloted different metrics that might be used to measure the presence of those attributes. The result was six metrics that we assessed for validity, reliability, discrimination, transferability and comparability. This user-centred approach to open data research highlighted some fundamental issues with expanding the use of open data from its enthusiast base.
Resumo:
Eye tracking has become a preponderant technique in the evaluation of user interaction and behaviour with study objects in defined contexts. Common eye tracking related data representation techniques offer valuable input regarding user interaction and eye gaze behaviour, namely through fixations and saccades measurement. However, these and other techniques may be insufficient for the representation of acquired data in specific studies, namely because of the complexity of the study object being analysed. This paper intends to contribute with a summary of data representation and information visualization techniques used in data analysis within different contexts (advertising, websites, television news and video games). Additionally, several methodological approaches are presented in this paper, which resulted from several studies developed and under development at CETAC.MEDIA - Communication Sciences and Technologies Research Centre. In the studies described, traditional data representation techniques were insufficient. As a result, new approaches were necessary and therefore, new forms of representing data, based on common techniques were developed with the objective of improving communication and information strategies. In each of these studies, a brief summary of the contribution to their respective area will be presented, as well as the data representation techniques used and some of the acquired results.
Resumo:
A common problem in many data based modelling algorithms such as associative memory networks is the problem of the curse of dimensionality. In this paper, a new two-stage neurofuzzy system design and construction algorithm (NeuDeC) for nonlinear dynamical processes is introduced to effectively tackle this problem. A new simple preprocessing method is initially derived and applied to reduce the rule base, followed by a fine model detection process based on the reduced rule set by using forward orthogonal least squares model structure detection. In both stages, new A-optimality experimental design-based criteria we used. In the preprocessing stage, a lower bound of the A-optimality design criterion is derived and applied as a subset selection metric, but in the later stage, the A-optimality design criterion is incorporated into a new composite cost function that minimises model prediction error as well as penalises the model parameter variance. The utilisation of NeuDeC leads to unbiased model parameters with low parameter variance and the additional benefit of a parsimonious model structure. Numerical examples are included to demonstrate the effectiveness of this new modelling approach for high dimensional inputs.
Resumo:
In a world of almost permanent and rapidly increasing electronic data availability, techniques of filtering, compressing, and interpreting this data to transform it into valuable and easily comprehensible information is of utmost importance. One key topic in this area is the capability to deduce future system behavior from a given data input. This book brings together for the first time the complete theory of data-based neurofuzzy modelling and the linguistic attributes of fuzzy logic in a single cohesive mathematical framework. After introducing the basic theory of data-based modelling, new concepts including extended additive and multiplicative submodels are developed and their extensions to state estimation and data fusion are derived. All these algorithms are illustrated with benchmark and real-life examples to demonstrate their efficiency. Chris Harris and his group have carried out pioneering work which has tied together the fields of neural networks and linguistic rule-based algortihms. This book is aimed at researchers and scientists in time series modeling, empirical data modeling, knowledge discovery, data mining, and data fusion.
Resumo:
Measured process data normally contain inaccuracies because the measurements are obtained using imperfect instruments. As well as random errors one can expect systematic bias caused by miscalibrated instruments or outliers caused by process peaks such as sudden power fluctuations. Data reconciliation is the adjustment of a set of process data based on a model of the process so that the derived estimates conform to natural laws. In this paper, techniques for the detection and identification of both systematic bias and outliers in dynamic process data are presented. A novel technique for the detection and identification of systematic bias is formulated and presented. The problem of detection, identification and elimination of outliers is also treated using a modified version of a previously available clustering technique. These techniques are also combined to provide a global dynamic data reconciliation (DDR) strategy. The algorithms presented are tested in isolation and in combination using dynamic simulations of two continuous stirred tank reactors (CSTR).
Resumo:
Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality.
Resumo:
This technique paper describes a novel method for quantitatively and routinely identifying auroral breakup following substorm onset using the Time History of Events and Macroscale Interactions During Substorms (THEMIS) all-sky imagers (ASIs). Substorm onset is characterised by a brightening of the aurora that is followed by auroral poleward expansion and auroral breakup. This breakup can be identified by a sharp increase in the auroral intensity i(t) and the time derivative of auroral intensity i'(t). Utilising both i(t) and i'(t) we have developed an algorithm for identifying the time interval and spatial location of auroral breakup during the substorm expansion phase within the field of view of ASI data based solely on quantifiable characteristics of the optical auroral emissions. We compare the time interval determined by the algorithm to independently identified auroral onset times from three previously published studies. In each case the time interval determined by the algorithm is within error of the onset independently identified by the prior studies. We further show the utility of the algorithm by comparing the breakup intervals determined using the automated algorithm to an independent list of substorm onset times. We demonstrate that up to 50% of the breakup intervals characterised by the algorithm are within the uncertainty of the times identified in the independent list. The quantitative description and routine identification of an interval of auroral brightening during the substorm expansion phase provides a foundation for unbiased statistical analysis of the aurora to probe the physics of the auroral substorm as a new scientific tool for aiding the identification of the processes leading to auroral substorm onset.
Resumo:
The past years have shown an enormous advancement in sequencing and array-based technologies, producing supplementary or alternative views of the genome stored in various formats and databases. Their sheer volume and different data scope pose a challenge to jointly visualize and integrate diverse data types. We present AmalgamScope a new interactive software tool focusing on assisting scientists with the annotation of the human genome and particularly the integration of the annotation files from multiple data types, using gene identifiers and genomic coordinates. Supported platforms include next-generation sequencing and microarray technologies. The available features of AmalgamScope range from the annotation of diverse data types across the human genome to integration of the data based on the annotational information and visualization of the merged files within chromosomal regions or the whole genome. Additionally, users can define custom transcriptome library files for any species and use the file exchanging distant server options of the tool.
Resumo:
The General Election for the 56th United Kingdom Parliament was held on 7 May 2015. Tweets related to UK politics, not only those with the specific hashtag ”#GE2015”, have been collected in the period between March 1 and May 31, 2015. The resulting dataset contains over 28 million tweets for a total of 118 GB in uncompressed format or 15 GB in compressed format. This study describes the method that was used to collect the tweets and presents some analysis, including a political sentiment index, and outlines interesting research directions on Big Social Data based on Twitter microblogging.