920 resultados para Data representation
Resumo:
Over the past couple of decades, the cultural field formerly known as ‘domestic’, and later ‘personal’ photography has been remediated and transformed as part of the social web, with its convergence of personal expression, interpersonal communication, and online social networks (most recently via platforms like Flickr, Facebook and Twitter). Meanwhile, the Digital Storytelling movement (involving the workshop-based production of short autobiographical videos) from its beginnings in the mid 1990s relied heavily on the narrative power of the personal photograph, often sourced from family albums, and later from online archives. This paper addresses the new issues arising for the politics of self-representation and personal photography in the era of social media, focusing particularly on the consequences of online image-sharing. It discusses in detail the practices of selection, curation, manipulation and editing of personal photographic images among a group of activist-oriented queer digital storytellers who have in common a stated desire to share their personal stories in pursuit of social change, and whose stories often aim to address both intimate and antagonistic publics.
Resumo:
The Bluetooth technology is being increasingly used to track vehicles throughout their trips, within urban networks and across freeway stretches. One important opportunity offered by this type of data is the measurement of Origin-Destination patterns, emerging from the aggregation and clustering of individual trips. In order to obtain accurate estimations, however, a number of issues need to be addressed, through data filtering and correction techniques. These issues mainly stem from the use of the Bluetooth technology amongst drivers, and the physical properties of the Bluetooth sensors themselves. First, not all cars are equipped with discoverable Bluetooth devices and the Bluetooth-enabled vehicles may belong to some small socio-economic groups of users. Second, the Bluetooth datasets include data from various transport modes; such as pedestrian, bicycles, cars, taxi driver, buses and trains. Third, the Bluetooth sensors may fail to detect all of the nearby Bluetooth-enabled vehicles. As a consequence, the exact journey for some vehicles may become a latent pattern that will need to be extracted from the data. Finally, sensors that are in close proximity to each other may have overlapping detection areas, thus making the task of retrieving the correct travelled path even more challenging. The aim of this paper is twofold. We first give a comprehensive overview of the aforementioned issues. Further, we propose a methodology that can be followed, in order to cleanse, correct and aggregate Bluetooth data. We postulate that the methods introduced by this paper are the first crucial steps that need to be followed in order to compute accurate Origin-Destination matrices in urban road networks.
Resumo:
Migraine, with and without aura (MA and MO), is a prevalent and complex neurovascular disorder that is likely to be influenced by multiple genes some of which may be capable of causing vascular changes leading to disease onset. This study was conducted to determine whether the ACE I/D gene variant is involved in migraine risk and whether this variant might act in combination with the previously implicated MTHFR C677T genetic variant in 270 migraine cases and 270 matched controls. Statistical analysis of the ACE I/D variant indicated no significant difference in allele or genotype frequencies (P > 0.05). However, grouping of genotypes showed a modest, yet significant, over-representation of the DD/ID genotype in the migraine group (88%) compared to controls (81%) (OR of 1.64, 95% CI: 1.00–2.69, P = 0.048). Multivariate analysis, including genotype data for the MTHFR C677T, provided evidence that the MTHFR (TT) and ACE (ID/DD) genotypes act in combination to increase migraine susceptibility (OR = 2.18, 95% CI: 1.15–4.16, P = 0.018). This effect was greatest for the MA subtype where the genotype combination corresponded to an OR of 2.89 (95% CI:1.47–5.72, P = 0.002). In Caucasians, the ACE D allele confers a weak independent risk to migraine susceptibility and also appears to act in combination with the C677T variant in the MTHFR gene to confer a stronger influence on the disease.
Resumo:
This thesis is a study for automatic discovery of text features for describing user information needs. It presents an innovative data-mining approach that discovers useful knowledge from both relevance and non-relevance feedback information. The proposed approach can largely reduce noises in discovered patterns and significantly improve the performance of text mining systems. This study provides a promising method for the study of Data Mining and Web Intelligence.
Resumo:
Aerosol mass spectrometers (AMS) are powerful tools in the analysis of the chemical composition of airborne particles, particularly organic aerosols which are gaining increasing attention. However, the advantages of AMS in providing on-line data can be outweighed by the difficulties involved in its use in field measurements at multiple sites. In contrast to the on-line measurement by AMS, a method which involves sample collection on filters followed by subsequent analysis by AMS could significantly broaden the scope of AMS application. We report the application of such an approach to field studies at multiple sites. An AMS was deployed at 5 urban schools to determine the sources of the organic aerosols at the schools directly. PM1 aerosols were also collected on filters at these and 20 other urban schools. The filters were extracted with water and the extract run through a nebulizer to generate the aerosols, which were analysed by an AMS. The mass spectra from the samples collected on filters at the 5 schools were found to have excellent correlations with those obtained directly by AMS, with r2 ranging from 0.89 to 0.98. Filter recoveries varied between the schools from 40 -115%, possibly indicating that this method provides qualitative rather than quantitative information. The stability of the organic aerosols on Teflon filters was demonstrated by analysing samples stored for up to two years. Application of the procedure to the remaining 20 schools showed that secondary organic aerosols were the main source of aerosols at the majority of the schools. Overall, this procedure provides accurate representation of the mass spectra of ambient organic aerosols and could facilitate rapid data acquisition at multiple sites where AMS could not be deployed for logistical reasons.
Resumo:
Literature is limited in its knowledge of the Bluetooth protocol based data acquisition process and in the accuracy and reliability of the analysis performed using the data. This paper extends the body of knowledge surrounding the use of data from the Bluetooth Media Access Control Scanner (BMS) as a complementary traffic data source. A multi layer simulation model named Traffic and Communication Simulation (TCS) is developed. TCS is utilised to model the theoretical properties of the BMS data and analyse the accuracy and reliability of travel time estimation using the BMS data.
Resumo:
The Macroscopic Fundamental Diagram (MFD) relates space-mean density and flow, and the existence with dynamic features was confirmed in congested urban network with real data set from loop detectors and taxi probes. Since the MFD represents the area-wide network traffic performances, it gives foundations for perimeter control strategies and an area traffic state estimation enabling area-based network control. However, limited works have been reported on real world example from signalised arterial network. This paper fuses data from multiple sources (Bluetooth, Loops and Signals) and develops a framework for the development of the MFD for Brisbane. Existence of the MFD in Brisbane network is confirmed. Different MFDs (from whole network and several sub regions) are evaluated to discover the spatial partitioning in network performance representation.
Resumo:
A significant amount of speech data is required to develop a robust speaker verification system, but it is difficult to find enough development speech to match all expected conditions. In this paper we introduce a new approach to Gaussian probabilistic linear discriminant analysis (GPLDA) to estimate reliable model parameters as a linearly weighted model taking more input from the large volume of available telephone data and smaller proportional input from limited microphone data. In comparison to a traditional pooled training approach, where the GPLDA model is trained over both telephone and microphone speech, this linear-weighted GPLDA approach is shown to provide better EER and DCF performance in microphone and mixed conditions in both the NIST 2008 and NIST 2010 evaluation corpora. Based upon these results, we believe that linear-weighted GPLDA will provide a better approach than pooled GPLDA, allowing for the further improvement of GPLDA speaker verification in conditions with limited development data.
Resumo:
We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.
Resumo:
Travel time prediction has long been the topic of transportation research. But most relevant prediction models in the literature are limited to motorways. Travel time prediction on arterial networks is challenging due to involving traffic signals and significant variability of individual vehicle travel time. The limited availability of traffic data from arterial networks makes travel time prediction even more challenging. Recently, there has been significant interest of exploiting Bluetooth data for travel time estimation. This research analysed the real travel time data collected by the Brisbane City Council using the Bluetooth technology on arterials. Databases, including experienced average daily travel time are created and classified for approximately 8 months. Thereafter, based on data characteristics, Seasonal Auto Regressive Integrated Moving Average (SARIMA) modelling is applied on the database for short-term travel time prediction. The SARMIA model not only takes the previous continuous lags into account, but also uses the values from the same time of previous days for travel time prediction. This is carried out by defining a seasonality coefficient which improves the accuracy of travel time prediction in linear models. The accuracy, robustness and transferability of the model are evaluated through comparing the real and predicted values on three sites within Brisbane network. The results contain the detailed validation for different prediction horizons (5 min to 90 minutes). The model performance is evaluated mainly on congested periods and compared to the naive technique of considering the historical average.
Resumo:
Recently, vision-based systems have been deployed in professional sports to track the ball and players to enhance analysis of matches. Due to their unobtrusive nature, vision-based approaches are preferred to wearable sensors (e.g. GPS or RFID sensors) as it does not require players or balls to be instrumented prior to matches. Unfortunately, in continuous team sports where players need to be tracked continuously over long-periods of time (e.g. 35 minutes in field-hockey or 45 minutes in soccer), current vision-based tracking approaches are not reliable enough to provide fully automatic solutions. As such, human intervention is required to fix-up missed or false detections. However, in instances where a human can not intervene due to the sheer amount of data being generated - this data can not be used due to the missing/noisy data. In this paper, we investigate two representations based on raw player detections (and not tracking) which are immune to missed and false detections. Specifically, we show that both team occupancy maps and centroids can be used to detect team activities, while the occupancy maps can be used to retrieve specific team activities. An evaluation on over 8 hours of field hockey data captured at a recent international tournament demonstrates the validity of the proposed approach.
Resumo:
In this paper, we describe a method to represent and discover adversarial group behavior in a continuous domain. In comparison to other types of behavior, adversarial behavior is heavily structured as the location of a player (or agent) is dependent both on their teammates and adversaries, in addition to the tactics or strategies of the team. We present a method which can exploit this relationship through the use of a spatiotemporal basis model. As players constantly change roles during a match, we show that employing a "role-based" representation instead of one based on player "identity" can best exploit the playing structure. As vision-based systems currently do not provide perfect detection/tracking (e.g. missed or false detections), we show that our compact representation can effectively "denoise" erroneous detections as well as enabe temporal analysis, which was previously prohibitive due to the dimensionality of the signal. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labelled data.
Resumo:
The promise of ‘big data’ has generated a significant deal of interest in the development of new approaches to research in the humanities and social sciences, as well as a range of important critical interventions which warn of an unquestioned rush to ‘big data’. Drawing on the experiences made in developing innovative ‘big data’ approaches to social media research, this paper examines some of the repercussions for the scholarly research and publication practices of those researchers who do pursue the path of ‘big data’–centric investigation in their work. As researchers import the tools and methods of highly quantitative, statistical analysis from the ‘hard’ sciences into computational, digital humanities research, must they also subscribe to the language and assumptions underlying such ‘scientificity’? If so, how does this affect the choices made in gathering, processing, analysing, and disseminating the outcomes of digital humanities research? In particular, is there a need to rethink the forms and formats of publishing scholarly work in order to enable the rigorous scrutiny and replicability of research outcomes?
Resumo:
It is only in recent years that the critical role that spatial data can play in disaster management and strengthening community resilience has been recognised. The recognition of this importance is singularly evident from the fact that in Australia spatial data is considered as soft infrastructure. In the aftermath of every disaster this importance is being increasingly strengthened with state agencies paying greater attention to ensuring the availability of accurate spatial data based on the lessons learnt. For example, the major flooding in Queensland during the summer of 2011 resulted in a comprehensive review of responsibilities and accountability for the provision of spatial information during such natural disasters. A high level commission of enquiry completed a comprehensive investigation of the 2011 Brisbane flood inundation event and made specific recommendations concerning the collection of and accessibility to spatial information for disaster management and for strengthening community resilience during and after a natural disaster. The lessons learnt and processes implemented were subsequently tested by natural disasters during subsequent years. This paper provides an overview of the practical implementation of the recommendations of the commission of enquiry. It focuses particularly on the measures adopted by the state agencies with the primary role for managing spatial data and the evolution of this role in Queensland State, Australia. The paper concludes with a review of the development of the role and the increasing importance of spatial data as an infrastructure for disaster planning and management which promotes the strengthening of community resilience.
Resumo:
Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.