424 resultados para Blog datasets
Resumo:
This paper addresses the development of trust in the use of Open Data through incorporation of appropriate authentication and integrity parameters for use by end user Open Data application developers in an architecture for trustworthy Open Data Services. The advantages of this architecture scheme is that it is far more scalable, not another certificate-based hierarchy that has problems with certificate revocation management. With the use of a Public File, if the key is compromised: it is a simple matter of the single responsible entity replacing the key pair with a new one and re-performing the data file signing process. Under this proposed architecture, the the Open Data environment does not interfere with the internal security schemes that might be employed by the entity. However, this architecture incorporates, when needed, parameters from the entity, e.g. person who authorized publishing as Open Data, at the time that datasets are created/added.
Resumo:
This chapter discusses the methodological aspects and empirical findings of a large-scale, funded project investigating public communication through social media in Australia. The project concentrates on Twitter, but we approach it as representative of broader current trends toward the integration of large datasets and computational methods into media and communication studies in general, and social media scholarship in particular. The research discussed in this chapter aims to empirically describe networks of affiliation and interest in the Australian Twittersphere, while reflecting on the methodological implications and imperatives of ‘big data’ in the humanities. Using custom network crawling technology, we have conducted a snowball crawl of Twitter accounts operated by Australian users to identify more than one million users and their follower/followee relationships, and have mapped their interconnections. In itself, the map provides an overview of the major clusters of densely interlinked users, largely centred on shared topics of interest (from politics through arts to sport) and/or sociodemographic factors (geographic origins, age groups). Our map of the Twittersphere is the first of its kind for the Australian part of the global Twitter network, and also provides a first independent and scholarly estimation of the size of the total Australian Twitter population. In combination with our investigation of participation patterns in specific thematic hashtags, the map also enables us to examine which areas of the underlying follower/followee network are activated in the discussion of specific current topics – allowing new insights into the extent to which particular topics and issues are of interest to specialised niches or to the Australian public more broadly. Specifically, we examine the Twittersphere footprint of dedicated political discussion, under the #auspol hashtag, and compare it with the heightened, broader interest in Australian politics during election campaigns, using #ausvotes; we explore the different patterns of Twitter activity across the map for major television events (the popular competitive cooking show #masterchef, the British #royalwedding, and the annual #stateoforigin Rugby League sporting contest); and we investigate the circulation of links to the articles published by a number of major Australian news organisations across the network. Such analysis, which combines the ‘big data’-informed map and a close reading of individual communicative phenomena, makes it possible to trace the dynamic formation and dissolution of issue publics against the backdrop of longer-term network connections, and the circulation of information across these follower/followee links. Such research sheds light on the communicative dynamics of Twitter as a space for mediated social interaction. Our work demonstrates the possibilities inherent in the current ‘computational turn’ (Berry, 2010) in the digital humanities, as well as adding to the development and critical examination of methodologies for dealing with ‘big data’ (boyd and Crawford, 2011). Out tools and methods for doing Twitter research, released under Creative Commons licences through our project Website, provide the basis for replicable and verifiable digital humanities research on the processes of public communication which take place through this important new social network.
Resumo:
Building on hashtag datasets gathered since January 2011, this paper will compare patterns of Twitter usage during the popular revolution in Egypt and the civil war in Libya. Using custom-made tools for processing ‘big data’ (boyd & Crawford, 2011), we will examine the volume of tweets sent by English-, Arabic-, and mixed-language Twitter users over time, and examine the networks of interaction (variously through @replying, retweeting, or both) between these groups as they developed and shifted over the course of these uprisings. Examining @reply and retweet traffic, we will identify general patterns of information flow between the English- and Arabic-speaking sides of the Twittersphere, and highlight the roles played by key boundary riders connecting both language spheres. Further, we will examine the URLs shared in these hashtags by Twitter participants, to identify the most prominent overall information sources, examine differences in the information diet experienced by English- and Arabic-language users, and investigate whether there are any online sources whose URLs are transcending language boundaries more frequently than others.
Resumo:
Weblogs, or blogs, constitute a form and genre of online publishing that emerged in the mid-1990s as a logical consequence of the confluence of personal and professional home pages and new web publishing technologies. To overcome technological limitations, where news updates had to be manually inserted by editing the underlying HTML code, the early content-management systems in the second half of the 1990s built on server-side database technology to dynamically generate web pages; this enabled more convenient and more frequent content updates. Weblogs utilised such technologies to provide an up-to-date news feed, presenting individual news items in reverse chronological order. Most blogging platforms provide commenting functions that enable readers to respond to and discuss individual blog posts...
Resumo:
Discounted Cumulative Gain (DCG) is a well-known ranking evaluation measure for models built with multiple relevance graded data. By handling tagging data used in recommendation systems as an ordinal relevance set of {negative,null,positive}, we propose to build a DCG based recommendation model. We present an efficient and novel learning-to-rank method by optimizing DCG for a recommendation model using the tagging data interpretation scheme. Evaluating the proposed method on real-world datasets, we demonstrate that the method is scalable and outperforms the benchmarking methods by generating a quality top-N item recommendation list.
Resumo:
Map-matching algorithms that utilise road segment connectivity along with other data (i.e.position, speed and heading) in the process of map-matching are normally suitable for high frequency (1 Hz or higher) positioning data from GPS. While applying such map-matching algorithms to low frequency data (such as data from a fleet of private cars, buses or light duty vehicles or smartphones), the performance of these algorithms reduces to in the region of 70% in terms of correct link identification, especially in urban and sub-urban road networks. This level of performance may be insufficient for some real-time Intelligent Transport System (ITS) applications and services such as estimating link travel time and speed from low frequency GPS data. Therefore, this paper develops a new weight-based shortest path and vehicle trajectory aided map-matching (stMM) algorithm that enhances the map-matching of low frequency positioning data on a road map. The well-known A* search algorithm is employed to derive the shortest path between two points while taking into account both link connectivity and turn restrictions at junctions. In the developed stMM algorithm, two additional weights related to the shortest path and vehicle trajectory are considered: one shortest path-based weight is related to the distance along the shortest path and the distance along the vehicle trajectory, while the other is associated with the heading difference of the vehicle trajectory. The developed stMM algorithm is tested using a series of real-world datasets of varying frequencies (i.e. 1 s, 5 s, 30 s, 60 s sampling intervals). A high-accuracy integrated navigation system (a high-grade inertial navigation system and a carrier-phase GPS receiver) is used to measure the accuracy of the developed algorithm. The results suggest that the algorithm identifies 98.9% of the links correctly for every 30 s GPS data. Omitting the information from the shortest path and vehicle trajectory, the accuracy of the algorithm reduces to about 73% in terms of correct link identification. The algorithm can process on average 50 positioning fixes per second making it suitable for real-time ITS applications and services.
Resumo:
Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).
Resumo:
In this paper we present a new method for performing Bayesian parameter inference and model choice for low count time series models with intractable likelihoods. The method involves incorporating an alive particle filter within a sequential Monte Carlo (SMC) algorithm to create a novel pseudo-marginal algorithm, which we refer to as alive SMC^2. The advantages of this approach over competing approaches is that it is naturally adaptive, it does not involve between-model proposals required in reversible jump Markov chain Monte Carlo and does not rely on potentially rough approximations. The algorithm is demonstrated on Markov process and integer autoregressive moving average models applied to real biological datasets of hospital-acquired pathogen incidence, animal health time series and the cumulative number of poison disease cases in mule deer.
Resumo:
Pilot and industrial scale dilute acid pretreatment data can be difficult to obtain due to the significant infrastructure investment required. Consequently, models of dilute acid pretreatment by necessity use laboratory scale data to determine kinetic parameters and make predictions about optimal pretreatment conditions at larger scales. In order for these recommendations to be meaningful, the ability of laboratory scale models to predict pilot and industrial scale yields must be investigated. A mathematical model of the dilute acid pretreatment of sugarcane bagasse has previously been developed by the authors. This model was able to successfully reproduce the experimental yields of xylose and short chain xylooligomers obtained at the laboratory scale. In this paper, the ability of the model to reproduce pilot scale yield and composition data is examined. It was found that in general the model over predicted the pilot scale reactor yields by a significant margin. Models that appear very promising at the laboratory scale may have limitations when predicting yields on a pilot or industrial scale. It is difficult to comment whether there are any consistent trends in optimal operating conditions between reactor scale and laboratory scale hydrolysis due to the limited reactor datasets available. Further investigation is needed to determine whether the model has some efficacy when the kinetic parameters are re-evaluated by parameter fitting to reactor scale data, however, this requires the compilation of larger datasets. Alternatively, laboratory scale mathematical models may have enhanced utility for predicting larger scale reactor performance if bulk mass transport and fluid flow considerations are incorporated into the fibre scale equations. This work reinforces the need for appropriate attention to be paid to pilot scale experimental development when moving from laboratory to pilot and industrial scales for new technologies.
Resumo:
Urbanisation significantly changes the characteristics of a catchment as natural areas are transformed to impervious surfaces such as roads, roofs and parking lots. The increased fraction of impervious surfaces leads to changes to the stormwater runoff characteristics, whilst a variety of anthropogenic activities common to urban areas generate a range of pollutants such as nutrients, solids and organic matter. These pollutants accumulate on catchment surfaces and are removed and trans- ported by stormwater runoff and thereby contribute pollutant loads to receiving waters. In summary, urbanisation influences the stormwater characteristics of a catchment, including hydrology and water quality. Due to the growing recognition that stormwater pollution is a significant environmental problem, the implementation of mitigation strategies to improve the quality of stormwater runoff is becoming increasingly common in urban areas. A scientifically robust stormwater quality treatment strategy is an essential requirement for effective urban stormwater management. The efficient design of treatment systems is closely dependent on the state of knowledge in relation to the primary factors influencing stormwater quality. In this regard, stormwater modelling outcomes provide designers with important guidance and datasets which significantly underpin the design of effective stormwater treatment systems. Therefore, the accuracy of modelling approaches and the reliability modelling outcomes are of particular concern. This book discusses the inherent complexity and key characteristics in the areas of urban hydrology and stormwater quality, based on the influence exerted by a range of rainfall and catchment characteristics. A comprehensive field sampling and testing programme in relation to pollutant build-up, an urban catchment monitoring programme in relation to stormwater quality and the outcomes from advanced statistical analyses provided the platform for the knowledge creation. Two case studies and two real-world applications are discussed to illustrate the translation of the knowledge created to practical use in relation to the role of rainfall and catchment characteristics on urban stormwater quality. An innovative rainfall classification based on stormwater quality was developed to support the effective and scientifically robust design of stormwater treatment systems. Underpinned by the rainfall classification methodology, a reliable approach for design rainfall selection is proposed in order to optimise stormwater treatment based on both, stormwater quality and quantity. This is a paradigm shift from the common approach where stormwater treatment systems are designed based solely on stormwater quantity data. Additionally, how pollutant build-up and stormwater runoff quality vary with a range of catchment characteristics was also investigated. Based on the study out- comes, it can be concluded that the use of only a limited number of catchment parameters such as land use and impervious surface percentage, as it is the case in current modelling approaches, could result in appreciable error in water quality estimation. Influential factors which should be incorporated into modelling in relation to catchment characteristics, should also include urban form and impervious surface area distribution. The knowledge created through the research investigations discussed in this monograph is expected to make a significant contribution to engineering practice such as hydrologic and stormwater quality modelling, stormwater treatment design and urban planning, as the study outcomes provide practical approaches and recommendations for urban stormwater quality enhancement. Furthermore, this monograph also demonstrates how fundamental knowledge of stormwater quality processes can be translated to provide guidance on engineering practice, the comprehensive application of multivariate data analyses techniques and a paradigm on integrative use of computer models and mathematical models to derive practical outcomes.
Resumo:
In a previous blog I was critical of the US health care system for not using cost-effectiveness information to plan their services. Today I’m going to talk about the implementation of innovation in health services, something the US does really well compared to Australia.
Resumo:
My impression is that explicit data on the cost-effectiveness of different health care services are not valued highly by US policy makers. An example is a recent decision to approve ipilimumab for the treatment of metastatic melanoma. The extra health benefit over standard treatment is 2.1 months in previously untreated patients and the cost is $120,000 for 4 doses. This is poor value for money. Had $120,000 been allocated to an intensive lifestyle modification programme for diabetes risk (Diabet Med. 2004 Nov;21(11):1229-36) then 67 years of life or 800 months could have been returned. A massive increase in health benefits for the same costs.
Resumo:
The collection of basic environmental data by industry members was successful and offers a way of overcoming the problems associated with differences in scale between the environment and fisheries datasets. A simple method of collecting environmental data was developed that was only a small time burden on skippers, yet has the potential to provide very useful information on the same scale as the catch and effort data recorded in the logbooks. The success of this trial was aided by the natural interest of fishers to learn more about the environment in which they fish. The archival temperature-depth tags chosen proved robust, reliable and easy to use. While the use of large scale environmental data may not yield significant improvements in stock assessments for most SESSF species, fine-scale data collected from selected vessels using methods developed during this project may, in the longer term, be useful for incorporation into CPUE standardisations in the future...
Resumo:
Affect is an important feature of multimedia content and conveys valuable information for multimedia indexing and retrieval. Most existing studies for affective content analysis are limited to low-level features or mid-level representations, and are generally criticized for their incapacity to address the gap between low-level features and high-level human affective perception. The facial expressions of subjects in images carry important semantic information that can substantially influence human affective perception, but have been seldom investigated for affective classification of facial images towards practical applications. This paper presents an automatic image emotion detector (IED) for affective classification of practical (or non-laboratory) data using facial expressions, where a lot of “real-world” challenges are present, including pose, illumination, and size variations etc. The proposed method is novel, with its framework designed specifically to overcome these challenges using multi-view versions of face and fiducial point detectors, and a combination of point-based texture and geometry. Performance comparisons of several key parameters of relevant algorithms are conducted to explore the optimum parameters for high accuracy and fast computation speed. A comprehensive set of experiments with existing and new datasets, shows that the method is effective despite pose variations, fast, and appropriate for large-scale data, and as accurate as the method with state-of-the-art performance on laboratory-based data. The proposed method was also applied to affective classification of images from the British Broadcast Corporation (BBC) in a task typical for a practical application providing some valuable insights.
Resumo:
Libraries have often been first adopters of many new technological innovations, such as, punch cards, computers, barcodes, and e-book readers. It is thus not surprising that many libraries have embraced the advent of the internet as an opportunity to move away from just being repositories of books, towards becoming ideas stores and local network hubs for entrepreneurial thinking and new creative practices. This presentation will look at the case of “The Edge” – an initiative of the State Library of Queensland in Brisbane, Australia, to establish a digital culture centre and learning environment deliberately designed for the co-creation and co-construction of knowledge. This initiative illustrates the potential role of libraries as testing grounds for new technologies and technological practices, which is particularly relevant in the context of the NBN rollout across Australia. It also provides an example of new engagement strategies for innovative co-working spaces that are a vital element in a trend that sees professionals, creatives and designers leave their traditional places of work and embrace the city as their office.