980 resultados para data-projection


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A number of online algorithms have been developed that have small additional loss (regret) compared to the best “shifting expert”. In this model, there is a set of experts and the comparator is the best partition of the trial sequence into a small number of segments, where the expert of smallest loss is chosen in each segment. The regret is typically defined for worst-case data / loss sequences. There has been a recent surge of interest in online algorithms that combine good worst-case guarantees with much improved performance on easy data. A practically relevant class of easy data is the case when the loss of each expert is iid and the best and second best experts have a gap between their mean loss. In the full information setting, the FlipFlop algorithm by De Rooij et al. (2014) combines the best of the iid optimal Follow-The-Leader (FL) and the worst-case-safe Hedge algorithms, whereas in the bandit information case SAO by Bubeck and Slivkins (2012) competes with the iid optimal UCB and the worst-case-safe EXP3. We ask the same question for the shifting expert problem. First, we ask what are the simple and efficient algorithms for the shifting experts problem when the loss sequence in each segment is iid with respect to a fixed but unknown distribution. Second, we ask how to efficiently unite the performance of such algorithms on easy data with worst-case robustness. A particular intriguing open problem is the case when the comparator shifts within a small subset of experts from a large set under the assumption that the losses in each segment are iid.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The majority of sugar mill locomotives are equipped with GPS devices from which locomotive position data is stored. Locomotive run information (e.g. start times, run destinations and activities) is electronically stored in software called TOTools. The latest software development allows TOTools to interpret historical GPS information by combining this data with run information recorded in TOTools and geographic information from a GIS application called MapInfo. As a result, TOTools is capable of summarising run activity details such as run start and finish times and shunt activities with great accuracy. This paper presents 15 reports developed to summarise run activities and speed information. The reports will be of use pre-season to assist in developing the next year's schedule and for determining priorities for investment in the track infrastructure. They will also be of benefit during the season to closely monitor locomotive run performance against the existing schedule.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Available industrial energy meters offer high accuracy and reliability, but are typically expensive and low-bandwidth, making them poorly suited to multi-sensor data acquisition schemes and power quality analysis. An alternative measurement system is proposed in this paper that is highly modular, extensible and compact. To minimise cost, the device makes use of planar coreless PCB transformers to provide galvanic isolation for both power and data. Samples from multiple acquisition devices may be concentrated by a central processor before integration with existing host control systems. This paper focusses on the practical design and implementation of planar coreless PCB transformers to facilitate the module's isolated power, clock and data signal transfer. Calculations necessary to design coreless PCB transformers, and circuits designed for the transformer's practical application in the measurement module are presented. The designed transformer and each application circuit have been experimentally verified, with test data and conclusions made applicable to coreless PCB transformers in general.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Smart Card Automated Fare Collection (AFC) data has been extensively exploited to understand passenger behavior, passenger segment, trip purpose and improve transit planning through spatial travel pattern analysis. The literature has been evolving from simple to more sophisticated methods such as from aggregated to individual travel pattern analysis, and from stop-to-stop to flexible stop aggregation. However, the issue of high computing complexity has limited these methods in practical applications. This paper proposes a new algorithm named Weighted Stop Density Based Scanning Algorithm with Noise (WS-DBSCAN) based on the classical Density Based Scanning Algorithm with Noise (DBSCAN) algorithm to detect and update the daily changes in travel pattern. WS-DBSCAN converts the classical quadratic computation complexity DBSCAN to a problem of sub-quadratic complexity. The numerical experiment using the real AFC data in South East Queensland, Australia shows that the algorithm costs only 0.45% in computation time compared to the classical DBSCAN, but provides the same clustering results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the main challenges in data analytics is that discovering structures and patterns in complex datasets is a computer-intensive task. Recent advances in high-performance computing provide part of the solution. Multicore systems are now more affordable and more accessible. In this paper, we investigate how this can be used to develop more advanced methods for data analytics. We focus on two specific areas: model-driven analysis and data mining using optimisation techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most real-life data analysis problems are difficult to solve using exact methods, due to the size of the datasets and the nature of the underlying mechanisms of the system under investigation. As datasets grow even larger, finding the balance between the quality of the approximation and the computing time of the heuristic becomes non-trivial. One solution is to consider parallel methods, and to use the increased computational power to perform a deeper exploration of the solution space in a similar time. It is, however, difficult to estimate a priori whether parallelisation will provide the expected improvement. In this paper we consider a well-known method, genetic algorithms, and evaluate on two distinct problem types the behaviour of the classic and parallel implementations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper addresses the development of trust in the use of Open Data through incorporation of appropriate authentication and integrity parameters for use by end user Open Data application developers in an architecture for trustworthy Open Data Services. The advantages of this architecture scheme is that it is far more scalable, not another certificate-based hierarchy that has problems with certificate revocation management. With the use of a Public File, if the key is compromised: it is a simple matter of the single responsible entity replacing the key pair with a new one and re-performing the data file signing process. Under this proposed architecture, the the Open Data environment does not interfere with the internal security schemes that might be employed by the entity. However, this architecture incorporates, when needed, parameters from the entity, e.g. person who authorized publishing as Open Data, at the time that datasets are created/added.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Honig and Samuelsson (2014) and Delmar (2015) recently had an exchange in this journal related to a replication-and-extension attempt of two papers which originally arrived at different conclusions based on the same data set. This commentary provides further clarification on the issues and links the debate to broader issues scholarly culture and practices in entrepreneurship research.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Historically, the paper hand-held record (PHR) has been used for sharing information between hospital clinicians, general practitioners and pregnant women in a maternity shared-care environment. Recently in alignment with a National e-health agenda, an electronic health record (EHR) was introduced at an Australian tertiary maternity service to replace the PHR for collection and transfer of data. The aim of this study was to examine and compare the completeness of clinical data collected in a PHR and an EHR. Methods We undertook a comparative cohort design study to determine differences in completeness between data collected from maternity records in two phases. Phase 1 data were collected from the PHR and Phase 2 data from the EHR. Records were compared for completeness of best practice variables collected The primary outcome was the presence of best practice variables and the secondary outcomes were the differences in individual variables between the records. Results Ninety-four percent of paper medical charts were available in Phase 1 and 100% of records from an obstetric database in Phase 2. No PHR or EHR had a complete dataset of best practice variables. The variables with significant improvement in completeness of data documented in the EHR, compared with the PHR, were urine culture, glucose tolerance test, nuchal screening, morphology scans, folic acid advice, tobacco smoking, illicit drug assessment and domestic violence assessment (p = 0.001). Additionally the documentation of immunisations (pertussis, hepatitis B, varicella, fluvax) were markedly improved in the EHR (p = 0.001). The variables of blood pressure, proteinuria, blood group, antibody, rubella and syphilis status, showed no significant differences in completeness of recording. Conclusion This is the first paper to report on the comparison of clinical data collected on a PHR and EHR in a maternity shared-care setting. The use of an EHR demonstrated significant improvements to the collection of best practice variables. Additionally, the data in an EHR were more available to relevant clinical staff with the appropriate log-in and more easily retrieved than from the PHR. This study contributes to an under-researched area of determining data quality collected in patient records.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With increasing concern about consumer product-related injuries in Australia, product safety regulators need evidence-based research to understand risks and patterns to inform their decision making. This study analysed paediatric injury data to identify and quantify product-related injuries in children to inform product safety prioritisation. This study provides information on novel techniques for interrogating health data to identify trends and patterns in product-related injuries to inform strategic directions in this growing area of concern.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The importance of design practice informed by urban morphology has led to intensification in interest, signalled by the formation of the ISUF Research and Practice Task Force and voiced through several recent academic publications cognisant of this current debate, this paper reports on a recent urban design workshop at which morphology was set as one of the key themes. Initially planned to be programmed as a augmented concurrent event to the 2013 20th ISUF conference held in Brisbane, the two day Bridge to Bridge: Ridge to Ridge urban design workshop nevertheless took place the following month, and involved over one hundred design professionals and academics. The workshop sought to develop several key urban design principles and recommendations addressing a major government development proposal sited in the most important heritage precinct of the city. The paper will focus specifically on one of the nine groups, in which the design proposal was purposefully guided by morphological input. The discussion will examine the design outcomes and illicit review and feedback from participants, shedding critical light on the issues that arise from such a design approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This chapter discusses the methodological aspects and empirical findings of a large-scale, funded project investigating public communication through social media in Australia. The project concentrates on Twitter, but we approach it as representative of broader current trends toward the integration of large datasets and computational methods into media and communication studies in general, and social media scholarship in particular. The research discussed in this chapter aims to empirically describe networks of affiliation and interest in the Australian Twittersphere, while reflecting on the methodological implications and imperatives of ‘big data’ in the humanities. Using custom network crawling technology, we have conducted a snowball crawl of Twitter accounts operated by Australian users to identify more than one million users and their follower/followee relationships, and have mapped their interconnections. In itself, the map provides an overview of the major clusters of densely interlinked users, largely centred on shared topics of interest (from politics through arts to sport) and/or sociodemographic factors (geographic origins, age groups). Our map of the Twittersphere is the first of its kind for the Australian part of the global Twitter network, and also provides a first independent and scholarly estimation of the size of the total Australian Twitter population. In combination with our investigation of participation patterns in specific thematic hashtags, the map also enables us to examine which areas of the underlying follower/followee network are activated in the discussion of specific current topics – allowing new insights into the extent to which particular topics and issues are of interest to specialised niches or to the Australian public more broadly. Specifically, we examine the Twittersphere footprint of dedicated political discussion, under the #auspol hashtag, and compare it with the heightened, broader interest in Australian politics during election campaigns, using #ausvotes; we explore the different patterns of Twitter activity across the map for major television events (the popular competitive cooking show #masterchef, the British #royalwedding, and the annual #stateoforigin Rugby League sporting contest); and we investigate the circulation of links to the articles published by a number of major Australian news organisations across the network. Such analysis, which combines the ‘big data’-informed map and a close reading of individual communicative phenomena, makes it possible to trace the dynamic formation and dissolution of issue publics against the backdrop of longer-term network connections, and the circulation of information across these follower/followee links. Such research sheds light on the communicative dynamics of Twitter as a space for mediated social interaction. Our work demonstrates the possibilities inherent in the current ‘computational turn’ (Berry, 2010) in the digital humanities, as well as adding to the development and critical examination of methodologies for dealing with ‘big data’ (boyd and Crawford, 2011). Out tools and methods for doing Twitter research, released under Creative Commons licences through our project Website, provide the basis for replicable and verifiable digital humanities research on the processes of public communication which take place through this important new social network.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Relative abundance data is common in the life sciences, but appreciation that it needs special analysis and interpretation is scarce. Correlation is popular as a statistical measure of pairwise association but should not be used on data that carry only relative information. Using timecourse yeast gene expression data, we show how correlation of relative abundances can lead to conclusions opposite to those drawn from absolute abundances, and that its value changes when different components are included in the analysis. Once all absolute information has been removed, only a subset of those associations will reliably endure in the remaining relative data, specifically, associations where pairs of values behave proportionally across observations. We propose a new statistic φ to describe the strength of proportionality between two variables and demonstrate how it can be straightforwardly used instead of correlation as the basis of familiar analyses and visualization methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This presentation will provide an overview of the load applied on the residuum of transfemoral amputees fitted with an osseointegrated fixation during (A) rehabilitation, including static and dynamic load bearing exercises (e.g., rowing, adduction, abduction, squat, cycling, walking with aids), and (B) activities of daily living including standardized activities (e.g., level walking in straight line and around a circle, ascending and descending slopes and stairs) and activities in real world environments.