910 resultados para Missing-data
Resumo:
Incorporating a learner’s level of cognitive processing into Learning Analytics presents opportunities for obtaining rich data on the learning process. We propose a framework called COPA that provides a basis for mapping levels of cognitive operation into a learning analytics system. We utilise Bloom’s taxonomy, a theoretically respected conceptualisation of cognitive processing, and apply it in a flexible structure that can be implemented incrementally and with varying degree of complexity within an educational organisation. We outline how the framework is applied, and its key benefits and limitations. Finally, we apply COPA to a University undergraduate unit, and demonstrate its utility in identifying key missing elements in the structure of the course.
Resumo:
Social media platforms are of interest to interactive entertainment companies for a number of reasons. They can operate as a platform for deploying games, as a tool for communicating with customers and potential customers, and can provide analytics on how players utilize the; game providing immediate feedback on design decisions and changes. However, as ongoing research with Australian developer Halfbrick, creators of $2 , demonstrates, the use of these platforms is not universally seen as a positive. The incorporation of Big Data into already innovative development practices has the potential to cause tension between designers, whilst the platform also challenges the traditional business model, relying on micro-transactions rather than an up-front payment and a substantial shift in design philosophy to take advantage of the social aspects of platforms such as Facebook.
Resumo:
Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.
Resumo:
This paper analyses the probabilistic linear discriminant analysis (PLDA) speaker verification approach with limited development data. This paper investigates the use of the median as the central tendency of a speaker’s i-vector representation, and the effectiveness of weighted discriminative techniques on the performance of state-of-the-art length-normalised Gaussian PLDA (GPLDA) speaker verification systems. The analysis within shows that the median (using a median fisher discriminator (MFD)) provides a better representation of a speaker when the number of representative i-vectors available during development is reduced, and that further, usage of the pair-wise weighting approach in weighted LDA and weighted MFD provides further improvement in limited development conditions. Best performance is obtained using a weighted MFD approach, which shows over 10% improvement in EER over the baseline GPLDA system on mismatched and interview-interview conditions.
Resumo:
The geographic location of cloud data storage centres is an important issue for many organisations and individuals due to various regulations that require data and operations to reside in specific geographic locations. Thus, cloud users may want to be sure that their stored data have not been relocated into unknown geographic regions that may compromise the security of their stored data. Albeshri et al. (2012) combined proof of storage (POS) protocols with distance-bounding protocols to address this problem. However, their scheme involves unnecessary delay when utilising typical POS schemes due to computational overhead at the server side. The aim of this paper is to improve the basic GeoProof protocol by reducing the computation overhead at the server side. We show how this can maintain the same level of security while achieving more accurate geographic assurance.
Resumo:
Prophylactic surgery including hysterectomy and bilateral salpingo-oophorectomy (BSO) is recommended in BRCA positive women, while in women from the general population, hysterectomy plus BSO may increase the risk of overall mortality. The effect of hysterectomy plus BSO on women previously diagnosed with breast cancer is unknown. We used data from a population-base data linkage study of all women diagnosed with primary breast cancer in Queensland, Australia between 1997 and 2008 (n=21,067). We fitted flexible parametric breast cancer specific and overall survival models with 95% confidence intervals (also known as Royston-Parmar models) to assess the impact of risk-reducing surgery (removal of uterus, one or both ovaries). We also stratified analyses by age 20-49 and 50-79 years, respectively. Overall, 1,426 women (7%) underwent risk-reducing surgery (13% of premenopausal women and 3% of postmenopausal women). No women who had risk-reducing surgery, compared to 171 who did not have risk-reducing surgery developed a gynaecological cancer. Overall, 3,165 (15%) women died, including 2,195 (10%) from breast cancer. Hysterectomy plus BSO was associated with significantly reduced risk of death overall (adjusted HR = 0.69, 95% CI 0.53-0.89; P =0.005). Risk reduction was greater among premenopausal women, whose risk of death halved (HR, 0.45; 95% CI, 0.25-0.79; P < 0.006). This was largely driven by reduction in breast cancer-specific mortality (HR, 0.43; 95% CI, 0.24-0.79; P < 0.006). This population-based study found that risk-reducing surgery halved the mortality risk for premenopausal breast cancer patients. Replication of our results in independent cohorts, and subsequently randomised trials are needed to confirm these findings.
Resumo:
For industrial wireless sensor networks, maintaining the routing path for a high packet delivery ratio is one of the key objectives in network operations. It is important to both provide the high data delivery rate at the sink node and guarantee a timely delivery of the data packet at the sink node. Most proactive routing protocols for sensor networks are based on simple periodic updates to distribute the routing information. A faulty link causes packet loss and retransmission at the source until periodic route update packets are issued and the link has been identified as broken. We propose a new proactive route maintenance process where periodic update is backed-up with a secondary layer of local updates repeating with shorter periods for timely discovery of broken links. Proposed route maintenance scheme improves reliability of the network by decreasing the packet loss due to delayed identification of broken links. We show by simulation that proposed mechanism behaves better than the existing popular routing protocols (AODV, AOMDV and DSDV) in terms of end-to-end delay, routing overhead, packet reception ratio.
Resumo:
Facial expression recognition (FER) systems must ultimately work on real data in uncontrolled environments although most research studies have been conducted on lab-based data with posed or evoked facial expressions obtained in pre-set laboratory environments. It is very difficult to obtain data in real-world situations because privacy laws prevent unauthorized capture and use of video from events such as funerals, birthday parties, marriages etc. It is a challenge to acquire such data on a scale large enough for benchmarking algorithms. Although video obtained from TV or movies or postings on the World Wide Web may also contain ‘acted’ emotions and facial expressions, they may be more ‘realistic’ than lab-based data currently used by most researchers. Or is it? One way of testing this is to compare feature distributions and FER performance. This paper describes a database that has been collected from television broadcasts and the World Wide Web containing a range of environmental and facial variations expected in real conditions and uses it to answer this question. A fully automatic system that uses a fusion based approach for FER on such data is introduced for performance evaluation. Performance improvements arising from the fusion of point-based texture and geometry features, and the robustness to image scale variations are experimentally evaluated on this image and video dataset. Differences in FER performance between lab-based and realistic data, between different feature sets, and between different train-test data splits are investigated.
Resumo:
A retrospective, descriptive analysis of a sample of children under 18 years presenting to a hospital emergency department (ED) for treatment of an injury was conducted. The aim was to explore characteristics and identify differences between children assigned abuse codes and children assigned unintentional injury codes using an injury surveillance database. Only 0.1% of children had been assigned the abuse code and 3.9% a code indicating possible abuse. Children between 2-5 years formed the largest proportion of those coded to abuse. Superficial injury and bruising were the most common types of injury seen in children in the abuse group and the possible abuse group (26.9% and 18.8% respectively), whereas those with unintentional injury were most likely to present with open wounds (18.4%). This study demonstrates that routinely collected injury surveillance data can be a useful source of information for describing injury characteristics in children assigned abuse codes compared to those assigned no abuse codes.
Resumo:
Aims: To compare different methods for identifying alcohol involvement in injury-related emergency department presentation in Queensland youth, and to explore the alcohol terminology used in triage text. Methods: Emergency Department Information System data were provided for patients aged 12-24 years with an injury-related diagnosis code for a 5 year period 2006-2010 presenting to a Queensland emergency department (N=348895). Three approaches were used to estimate alcohol involvement: 1) analysis of coded data, 2) mining of triage text, and 3) estimation using an adaptation of alcohol attributable fractions (AAF). Cases were identified as ‘alcohol-involved’ by code and text, as well as AAF weighted. Results: Around 6.4% of these injury presentations overall had some documentation of alcohol involvement, with higher proportions of alcohol involvement documented for 18-24 year olds, females, indigenous youth, where presentations occurred on a Saturday or Sunday, and where presentations occurred between midnight and 5am. The most common alcohol terms identified for all subgroups were generic alcohol terms (eg. ETOH or alcohol) with almost half of the cases where alcohol involvement was documented having a generic alcohol term recorded in the triage text. Conclusions: Emergency department data is a useful source of information for identification of high risk sub-groups to target intervention opportunities, though it is not a reliable source of data for incidence or trend estimation in its current unstandardised form. Improving the accuracy and consistency of identification, documenting and coding of alcohol-involvement at the point of data capture in the emergency department is the most desirable long term approach to produce a more solid evidence base to support policy and practice in this field.
Resumo:
Talk of Big Data seems to be everywhere. Indeed, the apparently value-free concept of ‘data’ has seen a spectacular broadening of popular interest, shifting from the dry terminology of labcoat-wearing scientists to the buzzword du jour of marketers. In the business world, data is increasingly framed as an economic asset of critical importance, a commodity on a par with scarce natural resources (Backaitis, 2012; Rotella, 2012). It is social media that has most visibly brought the Big Data moment to media and communication studies, and beyond it, to the social sciences and humanities. Social media data is one of the most important areas of the rapidly growing data market (Manovich, 2012; Steele, 2011). Massive valuations are attached to companies that directly collect and profit from social media data, such as Facebook and Twitter, as well as to resellers and analytics companies like Gnip and DataSift. The expectation attached to the business models of these companies is that their privileged access to data and the resulting valuable insights into the minds of consumers and voters will make them irreplaceable in the future. Analysts and consultants argue that advanced statistical techniques will allow the detection of ongoing communicative events (natural disasters, political uprisings) and the reliable prediction of future ones (electoral choices, consumption)...
Resumo:
Introduction: The built environment is increasingly recognised as being associated with health outcomes. Relationships between the built environment and health differ among age groups, especially between children and adults, but also between younger, mid-age and older adults. Yet few address differences across life stage groups within a single population study. Moreover, existing research mostly focuses on physical activity behaviours, with few studying objective clinical and mental health outcomes. The Life Course Built Environment and Health (LCBEH) project explores the impact of the built environment on self-reported and objectively measured health outcomes in a random sample of people across the life course. Methods and analysis: This cross-sectional data linkage study involves 15 954 children (0–15 years), young adults (16–24 years), adults (25–64 years) and older adults (65+years) from the Perth metropolitan region who completed the Health and Wellbeing Surveillance System survey administered by the Department of Health of Western Australia from 2003 to 2009. Survey data were linked to Western Australia's (WA) Hospital Morbidity Database System (hospital admission) and Mental Health Information System (mental health system outpatient) data. Participants’ residential address was geocoded and features of their ‘neighbourhood’ were measured using Geographic Information Systems software. Associations between the built environment and self-reported and clinical health outcomes will be explored across varying geographic scales and life stages. Ethics and dissemination: The University of Western Australia's Human Research Ethics Committee and the Department of Health of Western Australia approved the study protocol (#2010/1). Findings will be published in peer-reviewed journals and presented at local, national and international conferences, thus contributing to the evidence base informing the design of healthy neighbourhoods for all residents.
Resumo:
Silver dressings have been widely used to successfully prevent burn wound infection and sepsis. However, a few case studies have reported the functional abnormality and failure of vital organs, possibly caused by silver deposits. The aim of this study was to investigate the serum silver level in the pediatric burn population and also in several internal organs in a porcine burn model after the application of Acticoat. A total of 125 blood samples were collected from 46 pediatric burn patients. Thirty-six patients with a mean of 13.4% TBSA burns had a mean peak serum silver level of 114 microg/L, whereas 10 patients with a mean of 1.85% TBSA burns had an undetectable level of silver (<5.4 microg/L). Overall, serum silver levels were closely related to burn sizes. However, the highest serum silver was 735 microg/L in a 15-month-old toddler with 10% TBSA burns and the second highest was 367 microg/L in a 3-year old with 28% TBSA burns. In a porcine model with 2% TBSA burns, the mean peak silver level was 38 microg/L at 2 to 3 weeks after application of Acticoat and was then significantly reduced to an almost undetectable level at 6 weeks. Of a total of four pigs, silver was detected in all four livers (1.413 microg/g) and all four hearts (0.342 microg/g), three of four kidneys (1.113 microg/g), and two of four brains (0.402 microg/g). This result demonstrated that although variable, the level of serum silver was positively associated with the size of burns, and significant amounts of silver were deposited in internal organs in pigs with only 2% TBSA burns, after application of Acticoat.
Resumo:
The use of Mahalanobis squared distance–based novelty detection in statistical damage identification has become increasingly popular in recent years. The merit of the Mahalanobis squared distance–based method is that it is simple and requires low computational effort to enable the use of a higher dimensional damage-sensitive feature, which is generally more sensitive to structural changes. Mahalanobis squared distance–based damage identification is also believed to be one of the most suitable methods for modern sensing systems such as wireless sensors. Although possessing such advantages, this method is rather strict with the input requirement as it assumes the training data to be multivariate normal, which is not always available particularly at an early monitoring stage. As a consequence, it may result in an ill-conditioned training model with erroneous novelty detection and damage identification outcomes. To date, there appears to be no study on how to systematically cope with such practical issues especially in the context of a statistical damage identification problem. To address this need, this article proposes a controlled data generation scheme, which is based upon the Monte Carlo simulation methodology with the addition of several controlling and evaluation tools to assess the condition of output data. By evaluating the convergence of the data condition indices, the proposed scheme is able to determine the optimal setups for the data generation process and subsequently avoid unnecessarily excessive data. The efficacy of this scheme is demonstrated via applications to a benchmark structure data in the field.
Resumo:
Public health research consistently demonstrates the salience of neighbourhood as a determinant of both health-related behaviours and outcomes across the human life course. This paper will report on the findings from a mixed-methods Brisbane-based study that explores how mothers with primary school children from both high and low socioeconomic suburbs use the local urban environment for the purpose of physical activity. Firstly, we demonstrate findings from an innovative methodology using the geographic information systems (GIS) embedded in social media platforms on mobile phones to track locations, resource-use, distances travelled, and modes of transport of the families in real-time; and secondly, we report on qualitative data that provides insight into reasons for differential use of the environment by both groups. Spatial/mapping and statistical data showed that while the mothers from both groups demonstrated similar daily routines, the mothers from the high SEP suburb engaged in increased levels of physical activity, travelled less frequently and less distance by car, and walked more for transport. The qualitative data revealed differences in the psychosocial processes and characteristics of the households and neighbourhoods of the respective groups, with mothers in the lower SEP suburb reporting more stress, higher conflict, and lower quality relationships with neighbours.