978 resultados para Categorical data
Resumo:
Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Data reliability issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. Participants may submit low quality, misleading, inaccurate, or even malicious data. Therefore, finding a way to improve the data reliability has become an urgent demand. This study aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we propose to design a reputation framework to enhance data reliability and also investigate some critical elements that should be aware of during developing and designing new reputation systems.
Resumo:
Open the sports or business section of your daily newspaper, and you are immediately bombarded with an array of graphs, tables, diagrams, and statistical reports that require interpretation. Across all walks of life, the need to understand statistics is fundamental. Given that our youngsters’ future world will be increasingly data laden, scaffolding their statistical understanding and reasoning is imperative, from the early grades on. The National Council of Teachers of Mathematics (NCTM) continues to emphasize the importance of early statistical learning; data analysis and probability was the Council’s professional development “Focus of the Year” for 2007–2008. We need such a focus, especially given the results of the statistics items from the 2003 NAEP. As Shaughnessy (2007) noted, students’ performance was weak on more complex items involving interpretation or application of items of information in graphs and tables. Furthermore, little or no gains were made between the 2000 NAEP and the 2003 NAEP studies. One approach I have taken to promote young children’s statistical reasoning is through data modeling. Having implemented in grades 3 –9 a number of model-eliciting activities involving working with data (e.g., English 2010), I observed how competently children could create their own mathematical ideas and representations—before being instructed how to do so. I thus wished to introduce data-modeling activities to younger children, confi dent that they would likewise generate their own mathematics. I recently implemented data-modeling activities in a cohort of three first-grade classrooms of six year- olds. I report on some of the children’s responses and discuss the components of data modeling the children engaged in.
Resumo:
From human biomonitoring data that are increasingly collected in the United States, Australia, and in other countries from large-scale field studies, we obtain snap-shots of concentration levels of various persistent organic pollutants (POPs) within a cross section of the population at different times. Not only can we observe the trends within this population with time, but we can also gain information going beyond the obvious time trends. By combining the biomonitoring data with pharmacokinetic modeling, we can re-construct the time-variant exposure to individual POPs, determine their intrinsic elimination half-lives in the human body, and predict future levels of POPs in the population. Different approaches have been employed to extract information from human biomonitoring data. Pharmacokinetic (PK) models were combined with longitudinal data1, with single2 or multiple3 average concentrations of a cross-sectional data (CSD), or finally with multiple CSD with or without empirical exposure data4. In the latter study, for the first time, the authors based their modeling outputs on two sets of CSD and empirical exposure data, which made it possible that their model outputs were further constrained due to the extensive body of empirical measurements. Here we use a PK model to analyze recent levels of PBDE concentrations measured in the Australian population. In this study, we are able to base our model results on four sets5-7 of CSD; we focus on two PBDE congeners that have been shown3,5,8-9 to differ in intake rates and half-lives with BDE-47 being associated with high intake rates and a short half-life and BDE-153 with lower intake rates and a longer half-life. By fitting the model to PBDE levels measured in different age groups in different years, we determine the level of intake of BDE-47 and BDE-153, as well as the half-lives of these two chemicals in the Australian population.
Resumo:
Driving on an approach to a signalized intersection while distracted is particularly dangerous, as potential vehicular conflicts and resulting angle collisions tend to be severe. Given the prevalence and importance of this particular scenario, the decisions and actions of distracted drivers during the onset of yellow lights are the focus of this study. Driving simulator data were obtained from a sample of 58 drivers under baseline and handheld mobile phone conditions at the University of Iowa - National Advanced Driving Simulator. Explanatory variables included age, gender, cell phone use, distance to stop-line, and speed. Although there is extensive research on drivers’ responses to yellow traffic signals, the examination has been conducted from a traditional regression-based approach, which does not necessary provide the underlying relations and patterns among the sampled data. In this paper, we exploit the benefits of both classical statistical inference and data mining techniques to identify the a priori relationships among main effects, non-linearities, and interaction effects. Results suggest that novice (16-17 years) and young drivers’ (18-25 years) have heightened yellow light running risk while distracted by a cell phone conversation. Driver experience captured by age has a multiplicative effect with distraction, making the combined effect of being inexperienced and distracted particularly risky. Overall, distracted drivers across most tested groups tend to reduce the propensity of yellow light running as the distance to stop line increases, exhibiting risk compensation on a critical driving situation.
Resumo:
A routine activity for a sports dietitian is to estimate energy and nutrient intake from an athlete's self-reported food intake. Decisions made by the dietitian when coding a food record are a source of variability in the data. The aim of the present study was to determine the variability in estimation of the daily energy and key nutrient intakes of elite athletes, when experienced coders analyzed the same food record using the same database and software package. Seven-day food records from a dietary survey of athletes in the 1996 Australian Olympic team were randomly selected to provide 13 sets of records, each set representing the self-reported food intake of an endurance, team, weight restricted, and sprint/power athlete. Each set was coded by 3-5 members of Sports Dietitians Australia, making a total of 52 athletes, 53 dietitians, and 1456 athlete-days of data. We estimated within- and between- athlete and dietitian variances for each dietary nutrient using mixed modeling, and we combined the variances to express variability as a coefficient of variation (typical variation as a percent of the mean). Variability in the mean of 7-day estimates of a nutrient was 2- to 3-fold less than that of a single day. The variability contributed by the coder was less than the true athlete variability for a 1-day record but was of similar magnitude for a 7-day record. The most variable nutrients (e.g., vitamin C, vitamin A, cholesterol) had approximately 3-fold more variability than least variable nutrients (e.g., energy, carbohydrate, magnesium). These athlete and coder variabilities need to be taken into account in dietary assessment of athletes for counseling and research.
Resumo:
This paper describes an innovative platform that facilitates the collection of objective safety data around occurrences at railway level crossings using data sources including forward-facing video, telemetry from trains and geo-referenced asset and survey data. This platform is being developed with support by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper provides a description of the underlying accident causation model, the development methodology and refinement process as well as a description of the data collection platform. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.
Resumo:
The Geothermal industry in Australia and Queensland is in its infancy and for hot dry rock (HDR) geothermal energy, it is very much in the target identification and resource definition stages. As a key effort to assist the geothermal industry and exploration for HDR in Queensland, we are developing a comprehensive and new integrated geochemical and geochronological database on igneous rocks. To date, around 18,000 igneous rocks have been analysed across Queensland for chemical and/or age information. However, these data currently reside in a number of disparate datasets (e.g., Ozchron, Champion et al., 2007, Geological Survey of Queensland, journal publications, and unpublished university theses). The goal of this project is to collate and integrate these data on Queensland igneous rocks to improve our understanding of high heat producing granites in Queensland, in terms of their distribution (particularly in the subsurface), dimensions, ages, and controlling factors in their genesis.
Resumo:
Confusion exists as to the age of the Abor Volcanics of NE India. Some consider the unit to have been emplaced in the Early Permian, others the Early Eocene, a difference of ∼230 million years. The divergence in opinion is significant because fundamentally different models explaining the geotectonic evolution of India depend on the age designation of the unit. Paleomagnetic data reported here from several exposures in the type locality of the formation in the lower Siang Valley indicate that steep dipping primary magnetizations (mean = 72.7 ± 6.2°, equating to a paleo-latitude of 58.1°) are recorded in the formation. These are only consistent with the unit being of Permian age, possibly Artinskian based on a magnetostratigraphic argument. Plate tectonic models for this time consistently show the NE corner of the sub-continent >50°S; in the Early Eocene it was just north of the equator, which would have resulted in the unit recording shallow directions. The mean declination is counter-clockwise rotated by ∼94°, around half of which can be related to the motion of the Indian block; the remainder is likely due local Himalayan-age thrusting in the Eastern Syntaxis. Several workers have correlated the Abor Volcanics with broadly coeval mafic volcanic suites in Oman, NE Pakistan–NW India and southern Tibet–Nepal, which developed in response to the Cimmerian block peeling-off eastern Gondwana in the Early-Middle Permian, but we believe there are problems with this model. Instead, we suggest that the Abor basalts relate to India–Antarctica/India–Australia extension that was happening at about the same time. Such an explanation best accommodates the relevant stratigraphical and structural data (present-day position within the Himalayan thrust stack), as well as the plate tectonic model for Permian eastern Gondwana.
Resumo:
Background: As an increasing number of Taiwanese people live out the final stages of their lives with chronic and complex conditions. Care decisions at the end of life can also be complex, overwhelming and stressful for an individual, family and health professionals. Understanding individuals’ wishes for end-of-life care and factors which influence individuals' decisions is important so that the provision of quality end-of-life care for all can be promoted and ensured.
Resumo:
Travel time in an important transport performance indicator. Different modes of transport (buses and cars) have different mechanical and operational characteristics, resulting in significantly different travel behaviours and complexities in multimodal travel time estimation on urban networks. This paper explores the relationship between bus and car travel time on urban networks by utilising the empirical Bluetooth and Bus Vehicle Identification data from Brisbane. The technologies and issues behind the two datasets are studied. After cleaning the data to remove outliers, the relationship between not-in-service bus and car travel time and the relationship between in-service bus and car travel time are discussed. The travel time estimation models reveal that the not-in-service bus travel time are similar to the car travel time and the in-service bus travel time could be used to estimate car travel time during off-peak hours
Resumo:
Traffic congestion has a significant impact on the economy and environment. Encouraging the use of multimodal transport (public transport, bicycle, park’n’ride, etc.) has been identified by traffic operators as a good strategy to tackle congestion issues and its detrimental environmental impacts. A multi-modal and multi-objective trip planner provides users with various multi-modal options optimised on objectives that they prefer (cheapest, fastest, safest, etc) and has a potential to reduce congestion on both a temporal and spatial scale. The computation of multi-modal and multi-objective trips is a complicated mathematical problem, as it must integrate and utilize a diverse range of large data sets, including both road network information and public transport schedules, as well as optimising for a number of competing objectives, where fully optimising for one objective, such as travel time, can adversely affect other objectives, such as cost. The relationship between these objectives can also be quite subjective, as their priorities will vary from user to user. This paper will first outline the various data requirements and formats that are needed for the multi-modal multi-objective trip planner to operate, including static information about the physical infrastructure within Brisbane as well as real-time and historical data to predict traffic flow on the road network and the status of public transport. It will then present information on the graph data structures representing the road and public transport networks within Brisbane that are used in the trip planner to calculate optimal routes. This will allow for an investigation into the various shortest path algorithms that have been researched over the last few decades, and provide a foundation for the construction of the Multi-modal Multi-objective Trip Planner by the development of innovative new algorithms that can operate the large diverse data sets and competing objectives.
Resumo:
Big data is big news in almost every sector including crisis communication. However, not everyone has access to big data and even if we have access to big data, we often do not have necessary tools to analyze and cross reference such a large data set. Therefore this paper looks at patterns in small data sets that we have ability to collect with our current tools to understand if we can find actionable information from what we already have. We have analyzed 164390 tweets collected during 2011 earthquake to find out what type of location specific information people mention in their tweet and when do they talk about that. Based on our analysis we find that even a small data set that has far less data than a big data set can be useful to find priority disaster specific areas quickly.
Resumo:
The use of Wireless Sensor Networks (WSNs) for Structural Health Monitoring (SHM) has become a promising approach due to many advantages such as low cost, fast and flexible deployment. However, inherent technical issues such as data synchronization error and data loss have prevented these distinct systems from being extensively used. Recently, several SHM-oriented WSNs have been proposed and believed to be able to overcome a large number of technical uncertainties. Nevertheless, there is limited research verifying the applicability of those WSNs with respect to demanding SHM applications like modal analysis and damage identification. This paper first presents a brief review of the most inherent uncertainties of the SHM-oriented WSN platforms and then investigates their effects on outcomes and performance of the most robust Output-only Modal Analysis (OMA) techniques when employing merged data from multiple tests. The two OMA families selected for this investigation are Frequency Domain Decomposition (FDD) and Data-driven Stochastic Subspace Identification (SSI-data) due to the fact that they both have been widely applied in the past decade. Experimental accelerations collected by a wired sensory system on a large-scale laboratory bridge model are initially used as clean data before being contaminated by different data pollutants in sequential manner to simulate practical SHM-oriented WSN uncertainties. The results of this study show the robustness of FDD and the precautions needed for SSI-data family when dealing with SHM-WSN uncertainties. Finally, the use of the measurement channel projection for the time-domain OMA techniques and the preferred combination of the OMA techniques to cope with the SHM-WSN uncertainties is recommended.