957 resultados para Imbalanced datasets
Resumo:
Nature Refuges encompass the second largest extent of protected area estate in Queensland. Major problems exist in the data capture, map presentation, data quality and integrity of these boundaries. The spatial accuracies/inaccuracies of the Nature Refuge administrative boundaries directly influence the ability to preserve valuable ecosystems by challenging negative environmental impacts on these properties. This research work is about supporting the Nature Refuge Programs efforts to secure Queensland’s natural and cultural values on private land by utilising GIS and its advanced functionalities. The research design organizes and enters Queensland’s Nature Refuge boundaries into a spatial environment. Survey quality data collection techniques such as the Global Positioning Systems (GPS) are investigated to capture Nature Refuge boundary information. Using the concepts of map communication GIS Cartography is utilised for the protected area plan design. New spatial datasets are generated facilitating the effectiveness of investigative data analysis. The geodatabase model developed by this study adds rich GIS behaviour providing the capability to store, query, and manipulate geographic information. It provides the ability to leverage data relationships and enforces topological integrity creating savings in customization and productivity. The final phase of the research design incorporates the advanced functions of ArcGIS. These functions facilitate building spatial system models. The geodatabase and process models developed by this research can be easily modified and the data relating to mining can be replaced by other negative environmental impacts affecting the Nature Refuges. Results of the research are presented as graphs and maps providing visual evidence supporting the usefulness of GIS as means for capturing, visualising and enhancing spatial quality and integrity of Nature Refuge boundaries.
Resumo:
Predicting safety on roadways is standard practice for road safety professionals and has a corresponding extensive literature. The majority of safety prediction models are estimated using roadway segment and intersection (microscale) data, while more recently efforts have been undertaken to predict safety at the planning level (macroscale). Safety prediction models typically include roadway, operations, and exposure variables—factors known to affect safety in fundamental ways. Environmental variables, in particular variables attempting to capture the effect of rain on road safety, are difficult to obtain and have rarely been considered. In the few cases weather variables have been included, historical averages rather than actual weather conditions during which crashes are observed have been used. Without the inclusion of weather related variables researchers have had difficulty explaining regional differences in the safety performance of various entities (e.g. intersections, road segments, highways, etc.) As part of the NCHRP 8-44 research effort, researchers developed PLANSAFE, or planning level safety prediction models. These models make use of socio-economic, demographic, and roadway variables for predicting planning level safety. Accounting for regional differences - similar to the experience for microscale safety models - has been problematic during the development of planning level safety prediction models. More specifically, without weather related variables there is an insufficient set of variables for explaining safety differences across regions and states. Furthermore, omitted variable bias resulting from excluding these important variables may adversely impact the coefficients of included variables, thus contributing to difficulty in model interpretation and accuracy. This paper summarizes the results of an effort to include weather related variables, particularly various measures of rainfall, into accident frequency prediction and the prediction of the frequency of fatal and/or injury degree of severity crash models. The purpose of the study was to determine whether these variables do in fact improve overall goodness of fit of the models, whether these variables may explain some or all of observed regional differences, and identifying the estimated effects of rainfall on safety. The models are based on Traffic Analysis Zone level datasets from Michigan, and Pima and Maricopa Counties in Arizona. Numerous rain-related variables were found to be statistically significant, selected rain related variables improved the overall goodness of fit, and inclusion of these variables reduced the portion of the model explained by the constant in the base models without weather variables. Rain tends to diminish safety, as expected, in fairly complex ways, depending on rain frequency and intensity.
Resumo:
There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states—perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of “excess” zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to “excess” zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed—and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros
Resumo:
Queensland University of Technology (QUT) completed an Australian National Data Service (ANDS) funded “Seeding the Commons Project” to contribute metadata to Research Data Australia. The project employed two Research Data Librarians from October 2009 through to July 2010. Technical support for the project was provided by QUT’s High Performance Computing and Research Support Specialists. ---------- The project identified and described QUT’s category 1 (ARC / NHMRC) research datasets. Metadata for the research datasets was stored in QUT’s Research Data Repository (Architecta Mediaflux). Metadata which was suitable for inclusion in Research Data Australia was made available to the Australian Research Data Commons (ARDC) in RIF-CS format. ---------- Several workflows and processes were developed during the project. 195 data interviews took place in connection with 424 separate research activities which resulted in the identification of 492 datasets. ---------- The project had a high level of technical support from QUT High Performance Computing and Research Support Specialists who developed the Research Data Librarian interface to the data repository that enabled manual entry of interview data and dataset metadata, creation of relationships between repository objects. The Research Data Librarians mapped the QUT metadata repository fields to RIF-CS and an application was created by the HPC and Research Support Specialists to generate RIF-CS files for harvest by the Australian Research Data Commons (ARDC). ---------- This poster will focus on the workflows and processes established for the project including: ---------- • Interview processes and instruments • Data Ingest from existing systems (including mapping to RIF-CS) • Data entry and the Data Librarian interface to Mediaflux • Verification processes • Mapping and creation of RIF-CS for the ARDC
Resumo:
The Paediatric Spine Research group was formed in 2002 to perform high quality research into the prevention and management of spinal deformity, with an emphasis on scoliosis. The group has successfully built collaborative bridges between the scientific and research expertise at QUT, and the clinical skills and experience of the spinal orthopaedic surgeons at the Mater Children’s Hospital in Brisbane. Clinical and biomechanical research is now possible as a result of the development of detailed databases of patients who have innovative and unique surgical interventions for spinal deformity such as thoracoscopic scoliosis correction, thoracoscopic staple insertion for juvenile idiopathic scoliosis and minimally invasive growing rods. The Mater in Brisbane provides these unique datasets of spinal deformity surgery patients, whose procedures are not being performed anywhere else in the Southern Hemisphere. The most detailed is a database of thoracoscopic scoliosis correction surgery which now contains 180 patients with electronic collections of X-Rays, photographs and patient questionnaires. With ethics approval, a subset of these patients has had CT scans, and a further subset have had MRI scans with and without a compressive load to simulate the erect standing position. This database has to date contributed to 17 international refereed journal papers, a further 7 journal papers either under review or in final preparation, 53 national conference presentations and 35 international conference presentations. Major findings from selected journal publications will be presented. It is anticipated that as the surgical databases grow they will continue to provide invaluable clinical data which will feed into clinically relevant projects driven by both medical and engineering researchers whose findings will benefit spinal deformity patients and scientific knowledge worldwide.
Resumo:
Recent studies have detected a dominant accumulation mode (~100 nm) in the Sea Spray Aerosol (SSA) number distribution. There is evidence to suggest that particles in this mode are composed primarily of organics. To investigate this hypothesis we conducted experiments on NaCl, artificial SSA and natural SSA particles with a Volatility-Hygroscopicity-Tandem-Differential-Mobility-Analyser (VH-TDMA). NaCl particles were atomiser generated and a bubble generator was constructed to produce artificial and natural SSA particles. Natural seawater samples for use in the bubble generator were collected from biologically active, terrestrially-affected coastal water in Moreton Bay, Australia. Differences in the VH-TDMA-measured volatility curves of artificial and natural SSA particles were used to investigate and quantify the organic fraction of natural SSA particles. Hygroscopic Growth Factor (HGF) data, also obtained by the VH-TDMA, were used to confirm the conclusions drawn from the volatility data. Both datasets indicated that the organic fraction of our natural SSA particles evaporated in the VH-TDMA over the temperature range 170–200°C. The organic volume fraction for 71–77 nm natural SSA particles was 8±6%. Organic volume fraction did not vary significantly with varying water residence time (40 secs to 24 hrs) in the bubble generator or SSA particle diameter in the range 38–173 nm. At room temperature we measured shape- and Kelvin-corrected HGF at 90% RH of 2.46±0.02 for NaCl, 2.35±0.02 for artifical SSA and 2.26±0.02 for natural SSA particles. Overall, these results suggest that the natural accumulation mode SSA particles produced in these experiments contained only a minor organic fraction, which had little effect on hygroscopic growth. Our measurement of 8±6% is an order of magnitude below two previous measurements of the organic fraction in SSA particles of comparable sizes. We stress that our results were obtained using coastal seawater and they can’t necessarily be applied on a regional or global ocean scale. Nevertheless, considering the order of magnitude discrepancy between this and previous studies, further research with independent measurement techniques and a variety of different seawaters is required to better quantify how much organic material is present in accumulation mode SSA.
Resumo:
The Guardian reportage of the United Kingdom Member of Parliament (MP) expenses scandal of 2009 used crowdsourcing and computational journalism techniques. Computational journalism can be broadly defined as the application of computer science techniques to the activities of journalism. Its foundation lies in computer assisted reporting techniques and its importance is increasing due to the: (a) increasing availability of large scale government datasets for scrutiny; (b) declining cost, increasing power and ease of use of data mining and filtering software; and Web 2.0; and (c) explosion of online public engagement and opinion.. This paper provides a case study of the Guardian MP expenses scandal reportage and reveals some key challenges and opportunities for digital journalism. It finds journalists may increasingly take an active role in understanding, interpreting, verifying and reporting clues or conclusions that arise from the interrogations of datasets (computational journalism). Secondly a distinction should be made between information reportage and computational journalism in the digital realm, just as a distinction might be made between citizen reporting and citizen journalism. Thirdly, an opportunity exists for online news providers to take a ‘curatorial’ role, selecting and making easily available the best data sources for readers to use (information reportage). These activities have always been fundamental to journalism, however the way in which they are undertaken may change. Findings from this paper may suggest opportunities and challenges for the implementation of computational journalism techniques in practice by digital Australian media providers, and further areas of research.
Resumo:
As organizations reach higher levels of Business Process Management maturity, they tend to accumulate large collections of process models. These repositories may contain thousands of activities and be managed by different stakeholders with varying skills and responsibilities. However, while being of great value, these repositories induce high management costs. Thus, it becomes essential to keep track of the various model versions as they may mutually overlap, supersede one another and evolve over time. We propose an innovative versioning model and associated storage structure, specifically designed to maximize sharing across process model versions, and to automatically handle change propagation. The focal point of this technique is to version single process model fragments, rather than entire process models. Indeed empirical evidence shows that real-life process model repositories have numerous duplicate fragments. Experiments on two industrial datasets confirm the usefulness of our technique.
Resumo:
Queensland University of Technology’s Institutional Repository, QUT ePrints (http://eprints.qut.edu.au/), was established in 2003. With the help of an institutional mandate (endorsed in 2004) the repository now holds over 11,000 open access publications. The repository’s success is celebrated within the University and acknowledged nationally and internationally. QUT ePrints was built on GNU EPrints open source repository software (currently running v.3.1.3) and was originally configured to accommodate open access versions of the traditional range of research publications (journal articles, conference papers, books, book chapters and working papers). However, in 2009, the repository’s scope, content and systems were broadened and the ‘QUT Digital repository’ is now a service encompassing a range of digital collections, services and systems. For a work to be accepted in to the institutional repository, at least one of the authors/creators must have a current affiliation with QUT. However, the success of QUT ePrints in terms of its capacity to increase the visibility and accessibility of our researchers' scholarly works resulted in requests to accept digital collections of works which were out of scope. To address this need, a number of parallel digital collections have been developed. These collections include, OZcase, a collection of legal research materials and ‘The Sugar Industry Collection’; a digitsed collection of books and articles on sugar cane production and processing. Additionally, the Library has responded to requests from academics for a service to support the publication of new, and existing, peer reviewed open access journals. A project is currently underway to help a group of senior QUT academics publish a new international peer reviewed journal. The QUT Digital Repository website will be a portal for access to a range of resources to support copyright management. It is likely that it will provide an access point for the institution’s data repository. The data repository, provisionally named the ‘QUT Data Commons’, is currently a work-in-progress. The metadata for some QUT datasets will also be harvested by and discoverable via ‘Research Data Australia’, the dataset discovery service managed by the Australian National Data Service (ANDS). QUT Digital repository will integrate a range of technologies and services related to scholarly communication. This paper will discuss the development of the QUT Digital Repository, its strategic functions, the stakeholders involved and lessons learned.
Resumo:
This paper reports on the empirical comparison of seven machine learning algorithms in texture classification with application to vegetation management in power line corridors. Aiming at classifying tree species in power line corridors, object-based method is employed. Individual tree crowns are segmented as the basic classification units and three classic texture features are extracted as the input to the classification algorithms. Several widely used performance metrics are used to evaluate the classification algorithms. The experimental results demonstrate that the classification performance depends on the performance matrix, the characteristics of datasets and the feature used.
Resumo:
The Guardian reportage of the United Kingdom Member of Parliament (MP) expenses scandal of 2009 used crowdsourcing and computational journalism techniques. Computational journalism can be broadly defined as the application of computer science techniques to the activities of journalism. Its foundation lies in computer assisted reporting techniques and its importance is increasing due to the: (a) increasing availability of large scale government datasets for scrutiny; (b) declining cost, increasing power and ease of use of data mining and filtering software; and Web 2.0; and (c) explosion of online public engagement and opinion.. This paper provides a case study of the Guardian MP expenses scandal reportage and reveals some key challenges and opportunities for digital journalism. It finds journalists may increasingly take an active role in understanding, interpreting, verifying and reporting clues or conclusions that arise from the interrogations of datasets (computational journalism). Secondly a distinction should be made between information reportage and computational journalism in the digital realm, just as a distinction might be made between citizen reporting and citizen journalism. Thirdly, an opportunity exists for online news providers to take a ‘curatorial’ role, selecting and making easily available the best data sources for readers to use (information reportage). These activities have always been fundamental to journalism, however the way in which they are undertaken may change. Findings from this paper may suggest opportunities and challenges for the implementation of computational journalism techniques in practice by digital Australian media providers, and further areas of research.
Resumo:
BACKGROUND: Indigenous patients with acute coronary syndromes represent a high-risk group. There are however few contemporary datasets addressing differences in the presentation and management of Indigenous and non-Indigenous patients with chest pain. METHODS: The Heart Protection Project, is a multicentre retrospective audit of consecutive medical records from patients presenting with chest pain. Patients were identified as Indigenous or non-Indigenous, and time to presentation and cardiac investigations as well as rates of cardiac investigations and procedures were compared between the two groups. RESULTS: Of the 2380 patients included, 199 (8.4%) identified as Indigenous, and 2174 (91.6%) as non-Indigenous. Indigenous patients were younger, had higher rates hyperlipidaemia, diabetes, smoking, known coronary artery disease and a lower rate of prior PCI; and were significantly less likely to have private health insurance, be admitted to an interventional facility or to have a cardiologist as primary physician. Following adjustment for difference in baseline characteristics, Indigenous patients had comparable rates of cardiac investigations and delay times to presentation and investigations. CONCLUSIONS: Although the Indigenous population was identified as a high-risk group, in this analysis of selected Australian hospitals there were no significant differences in treatment or management of Indigenous patients in comparison to non-Indigenous.
Resumo:
There are at least four key challenges in the online news environment that computational journalism may address. Firstly, news providers operate in a rapidly evolving environment and larger businesses are typically slower to adapt to market innovations. News consumption patterns have changed and news providers need to find new ways to capture and retain digital users. Meanwhile, declining financial performance has led to cost cuts in mass market newspapers. Finally investigative reporting is typically slow, high cost and may be tedious, and yet is valuable to the reputation of a news provider. Computational journalism involves the application of software and technologies to the activities of journalism, and it draws from the fields of computer science, social science and communications. New technologies may enhance the traditional aims of journalism, or may require “a new breed of people who are midway between technologists and journalists” (Irfan Essa in Mecklin 2009: 3). Historically referred to as ‘computer assisted reporting’, the use of software in online reportage is increasingly valuable due to three factors: larger datasets are becoming publicly available; software is becoming sophisticated and ubiquitous; and the developing Australian digital economy. This paper introduces key elements of computational journalism – it describes why it is needed; what it involves; benefits and challenges; and provides a case study and examples. Computational techniques can quickly provide a solid factual basis for original investigative journalism and may increase interaction with readers, when correctly used. It is a major opportunity to enhance the delivery of original investigative journalism, which ultimately may attract and retain readers online.
Resumo:
Being in paid employment is socially valued, and is linked to health, financial security and time use. Issues arising from a lack of occupational choice and control, and from diminished role partnerships are particularly problematic in the lives of people with an intellectual disability. Informal support networks are shown to influence work opportunities for people without disabilities, but their impact on the work experiences of people with disability has not been thoroughly explored. The experience of 'work' and preparation for work was explored with a group of four people with an intellectual disability (the participants) and the key members of their informal support networks (network members) in New South Wales, Australia. Network members and participants were interviewed and participant observations of work and other activities were undertaken. Data analysis included open, conceptual and thematic coding. Data analysis software assisted in managing the large datasets across multiple team members. The insight and actions of network members created and sustained the employment and support opportunities that effectively matched the needs and interests of the participants. Recommendations for future research are outlined.