957 resultados para Imbalanced datasets
Resumo:
The ability to detect unusual events in surviellance footage as they happen is a highly desireable feature for a surveillance system. However, this problem remains challenging in crowded scenes due to occlusions and the clustering of people. In this paper, we propose using the Distributed Behavior Model (DBM), which has been widely used in computer graphics, for video event detection. Our approach does not rely on object tracking, and is robust to camera movements. We use sparse coding for classification, and test our approach on various datasets. Our proposed approach outperforms a state-of-the-art work which uses the social force model and Latent Dirichlet Allocation.
Resumo:
Baseline monitoring of groundwater quality aims to characterize the ambient condition of the resource and identify spatial or temporal trends. Sites comprising any baseline monitoring network must be selected to provide a representative perspective of groundwater quality across the aquifer(s) of interest. Hierarchical cluster analysis (HCA) has been used as a means of assessing the representativeness of a groundwater quality monitoring network, using example datasets from New Zealand. HCA allows New Zealand's national and regional monitoring networks to be compared in terms of the number of water-quality categories identified in each network, the hydrochemistry at the centroids of these water-quality categories, the proportions of monitoring sites assigned to each water-quality category, and the range of concentrations for each analyte within each water-quality category. Through the HCA approach, the National Groundwater Monitoring Programme (117 sites) is shown to provide a highly representative perspective of groundwater quality across New Zealand, relative to the amalgamated regional monitoring networks operated by 15 different regional authorities (680 sites have sufficient data for inclusion in HCA). This methodology can be applied to evaluate the representativeness of any subset of monitoring sites taken from a larger network.
Resumo:
Learning and then recognizing a route, whether travelled during the day or at night, in clear or inclement weather, and in summer or winter is a challenging task for state of the art algorithms in computer vision and robotics. In this paper, we present a new approach to visual navigation under changing conditions dubbed SeqSLAM. Instead of calculating the single location most likely given a current image, our approach calculates the best candidate matching location within every local navigation sequence. Localization is then achieved by recognizing coherent sequences of these “local best matches”. This approach removes the need for global matching performance by the vision front-end - instead it must only pick the best match within any short sequence of images. The approach is applicable over environment changes that render traditional feature-based techniques ineffective. Using two car-mounted camera datasets we demonstrate the effectiveness of the algorithm and compare it to one of the most successful feature-based SLAM algorithms, FAB-MAP. The perceptual change in the datasets is extreme; repeated traverses through environments during the day and then in the middle of the night, at times separated by months or years and in opposite seasons, and in clear weather and extremely heavy rain. While the feature-based method fails, the sequence-based algorithm is able to match trajectory segments at 100% precision with recall rates of up to 60%.
Resumo:
KLK15 over-expression is reported to be a significant predictor of reduced progression-free survival and overall survival in ovarian cancer. Our aim was to analyse the KLK15 gene for putative functional single nucleotide polymorphisms (SNPs) and assess the association of these and KLK15 HapMap tag SNPs with ovarian cancer survival. Results In silico analysis was performed to identify KLK15 regulatory elements and to classify potentially functional SNPs in these regions. After SNP validation and identification by DNA sequencing of ovarian cancer cell lines and aggressive ovarian cancer patients, 9 SNPs were shortlisted and genotyped using the Sequenom iPLEX Mass Array platform in a cohort of Australian ovarian cancer patients (N = 319). In the Australian dataset we observed significantly worse survival for the KLK15 rs266851 SNP in a dominant model (Hazard Ratio (HR) 1.42, 95% CI 1.02-1.96). This association was observed in the same direction in two independent datasets, with a combined HR for the three studies of 1.16 (1.00-1.34). This SNP lies 15bp downstream of a novel exon and is predicted to be involved in mRNA splicing. The mutant allele is also predicted to abrogate an HSF-2 binding site. Conclusions We provide evidence of association for the SNP rs266851 with ovarian cancer survival. Our results provide the impetus for downstream functional assays and additional independent validation studies to assess the role of KLK15 regulatory SNPs and KLK15 isoforms with alternative intracellular functional roles in ovarian cancer survival.
Resumo:
Background Cohort studies can provide valuable evidence of cause and effect relationships but are subject to loss of participants over time, limiting the validity of findings. Computerised record linkage offers a passive and ongoing method of obtaining health outcomes from existing routinely collected data sources. However, the quality of record linkage is reliant upon the availability and accuracy of common identifying variables. We sought to develop and validate a method for linking a cohort study to a state-wide hospital admissions dataset with limited availability of unique identifying variables. Methods A sample of 2000 participants from a cohort study (n = 41 514) was linked to a state-wide hospitalisations dataset in Victoria, Australia using the national health insurance (Medicare) number and demographic data as identifying variables. Availability of the health insurance number was limited in both datasets; therefore linkage was undertaken both with and without use of this number and agreement tested between both algorithms. Sensitivity was calculated for a sub-sample of 101 participants with a hospital admission confirmed by medical record review. Results Of the 2000 study participants, 85% were found to have a record in the hospitalisations dataset when the national health insurance number and sex were used as linkage variables and 92% when demographic details only were used. When agreement between the two methods was tested the disagreement fraction was 9%, mainly due to "false positive" links when demographic details only were used. A final algorithm that used multiple combinations of identifying variables resulted in a match proportion of 87%. Sensitivity of this final linkage was 95%. Conclusions High quality record linkage of cohort data with a hospitalisations dataset that has limited identifiers can be achieved using combinations of a national health insurance number and demographic data as identifying variables.
Resumo:
Bactrocera dorsalis (Hendel) and B. papayae Drew & Hancock represent a closely related sibling species pair for which the biological species limits are unclear; i.e., it is uncertain if they are truely two biological species, or one biological species which has been incorrectly taxonomically split. The geographic ranges of the two taxa are thought to abut or overlap on or around the Isthmus of Kra, a recognised biogeographic barrier located on the narrowest portion of the Thai Peninsula. We collected fresh material of B. dorsalis sensu lato (i.e., B. dorsalis sensu stricto + B. papayae) in a north-south transect down the Thai Peninsula, from areas regarded as being exclusively B. dorsalis s.s., across the Kra Isthmus, and into regions regarded as exclusively B. papayae. We carried out microsatellite analyses and took measurements of male genitalia and wing shape. Both the latter morphological tests have been used previously to separate these two taxa. No significant population structuring was found in the microsatellite analysis and results were consistent with an interpretation of one, predominantly panmictic population. Both morphological datasets showed consistent, clinal variation along the transect, with no evidence for disjunction. No evidence in any tests supported historical vicariance driven by the Isthmus of Kra, and none of the three datasets supported the current taxonomy of two species. Rather, within and across the area of range overlap or abutment between the two species, only continuous morphological and genetic variation was recorded. Recognition that morphological traits previously used to separate these taxa are continuous, and that there is no genetic evidence for population segregation in the region of suspected species overlap, is consistent with a growing body of literature that reports no evidence of biological differentiation between these taxa.
Resumo:
Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.
Resumo:
Background In Australia and other developed countries, there are consistent and marked socioeconomic inequalities in health. Diet is a major contributing factor to the poorer health of lower socioeconomic groups: the dietary patterns of disadvantaged groups are least consistent with dietary recommendations for the prevention of diet-related chronic diseases compared with their more advantaged counterparts. Part of the reason that lower socioeconomic groups have poorer diets may be their consumption of takeaway foods. These foods typically have nutrient contents that fail to comply with the dietary recommendations for the prevention of chronic disease and associated risk factors. A high level of takeaway food consumption, therefore, may negatively influence overall dietary intakes and, consequently, lead to adverse health outcomes. Despite this, little attention has focused on the association between socioeconomic position (SEP) and takeaway food consumption, with the limited number of studies showing mixed results. Additionally, studies have been limited by only considering a narrow range of takeaway foods and not examining how different socioeconomic groups make choices that are more (or less) consistent with dietary recommendations. While a large number of earlier studies have consistently reported socioeconomically disadvantaged groups consume a lesser amount of fruit and vegetables, there is limited knowledge about the role of takeaway food in socioeconomic variations in fruit and vegetable intake. Furthermore, no known studies have investigated why there are socioeconomic differences in takeaway food consumption. The aims of this study are to: examine takeaway food consumption and the types of takeaway food consumed (healthy and less healthy) by different socioeconomic groups, to determine whether takeaway food consumption patterns explain socioeconomic variations in fruit and vegetable intake, and investigate the role of a range of psychosocial factors in explaining the association between SEP and takeaway food consumption and the choice of takeaway food. Methods This study used two cross-sectional population-based datasets: 1) the 1995 Australian National Nutrition Survey (NNS) which was conducted among a nationally representative sample of adults aged between 25.64 years (N = 7319, 61% response rate); and 2) the Food and Lifestyle Survey (FLS) which was conducted by the candidate and was undertaken among randomly selected adults aged between 25.64 years residing in Brisbane, Australia in 2009 (N = 903, 64% response rate). The FLS extended the NNS in several ways by describing current socioeconomic differences in takeaway food consumption patterns, formally assessing the mediated effect of takeaway food consumption to socioeconomic inequalities in fruit and vegetable intake, and also investigating whether (and which) psychosocial factors contributed to the observed socioeconomic variations in takeaway food consumption patterns. Results Approximately 32% of the NNS participants consumed takeaway food in the previous 24 hours and 38% of the FLS participants reported consuming takeaway food once a week or more. The results from analyses of the NNS and the FLS were somewhat mixed; however, disadvantaged groups were likely to consume a high level of �\less healthy. takeaway food compared with their more advantaged counterparts. The lower fruit and vegetable intake among lower socioeconomic groups was partly mediated by their high consumption of �\less healthy. takeaway food. Lower socioeconomic groups were more likely to have negative meal preparation behaviours and attitudes, and weaker health and nutrition-related beliefs and knowledge. Socioeconomic differences in takeaway food consumption were partly explained by meal preparation behaviours and attitudes, and these factors along with health and nutrition-related beliefs and knowledge appeared to contribute to the socioeconomic variations in choice of takeaway foods. Conclusion This thesis enhances our understanding of socioeconomic differences in dietary behaviours and the potential pathways by describing takeaway food consumption patterns by SEP, explaining the role of takeaway food consumption in socioeconomic inequalities in fruit and vegetable intake, and identifying the potential impact of psychosocial factors on socioeconomic differences in takeaway food consumption and the choice of takeaway food. Some important evidence is also provided for developing policies and effective intervention programs to improve the diet quality of the population, especially among lower socioeconomic groups. This thesis concludes with a discussion of a number of recommendations about future research and strategies to improve the dietary intake of the whole population, and especially among disadvantaged groups.
Resumo:
While undertaking the ANDS RDA Gold Standard Record Exemplars project, research data sharing was discussed with many QUT researchers. Our experiences provided rich insight into researcher attitudes towards their data and the sharing of such data. Generally, we found traditional altruistic motivations for research data sharing did not inspire researchers, but an explanation of the more achievement-oriented benefits were more compelling.
Resumo:
The Queensland University of Technology (QUT) in Brisbane, Australia, is involved in a number of projects funded by the Australian National Data Service (ANDS). Currently, QUT is working on a project (Metadata Stores Project) that uses open source VIVO software to aid in the storage and management of metadata relating to data sets created/managed by the QUT research community. The registry (called QUT Research Data Finder) will support the sharing and reuse of research datasets, within and external to QUT. QUT uses VIVO for both the display and the editing of research metadata.
Resumo:
The most common software analysis tools available for measuring fluorescence images are for two-dimensional (2D) data that rely on manual settings for inclusion and exclusion of data points, and computer-aided pattern recognition to support the interpretation and findings of the analysis. It has become increasingly important to be able to measure fluorescence images constructed from three-dimensional (3D) datasets in order to be able to capture the complexity of cellular dynamics and understand the basis of cellular plasticity within biological systems. Sophisticated microscopy instruments have permitted the visualization of 3D fluorescence images through the acquisition of multispectral fluorescence images and powerful analytical software that reconstructs the images from confocal stacks that then provide a 3D representation of the collected 2D images. Advanced design-based stereology methods have progressed from the approximation and assumptions of the original model-based stereology(1) even in complex tissue sections(2). Despite these scientific advances in microscopy, a need remains for an automated analytic method that fully exploits the intrinsic 3D data to allow for the analysis and quantification of the complex changes in cell morphology, protein localization and receptor trafficking. Current techniques available to quantify fluorescence images include Meta-Morph (Molecular Devices, Sunnyvale, CA) and Image J (NIH) which provide manual analysis. Imaris (Andor Technology, Belfast, Northern Ireland) software provides the feature MeasurementPro, which allows the manual creation of measurement points that can be placed in a volume image or drawn on a series of 2D slices to create a 3D object. This method is useful for single-click point measurements to measure a line distance between two objects or to create a polygon that encloses a region of interest, but it is difficult to apply to complex cellular network structures. Filament Tracer (Andor) allows automatic detection of the 3D neuronal filament-like however, this module has been developed to measure defined structures such as neurons, which are comprised of dendrites, axons and spines (tree-like structure). This module has been ingeniously utilized to make morphological measurements to non-neuronal cells(3), however, the output data provide information of an extended cellular network by using a software that depends on a defined cell shape rather than being an amorphous-shaped cellular model. To overcome the issue of analyzing amorphous-shaped cells and making the software more suitable to a biological application, Imaris developed Imaris Cell. This was a scientific project with the Eidgenössische Technische Hochschule, which has been developed to calculate the relationship between cells and organelles. While the software enables the detection of biological constraints, by forcing one nucleus per cell and using cell membranes to segment cells, it cannot be utilized to analyze fluorescence data that are not continuous because ideally it builds cell surface without void spaces. To our knowledge, at present no user-modifiable automated approach that provides morphometric information from 3D fluorescence images has been developed that achieves cellular spatial information of an undefined shape (Figure 1). We have developed an analytical platform using the Imaris core software module and Imaris XT interfaced to MATLAB (Mat Works, Inc.). These tools allow the 3D measurement of cells without a pre-defined shape and with inconsistent fluorescence network components. Furthermore, this method will allow researchers who have extended expertise in biological systems, but not familiarity to computer applications, to perform quantification of morphological changes in cell dynamics.
Resumo:
This paper presents a combined structure for using real, complex, and binary valued vectors for semantic representation. The theory, implementation, and application of this structure are all significant. For the theory underlying quantum interaction, it is important to develop a core set of mathematical operators that describe systems of information, just as core mathematical operators in quantum mechanics are used to describe the behavior of physical systems. The system described in this paper enables us to compare more traditional quantum mechanical models (which use complex state vectors), alongside more generalized quantum models that use real and binary vectors. The implementation of such a system presents fundamental computational challenges. For large and sometimes sparse datasets, the demands on time and space are different for real, complex, and binary vectors. To accommodate these demands, the Semantic Vectors package has been carefully adapted and can now switch between different number types comparatively seamlessly. This paper describes the key abstract operations in our semantic vector models, and describes the implementations for real, complex, and binary vectors. We also discuss some of the key questions that arise in the field of quantum interaction and informatics, explaining how the wide availability of modelling options for different number fields will help to investigate some of these questions.
Resumo:
Complex flow datasets are often difficult to represent in detail using traditional vector visualisation techniques such as arrow plots and streamlines. This is particularly true when the flow regime changes in time. Texture-based techniques, which are based on the advection of dense textures, are novel techniques for visualising such flows (i.e., complex dynamics and time-dependent). In this paper, we review two popular texture-based techniques and their application to flow datasets sourced from real research projects. The texture-based techniques investigated were Line Integral Convolution (LIC), and Image-Based Flow Visualisation (IBFV). We evaluated these techniques and in this paper report on their visualisation effectiveness (when compared with traditional techniques), their ease of implementation, and their computational overhead.
Resumo:
A Flash Event (FE) represents a period of time when a web-server experiences a dramatic increase in incoming traffic, either following a newsworthy event that has prompted users to locate and access it, or as a result of redirection from other popular web or social media sites. This usually leads to network congestion and Quality-of-Service (QoS) degradation. These events can be mistaken for Distributed Denial-of-Service (DDoS) attacks aimed at disrupting the server. Accurate detection of FEs and their distinction from DDoS attacks is important, since different actions need to be undertaken by network administrators in these two cases. However, lack of public domain FE datasets hinders research in this area. In this paper we present a detailed study of flash events and classify them into three broad categories. In addition, the paper describes FEs in terms of three key components: the volume of incoming traffic, the related source IP-addresses, and the resources being accessed. We present such a FE model with minimal parameters and use publicly available datasets to analyse and validate our proposed model. The model can be used to generate different types of FE traffic, closely approximating real-world scenarios, in order to facilitate research into distinguishing FEs from DDoS attacks.
Resumo:
Motor unit number estimation (MUNE) is a method which aims to provide a quantitative indicator of progression of diseases that lead to loss of motor units, such as motor neurone disease. However the development of a reliable, repeatable and fast real-time MUNE method has proved elusive hitherto. Ridall et al. (2007) implement a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm to produce a posterior distribution for the number of motor units using a Bayesian hierarchical model that takes into account biological information about motor unit activation. However we find that the approach can be unreliable for some datasets since it can suffer from poor cross-dimensional mixing. Here we focus on improved inference by marginalising over latent variables to create the likelihood. In particular we explore how this can improve the RJMCMC mixing and investigate alternative approaches that utilise the likelihood (e.g. DIC (Spiegelhalter et al., 2002)). For this model the marginalisation is over latent variables which, for a larger number of motor units, is an intractable summation over all combinations of a set of latent binary variables whose joint sample space increases exponentially with the number of motor units. We provide a tractable and accurate approximation for this quantity and also investigate simulation approaches incorporated into RJMCMC using results of Andrieu and Roberts (2009).