407 resultados para Data Storage


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last few years we have observed a proliferation of approaches for clustering XML docu- ments and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the XML data to be clustered. These applications need data in the form of similar contents, tags, paths, structures and semantics. In this paper, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. This presentation leads to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering compo- nent. Finally, the paper moves into the description of future trends and research issues that still need to be faced.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper argues for a renewed focus on statistical reasoning in the beginning school years, with opportunities for children to engage in data modelling. Results are reported from the first year of a 3-year longitudinal study in which three classes of first-grade children (6-year-olds) and their teachers engaged in data modelling activities. The theme of Looking after our Environment, part of the children’s science curriculum, provided the task context. The goals for the two activities addressed here included engaging children in core components of data modelling, namely, selecting attributes, structuring and representing data, identifying variation in data, and making predictions from given data. Results include the various ways in which children represented and re represented collected data, including attribute selection, and the metarepresentational competence they displayed in doing so. The “data lenses” through which the children dealt with informal inference (variation and prediction) are also reported.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In response to the need to leverage private finance and the lack of competition in some parts of the Australian public sector infrastructure market, especially in the very large economic infrastructure sector procured using Pubic Private Partnerships, the Australian Federal government has demonstrated its desire to attract new sources of in-bound foreign direct investment (FDI). This paper aims to report on progress towards an investigation into the determinants of multinational contractors’ willingness to bid for Australian public sector major infrastructure projects. This research deploys Dunning’s eclectic theory for the first time in terms of in-bound FDI by multinational contractors into Australia. Elsewhere, the authors have developed Dunning’s principal hypothesis to suit the context of this research and to address a weakness arising in this hypothesis that is based on a nominal approach to the factors in Dunning's eclectic framework and which fails to speak to the relative explanatory power of these factors. In this paper, a first stage test of the authors' development of Dunning's hypothesis is presented by way of an initial review of secondary data vis-à-vis the selected sector (roads and bridges) in Australia (as the host location) and with respect to four selected home countries (China; Japan; Spain; and US). In doing so, the next stage in the research method concerning sampling and case studies is also further developed and described in this paper. In conclusion, the extent to which the initial review of secondary data suggests the relative importance of the factors in the eclectic framework is considered. It is noted that more robust conclusions are expected following the future planned stages of the research including primary data from the case studies and a global survey of the world’s largest contractors and which is briefly previewed. Finally, and beyond theoretical contributions expected from the overall approach taken to developing and testing Dunning’s framework, other expected contributions concerning research method and practical implications are mentioned.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A rule-based approach for classifying previously identified medical concepts in the clinical free text into an assertion category is presented. There are six different categories of assertions for the task: Present, Absent, Possible, Conditional, Hypothetical and Not associated with the patient. The assertion classification algorithms were largely based on extending the popular NegEx and Context algorithms. In addition, a health based clinical terminology called SNOMED CT and other publicly available dictionaries were used to classify assertions, which did not fit the NegEx/Context model. The data for this task includes discharge summaries from Partners HealthCare and from Beth Israel Deaconess Medical Centre, as well as discharge summaries and progress notes from University of Pittsburgh Medical Centre. The set consists of 349 discharge reports, each with pairs of ground truth concept and assertion files for system development, and 477 reports for evaluation. The system’s performance on the evaluation data set was 0.83, 0.83 and 0.83 for recall, precision and F1-measure, respectively. Although the rule-based system shows promise, further improvements can be made by incorporating machine learning approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The International Classification of Diseases (ICD) is used to categorise diseases, injuries and external causes, and is a key epidemiological tool enabling the storage and retrieval of data from health and vital records to produce core international mortality and morbidity statistics. The ICD is updated periodically to ensure the classification remains current and work is now underway to develop the next revision, ICD-11. There have been almost 20 years since the last ICD edition was published and over 60 years since the last substantial structural revision of the external causes chapter. Revision of such a critical tool requires transparency and documentation to ensure that changes made to the classification system are recorded comprehensively for future reference. In this paper, the authors provide a history of external causes classification development and outline the external cause structure. Approaches to manage ICD-10 deficiencies are discussed and the ICD-11 revision approach regarding the development of, rationale for and implications of proposed changes to the chapter are outlined. Through improved capture of external cause concepts in ICD-11, a stronger evidence base will be available to inform injury prevention, treatment, rehabilitation and policy initiatives to ultimately contribute to a reduction in injury morbidity and mortality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Projects funded by the Australian National Data Service(ANDS). The specific projects that were funded included: a) Greenhouse Gas Emissions Project (N2O) with Prof. Peter Grace from QUT’s Institute of Sustainable Resources. b) Q150 Project for the management of multimedia data collected at Festival events with Prof. Phil Graham from QUT’s Institute of Creative Industries. c) Bio-diversity environmental sensing with Prof. Paul Roe from the QUT Microsoft eResearch Centre. For the purposes of these projects the Eclipse Rich Client Platform (Eclipse RCP) was chosen as an appropriate software development framework within which to develop the respective software. This poster will present a brief overview of the requirements of the projects, an overview of the experiences of the project team in using Eclipse RCP, report on the advantages and disadvantages of using Eclipse and it’s perspective on Eclipse as an integrated tool for supporting future data management requirements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The gathering of people in everyday life is intertwined with travelling to negotiated locations. As a result, mobile phones are often used to rearrange meetings when one or more participants are late or cannot make it on time. Our research is based on the hypothesis that the provision of location data can enhance the experience of people who are meeting each other in different locations. Disposable Maps allows users to select contacts from their phone’s address book who then receive up-to-date location data. The utilisation of peer-to-peer notifications and the application of unique URLs for location storage and presentation enable location sharing whilst ensuring users’ location privacy. In contrast to other location sharing services like Google Latitude, Disposable Maps enables ad hoc location sharing to actively selected location receivers for a fixed period of time in a specific given situation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As the international community struggles to find a cost-effective solution to mitigate climate change and reduce greenhouse gas emissions, carbon capture and storage (CCS) has emerged as a project mechanism with the potential to assist in transitioning society towards its low carbon future. Being a politically attractive option, legal regimes to promote and approve CCS have proceeded at an accelerated pace in multiple jurisdictions including the European Union and Australia. This acceleration and emphasis on the swift commercial deployment of CCS projects has left the legal community in the undesirable position of having to advise on the strengths and weaknesses of the key features of these regimes once they have been passed and become operational. This is an area where environmental law principles are tested to their very limit. On the one hand, implementation of this new technology should proceed in a precautionary manner to avoid adverse impacts on the atmosphere, local community and broader environment. On the other hand, excessive regulatory restrictions will stifle innovation and act as a barrier to the swift deployment of CCS projects around the world. Finding the balance between precaution and innovation is no easy feat. This is an area where lawyers, academics, regulators and industry representatives can benefit from the sharing of collective experiences, both positive and negative, across the jurisdictions. This exemplary book appears to have been collated with this philosophy in mind and provides an insightful addition to the global dialogue on establishing effective national and international regimes for the implementation of CCS projects...

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper studies the missing covariate problem which is often encountered in survival analysis. Three covariate imputation methods are employed in the study, and the effectiveness of each method is evaluated within the hazard prediction framework. Data from a typical engineering asset is used in the case study. Covariate values in some time steps are deliberately discarded to generate an incomplete covariate set. It is found that although the mean imputation method is simpler than others for solving missing covariate problems, the results calculated by it can differ largely from the real values of the missing covariates. This study also shows that in general, results obtained from the regression method are more accurate than those of the mean imputation method but at the cost of a higher computational expensive. Gaussian Mixture Model (GMM) method is found to be the most effective method within these three in terms of both computation efficiency and predication accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a hardware-based path planning architecture for unmanned aerial vehicle (UAV) adaptation is proposed. The architecture aims to provide UAVs with higher autonomy using an application specific evolutionary algorithm (EA) implemented entirely on a field programmable gate array (FPGA) chip. The physical attributes of an FPGA chip, being compact in size and low in power consumption, compliments it to be an ideal platform for UAV applications. The design, which is implemented entirely in hardware, consists of EA modules, population storage resources, and three-dimensional terrain information necessary to the path planning process, subject to constraints accounted for separately via UAV, environment and mission profiles. The architecture has been successfully synthesised for a target Xilinx Virtex-4 FPGA platform with 32% logic slices utilisation. Results obtained from case studies for a small UAV helicopter with environment derived from LIDAR (Light Detection and Ranging) data verify the effectiveness of the proposed FPGA-based path planner, and demonstrate convergence at rates above the typical 10 Hz update frequency of an autopilot system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since the availability of 3D full body scanners and the associated software systems for operations with large point clouds, 3D anthropometry has been marketed as a breakthrough and milestone in ergonomic design. The assumptions made by the representatives of the 3D paradigm need to be critically reviewed though. 3D anthropometry has advantages as well as shortfalls, which need to be carefully considered. While it is apparent that the measurement of a full body point cloud allows for easier storage of raw data and improves quality control, the difficulties in calculation of standardized measurements from the point cloud are widely underestimated. Early studies that made use of 3D point clouds to derive anthropometric dimensions have shown unacceptable deviations from the standardized results measured manually. While 3D human point clouds provide a valuable tool to replicate specific single persons for further virtual studies, or personalize garment, their use in ergonomic design must be critically assessed. Ergonomic, volumetric problems are defined by their 2-dimensional boundary or one dimensional sections. A 1D/2D approach is therefore sufficient to solve an ergonomic design problem. As a consequence, all modern 3D human manikins are defined by the underlying anthropometric girths (2D) and lengths/widths (1D), which can be measured efficiently using manual techniques. Traditionally, Ergonomists have taken a statistical approach to design for generalized percentiles of the population rather than for a single user. The underlying method is based on the distribution function of meaningful single and two-dimensional anthropometric variables. Compared to these variables, the distribution of human volume has no ergonomic relevance. On the other hand, if volume is to be seen as a two-dimensional integral or distribution function of length and girth, the calculation of combined percentiles – a common ergonomic requirement - is undefined. Consequently, we suggest to critically review the cost and use of 3D anthropometry. We also recommend making proper use of widely available single and 2-dimensional anthropometric data in ergonomic design.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cities accumulate and distribute vast sets of digital information. Many decision-making and planning processes in councils, local governments and organisations are based on both real-time and historical data. Until recently, only a small, carefully selected subset of this information has been released to the public – usually for specific purposes (e.g. train timetables, release of planning application through websites to name just a few). This situation is however changing rapidly. Regulatory frameworks, such as the Freedom of Information Legislation in the US, the UK, the European Union and many other countries guarantee public access to data held by the state. One of the results of this legislation and changing attitudes towards open data has been the widespread release of public information as part of recent Government 2.0 initiatives. This includes the creation of public data catalogues such as data.gov.au (U.S.), data.gov.uk (U.K.), data.gov.au (Australia) at federal government levels, and datasf.org (San Francisco) and data.london.gov.uk (London) at municipal levels. The release of this data has opened up the possibility of a wide range of future applications and services which are now the subject of intensified research efforts. Previous research endeavours have explored the creation of specialised tools to aid decision-making by urban citizens, councils and other stakeholders (Calabrese, Kloeckl & Ratti, 2008; Paulos, Honicky & Hooker, 2009). While these initiatives represent an important step towards open data, they too often result in mere collections of data repositories. Proprietary database formats and the lack of an open application programming interface (API) limit the full potential achievable by allowing these data sets to be cross-queried. Our research, presented in this paper, looks beyond the pure release of data. It is concerned with three essential questions: First, how can data from different sources be integrated into a consistent framework and made accessible? Second, how can ordinary citizens be supported in easily composing data from different sources in order to address their specific problems? Third, what are interfaces that make it easy for citizens to interact with data in an urban environment? How can data be accessed and collected?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Accurate and detailed road models play an important role in a number of geospatial applications, such as infrastructure planning, traffic monitoring, and driver assistance systems. In this thesis, an integrated approach for the automatic extraction of precise road features from high resolution aerial images and LiDAR point clouds is presented. A framework of road information modeling has been proposed, for rural and urban scenarios respectively, and an integrated system has been developed to deal with road feature extraction using image and LiDAR analysis. For road extraction in rural regions, a hierarchical image analysis is first performed to maximize the exploitation of road characteristics in different resolutions. The rough locations and directions of roads are provided by the road centerlines detected in low resolution images, both of which can be further employed to facilitate the road information generation in high resolution images. The histogram thresholding method is then chosen to classify road details in high resolution images, where color space transformation is used for data preparation. After the road surface detection, anisotropic Gaussian and Gabor filters are employed to enhance road pavement markings while constraining other ground objects, such as vegetation and houses. Afterwards, pavement markings are obtained from the filtered image using the Otsu's clustering method. The final road model is generated by superimposing the lane markings on the road surfaces, where the digital terrain model (DTM) produced by LiDAR data can also be combined to obtain the 3D road model. As the extraction of roads in urban areas is greatly affected by buildings, shadows, vehicles, and parking lots, we combine high resolution aerial images and dense LiDAR data to fully exploit the precise spectral and horizontal spatial resolution of aerial images and the accurate vertical information provided by airborne LiDAR. Objectoriented image analysis methods are employed to process the feature classiffcation and road detection in aerial images. In this process, we first utilize an adaptive mean shift (MS) segmentation algorithm to segment the original images into meaningful object-oriented clusters. Then the support vector machine (SVM) algorithm is further applied on the MS segmented image to extract road objects. Road surface detected in LiDAR intensity images is taken as a mask to remove the effects of shadows and trees. In addition, normalized DSM (nDSM) obtained from LiDAR is employed to filter out other above-ground objects, such as buildings and vehicles. The proposed road extraction approaches are tested using rural and urban datasets respectively. The rural road extraction method is performed using pan-sharpened aerial images of the Bruce Highway, Gympie, Queensland. The road extraction algorithm for urban regions is tested using the datasets of Bundaberg, which combine aerial imagery and LiDAR data. Quantitative evaluation of the extracted road information for both datasets has been carried out. The experiments and the evaluation results using Gympie datasets show that more than 96% of the road surfaces and over 90% of the lane markings are accurately reconstructed, and the false alarm rates for road surfaces and lane markings are below 3% and 2% respectively. For the urban test sites of Bundaberg, more than 93% of the road surface is correctly reconstructed, and the mis-detection rate is below 10%.