215 resultados para Computing Classification Systems
Resumo:
One of the main challenges in data analytics is that discovering structures and patterns in complex datasets is a computer-intensive task. Recent advances in high-performance computing provide part of the solution. Multicore systems are now more affordable and more accessible. In this paper, we investigate how this can be used to develop more advanced methods for data analytics. We focus on two specific areas: model-driven analysis and data mining using optimisation techniques.
Resumo:
As computational models in fields such as medicine and engineering get more refined, resource requirements are increased. In a first instance, these needs have been satisfied using parallel computing and HPC clusters. However, such systems are often costly and lack flexibility. HPC users are therefore tempted to move to elastic HPC using cloud services. One difficulty in making this transition is that HPC and cloud systems are different, and performance may vary. The purpose of this study is to evaluate cloud services as a means to minimise both cost and computation time for large-scale simulations, and to identify which system properties have the most significant impact on performance. Our simulation results show that, while the performance of Virtual CPU (VCPU) is satisfactory, network throughput may lead to difficulties.
Resumo:
Narrative text is a useful way of identifying injury circumstances from the routine emergency department data collections. Automatically classifying narratives based on machine learning techniques is a promising technique, which can consequently reduce the tedious manual classification process. Existing works focus on using Naive Bayes which does not always offer the best performance. This paper proposes the Matrix Factorization approaches along with a learning enhancement process for this task. The results are compared with the performance of various other classification approaches. The impact on the classification results from the parameters setting during the classification of a medical text dataset is discussed. With the selection of right dimension k, Non Negative Matrix Factorization-model method achieves 10 CV accuracy of 0.93.
Resumo:
Discounted Cumulative Gain (DCG) is a well-known ranking evaluation measure for models built with multiple relevance graded data. By handling tagging data used in recommendation systems as an ordinal relevance set of {negative,null,positive}, we propose to build a DCG based recommendation model. We present an efficient and novel learning-to-rank method by optimizing DCG for a recommendation model using the tagging data interpretation scheme. Evaluating the proposed method on real-world datasets, we demonstrate that the method is scalable and outperforms the benchmarking methods by generating a quality top-N item recommendation list.
Resumo:
In this paper we discuss some preliminary results of an ethnographic study focused on the ways money and financial issues are collaboratively handled within families. Families develop ‘systems’ or methods through which they organize and manage their everyday financial activities. These systems not only organize everyday family finances, but represent and shape family relationships. Through analysis of our ethnographic field study data, we develop four types of financial systems that we observed in the field: banking arrangements, physical hubs, goal-oriented systems and spatio-temporal organization. In this paper, we discuss examples of these systems and their implications for designing tools to support household financial practices.
Resumo:
Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.
Resumo:
For future planetary robot missions, multi-robot-systems can be considered as a suitable platform to perform space mission faster and more reliable. In heterogeneous robot teams, each robot can have different abilities and sensor equipment. In this paper we describe a lunar demonstration scenario where a team of mobile robots explores an unknown area and identifies a set of objects belonging to a lunar infrastructure. Our robot team consists of two exploring scout robots and a mobile manipulator. The mission goal is to locate the objects within a certain area, to identify the objects, and to transport the objects to a base station. The robots have a different sensor setup and different capabilities. In order to classify parts of the lunar infrastructure, the robots have to share the knowledge about the objects. Based on the different sensing capabilities, several information modalities have to be shared and combined by the robots. In this work we propose an approach using spatial features and a fuzzy logic based reasoning for distributed object classification.
Resumo:
Affect is an important feature of multimedia content and conveys valuable information for multimedia indexing and retrieval. Most existing studies for affective content analysis are limited to low-level features or mid-level representations, and are generally criticized for their incapacity to address the gap between low-level features and high-level human affective perception. The facial expressions of subjects in images carry important semantic information that can substantially influence human affective perception, but have been seldom investigated for affective classification of facial images towards practical applications. This paper presents an automatic image emotion detector (IED) for affective classification of practical (or non-laboratory) data using facial expressions, where a lot of “real-world” challenges are present, including pose, illumination, and size variations etc. The proposed method is novel, with its framework designed specifically to overcome these challenges using multi-view versions of face and fiducial point detectors, and a combination of point-based texture and geometry. Performance comparisons of several key parameters of relevant algorithms are conducted to explore the optimum parameters for high accuracy and fast computation speed. A comprehensive set of experiments with existing and new datasets, shows that the method is effective despite pose variations, fast, and appropriate for large-scale data, and as accurate as the method with state-of-the-art performance on laboratory-based data. The proposed method was also applied to affective classification of images from the British Broadcast Corporation (BBC) in a task typical for a practical application providing some valuable insights.
Resumo:
Cloud computing has significantly impacted a broad range of industries, but these technologies and services have been absorbed throughout the marketplace unevenly. Some industries have moved aggressively towards cloud computing, while others have moved much more slowly. For the most part, the energy sector has approached cloud computing in a measured and cautious way, with progress often in the form of private cloud solutions rather than public ones, or hybridized information technology systems that combine cloud and existing non-cloud architectures. By moving towards cloud computing in a very slow and tentative way, however, the energy industry may prevent itself from reaping the full benefit that a more complete migration to the public cloud has brought about in several other industries. This short communication is accordingly intended to offer a high-level overview of cloud computing, and to put forward the argument that the energy sector should make a more complete migration to the public cloud in order to unlock the major system-wide efficiencies that cloud computing can provide. Also, assets within the energy sector should be designed with as much modularity and flexibility as possible so that they are not locked out of cloud-friendly options in the future.
Resumo:
Crashes at level crossings are a major issue worldwide. In Australia, as well as in other countries, the number of crashes with vehicles has declined in the past years, while the number of crashes involving pedestrians seems to have remained unchanged. A systematic review of research related to pedestrian behaviour highlighted a number of important scientific gaps in current knowledge. The complexity of such intersections imposes particular constraints to the understanding of pedestrians’ crossing behaviour. A new systems-based framework, called Pedestrian Unsafe Level Crossing framework (PULC) was developed. The PULC organises contributing factors to crossing behaviour on different system levels as per the hierarchical classification of Jens Rasmussen’s Framework for Risk Management. In addition, the framework adapts James Reason’s classification to distinguish between different types of unsafe behaviour. The framework was developed as a tool for collection of generalizable data that could be used to predict current or future system failures or to identify aspects of the system that require further safety improvement. To give it an initial support, the PULC was applied to the analysis of qualitative data from focus groups discussions. A total number of 12 pedestrians who regularly crossed the same level crossing were asked about their daily experience and their observations of others’ behaviour which allowed the extraction and classification of factors associated with errors and violations. Two case studies using Rasmussen’s AcciMap technique are presented as an example of potential application of the framework. A discussion on the identified multiple risk contributing factors and their interactions is provided, in light of the benefits of applying a systems approach to the understanding of the origins of individual’s behaviour. Potential actions towards safety improvement are discussed.
Resumo:
The efficient computation of matrix function vector products has become an important area of research in recent times, driven in particular by two important applications: the numerical solution of fractional partial differential equations and the integration of large systems of ordinary differential equations. In this work we consider a problem that combines these two applications, in the form of a numerical solution algorithm for fractional reaction diffusion equations that after spatial discretisation, is advanced in time using the exponential Euler method. We focus on the efficient implementation of the algorithm on Graphics Processing Units (GPU), as we wish to make use of the increased computational power available with this hardware. We compute the matrix function vector products using the contour integration method in [N. Hale, N. Higham, and L. Trefethen. Computing Aα, log(A), and related matrix functions by contour integrals. SIAM J. Numer. Anal., 46(5):2505–2523, 2008]. Multiple levels of preconditioning are applied to reduce the GPU memory footprint and to further accelerate convergence. We also derive an error bound for the convergence of the contour integral method that allows us to pre-determine the appropriate number of quadrature points. Results are presented that demonstrate the effectiveness of the method for large two-dimensional problems, showing a speedup of more than an order of magnitude compared to a CPU-only implementation.
Resumo:
This paper addresses the problem of predicting the outcome of an ongoing case of a business process based on event logs. In this setting, the outcome of a case may refer for example to the achievement of a performance objective or the fulfillment of a compliance rule upon completion of the case. Given a log consisting of traces of completed cases, given a trace of an ongoing case, and given two or more possible out- comes (e.g., a positive and a negative outcome), the paper addresses the problem of determining the most likely outcome for the case in question. Previous approaches to this problem are largely based on simple symbolic sequence classification, meaning that they extract features from traces seen as sequences of event labels, and use these features to construct a classifier for runtime prediction. In doing so, these approaches ignore the data payload associated to each event. This paper approaches the problem from a different angle by treating traces as complex symbolic sequences, that is, sequences of events each carrying a data payload. In this context, the paper outlines different feature encodings of complex symbolic sequences and compares their predictive accuracy on real-life business process event logs.
Resumo:
Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.
Resumo:
This tutorial primarily focuses on the implementation of Information Accountability (IA) protocols defined in an Information Accountability Framework (IAF) in eHealth systems. Concerns over the security and privacy of patient information are one of the biggest hindrances to sharing health information and the wide adoption of eHealth systems. At present, there are competing requirements between healthcare consumers' (i.e. patients) requirements and healthcare professionals' (HCP) requirements. While consumers want control over their information, healthcare professionals want access to as much information as required in order to make well-informed decisions and provide quality care. This conflict is evident in the review of Australia's PCEHR system and in recent studies of patient control of access to their eHealth information. In order to balance these requirements, the use of an Information Accountability Framework devised for eHealth systems has been proposed. Through the use of IA protocols, so-called Accountable-eHealth systems (AeH) create an eHealth environment where health information is available to the right person at the right time without rigid barriers whilst empowering the consumers with information control and transparency. In this half-day tutorial, we will discuss and describe the technical challenges surrounding the implementation of the IAF protocols into existing eHealth systems and demonstrate their use. The functionality of the protocols and AeH systems will be demonstrated, and an example of the implementation of the IAF protocols into an existing eHealth system will be presented and discussed.
Resumo:
RFID is an important technology that can be used to create the ubiquitous society. But an RFID system uses open radio frequency signal to transfer information and this leads to pose many serious threats to its privacy and security. In general, the computing and storage resources in an RFID tag are very limited and this makes it difficult to solve its secure and private problems, especially for low-cost RFID tags. In order to ensure the security and privacy of low-cost RFID systems we propose a lightweight authentication protocol based on Hash function. This protocol can ensure forward security and prevent information leakage, location tracing, eavesdropping, replay attack and spoofing. This protocol completes the strong authentication of the reader to the tag by twice authenticating and it only transfers part information of the encrypted tag’s identifier for each session so it is difficult for an adversary to intercept the whole identifier of a tag. This protocol is simple and it takes less computing and storage resources, it is very suitable to some low-cost RFID systems.