927 resultados para Data processing Computer science


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose and study low complexity algorithms for on-line estimation of hidden Markov model (HMM) parameters. The estimates approach the true model parameters as the measurement noise approaches zero, but otherwise give improved estimates, albeit with bias. On a nite data set in the high noise case, the bias may not be signi cantly more severe than for a higher complexity asymptotically optimal scheme. Our algorithms require O(N3) calculations per time instant, where N is the number of states. Previous algorithms based on earlier hidden Markov model signal processing methods, including the expectation-maximumisation (EM) algorithm require O(N4) calculations per time instant.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Health Information Exchange (HIE) is an interesting phenomenon. It is a patient centric health and/or medical information management scenario enhanced by integration of Information and Communication Technologies (ICT). While health information systems are repositioning complex system directives, in the wake of the ‘big data’ paradigm, extracting quality information is challenging. It is anticipated that in this talk, ICT enabled healthcare scenarios with big data analytics will be shared. In addition, research and development regarding big data analytics, such as current trends of using these technologies for health care services and critical research challenges when extracting quality of information to improve quality of life will be discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Double-pulse tests are commonly used as a method for assessing the switching performance of power semiconductor switches in a clamped inductive switching application. Data generated from these tests are typically in the form of sampled waveform data captured using an oscilloscope. In cases where it is of interest to explore a multi-dimensional parameter space and corresponding result space it is necessary to reduce the data into key performance metrics via feature extraction. This paper presents techniques for the extraction of switching performance metrics from sampled double-pulse waveform data. The reported techniques are applied to experimental data from characterisation of a cascode gate drive circuit applied to power MOSFETs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a design framework intended to conceptually map the influence that game design has on the creative activity people engage in during gameplay. The framework builds on behavioral and verbal analysis of people playing puzzle games. The analysis was designed to better understand the extent to which gameplay activities within different games facilitate creative problem solving. We have used an expert review process to evaluate these games in terms of their game design elements and have taken a cognitive action approach to this process to investigate how particular elements produce the potential for creative activity. This paper proposes guidelines that build upon our understanding of the relationship between the creative processes that players undertake during a game and the components of the game that allow these processes to occur. These guidelines may be used in the game design process to better facilitate creative gameplay activity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past few years, there has been a steady increase in the attention, importance and focus of green initiatives related to data centers. While various energy aware measures have been developed for data centers, the requirement of improving the performance efficiency of application assignment at the same time has yet to be fulfilled. For instance, many energy aware measures applied to data centers maintain a trade-off between energy consumption and Quality of Service (QoS). To address this problem, this paper presents a novel concept of profiling to facilitate offline optimization for a deterministic application assignment to virtual machines. Then, a profile-based model is established for obtaining near-optimal allocations of applications to virtual machines with consideration of three major objectives: energy cost, CPU utilization efficiency and application completion time. From this model, a profile-based and scalable matching algorithm is developed to solve the profile-based model. The assignment efficiency of our algorithm is then compared with that of the Hungarian algorithm, which does not scale well though giving the optimal solution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research is a step forward in improving the accuracy of detecting anomaly in a data graph representing connectivity between people in an online social network. The proposed hybrid methods are based on fuzzy machine learning techniques utilising different types of structural input features. The methods are presented within a multi-layered framework which provides the full requirements needed for finding anomalies in data graphs generated from online social networks, including data modelling and analysis, labelling, and evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed computation and storage have been widely used for processing of big data sets. For many big data problems, with the size of data growing rapidly, the distribution of computing tasks and related data can affect the performance of the computing system greatly. In this paper, a distributed computing framework is presented for high performance computing of All-to-All Comparison Problems. A data distribution strategy is embedded in the framework for reduced storage space and balanced computing load. Experiments are conducted to demonstrate the effectiveness of the developed approach. They have shown that about 88% of the ideal performance capacity have be achieved in multiple machines through using the approach presented in this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we summarize our recent work in analyz- ing and predicting behaviors in sports using spatiotemporal data. We specifically focus on two recent works: 1) Predicting the location of shot in tennis using Hawk-Eye tennis data, and 2) Clustering spatiotemporal plays in soccer to discover the methods in which they get a shot on goal from a professional league.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a single pass algorithm for mining discriminative Itemsets in data streams using a novel data structure and the tilted-time window model. Discriminative Itemsets are defined as Itemsets that are frequent in one data stream and their frequency in that stream is much higher than the rest of the streams in the dataset. In order to deal with the data structure size, we propose a pruning process that results in the compact tree structure containing discriminative Itemsets. Empirical analysis shows the sound time and space complexity of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rapid recursive estimation of hidden Markov Model (HMM) parameters is important in applications that place an emphasis on the early availability of reasonable estimates (e.g. for change detection) rather than the provision of longer-term asymptotic properties (such as convergence, convergence rate, and consistency). In the context of vision- based aircraft (image-plane) heading estimation, this paper suggests and evaluates the short-data estimation properties of 3 recursive HMM parameter estimation techniques (a recursive maximum likelihood estimator, an online EM HMM estimator, and a relative entropy based estimator). On both simulated and real data, our studies illustrate the feasibility of rapid recursive heading estimation, but also demonstrate the need for careful step-size design of HMM recursive estimation techniques when these techniques are intended for use in applications where short-data behaviour is paramount.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Heterogeneous health data is a critical issue when managing health information for quality decision making processes. In this paper we examine the efficient aggregation of lifestyle information through a data warehousing architecture lens. We present a proof of concept for a clinical data warehouse architecture that enables evidence based decision making processes by integrating and organising disparate data silos in support of healthcare services improvement paradigms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an algorithm for mining unordered embedded subtrees using the balanced-optimal-search canonical form (BOCF). A tree structure guided scheme based enumeration approach is defined using BOCF for systematically enumerating the valid subtrees only. Based on this canonical form and enumeration technique, the balanced optimal search embedded subtree mining algorithm (BEST) is introduced for mining embedded subtrees from a database of labelled rooted unordered trees. The extensive experiments on both synthetic and real datasets demonstrate the efficiency of BEST over the two state-of-the-art algorithms for mining embedded unordered subtrees, SLEUTH and U3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying product families has been considered as an effective way to accommodate the increasing product varieties across the diverse market niches. In this paper, we propose a novel framework to identifying product families by using a similarity measure for a common product design data BOM (Bill of Materials) based on data mining techniques such as frequent mining and clus-tering. For calculating the similarity between BOMs, a novel Extended Augmented Adjacency Matrix (EAAM) representation is introduced that consists of information not only of the content and topology but also of the fre-quent structural dependency among the various parts of a product design. These EAAM representations of BOMs are compared to calculate the similarity between products and used as a clustering input to group the product fami-lies. When applied on a real-life manufacturing data, the proposed framework outperforms a current baseline that uses orthogonal Procrustes for grouping product families.