45 resultados para data gathering algorithm
Resumo:
With mixed feature data, problems are induced in modeling the gating network of normalized Gaussian (NG) networks as the assumption of multivariate Gaussian becomes invalid. In this paper, we propose an independence model to handle mixed feature data within the framework of NG networks. The method is illustrated using a real example of breast cancer data.
Resumo:
An approach and strategy for automatic detection of buildings from aerial images using combined image analysis and interpretation techniques is described in this paper. It is undertaken in several steps. A dense DSM is obtained by stereo image matching and then the results of multi-band classification, the DSM, and Normalized Difference Vegetation Index (NDVI) are used to reveal preliminary building interest areas. From these areas, a shape modeling algorithm has been used to precisely delineate their boundaries. The Dempster-Shafer data fusion technique is then applied to detect buildings from the combination of three data sources by a statistically-based classification. A number of test areas, which include buildings of different sizes, shape, and roof color have been investigated. The tests are encouraging and demonstrate that all processes in this system are important for effective building detection.
Resumo:
Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
The paper investigates a Bayesian hierarchical model for the analysis of categorical longitudinal data from a large social survey of immigrants to Australia. Data for each subject are observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and the explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia.
Resumo:
The estimation of a concentration-dependent diffusion coefficient in a drying process is known as an inverse coefficient problem. The solution is sought wherein the space-average concentration is known as function of time (mass loss monitoring). The problem is stated as the minimization of a functional and gradient-based algorithms are used to solve it. Many numerical and experimental examples that demonstrate the effectiveness of the proposed approach are presented. Thin slab drying was carried out in an isothermal drying chamber built in our laboratory. The diffusion coefficients of fructose obtained with the present method are compared with existing literature results.
Resumo:
This paper investigates the performance of EASI algorithm and the proposed EKENS algorithm for linear and nonlinear mixtures. The proposed EKENS algorithm is based on the modified equivariant algorithm and kernel density estimation. Theory and characteristic of both the algorithms are discussed for blind source separation model. The separation structure of nonlinear mixtures is based on a nonlinear stage followed by a linear stage. Simulations with artificial and natural data demonstrate the feasibility and good performance of the proposed EKENS algorithm.
Resumo:
In many online applications, we need to maintain quantile statistics for a sliding window on a data stream. The sliding windows in natural form are defined as the most recent N data items. In this paper, we study the problem of estimating quantiles over other types of sliding windows. We present a uniform framework to process quantile queries for time constrained and filter based sliding windows. Our algorithm makes one pass on the data stream and maintains an E-approximate summary. It uses O((1)/(epsilon2) log(2) epsilonN) space where N is the number of data items in the window. We extend this framework to further process generalized constrained sliding window queries and proved that our technique is applicable for flexible window settings. Our performance study indicates that the space required in practice is much less than the given theoretical bound and the algorithm supports high speed data streams.
Resumo:
Online multimedia data needs to be encrypted for access control. To be capable of working on mobile devices such as pocket PC and mobile phones, lightweight video encryption algorithms should be proposed. The two major problems in these algorithms are that they are either not fast enough or unable to work on highly compressed data stream. In this paper, we proposed a new lightweight encryption algorithm based on Huffman error diffusion. It is a selective algorithm working on compressed data. By carefully choosing the most significant parts (MSP), high performance is achieved with proper security. Experimental results has proved the algorithm to be fast. secure: and compression-compatible.
Resumo:
Frequent Itemsets mining is well explored for various data types, and its computational complexity is well understood. There are methods to deal effectively with computational problems. This paper shows another approach to further performance enhancements of frequent items sets computation. We have made a series of observations that led us to inventing data pre-processing methods such that the final step of the Partition algorithm, where a combination of all local candidate sets must be processed, is executed on substantially smaller input data. The paper shows results from several experiments that confirmed our general and formally presented observations.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.
Resumo:
In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
Resumo:
In this paper, we propose an algorithm for partitioning parameterized orthogonal polygons into rectangles. The algorithm is based on the plane-sweep technique and can be used for partitioning polygons which contain holes. The input to the algorithm consists of the contour of a parameterized polygon to be partitioned and the constraints for those parameters which reside in the contour. The algorithm uses horizontal cuts only and generates a minimum number of rectangles whose union is the original orthogonal polygon. The proposed algorithm can be used as the basis to build corner stitching data structure for parameterized VLSI layouts and has been implemented in Java programming language. Copyright © 2010 ACM, Inc.