873 resultados para constrained clustering
Resumo:
In this paper, we study the problem of wireless sensor network design by deploying a minimum number of additional relay nodes (to minimize network design cost) at a subset of given potential relay locationsin order to convey the data from already existing sensor nodes (hereafter called source nodes) to a Base Station within a certain specified mean delay bound. We formulate this problem in two different ways, and show that the problem is NP-Hard. For a problem in which the number of existing sensor nodes and potential relay locations is n, we propose an O(n) approximation algorithm of polynomial time complexity. Results show that the algorithm performs efficiently (in over 90% of the tested scenarios, it gave solutions that were either optimal or exceeding optimal just by one relay) in various randomly generated network scenarios.
Resumo:
A wireless Energy Harvesting Sensor (EHS) needs to send data packets arriving in its queue over a fading channel at maximum possible throughput while ensuring acceptable packet delays. At the same time, it needs to ensure that energy neutrality is satisfied, i.e., the average energy drawn from a battery should equal the amount of energy deposited in it minus the energy lost due to the inefficiency of the battery. In this work, a framework is developed under which a system designer can optimize the performance of the EHS node using power control based on the current channel state information, when the EHS node employs a single modulation and coding scheme and the channel is Rayleigh fading. Optimal system parameters for throughput optimal, delay optimal and delay-constrained throughput optimal policies that ensure energy neutrality are derived. It is seen that a throughput optimal (maximal) policy is packet delay-unbounded and an average delay optimal (minimal) policy achieves negligibly small throughput. Finally, the influence of the harvested energy profile on the performance of the EHS is illustrated through the example of solar energy harvesting.
Resumo:
Emerging high-dimensional data mining applications needs to find interesting clusters embeded in arbitrarily aligned subspaces of lower dimensionality. It is difficult to cluster high-dimensional data objects, when they are sparse and skewed. Updations are quite common in dynamic databases and they are usually processed in batch mode. In very large dynamic databases, it is necessary to perform incremental cluster analysis only to the updations. We present a incremental clustering algorithm for subspace clustering in very high dimensions, which handles both insertion and deletions of datapoints to the backend databases.
Resumo:
Delineation of homogeneous precipitation regions (regionalization) is necessary for investigating frequency and spatial distribution of meteorological droughts. The conventional methods of regionalization use statistics of precipitation as attributes to establish homogeneous regions. Therefore they cannot be used to form regions in ungauged areas, and they may not be useful to form meaningful regions in areas having sparse rain gauge density. Further, validation of the regions for homogeneity in precipitation is not possible, since the use of the precipitation statistics to form regions and subsequently to test the regional homogeneity is not appropriate. To alleviate this problem, an approach based on fuzzy cluster analysis is presented. It allows delineation of homogeneous precipitation regions in data sparse areas using large scale atmospheric variables (LSAV), which influence precipitation in the study area, as attributes. The LSAV, location parameters (latitude, longitude and altitude) and seasonality of precipitation are suggested as features for regionalization. The approach allows independent validation of the identified regions for homogeneity using statistics computed from the observed precipitation. Further it has the ability to form regions even in ungauged areas, owing to the use of attributes that can be reliably estimated even when no at-site precipitation data are available. The approach was applied to delineate homogeneous annual rainfall regions in India, and its effectiveness is illustrated by comparing the results with those obtained using rainfall statistics, regionalization based on hard cluster analysis, and meteorological sub-divisions in India. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
MATLAB is an array language, initially popular for rapid prototyping, but is now being increasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program's execution time. Today's computer systems have tremendous computing power in the form of traditional CPU cores and throughput oriented accelerators such as graphics processing units(GPUs). Thus, an approach that maps the control flow dominated regions to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this paper, we present the design and implementation of MEGHA, a compiler that automatically compiles MATLAB programs to enable synergistic execution on heterogeneous processors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. We propose a set of compiler optimizations tailored for MATLAB. Our compiler identifies data parallel regions of the program and composes them into kernels. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically and the amount of data transfer needed is minimized. In order to ensure required data movement for dependencies across basic blocks, we propose a data flow analysis and edge splitting strategy. Thus our compiler automatically handles composition of kernels, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfer. The proposed compiler was implemented and experimental evaluation using a set of MATLAB benchmarks shows that our approach achieves a geometric mean speedup of 19.8X for data parallel benchmarks over native execution of MATLAB.
Resumo:
Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.
Resumo:
Technology scaling has caused Negative Bias Temperature Instability (NBTI) to emerge as a major circuit reliability concern. Simultaneously leakage power is becoming a greater fraction of the total power dissipated by logic circuits. As both NBTI and leakage power are highly dependent on vectors applied at the circuit’s inputs, they can be minimized by applying carefully chosen input vectors during periods when the circuit is in standby or idle mode. Unfortunately input vectors that minimize leakage power are not the ones that minimize NBTI degradation, so there is a need for a methodology to generate input vectors that minimize both of these variables.This paper proposes such a systematic methodology for the generation of input vectors which minimize leakage power under the constraint that NBTI degradation does not exceed a specified limit. These input vectors can be applied at the primary inputs of a circuit when it is in standby/idle mode and are such that the gates dissipate only a small amount of leakage power and also allow a large majority of the transistors on critical paths to be in the “recovery” phase of NBTI degradation. The advantage of this methodology is that allowing circuit designers to constrain NBTI degradation to below a specified limit enables tighter guardbanding, increasing performance. Our methodology guarantees that the generated input vector dissipates the least leakage power among all the input vectors that satisfy the degradation constraint. We formulate the problem as a zero-one integer linear program and show that this formulation produces input vectors whose leakage power is within 1% of a minimum leakage vector selected by a search algorithm and simultaneously reduces NBTI by about 5.75% of maximum circuit delay as compared to the worst case NBTI degradation. Our paper also proposes two new algorithms for the identification of circuit paths that are affected the most by NBTI degradation. The number of such paths identified by our algorithms are an order of magnitude fewer than previously proposed heuristics.
Resumo:
Advertisements(Ads) are the main revenue earner for Television (TV) broadcasters. As TV reaches a large audience, it acts as the best media for advertisements of products and services. With the emergence of digital TV, it is important for the broadcasters to provide an intelligent service according to the various dimensions like program features, ad features, viewers’ interest and sponsors’ preference. We present an automatic ad recommendation algorithm that selects a set of ads by considering these dimensions and semantically match them with programs. Features of the ad video are captured interms of annotations and they are grouped into number of predefined semantic categories by using a categorization technique. Fuzzy categorical data clustering technique is applied on categorized data for selecting better suited ads for a particular program. Since the same ad can be recommended for more than one program depending upon multiple parameters, fuzzy clustering acts as the best suited method for ad recommendation. The relative fuzzy score called “degree of membership” calculated for each ad indicates the membership of a particular ad to different program clusters. Subjective evaluation of the algorithm is done by 10 different people and rated with a high success score.
Resumo:
In this paper, we study how TCP and UDP flows interact with each other when the end system is a CPU resource constrained thin client. The problem addressed is twofold, 1) the throughput of TCP flows degrades severely in the presence of heavily loaded UDP flows 2) fairness and minimum QoS requirements of UDP are not maintained. First, we identify the factors affecting the TCP throughput by providing an in-depth analysis of end to end delay and packet loss variations. The results obtained from the first part leads us to our second contribution. We propose and study the use of an algorithm that ensures fairness across flows. The algorithm improves the performance of TCP flows in the presence of multiple UDP flows admitted under an admission algorithm and maintains the minimum QoS requirements of the UDP flows. The advantage of the algorithm is that it requires no changes to TCP/IP stack and control is achieved through receiver window control.
Resumo:
Support Vector Clustering has gained reasonable attention from the researchers in exploratory data analysis due to firm theoretical foundation in statistical learning theory. Hard Partitioning of the data set achieved by support vector clustering may not be acceptable in real world scenarios. Rough Support Vector Clustering is an extension of Support Vector Clustering to attain a soft partitioning of the data set. But the Quadratic Programming Problem involved in Rough Support Vector Clustering makes it computationally expensive to handle large datasets. In this paper, we propose Rough Core Vector Clustering algorithm which is a computationally efficient realization of Rough Support Vector Clustering. Here Rough Support Vector Clustering problem is formulated using an approximate Minimum Enclosing Ball problem and is solved using an approximate Minimum Enclosing Ball finding algorithm. Experiments done with several Large Multi class datasets such as Forest cover type, and other Multi class datasets taken from LIBSVM page shows that the proposed strategy is efficient, finds meaningful soft cluster abstractions which provide a superior generalization performance than the SVM classifier.
Resumo:
Applications in various domains often lead to very large and frequently high-dimensional data. Successful algorithms must avoid the curse of dimensionality but at the same time should be computationally efficient. Finding useful patterns in large datasets has attracted considerable interest recently. The primary goal of the paper is to implement an efficient Hybrid Tree based clustering method based on CF-Tree and KD-Tree, and combine the clustering methods with KNN-Classification. The implementation of the algorithm involves many issues like good accuracy, less space and less time. We will evaluate the time and space efficiency, data input order sensitivity, and clustering quality through several experiments.
Resumo:
Clustering techniques are used in regional flood frequency analysis (RFFA) to partition watersheds into natural groups or regions with similar hydrologic responses. The linear Kohonen's self‐organizing feature map (SOFM) has been applied as a clustering technique for RFFA in several recent studies. However, it is seldom possible to interpret clusters from the output of an SOFM, irrespective of its size and dimensionality. In this study, we demonstrate that SOFMs may, however, serve as a useful precursor to clustering algorithms. We present a two‐level. SOFM‐based clustering approach to form regions for FFA. In the first level, the SOFM is used to form a two‐dimensional feature map. In the second level, the output nodes of SOFM are clustered using Fuzzy c‐means algorithm to form regions. The optimal number of regions is based on fuzzy cluster validation measures. Effectiveness of the proposed approach in forming homogeneous regions for FFA is illustrated through application to data from watersheds in Indiana, USA. Results show that the performance of the proposed approach to form regions is better than that based on classical SOFM.
Resumo:
This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.