918 resultados para Turing machines.
Resumo:
Past studies use deterministic models to evaluate optimal cache configuration or to explore its design space. However, with the increasing number of components present on a chip multiprocessor (CMP), deterministic approaches do not scale well. Hence, we apply probabilistic genetic algorithms (GA) to determine a near-optimal cache configuration for a sixteen tiled CMP. We propose and implement a faster trace based approach to estimate fitness of a chromosome. It shows up-to 218x simulation speedup over the cycle-accurate architectural simulation. Our methodology can be applied to solve other cache optimization problems such as design space exploration of cache and its partitioning among applications/ virtual machines.
Resumo:
Rapid advancements in multi-core processor architectures coupled with low-cost, low-latency, high-bandwidth interconnects have made clusters of multi-core machines a common computing resource. Unfortunately, writing good parallel programs that efficiently utilize all the resources in such a cluster is still a major challenge. Various programming languages have been proposed as a solution to this problem, but are yet to be adopted widely to run performance-critical code mainly due to the relatively immature software framework and the effort involved in re-writing existing code in the new language. In this paper, we motivate and describe our initial study in exploring CUDA as a programming language for a cluster of multi-cores. We develop CUDA-For-Clusters (CFC), a framework that transparently orchestrates execution of CUDA kernels on a cluster of multi-core machines. The well-structured nature of a CUDA kernel, the growing popularity, support and stability of the CUDA software stack collectively make CUDA a good candidate to be considered as a programming language for a cluster. CFC uses a mixture of source-to-source compiler transformations, a work distribution runtime and a light-weight software distributed shared memory to manage parallel executions. Initial results on running several standard CUDA benchmark programs achieve impressive speedups of up to 7.5X on a cluster with 8 nodes, thereby opening up an interesting direction of research for further investigation.
Resumo:
In this paper, a current hysteresis controller with parabolic boundaries for a 12-sided polygonal voltage space vector inverter fed induction motor (IM) drive is proposed. Parabolic boundaries with generalized vector selection logic, valid for all sectors and rotational direction, is used in the proposed controller. The current error space phasor boundary is obtained by first studying the drive scheme with space vector based PWM (SVPWM) controller. Four parabolas are used to approximate this current error space phasor boundary. The system is then run with space phasor based hysteresis PWM controller by limiting the current error space vector (CESV) within the parabolic boundary. The proposed controller has simple controller implementation, nearly constant switching frequency, extended modulation range and fast dynamic response with smooth transition to the over modulation region.
Resumo:
Structural Support Vector Machines (SSVMs) and Conditional Random Fields (CRFs) are popular discriminative methods used for classifying structured and complex objects like parse trees, image segments and part-of-speech tags. The datasets involved are very large dimensional, and the models designed using typical training algorithms for SSVMs and CRFs are non-sparse. This non-sparse nature of models results in slow inference. Thus, there is a need to devise new algorithms for sparse SSVM and CRF classifier design. Use of elastic net and L1-regularizer has already been explored for solving primal CRF and SSVM problems, respectively, to design sparse classifiers. In this work, we focus on dual elastic net regularized SSVM and CRF. By exploiting the weakly coupled structure of these convex programming problems, we propose a new sequential alternating proximal (SAP) algorithm to solve these dual problems. This algorithm works by sequentially visiting each training set example and solving a simple subproblem restricted to a small subset of variables associated with that example. Numerical experiments on various benchmark sequence labeling datasets demonstrate that the proposed algorithm scales well. Further, the classifiers designed are sparser than those designed by solving the respective primal problems and demonstrate comparable generalization performance. Thus, the proposed SAP algorithm is a useful alternative for sparse SSVM and CRF classifier design.
Resumo:
This paper presents the design technique that has been adopted for packaging of Polyvinylidene fluoride (PVDF) nasal sensor for biomedical applications. The PVDF film with the dimension of length 10mm, width 5mm and thickness 28 mu m was firmly adhered on one end of plastic base (8mmx5mmx30 mu m) in such a way that it forms a cantilever configuration leaving the other end free for deflection. Now with the leads attached on the surface of the PVDF film, the cantilever configuration becomes the PVDF nasal sensor. For mounting a PVDF nasal sensor, a special headphone was designed, that can fit most of the human head sizes. Two flexible strings are soldered on either side of the headphone. Two identical PVDF nasal sensors were then connected to either side of flexible string of the headphone in such a way that they are placed below the right and left nostrils respectively without disturbing the normal breathing. When a subject wares headphone along with PVDF nasal sensors, two voltage signals due to the piezoelectric property of the PVDF film were generated corresponding to his/her nasal airflow from right and left nostril. The entire design was made compact, so that PVDF nasal sensors along with headphone can be made portable. No special equipment or machines are needed for mounting the PVDF nasal sensors. The time required for packaging of PVDF nasal sensors was less and the approximate cost of the entire assembly (PVDF nasal sensors + headphone) was very nominal.
Resumo:
Elastic Net Regularizers have shown much promise in designing sparse classifiers for linear classification. In this work, we propose an alternating optimization approach to solve the dual problems of elastic net regularized linear classification Support Vector Machines (SVMs) and logistic regression (LR). One of the sub-problems turns out to be a simple projection. The other sub-problem can be solved using dual coordinate descent methods developed for non-sparse L2-regularized linear SVMs and LR, without altering their iteration complexity and convergence properties. Experiments on very large datasets indicate that the proposed dual coordinate descent - projection (DCD-P) methods are fast and achieve comparable generalization performance after the first pass through the data, with extremely sparse models.
Resumo:
A micro-newton static force sensor is presented here as a packaged product. The sensor, which is based on the mechanics of deformable objects, consists of a compliant mechanism that amplifies the displacement caused by the force that is to be measured. The output displacement, captured using a digital microscope and analyzed using image processing techniques, is used to calculate the force using precalibrated force-displacement curve. Images are scanned in real time at a frequency of 15 frames per second and sampled at around half the scanning frequency. The sensor was built, packaged, calibrated, and tested. It has simulated and measured stiffness values of 2.60N/m and 2.57N/m, respectively. The smallest force it can reliably measure in the presence of noise is about 2 mu N over a range of 1.4mN. The off-the-shelf digital microscope aside, all of its other components are purely mechanical; they are inexpensive and can be easily made using simple machines. Another highlight of the sensor is that its movable and delicate components are easily replaceable. The sensor can be used in aqueous environment as it does not use electric, magnetic, thermal, or any other fields. Currently, it can only measure static forces or forces that vary at less than 1Hz because its response time and bandwidth are limited by the speed of imaging with a camera. With a universal serial bus (USB) connection of its digital microscope, custom-developed graphical user interface (GUI), and related software, the sensor is fully developed as a readily usable product.
Resumo:
This paper discusses a novel high-speed approach for human action recognition in H. 264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of our work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can effect in reduced hardware utilization and fast recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust in outdoor as well as indoor testing scenarios. We have tested our method on two benchmark action datasets and achieved more than 85% accuracy. The proposed algorithm classifies actions with speed (>2000 fps) approximately 100 times more than existing state-of-the-art pixel-domain algorithms.
Resumo:
With the premise that electronic noise dominates mechanical noise in micromachined accelerometers, we present here a method to enhance the sensitivity and resolution at kHz bandwidth using mechanical amplification. This is achieved by means of a Displacement-amplifying Compliant Mechanism (DaCM) that is appended to the usual sensing element comprising a proof-mass and a suspension. Differential comb-drive arrangement is used for capacitive-sensing. The DaCM is designed to match the stiffness of the suspension so that there is substantial net amplification without compromising the bandwidth. A spring-mass-lever model is used to estimate the lumped parameters of the system. A DaCM-aided accelerometer and another without a DaCM-both occupying the same footprint-are compared to show that the former gives enhanced sensitivity: 8.7 nm/g vs. 1.4 nm/g displacement at the sensing-combs under static conditions. A prototype of the DaCM-aided micromachined acclerometer was fabricated using bulk-micromachining. It was tested at the die-level and then packaged on a printed circuit board with an off-the-shelf integrated chip for measuring change in capacitance. Under dynamic conditions, the measured amplification factor at the output of the DaCM was observed to be about 11 times larger than the displacement of the proof-mass and thus validating the concept of enhancing the sensitivity of accelerometers using mechanical amplifiers. The measured first in-plane natural frequency of the fabricated accelerometer was 6.25 kHz. The packaged accelerometer with the DaCM was measured to have 26.7 mV/g sensitivity at 40 Hz.
Resumo:
It is essential to accurately estimate the working set size (WSS) of an application for various optimizations such as to partition cache among virtual machines or reduce leakage power dissipated in an over-allocated cache by switching it OFF. However, the state-of-the-art heuristics such as average memory access latency (AMAL) or cache miss ratio (CMR) are poorly correlated to the WSS of an application due to 1) over-sized caches and 2) their dispersed nature. Past studies focus on estimating WSS of an application executing on a uniprocessor platform. Estimating the same for a chip multiprocessor (CMP) with a large dispersed cache is challenging due to the presence of concurrently executing threads/processes. Hence, we propose a scalable, highly accurate method to estimate WSS of an application. We call this method ``tagged WSS (TWSS)'' estimation method. We demonstrate the use of TWSS to switch-OFF the over-allocated cache ways in Static and Dynamic NonUniform Cache Architectures (SNUCA, DNUCA) on a tiled CMP. In our implementation of adaptable way SNUCA and DNUCA caches, decision of altering associativity is taken by each L2 controller. Hence, this approach scales better with the number of cores present on a CMP. It gives overall (geometric mean) 26% and 19% higher energy-delay product savings compared to AMAL and CMR heuristics on SNUCA, respectively.
Resumo:
Virtualization is one of the key enabling technologies for Cloud computing. Although it facilitates improved utilization of resources, virtualization can lead to performance degradation due to the sharing of physical resources like CPU, memory, network interfaces, disk controllers, etc. Multi-tenancy can cause highly unpredictable performance for concurrent I/O applications running inside virtual machines that share local disk storage in Cloud. Disk I/O requests in a typical Cloud setup may have varied requirements in terms of latency and throughput as they arise from a range of heterogeneous applications having diverse performance goals. This necessitates providing differential performance services to different I/O applications. In this paper, we present PriDyn, a novel scheduling framework which is designed to consider I/O performance metrics of applications such as acceptable latency and convert them to an appropriate priority value for disk access based on the current system state. This framework aims to provide differentiated I/O service to various applications and ensures predictable performance for critical applications in multi-tenant Cloud environment. We demonstrate through experimental validations on real world I/O traces that this framework achieves appreciable enhancements in I/O performance, indicating that this approach is a promising step towards enabling QoS guarantees on Cloud storage.
Resumo:
Prediction of queue waiting times of jobs submitted to production parallel batch systems is important to provide overall estimates to users and can also help meta-schedulers make scheduling decisions. In this work, we have developed a framework for predicting ranges of queue waiting times for jobs by employing multi-class classification of similar jobs in history. Our hierarchical prediction strategy first predicts the point wait time of a job using dynamic k-Nearest Neighbor (kNN) method. It then performs a multi-class classification using Support Vector Machines (SVMs) among all the classes of the jobs. The probabilities given by the SVM for the class predicted using k-NN and its neighboring classes are used to provide a set of ranges of predicted wait times with probabilities. We have used these predictions and probabilities in a meta-scheduling strategy that distributes jobs to different queues/sites in a multi-queue/grid environment for minimizing wait times of the jobs. Experiments with different production supercomputer job traces show that our prediction strategies can give correct predictions for about 77-87% of the jobs, and also result in about 12% improved accuracy when compared to the next best existing method. Experiments with our meta-scheduling strategy using different production and synthetic job traces for various system sizes, partitioning schemes and different workloads, show that the meta-scheduling strategy gives much improved performance when compared to existing scheduling policies by reducing the overall average queue waiting times of the jobs by about 47%.
Resumo:
A wheeled mobile robot (WMR) will move on an uneven terrain without slip if its torus-shaped wheels tilt in a lateral direction. An independent two degree-of-freedom (DOF) suspension is required to maintain contact with uneven terrain and for lateral tilting. This article deals with the modeling and simulation of a three-wheeled mobile robot with torus-shaped wheels and four novel two-DOF suspension mechanism concepts. Simulations are performed on an uneven terrain for three representative pathsa straight line, a circular, and an S'-shaped path. Simulations show that a novel concept using double four-bar mechanism performs better than the other three concepts.
Resumo:
In structured output learning, obtaining labeled data for real-world applications is usually costly, while unlabeled examples are available in abundance. Semisupervised structured classification deals with a small number of labeled examples and a large number of unlabeled structured data. In this work, we consider semisupervised structural support vector machines with domain constraints. The optimization problem, which in general is not convex, contains the loss terms associated with the labeled and unlabeled examples, along with the domain constraints. We propose a simple optimization approach that alternates between solving a supervised learning problem and a constraint matching problem. Solving the constraint matching problem is difficult for structured prediction, and we propose an efficient and effective label switching method to solve it. The alternating optimization is carried out within a deterministic annealing framework, which helps in effective constraint matching and avoiding poor local minima, which are not very useful. The algorithm is simple and easy to implement. Further, it is suitable for any structured output learning problem where exact inference is available. Experiments on benchmark sequence labeling data sets and a natural language parsing data set show that the proposed approach, though simple, achieves comparable generalization performance.
Resumo:
This paper discusses a novel high-speed approach for human action recognition in H.264/AVC compressed domain. The proposed algorithm utilizes cues from quantization parameters and motion vectors extracted from the compressed video sequence for feature extraction and further classification using Support Vector Machines (SVM). The ultimate goal of the proposed work is to portray a much faster algorithm than pixel domain counterparts, with comparable accuracy, utilizing only the sparse information from compressed video. Partial decoding rules out the complexity of full decoding, and minimizes computational load and memory usage, which can result in reduced hardware utilization and faster recognition results. The proposed approach can handle illumination changes, scale, and appearance variations, and is robust to outdoor as well as indoor testing scenarios. We have evaluated the performance of the proposed method on two benchmark action datasets and achieved more than 85 % accuracy. The proposed algorithm classifies actions with speed (> 2,000 fps) approximately 100 times faster than existing state-of-the-art pixel-domain algorithms.