65 resultados para Associative Classifiers
Resumo:
This paper presents the design of a full fledged OCR system for printed Kannada text. The machine recognition of Kannada characters is difficult due to similarity in the shapes of different characters, script complexity and non-uniqueness in the representation of diacritics. The document image is subject to line segmentation, word segmentation and zone detection. From the zonal information, base characters, vowel modifiers and consonant conjucts are separated. Knowledge based approach is employed for recognizing the base characters. Various features are employed for recognising the characters. These include the coefficients of the Discrete Cosine Transform, Discrete Wavelet Transform and Karhunen-Louve Transform. These features are fed to different classifiers. Structural features are used in the subsequent levels to discriminate confused characters. Use of structural features, increases recognition rate from 93% to 98%. Apart from the classical pattern classification technique of nearest neighbour, Artificial Neural Network (ANN) based classifiers like Back Propogation and Radial Basis Function (RBF) Networks have also been studied. The ANN classifiers are trained in supervised mode using the transform features. Highest recognition rate of 99% is obtained with RBF using second level approximation coefficients of Haar wavelets as the features on presegmented base characters.
Resumo:
Given a parametrized n-dimensional SQL query template and a choice of query optimizer, a plan diagram is a color-coded pictorial enumeration of the execution plan choices of the optimizer over the query parameter space. These diagrams have proved to be a powerful metaphor for the analysis and redesign of modern optimizers, and are gaining currency in diverse industrial and academic institutions. However, their utility is adversely impacted by the impractically large computational overheads incurred when standard brute-force exhaustive approaches are used for producing fine-grained diagrams on high-dimensional query templates. In this paper, we investigate strategies for efficiently producing close approximations to complex plan diagrams. Our techniques are customized to the features available in the optimizer's API, ranging from the generic optimizers that provide only the optimal plan for a query, to those that also support costing of sub-optimal plans and enumerating rank-ordered lists of plans. The techniques collectively feature both random and grid sampling, as well as inference techniques based on nearest-neighbor classifiers, parametric query optimization and plan cost monotonicity. Extensive experimentation with a representative set of TPC-H and TPC-DS-based query templates on industrial-strength optimizers indicates that our techniques are capable of delivering 90% accurate diagrams while incurring less than 15% of the computational overheads of the exhaustive approach. In fact, for full-featured optimizers, we can guarantee zero error with less than 10% overheads. These approximation techniques have been implemented in the publicly available Picasso optimizer visualization tool.
Resumo:
Growing concern over the status of global and regional bioenergy resources has necessitated the analysis and monitoring of land cover and land use parameters on spatial and temporal scales. The knowledge of land cover and land use is very important in understanding natural resources utilization, conversion and management. Land cover, land use intensity and land use diversity are land quality indicators for sustainable land management. Optimal management of resources aids in maintaining the ecosystem balance and thereby ensures the sustainable development of a region. Thus sustainable development of a region requires a synoptic ecosystem approach in the management of natural resources that relates to the dynamics of natural variability and the effects of human intervention on key indicators of biodiversity and productivity. Spatial and temporal tools such as remote sensing (RS), geographic information system (GIS) and global positioning system (GPS) provide spatial and attribute data at regular intervals with functionalities of a decision support system aid in visualisation, querying, analysis, etc., which would aid in sustainable management of natural resources. Remote sensing data and GIS technologies play an important role in spatially evaluating bioresource availability and demand. This paper explores various land cover and land use techniques that could be used for bioresources monitoring considering the spatial data of Kolar district, Karnataka state, India. Slope and distance based vegetation indices are computed for qualitative and quantitative assessment of land cover using remote spectral measurements. Differentscale mapping of land use pattern in Kolar district is done using supervised classification approaches. Slope based vegetation indices show area under vegetation range from 47.65 % to 49.05% while distance based vegetation indices shoes its range from 40.40% to 47.41%. Land use analyses using maximum likelihood classifier indicate that 46.69% is agricultural land, 42.33% is wasteland (barren land), 4.62% is built up, 3.07% of plantation, 2.77% natural forest and 0.53% water bodies. The comparative analysis of various classifiers, indicate that the Gaussian maximum likelihood classifier has least errors. The computation of talukwise bioresource status shows that Chikballapur Taluk has better availability of resources compared to other taluks in the district.
Resumo:
Uttara Kannada is the only district in Karnataka, which has a forested area of about 80% and falls in the region of the Western Ghats. It is considered to be a very resourceful in terms of abundant natural resources and constitutes an important district in Karnataka. The forest resources of the district are under pressure as a large portion of the forested area has been converted to non-forestry activities since independence owing to the increased demands from human and animal population resulting in degradation of the forest ecosystem. This has led to poor productivity and regenerative capacity which is evident in the form of barren hill tops, etc in Coastal taluks of Uttara Kannada, entailing regular monitoring of the forest resources very essential. The classification of forest is a prerequisite for managing forest resources. Geographical Information System (GIS), allows the spatial and temporal analysis of the features of interest, and helps in solving the problem of deforestation and associated environmental and ecological problems. Spatial and temporal tools such as GIS and remotely sensed data helps the planners and decision makers in evolving the sustainable strategies for management and conservation of natural resources. Uttara Kannada district was classified on the basis of the land-use using supervised hard classifiers. The land use categories identified were urban area, water bodies, agricultural land, forest cover, and waste land. Further classification was carried out on the basis of forest type. The types of forest categorised were semi-evergreen, evergreen, moist deciduous, dry deciduous, plantations and scrub, thorny and non-forested area. The identified classes were correlated with the ground data collected during field visits. The observed results were compared with the historic data and the changes in the forest cover were analysed. From the assessment made it was clear that there has been a considerable degree of forest loss in certain areas of the district. It was also observed that plantations and social forests have increased drastically over the last fifteen years, and natural forests have declined.
Resumo:
Uttara Kannada is the only district in Karnataka, which has a forested area of about 80% and falls in the region of the Western Ghats. It is considered to be a very resourceful in terms of abundant natural resources and constitutes an important district in Karnataka. The forest resources of the district are under pressure as a large portion of the forested area has been converted to non-forestry activities since independence owing to the increased demands from human and animal population resulting in degradation of the forest ecosystem. This has led to poor productivity and regenerative capacity which is evident in the form of barren hill tops, etc in Coastal taluks of Uttara Kannada, entailing regular monitoring of the forest resources very essential. The classification of forest is a prerequisite for managing forest resources. Geographical Information System (GIS), allows the spatial and temporal analysis of the features of interest, and helps in solving the problem of deforestation and associated environmental and ecological problems. Spatial and temporal tools such as GIS and remotely sensed data helps the planners and decision makers in evolving the sustainable strategies for management and conservation of natural resources. Uttara Kannada district was classified on the basis of the land-use using supervised hard classifiers. The land use categories identified were urban area, water bodies, agricultural land, forest cover, and waste land. Further classification was carried out on the basis of forest type. The types of forest categorised were semi-evergreen, evergreen, moist deciduous, dry deciduous, plantations and scrub, thorny and non-forested area. The identified classes were correlated with the ground data collected during field visits. The observed results were compared with the historic data and the changes in the forest cover were analysed. From the assessment made it was clear that there has been a considerable degree of forest loss in certain areas of the district. It was also observed that plantations and social forests have increased drastically over the last fifteen years,and natural forests have declined.
Resumo:
The inherent temporal locality in memory accesses is filtered out by the L1 cache. As a consequence, an L2 cache with LRU replacement incurs significantly higher misses than the optimal replacement policy (OPT). We propose to narrow this gap through a novel replacement strategy that mimics the replacement decisions of OPT. The L2 cache is logically divided into two components, a Shepherd Cache (SC) with a simple FIFO replacement and a Main Cache (MC) with an emulation of optimal replacement. The SC plays the dual role of caching lines and guiding the replacement decisions in MC. Our pro- posed organization can cover 40% of the gap between OPT and LRU for a 2MB cache resulting in 7% overall speedup. Comparison with the dynamic insertion policy, a victim buffer, a V-Way cache and an LRU based fully associative cache demonstrates that our scheme performs better than all these strategies.
Resumo:
This paper presents a novel Second Order Cone Programming (SOCP) formulation for large scale binary classification tasks. Assuming that the class conditional densities are mixture distributions, where each component of the mixture has a spherical covariance, the second order statistics of the components can be estimated efficiently using clustering algorithms like BIRCH. For each cluster, the second order moments are used to derive a second order cone constraint via a Chebyshev-Cantelli inequality. This constraint ensures that any data point in the cluster is classified correctly with a high probability. This leads to a large margin SOCP formulation whose size depends on the number of clusters rather than the number of training data points. Hence, the proposed formulation scales well for large datasets when compared to the state-of-the-art classifiers, Support Vector Machines (SVMs). Experiments on real world and synthetic datasets show that the proposed algorithm outperforms SVM solvers in terms of training time and achieves similar accuracies.
Resumo:
This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.
Resumo:
We present a fractal coding method to recognize online handwritten Tamil characters and propose a novel technique to increase the efficiency in terms of time while coding and decoding. This technique exploits the redundancy in data, thereby achieving better compression and usage of lesser memory. It also reduces the encoding time and causes little distortion during reconstruction. Experiments have been conducted to use these fractal codes to classify the online handwritten Tamil characters from the IWFHR 2006 competition dataset. In one approach, we use fractal coding and decoding process. A recognition accuracy of 90% has been achieved by using DTW for distortion evaluation during classification and encoding processes as compared to 78% using nearest neighbor classifier. In other experiments, we use the fractal code, fractal dimensions and features derived from fractal codes as features in separate classifiers. While the fractal code is successful as a feature, the other two features are not able to capture the wide within-class variations.
Resumo:
In this paper, we compare the experimental results for Tamil online handwritten character recognition using HMM and Statistical Dynamic Time Warping (SDTW) as classifiers. HMM was used for a 156-class problem. Different feature sets and values for the HMM states & mixtures were tried and the best combination was found to be 16 states & 14 mixtures, giving an accuracy of 85%. The features used in this combination were retained and a SDTW model with 20 states and single Gaussian was used as classifier. Also, the symbol set was increased to include numerals, punctuation marks and special symbols like $, & and #, taking the number of classes to 188. It was found that, with a small addition to the feature set, this simple SDTW classifier performed on par with the more complicated HMM model, giving an accuracy of 84%. Mixture density estimation computations was reduced by 11 times. The recognition is writer independent, as the dataset used is quite large, with a variety of handwriting styles.
Resumo:
In this paper, we consider the problem of time series classification. Using piecewise linear interpolation various novel kernels are obtained which can be used with Support vector machines for designing classifiers capable of deciding the class of a given time series. The approach is general and is applicable in many scenarios. We apply the method to the task of Online Tamil handwritten character recognition with promising results.
Intelligent Approach for Fault Diagnosis in Power Transmission Systems Using Support Vector Machines
Resumo:
This paper presents an approach for identifying the faulted line section and fault location on transmission systems using support vector machines (SVMs) for diagnosis/post-fault analysis purpose. Power system disturbances are often caused by faults on transmission lines. When fault occurs on a transmission system, the protective relay detects the fault and initiates the tripping operation, which isolates the affected part from the rest of the power system. Based on the fault section identified, rapid and corrective restoration procedures can thus be taken to minimize the power interruption and limit the impact of outage on the system. The approach is particularly important for post-fault diagnosis of any mal-operation of relays following a disturbance in the neighboring line connected to the same substation. This may help in improving the fault monitoring/diagnosis process, thus assuring secure operation of the power systems. In this paper we compare SVMs with radial basis function neural networks (RBFNN) in data sets corresponding to different faults on a transmission system. Classification and regression accuracy is reported for both strategies. Studies on a practical 24-Bus equivalent EHV transmission system of the Indian Southern region is presented for indicating the improved generalization with the large margin classifiers in enhancing the efficacy of the chosen model.
Resumo:
This paper discusses an approach for river mapping and flood evaluation based on multi-temporal time series analysis of satellite images utilizing pixel spectral information for image classification and region-based segmentation for extracting water-covered regions. Analysis of MODIS satellite images is applied in three stages: before flood, during flood and after flood. Water regions are extracted from the MODIS images using image classification (based on spectral information) and image segmentation (based on spatial information). Multi-temporal MODIS images from ``normal'' (non-flood) and flood time-periods are processed in two steps. In the first step, image classifiers such as Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) separate the image pixels into water and non-water groups based on their spectral features. The classified image is then segmented using spatial features of the water pixels to remove the misclassified water. From the results obtained, we evaluate the performance of the method and conclude that the use of image classification (SVM and ANN) and region-based image segmentation is an accurate and reliable approach for the extraction of water-covered regions. (c) 2012 COSPAR. Published by Elsevier Ltd. All rights reserved.
Resumo:
In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T-2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.
Resumo:
The effectiveness of the last-level shared cache is crucial to the performance of a multi-core system. In this paper, we observe and make use of the DelinquentPC - Next-Use characteristic to improve shared cache performance. We propose a new PC-centric cache organization, NUcache, for the shared last level cache of multi-cores. NUcache logically partitions the associative ways of a cache set into MainWays and DeliWays. While all lines have access to the MainWays, only lines brought in by a subset of delinquent PCs, selected by a PC selection mechanism, are allowed to enter the DeliWays. The PC selection mechanism is an intelligent cost-benefit analysis based algorithm that utilizes Next-Use information to select the set of PCs that can maximize the hits experienced in DeliWays. Performance evaluation reveals that NUcache improves the performance over a baseline design by 9.6%, 30% and 33% respectively for dual, quad and eight core workloads comprised of SPEC benchmarks. We also show that NUcache is more effective than other well-known cache-partitioning algorithms.