7 resultados para Short-text clustering

em Indian Institute of Science - Bangalore - Índia


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Computer Assisted Assessment (CAA) has been existing for several years now. While some forms of CAA do not require sophisticated text understanding (e.g., multiple choice questions), there are also student answers that consist of free text and require analysis of text in the answer. Research towards the latter till date has concentrated on two main sub-tasks: (i) grading of essays, which is done mainly by checking the style, correctness of grammar, and coherence of the essay and (ii) assessment of short free-text answers. In this paper, we present a structured view of relevant research in automated assessment techniques for short free-text answers. We review papers spanning the last 15 years of research with emphasis on recent papers. Our main objectives are two folds. First we present the survey in a structured way by segregating information on dataset, problem formulation, techniques, and evaluation measures. Second we present a discussion on some of the potential future directions in this domain which we hope would be helpful for researchers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A computationally efficient agglomerative clustering algorithm based on multilevel theory is presented. Here, the data set is divided randomly into a number of partitions. The samples of each such partition are clustered separately using hierarchical agglomerative clustering algorithm to form sub-clusters. These are merged at higher levels to get the final classification. This algorithm leads to the same classification as that of hierarchical agglomerative clustering algorithm when the clusters are well separated. The advantages of this algorithm are short run time and small storage requirement. It is observed that the savings, in storage space and computation time, increase nonlinearly with the sample size.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the design of a full fledged OCR system for printed Kannada text. The machine recognition of Kannada characters is difficult due to similarity in the shapes of different characters, script complexity and non-uniqueness in the representation of diacritics. The document image is subject to line segmentation, word segmentation and zone detection. From the zonal information, base characters, vowel modifiers and consonant conjucts are separated. Knowledge based approach is employed for recognizing the base characters. Various features are employed for recognising the characters. These include the coefficients of the Discrete Cosine Transform, Discrete Wavelet Transform and Karhunen-Louve Transform. These features are fed to different classifiers. Structural features are used in the subsequent levels to discriminate confused characters. Use of structural features, increases recognition rate from 93% to 98%. Apart from the classical pattern classification technique of nearest neighbour, Artificial Neural Network (ANN) based classifiers like Back Propogation and Radial Basis Function (RBF) Networks have also been studied. The ANN classifiers are trained in supervised mode using the transform features. Highest recognition rate of 99% is obtained with RBF using second level approximation coefficients of Haar wavelets as the features on presegmented base characters.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Emerging high-dimensional data mining applications needs to find interesting clusters embeded in arbitrarily aligned subspaces of lower dimensionality. It is difficult to cluster high-dimensional data objects, when they are sparse and skewed. Updations are quite common in dynamic databases and they are usually processed in batch mode. In very large dynamic databases, it is necessary to perform incremental cluster analysis only to the updations. We present a incremental clustering algorithm for subspace clustering in very high dimensions, which handles both insertion and deletions of datapoints to the backend databases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper illustrates the application of a new technique, based on Support Vector Clustering (SVC) for the direct identification of coherent synchronous generators in a large interconnected Multi-Machine Power Systems. The clustering is based on coherency measures, obtained from the time domain responses of the generators following system disturbances. The proposed clustering algorithm could be integrated into a wide-area measurement system that enables fast identification of coherent clusters of generators for the construction of dynamic equivalent models. An application of the proposed method is demonstrated on a practical 15 generators 72-bus system, an equivalent of Indian Southern grid in an attempt to show the effectiveness of this clustering approach. The effects of short circuit fault locations on coherency are also investigated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.