17 resultados para triangulation of documents
em Indian Institute of Science - Bangalore - Índia
Resumo:
For d >= 2, Walkup's class K (d) consists of the d-dimensional simplicial complexes all whose vertex-links are stacked (d - 1)-spheres. Kalai showed that for d >= 4, all connected members of K (d) are obtained from stacked d-spheres by finitely many elementary handle additions. According to a result of Walkup, the face vector of any triangulated 4-manifold X with Euler characteristic chi satisfies f(1) >= 5f(0) - 15/2 chi, with equality only for X is an element of K(4). Kuhnel observed that this implies f(0)(f(0) - 11) >= -15 chi, with equality only for 2-neighborly members of K(4). Kuhnel also asked if there is a triangulated 4-manifold with f(0) = 15, chi = -4 (attaining equality in his lower bound). In this paper, guided by Kalai's theorem, we show that indeed there is such a triangulation. It triangulates the connected sum of three copies of the twisted sphere product S-3 (sic) S-1. Because of Kuhnel's inequality, the given triangulation of this manifold is a vertex-minimal triangulation. By a recent result of Effenberger, the triangulation constructed here is tight. Apart from the neighborly 2-manifolds and the infinite family of (2d + 3)-vertex sphere products Sd-1 X S-1 (twisted for d odd), only fourteen tight triangulated manifolds were known so far. The present construction yields a new member of this sporadic family. We also present a self-contained proof of Kalai's result. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The symmetric group acts on the Cartesian product (S (2)) (d) by coordinate permutation, and the quotient space is homeomorphic to the complex projective space a'',P (d) . We used the case d=2 of this fact to construct a 10-vertex triangulation of a'',P (2) earlier. In this paper, we have constructed a 124-vertex simplicial subdivision of the 64-vertex standard cellulation of (S (2))(3), such that the -action on this cellulation naturally extends to an action on . Further, the -action on is ``good'', so that the quotient simplicial complex is a 30-vertex triangulation of a'',P (3). In other words, we have constructed a simplicial realization of the branched covering (S (2))(3)-> a'',P (3).
Resumo:
We propose a set of metrics that evaluate the uniformity, sharpness, continuity, noise, stroke width variance,pulse width ratio, transient pixels density, entropy and variance of components to quantify the quality of a document image. The measures are intended to be used in any optical character recognition (OCR) engine to a priori estimate the expected performance of the OCR. The suggested measures have been evaluated on many document images, which have different scripts. The quality of a document image is manually annotated by users to create a ground truth. The idea is to correlate the values of the measures with the user annotated data. If the measure calculated matches the annotated description,then the metric is accepted; else it is rejected. In the set of metrics proposed, some of them are accepted and the rest are rejected. We have defined metrics that are easily estimatable. The metrics proposed in this paper are based on the feedback of homely grown OCR engines for Indic (Tamil and Kannada) languages. The metrics are independent of the scripts, and depend only on the quality and age of the paper and the printing. Experiments and results for each proposed metric are discussed. Actual recognition of the printed text is not performed to evaluate the proposed metrics. Sometimes, a document image containing broken characters results in good document image as per the evaluated metrics, which is part of the unsolved challenges. The proposed measures work on gray scale document images and fail to provide reliable information on binarized document image.
Resumo:
When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.
Resumo:
Extensible Markup Language ( XML) has emerged as a medium for interoperability over the Internet. As the number of documents published in the form of XML is increasing, there is a need for selective dissemination of XML documents based on user interests. In the proposed technique, a combination of Adaptive Genetic Algorithms and multi class Support Vector Machine ( SVM) is used to learn a user model. Based on the feedback from the users, the system automatically adapts to the user's preference and interests. The user model and a similarity metric are used for selective dissemination of a continuous stream of XML documents. Experimental evaluations performed over a wide range of XML documents, indicate that the proposed approach significantly improves the performance of the selective dissemination task, with respect to accuracy and efficiency.
Resumo:
Extensible Markup Language ( XML) has emerged as a medium for interoperability over the Internet. As the number of documents published in the form of XML is increasing, there is a need for selective dissemination of XML documents based on user interests. In the proposed technique, a combination of Adaptive Genetic Algorithms and multi class Support Vector Machine ( SVM) is used to learn a user model. Based on the feedback from the users, the system automatically adapts to the user's preference and interests. The user model and a similarity metric are used for selective dissemination of a continuous stream of XML documents. Experimental evaluations performed over a wide range of XML documents, indicate that the proposed approach significantly improves the performance of the selective dissemination task, with respect to accuracy and efficiency.
Resumo:
XML has emerged as a medium for interoperability over the Internet. As the number of documents published in the form of XML is increasing there is a need for selective dissemination of XML documents based on user interests. In the proposed technique, a combination of Self Adaptive Migration Model Genetic Algorithm (SAMCA)[5] and multi class Support Vector Machine (SVM) are used to learn a user model. Based on the feedback from the users the system automatically adapts to the user's preference and interests. The user model and a similarity metric are used for selective dissemination of a continuous stream of XML documents. Experimental evaluations performed over a wide range of XML documents indicate that the proposed approach significantly improves the performance of the selective dissemination task, with respect to accuracy and efficiency.
Resumo:
Skew correction of complex document images is a difficult task. We propose an edge-based connected component approach for robust skew correction of documents with complex layout and content. The algorithm essentially consists of two steps - an 'initialization' step to determine the image orientation from the centroids of the connected components and a 'search' step to find the actual skew of the image. During initialization, we choose two different sets of points regularly spaced across the the image, one from the left to right and the other from top to bottom. The image orientation is determined from the slope between the two succesive nearest neighbors of each of the points in the chosen set. The search step finds succesive nearest neighbors that satisfy the parameters obtained in the initialization step. The final skew is determined from the slopes obtained in the 'search' step. Unlike other connected component based methods, the proposed method does not require any binarization step that generally precedes connected component analysis. The method works well for scanned documents with complex layout of any skew with a precision of 0.5 degrees.
Resumo:
It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.
Resumo:
A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed method chooses the best binarized document that is close to the ground-truth of the document.
Resumo:
All triangulated d-manifolds satisfy the inequality ((f0-d-1)(2)) >= ((d+2)(2))beta(1) for d >= 3. A triangulated d-manifold is called tight neighborly if it attains equality in this bound. For each d >= 3, a (2d + 3)-vertex tight neighborly triangulation of the Sd-1-bundle over S-1 with beta(1) = 1 was constructed by Kuhnel in 1986. In this paper, it is shown that there does not exist a tight neighborly triangulated manifold with beta(1) = 2. In other words, there is no tight neighborly triangulation of (Sd-1 x S-1)(#2) or (Sd-1 (sic) S-1)(#2) for d >= 3. A short proof of the uniqueness of K hnel's complexes for d >= 4 under the assumption beta(1) not equal 0 is also presented.
Resumo:
The broader goal of the research being described here is to automatically acquire diagnostic knowledge from documents in the domain of manual and mechanical assembly of aircraft structures. These documents are treated as a discourse used by experts to communicate with others. It therefore becomes possible to use discourse analysis to enable machine understanding of the text. The research challenge addressed in the paper is to identify documents or sections of documents that are potential sources of knowledge. In a subsequent step, domain knowledge will be extracted from these segments. The segmentation task requires partitioning the document into relevant segments and understanding the context of each segment. In discourse analysis, the division of a discourse into various segments is achieved through certain indicative clauses called cue phrases that indicate changes in the discourse context. However, in formal documents such language may not be used. Hence the use of a domain specific ontology and an assembly process model is proposed to segregate chunks of the text based on a local context. Elements of the ontology/model, and their related terms serve as indicators of current context for a segment and changes in context between segments. Local contexts are aggregated for increasingly larger segments to identify if the document (or portions of it) pertains to the topic of interest, namely, assembly. Knowledge acquired through such processes enables acquisition and reuse of knowledge during any part of the lifecycle of a product.
Resumo:
Minimal crystallizations of simply connected PL 4-manifolds are very natural objects. Many of their topological features are reflected in their combinatorial structure which, in addition, is preserved under the connected sum operation. We present a minimal crystallization of the standard PL K3 surface. In combination with known results this yields minimal crystallizations of all simply connected PL 4-manifolds of ``standard'' type, that is, all connected sums of CP2, S-2 x S-2, and the K3 surface. In particular, we obtain minimal crystallizations of a pair of homeomorphic but non-PL-homeomorphic 4-manifolds. In addition, we give an elementary proof that the minimal 8-vertex crystallization of CP2 is unique and its associated pseudotriangulation is related to the 9-vertex combinatorial triangulation of CP2 by the minimum of four edge contractions.
Resumo:
A triangulation of a closed 2-manifold is tight with respect to a field of characteristic two if and only if it is neighbourly; and it is tight with respect to a field of odd characteristic if and only if it is neighbourly and orientable. No such characterization of tightness was previously known for higher dimensional manifolds. In this paper, we prove that a triangulation of a closed 3-manifold is tight with respect to a field of odd characteristic if and only if it is neighbourly, orientable and stacked. In consequence, the Kuhnel-Lutz conjecture is valid in dimension three for fields of odd characteristic. Next let F be a field of characteristic two. It is known that, in this case, any neighbourly and stacked triangulation of a closed 3-manifold is F-tight. For closed, triangulated 3-manifolds with at most 71 vertices or with first Betti number at most 188, we show that the converse is true. But the possibility of the existence of an F-tight, non-stacked triangulation on a larger number of vertices remains open. We prove the following upper bound theorem on such triangulations. If an F-tight triangulation of a closed 3-manifold has n vertices and first Betti number beta(1), then (n - 4) (617n - 3861) <= 15444 beta(1). Equality holds here if and only if all the vertex links of the triangulation are connected sums of boundary complexes of icosahedra. (C) 2015 Elsevier Ltd. All rights reserved.
Resumo:
We present an elementary combinatorial proof of the existence and uniqueness of the 9-vertex triangulation of C P2. The original proof of existence, due to Kuhnel, as well as the original proof of uniqueness, due to Kuhnel and Lassmann, were based on extensive computer search. Recently Arnoux and Marin have used cohomology theory to present a computer-free proof. Our proof has the advantage of displaying a canonical copy of the affine plane over the three-element field inside this complex in terms of which the entire complex has a very neat and short description. This explicates the full automorphism group of the Kuhnel complex as a subgroup of the automorphism group of this affine plane. Our method also brings out the rich combinatorial structure inside this complex.