945 resultados para Document ranking
Resumo:
Extraction of text areas from the document images with complex content and layout is one of the challenging tasks. Few texture based techniques have already been proposed for extraction of such text blocks. Most of such techniques are greedy for computation time and hence are far from being realizable for real time implementation. In this work, we propose a modification to two of the existing texture based techniques to reduce the computation. This is accomplished with Harris corner detectors. The efficiency of these two textures based algorithms, one based on Gabor filters and other on log-polar wavelet signature, are compared. A combination of Gabor feature based texture classification performed on a smaller set of Harris corner detected points is observed to deliver the accuracy and efficiency.
Resumo:
Instability of laminated curved composite beams made of repeated sublaminate construction is studied using finite element method. In repeated sublaminate construction, a full laminate is obtained by repeating a basic sublaminate which has a smaller number of plies. This paper deals with the determination of optimum lay-up for buckling by ranking of such composite curved beams (which may be solid or sandwich). For this purpose, use is made of a two-noded, 16 degress of freedom curved composite beam finite element. The displacements u, v, w of the element reference axis are expressed in terms of one-dimensional first-order Hermite interpolation polynomials, and line member assumptions are invoked in formulation of the elastic stiffness matrix and geometric stiffness matrix. The nonlinear expressions for the strains, occurring in beams subjected to axial, flexural and torsional loads, are incorporated in a general instability analysis. The computer program developed has been used, after extensive checking for correctness, to obtain optimum orientation scheme of the plies in the sublaminate so as to achieve maximum buckling load for typical curved solid/sandwich composite beams.
Resumo:
In this paper we describe a method for the optimum design of fiber rein forced composite laminates for strength by ranking. The software developed based on this method is capable of designing laminates for strength; which are subjected to inplane and/or bending loads and optionally hygrothermal loads. Symmetric laminates only are considered which are assumed to be made of repeated sublaminate construction. Various layup schemes are evaluated based on the laminated plate theory and quadratic failure cri terion for the given mechanical and hygrothermal loads. The optimum layup sequence in the sublaminate and the number of such sublaminates required are obtained. Further, a ply-drop round-off scheme is adopted to arrive at an optimum laminate thickness. As an example, a family of 0/90/45/ -45 bi-directional lamination schemes are examined for dif ferent types of loads and the gains in optimising the ply orientations in a sublaminate are demonstrated.
Resumo:
A successful protein-protein docking study culminates in identification of decoys at top ranks with near-native quaternary structures. However, this task remains enigmatic because no generalized scoring functions exist that effectively infer decoys according to the similarity to near-native quaternary structures. Difficulties arise because of the highly irregular nature of the protein surface and the significant variation of the nonbonding and solvation energies based on the chemical composition of the protein-protein interface. In this work, we describe a novel method combining an interface-size filter, a regression model for geometric compatibility (based on two correlated surface and packing parameters), and normalized interaction energy (calculated from correlated nonbonded and solvation energies), to effectively rank decoys from a set of 10,000 decoys. Tests on 30 unbound binary protein-protein complexes show that in 16 cases we can identify at least one decoy in top three ranks having <= 10 angstrom backbone root mean square deviation from true binding geometry. Comparisons with other state-of-art methods confirm the improved ranking power of our method without the use of any experiment-guided restraints, evolutionary information, statistical propensities, or modified interaction energy equations. Tests on 118 less-difficult bound binary protein-protein complexes with <= 35% sequence redundancy at the interface showed that in 77% cases, at least 1 in 10,000 decoys were identified with <= 5 angstrom backbone root mean square deviation from true geometry at first rank. The work will promote the use of new concepts where correlations among parameters provide more robust scoring models. It will facilitate studies involving molecular interactions, including modeling of large macromolecular assemblies and protein structure prediction. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32: 787-796, 2011.
Resumo:
Buckling of discretely stiffened composite cylindrical panels made of repeated sublaminate construction is studied using a finite element method. In repeated sublaminate construction, a full laminate is obtained by repeating a basic sublaminate, which has a smaller number of plies. This paper deals with the determination of the optimum lay-up for buckling by ranking of such stiffened (longitudinal and hoop) composite cylindrical panels. For this purpose we use the particularized form of a four-noded, 48 degrees of freedom doubly curved quadrilateral thin shell finite element together with a fully compatible two-noded, 16 degrees of freedom composite stiffener element. The computer program developed has been used, after extensive checking for correctness, to obtain an optimum orientation scheme of the plies in the sublaminate so as to achieve maximum buckling load for a specified thickness of typical stiffened composite cylindrical panels.
Resumo:
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.
Resumo:
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
Resumo:
In this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary topics which best describe a given document and only extract sentences from those paragraphs within the document which are highly correlated given the summary topics. This ensures that our summaries always highlight the crux of the document without paying any attention to the grammar and the structure of the documents. Finally, we evaluate our summaries on the DUC 2002 Single document summarization data corpus using ROUGE measures. Our summaries had higher ROUGE values and better semantic similarity with the documents than the DUC summaries.
Resumo:
Classification of a large document collection involves dealing with a huge feature space where each distinct word is a feature. In such an environment, classification is a costly task both in terms of running time and computing resources. Further it will not guarantee optimal results because it is likely to overfit by considering every feature for classification. In such a context, feature selection is inevitable. This work analyses the feature selection methods, explores the relations among them and attempts to find a minimal subset of features which are discriminative for document classification.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.