997 resultados para document technologique


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary topics which best describe a given document and only extract sentences from those paragraphs within the document which are highly correlated given the summary topics. This ensures that our summaries always highlight the crux of the document without paying any attention to the grammar and the structure of the documents. Finally, we evaluate our summaries on the DUC 2002 Single document summarization data corpus using ROUGE measures. Our summaries had higher ROUGE values and better semantic similarity with the documents than the DUC summaries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classification of a large document collection involves dealing with a huge feature space where each distinct word is a feature. In such an environment, classification is a costly task both in terms of running time and computing resources. Further it will not guarantee optimal results because it is likely to overfit by considering every feature for classification. In such a context, feature selection is inevitable. This work analyses the feature selection methods, explores the relations among them and attempts to find a minimal subset of features which are discriminative for document classification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed method chooses the best binarized document that is close to the ground-truth of the document.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

(287 page document)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The principal purpose of this document is to assist programme teams throughout the development process when they are considering the development or review of a route through the award where it will be delivered wholly, or primarily, via online distance learning. Please note that this document is current as of Sept 2015 but it is considered to be an evolving document and is updated/tweaked from time to time.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Commissioned paper from Cameron Neylon (Curtin University) on citation practices for research data. Includes information on current (2016) global activity in the field, parallels with traditional citation, and recommendations.