Biblioteca Digital

15 resultados para document and text processing

em Bulgarian Digital Mathematics Library at IMI-BAS

Manuscript Digitization and Electronic Processing of Manuscripts in the Czech National Library

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper informs about the history of manuscript digitization in the National Library of the Czech Republic as well as about other issues concerning processing of manuscripts. The main consequence of the massive digitization and record and/or full text processing is a paradigm shift leading to the digital history.

The Involvement of Institute for Information Technologies in Text Processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The activities of the Institute of Information Technologies in the area of automatic text processing are outlined. Major problems related to different steps of processing are pointed out together with the shortcomings of the existing solutions.

Hierarchical Three-level Ontology for Text Processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The principal feature of ontology, which is developed for a text processing, is wider knowledge representation of an external world due to introduction of three-level hierarchy. It allows to improve semantic interpretation of natural language texts.

A Workbench for Document Processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the MEMORIAL project time an international consortium has developed a software solution called DDW (Digital Document Workbench). It provides a set of tools to support the process of digitisation of documents from the scanning up to the retrievable presentation of the content. The attention is focused to machine typed archival documents. One of the important features is the evaluation of quality in each step of the process. The workbench consists of automatic parts as well as of parts which request human activity. The measurable improvement of 20% shows the approach is successful.

A Statistical Approach for Multilingual Document Clustering and Topic Extraction from Clusters

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62H30

The Latest Prague Contributions to Written Cultural Heritage Processing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

* The following text has been originally published in the Proceedings of the Language Recourses and Evaluation Conference held in Lisbon, Portugal, 2004, under the title of "Towards Intelligent Written Cultural Heritage Processing - Lexical processing". I present here a revised contribution of the aforementioned paper and I add here the latest efforts done in the Center for Computational Linguistic in Prague in the field under discussion.

Computer-aided System of Semantic Text Analysis of a Technical Specification

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated the model of the analysis of the text of the technical project is submitted, the attribute grammar of a technical specification, intended for formalization of limited Russian is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical project as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consists of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Computer Support of Semantic Text Analysis of a Technical Specification on Designing Software

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated a technique of the text analysis of a technical specification is submitted, the expanded fuzzy attribute grammar of a technical specification, intended for formalization of limited Russian language is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical specification as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consist of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Demo: Using RapidMiner for Text Mining

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this demo the basic text mining technologies by using RapidMining have been reviewed. RapidMining basic characteristics and operators of text mining have been described. Text mining example by using Navie Bayes algorithm and process modeling have been revealed.

Development of Database for Distributed Information Measurement and Control System

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this work is the development of database of the distributed information measurement and control system that implements methods of optical spectroscopy for plasma physics research and atomic collisions and provides remote access to information and hardware resources within the Intranet/Internet networks. The database is based on database management system Oracle9i. Client software was realized in Java language. The software was developed using Model View Controller architecture, which separates application data from graphical presentation components and input processing logic. The following graphical presentations were implemented: measurement of radiation spectra of beam and plasma objects, excitation function for non-elastic collisions of heavy particles and analysis of data acquired in preceding experiments. The graphical clients have the following functionality of the interaction with the database: browsing information on experiments of a certain type, searching for data with various criteria, and inserting the information about preceding experiments.

Greedy Approximation with Regard to Bases and General Minimal Systems

Relevância:

100.00% 100.00%

Publicador:

Resumo:

*This research was supported by the National Science Foundation Grant DMS 0200187 and by ONR Grant N00014-96-1-1003

Automatic Generation of Titles for a Corpus of Questions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the followed methodology to automatically generate titles for a corpus of questions that belong to sociological opinion polls. Titles for questions have a twofold function: (1) they are the input of user searches and (2) they inform about the whole contents of the question and possible answer options. Thus, generation of titles can be considered as a case of automatic summarization. However, the fact that summarization had to be performed over very short texts together with the aforementioned quality conditions imposed on new generated titles led the authors to follow knowledge-rich and domain-dependent strategies for summarization, disregarding the more frequent extractive techniques for summarization.

Topic Segmentation: How Much Can We Do by Counting Words and Sequences of Words

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present an innovative topic segmentation system based on a new informative similarity measure that takes into account word co-occurrence in order to avoid the accessibility to existing linguistic resources such as electronic dictionaries or lexico-semantic databases such as thesauri or ontology. Topic segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. Topic segmentation has extensively been used in information retrieval and text summarization. In particular, our architecture proposes a language-independent topic segmentation system that solves three main problems evidenced by previous research: systems based uniquely on lexical repetition that show reliability problems, systems based on lexical cohesion using existing linguistic resources that are usually available only for dominating languages and as a consequence do not apply to less favored languages and finally systems that need previously existing harvesting training data. For that purpose, we only use statistics on words and sequences of words based on a set of texts. This solution provides a flexible solution that may narrow the gap between dominating languages and less favored languages thus allowing equivalent access to information.

Analysis and Data Mining of Lead-Zinc Ore Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.

Classification of Paintings by Artist, Movement, and Indoor Setting Using MPEG-7 Descriptor Features

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ACM Computing Classification System (1998): I.4.9, I.4.10.