6 resultados para Vector Space IR, Search Engines, Document Clustering, Document
em Cochin University of Science
Resumo:
This paper describes about an English-Malayalam Cross-Lingual Information Retrieval system. The system retrieves Malayalam documents in response to query given in English or Malayalam. Thus monolingual information retrieval is also supported in this system. Malayalam is one of the most prominent regional languages of Indian subcontinent. It is spoken by more than 37 million people and is the native language of Kerala state in India. Since we neither had any full-fledged online bilingual dictionary nor any parallel corpora to build the statistical lexicon, we used a bilingual dictionary developed in house for translation. Other language specific resources like Malayalam stemmer, Malayalam morphological root analyzer etc developed in house were used in this work
Resumo:
Sharing of information with those in need of it has always been an idealistic goal of networked environments. With the proliferation of computer networks, information is so widely distributed among systems, that it is imperative to have well-organized schemes for retrieval and also discovery. This thesis attempts to investigate the problems associated with such schemes and suggests a software architecture, which is aimed towards achieving a meaningful discovery. Usage of information elements as a modelling base for efficient information discovery in distributed systems is demonstrated with the aid of a novel conceptual entity called infotron.The investigations are focused on distributed systems and their associated problems. The study was directed towards identifying suitable software architecture and incorporating the same in an environment where information growth is phenomenal and a proper mechanism for carrying out information discovery becomes feasible. An empirical study undertaken with the aid of an election database of constituencies distributed geographically, provided the insights required. This is manifested in the Election Counting and Reporting Software (ECRS) System. ECRS system is a software system, which is essentially distributed in nature designed to prepare reports to district administrators about the election counting process and to generate other miscellaneous statutory reports.Most of the distributed systems of the nature of ECRS normally will possess a "fragile architecture" which would make them amenable to collapse, with the occurrence of minor faults. This is resolved with the help of the penta-tier architecture proposed, that contained five different technologies at different tiers of the architecture.The results of experiment conducted and its analysis show that such an architecture would help to maintain different components of the software intact in an impermeable manner from any internal or external faults. The architecture thus evolved needed a mechanism to support information processing and discovery. This necessitated the introduction of the noveI concept of infotrons. Further, when a computing machine has to perform any meaningful extraction of information, it is guided by what is termed an infotron dictionary.The other empirical study was to find out which of the two prominent markup languages namely HTML and XML, is best suited for the incorporation of infotrons. A comparative study of 200 documents in HTML and XML was undertaken. The result was in favor ofXML.The concept of infotron and that of infotron dictionary, which were developed, was applied to implement an Information Discovery System (IDS). IDS is essentially, a system, that starts with the infotron(s) supplied as clue(s), and results in brewing the information required to satisfy the need of the information discoverer by utilizing the documents available at its disposal (as information space). The various components of the system and their interaction follows the penta-tier architectural model and therefore can be considered fault-tolerant. IDS is generic in nature and therefore the characteristics and the specifications were drawn up accordingly. Many subsystems interacted with multiple infotron dictionaries that were maintained in the system.In order to demonstrate the working of the IDS and to discover the information without modification of a typical Library Information System (LIS), an Information Discovery in Library Information System (lDLIS) application was developed. IDLIS is essentially a wrapper for the LIS, which maintains all the databases of the library. The purpose was to demonstrate that the functionality of a legacy system could be enhanced with the augmentation of IDS leading to information discovery service. IDLIS demonstrates IDS in action. IDLIS proves that any legacy system could be augmented with IDS effectively to provide the additional functionality of information discovery service.Possible applications of IDS and scope for further research in the field are covered.
Resumo:
This thesis Entitled Internet Utilization and Academic Activities of Faculty Members in the Universities of kerala: an analytical study. Today, scientific research is throwing up new discoveries, inventions and vistas by the hour. We are witnessing a veritable knowledge explosion. It is important for members of university faculty members to keep abreast of it for giving up-t-date information to their students about the new development in the subject of their study. The internet is an invaluable tool for achieving it. Most of the universities have sufficient internet facility, but the accessibility to all the faculty members is not adequate. University Libraries also provides standard supplementary service in the internet area. This study indicates differential level of awareness and utilization of the internet services by the faculty members in the areas of teaching, research and publication. However the overall impression is that the awareness and utilization is inadequate. This point to the urgent need to devise programs and schemes to promote internet utilization among the faculty members. The suggestions indicate the key areas that deserve attention by policy makers and administrators. Thanks to the internet, every new development in every field of study is just a click away for faculty members, research scholars and students.
Resumo:
There is a recent trend to describe physical phenomena without the use of infinitesimals or infinites. This has been accomplished replacing differential calculus by the finite difference theory. Discrete function theory was first introduced in l94l. This theory is concerned with a study of functions defined on a discrete set of points in the complex plane. The theory was extensively developed for functions defined on a Gaussian lattice. In 1972 a very suitable lattice H: {Ci qmxO,I qnyo), X0) 0, X3) 0, O < q < l, m, n 5 Z} was found and discrete analytic function theory was developed. Very recently some work has been done in discrete monodiffric function theory for functions defined on H. The theory of pseudoanalytic functions is a generalisation of the theory of analytic functions. When the generator becomes the identity, ie., (l, i) the theory of pseudoanalytic functions reduces to the theory of analytic functions. Theugh the theory of pseudoanalytic functions plays an important role in analysis, no discrete theory is available in literature. This thesis is an attempt in that direction. A discrete pseudoanalytic theory is derived for functions defined on H.
Resumo:
The work is intended to study the following important aspects of document image processing and develop new methods. (1) Segmentation ofdocument images using adaptive interval valued neuro-fuzzy method. (2) Improving the segmentation procedure using Simulated Annealing technique. (3) Development of optimized compression algorithms using Genetic Algorithm and parallel Genetic Algorithm (4) Feature extraction of document images (5) Development of IV fuzzy rules. This work also helps for feature extraction and foreground and background identification. The proposed work incorporates Evolutionary and hybrid methods for segmentation and compression of document images. A study of different neural networks used in image processing, the study of developments in the area of fuzzy logic etc is carried out in this work
Resumo:
This work proposes a parallel genetic algorithm for compressing scanned document images. A fitness function is designed with Hausdorff distance which determines the terminating condition. The algorithm helps to locate the text lines. A greater compression ratio has achieved with lesser distortion