Biblioteca Digital

427 resultados para indexing

Élaboration d'un corpus étalon pour l'évaluation d'extracteurs de termes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ce travail porte sur la construction d’un corpus étalon pour l’évaluation automatisée des extracteurs de termes. Ces programmes informatiques, conçus pour extraire automatiquement les termes contenus dans un corpus, sont utilisés dans différentes applications, telles que la terminographie, la traduction, la recherche d’information, l’indexation, etc. Ainsi, leur évaluation doit être faite en fonction d’une application précise. Une façon d’évaluer les extracteurs consiste à annoter toutes les occurrences des termes dans un corpus, ce qui nécessite un protocole de repérage et de découpage des unités terminologiques. À notre connaissance, il n’existe pas de corpus annoté bien documenté pour l’évaluation des extracteurs. Ce travail vise à construire un tel corpus et à décrire les problèmes qui doivent être abordés pour y parvenir. Le corpus étalon que nous proposons est un corpus entièrement annoté, construit en fonction d’une application précise, à savoir la compilation d’un dictionnaire spécialisé de la mécanique automobile. Ce corpus rend compte de la variété des réalisations des termes en contexte. Les termes sont sélectionnés en fonction de critères précis liés à l’application, ainsi qu’à certaines propriétés formelles, linguistiques et conceptuelles des termes et des variantes terminologiques. Pour évaluer un extracteur au moyen de ce corpus, il suffit d’extraire toutes les unités terminologiques du corpus et de comparer, au moyen de métriques, cette liste à la sortie de l’extracteur. On peut aussi créer une liste de référence sur mesure en extrayant des sous-ensembles de termes en fonction de différents critères. Ce travail permet une évaluation automatique des extracteurs qui tient compte du rôle de l’application. Cette évaluation étant reproductible, elle peut servir non seulement à mesurer la qualité d’un extracteur, mais à comparer différents extracteurs et à améliorer les techniques d’extraction.

A Document Browsing Tool Based on Book Indexes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This research project is a contribution to the global field of information retrieval, specifically, to develop tools to enable information access in digital documents. We recognize the need to provide the user with flexible access to the contents of large, potentially complex digital documents, with means other than a search function or a handful of metadata elements. The goal is to produce a text browsing tool offering a maximum of information based on a fairly superficial linguistic analysis. We are concerned with a type of extensive single-document indexing, and not indexing by a set of keywords (see Klement, 2002, for a clear distinction between the two). The desired browsing tool would not only give at a glance the main topics discussed in the document, but would also present relationships between these topics. It would also give direct access to the text (via hypertext links to specific passages). The present paper, after reviewing previous research on this and similar topics, discusses the methodology and the main characteristics of a prototype we have devised. Experimental results are presented, as well as an analysis of remaining hurdles and potential applications.

Analyse documentaire en milieu universitaire : deux approches générales comparées

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Ce mémoire porte sur l’analyse documentaire en milieu universitaire. Deux approches générales sont d’abord étudiées : l’approche centrée sur le document (premier chapitre), prédominante dans la tradition bibliothéconomique, et l’approche centrée sur l’usager (deuxième chapitre), influencée par le développement d’outils le plus souvent associés au Web 2.0. L’opposition entre ces deux démarches reflète une dichotomie qui se trouve au cœur de la notion de sujet, c’est-à-dire les dimensions objective et subjective du sujet. Ce mémoire prend par conséquent la forme d’une dissertation dont l’avantage principal est de considérer à la fois d’importants acquis qui appartiennent à la tradition bibliothéconomique, à la fois des développements plus récents ayant un impact important sur l’évolution de l’analyse documentaire en milieu universitaire. Notre hypothèse est que ces deux tendances générales doivent être mises en relief afin d’approfondir la problématique de l’appariement, laquelle définit la difficulté d’accorder le vocabulaire qu’utilise l’usager dans ses recherches documentaires avec celui issu de l’analyse documentaire (métadonnées sujet). Dans le troisième chapitre, nous examinons certaines particularités liées à l’utilisation de la documentation en milieu universitaire dans le but de repérer certaines possibilités et certaines exigences de l’analyse documentaire dans un tel milieu. À partir d’éléments basés sur l’analyse des domaines d’études et sur la démarche analytico-synthétique, il s’agit d’accentuer l’interaction potentielle entre usagers et analystes documentaires sur le plan du vocabulaire utilisé de part et d’autre.

Relating Dependent Terms in Information Retrieval

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Les moteurs de recherche font partie de notre vie quotidienne. Actuellement, plus d’un tiers de la population mondiale utilise l’Internet. Les moteurs de recherche leur permettent de trouver rapidement les informations ou les produits qu'ils veulent. La recherche d'information (IR) est le fondement de moteurs de recherche modernes. Les approches traditionnelles de recherche d'information supposent que les termes d'indexation sont indépendants. Pourtant, les termes qui apparaissent dans le même contexte sont souvent dépendants. L’absence de la prise en compte de ces dépendances est une des causes de l’introduction de bruit dans le résultat (résultat non pertinents). Certaines études ont proposé d’intégrer certains types de dépendance, tels que la proximité, la cooccurrence, la contiguïté et de la dépendance grammaticale. Dans la plupart des cas, les modèles de dépendance sont construits séparément et ensuite combinés avec le modèle traditionnel de mots avec une importance constante. Par conséquent, ils ne peuvent pas capturer correctement la dépendance variable et la force de dépendance. Par exemple, la dépendance entre les mots adjacents "Black Friday" est plus importante que celle entre les mots "road constructions". Dans cette thèse, nous étudions différentes approches pour capturer les relations des termes et de leurs forces de dépendance. Nous avons proposé des méthodes suivantes: ─ Nous réexaminons l'approche de combinaison en utilisant différentes unités d'indexation pour la RI monolingue en chinois et la RI translinguistique entre anglais et chinois. En plus d’utiliser des mots, nous étudions la possibilité d'utiliser bi-gramme et uni-gramme comme unité de traduction pour le chinois. Plusieurs modèles de traduction sont construits pour traduire des mots anglais en uni-grammes, bi-grammes et mots chinois avec un corpus parallèle. Une requête en anglais est ensuite traduite de plusieurs façons, et un score classement est produit avec chaque traduction. Le score final de classement combine tous ces types de traduction. Nous considérons la dépendance entre les termes en utilisant la théorie d’évidence de Dempster-Shafer. Une occurrence d'un fragment de texte (de plusieurs mots) dans un document est considérée comme représentant l'ensemble de tous les termes constituants. La probabilité est assignée à un tel ensemble de termes plutôt qu’a chaque terme individuel. Au moment d’évaluation de requête, cette probabilité est redistribuée aux termes de la requête si ces derniers sont différents. Cette approche nous permet d'intégrer les relations de dépendance entre les termes. Nous proposons un modèle discriminant pour intégrer les différentes types de dépendance selon leur force et leur utilité pour la RI. Notamment, nous considérons la dépendance de contiguïté et de cooccurrence à de différentes distances, c’est-à-dire les bi-grammes et les paires de termes dans une fenêtre de 2, 4, 8 et 16 mots. Le poids d’un bi-gramme ou d’une paire de termes dépendants est déterminé selon un ensemble des caractères, en utilisant la régression SVM. Toutes les méthodes proposées sont évaluées sur plusieurs collections en anglais et/ou chinois, et les résultats expérimentaux montrent que ces méthodes produisent des améliorations substantielles sur l'état de l'art.

Development of Shape Descriptors Based on Legendre Polynomials and Content-Based Retrieval of Scoliosis Images

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The wealth of information available freely on the web and medical image databases poses a major problem for the end users: how to find the information needed? Content –Based Image Retrieval is the obvious solution.A standard called MPEG-7 was evolved to address the interoperability issues of content-based search.The work presented in this thesis mainly concentrates on developing new shape descriptors and a framework for content – based retrieval of scoliosis images.New region-based and contour based shape descriptor is developed based on orthogonal Legendre polymomials.A novel system for indexing and retrieval of digital spine radiographs with scoliosis is presented.

Potential public health significance of faecal contamination and multidrug-resistant Escherichia coli and Salmonella serotypes in a lake in India

Relevância:

10.00% 10.00%

Publicador:

Resumo:

To assess the prevalence of faecal coliform bacteria and multiple drug resistance among Escherichia coli and Salmonella serotypes from Vembanadu Lake. Study design: Systematic microbiological testing. Methods: Monthly collection of water samples were made from ten stations on the southern and northern parts of a salt water regulator constructed in Vembanadu Lake in order to prevent incursion of seawater during certain periods of the year. Density of faecal colifrom bacteria was estimated. E. coli and Salmonella were isolated and their different serotypes were identified. Antibiotic resistance analysis of E. coli and Salmonella serotypes was done and the MAR index of individual isolates was calculated. Results: Density of faecal coliform bacteria ranged from mean MPN value 2900 -7100/100ml. Results showed multiple drug resistance pattern among the bacterial isolates. E. coli showed more than 50% resistance to amickacin, oxytetracycline, streptomycin, tetracycline and kanamycin while Salmonella showed high resistance to oxytetracycline, streptomycin, tetracycline and ampicillin. The MAR indexing of the isolates showed that they have originated from high risk source such as humans, poultry and dairy cows. Conclusions: The high density of faecal coliform bacteria and prevalence of multi drug resistant E. coli and Salmonella serotypes in the lake may pose severe public health risk through related water borne and food borne outbreaks

Risk Assessment of Rooftop- Collected Rainwater for Individual Household and Community Use in Central Kerala, India

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Water quality of rooftop-collected rainwater is an issue of increased interest particularly in developing countries where the collected water is used as a source of drinking water. Bacteriological and chemical parameters of 25 samples of rooftop-harvested rainwater stored in ferrocement tanks were analyzed in the study described in this article. Except for the pH and lower dissolved oxygen levels, all other physicochemical parameters were within World Health Organization guidelines. Bacteriological results revealed that the rooftop-harvested rainwater stored in tanks does not often meet the bacteriological quality standards prescribed for drinking water. Fifty percent of samples of harvested rainwater for rural and urban community use and 20% of the samples for individual household use showed the presence of E. coli. Fecal coliform/fecal streptococci ratios revealed nonhuman animal sources of fecal pollution. Risk assessment of bacterial isolates from the harvested rainwater showed high resistance to ampicillin, erythromycin, penicillin, and vancomycin. Multiple antibiotic resistance (MAR) indexing of the isolates and elucidation of the resistance patterns revealed that 73% of the isolates exhibited MAR

Multiple antibiotic resistance profiles of various Escherichia coli serotypes isolated from Cochin Estuary

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A total of eighty-one Escherichia coli isolates belonging to forty-three different serotypes including several pathogenic strains such as enterotoxigenic E. coli (ETEC), enterohaemorrhagic E. coli (EHEC), enteropathogenic E. coli (EPEC) and uropathogenic E. coli (UPEC) isolated from Cochin estuary between November 2001 and October 2002 were tested against twelve antibiotics to determine the prevalence of multiple antibiotic resistance (MAR) and antimicrobial resistance profiles as a measure of high risk source of contamination. The results revealed that more than 95% of the isolates were multiple antibiotic resistant (resistant to more than three antibiotics). The MAR indexing of the isolates showed that all these strains originated from high risk source of contamination. The incidence of multiple antibiotic resistant E. coli especially the pathogenic strains in natural waters will pose a serious threat to human population

Prevalence of multi drug resistant Escherichia coli serotypes in a tropical estuary,India

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A toatal of 81 Escherichia coliisolates belonging to 43 different serotypes including several pathogenic strains such as enterotoxigenic E.coli isolated from a tropical estuary were tested against 12 antibiotics to determine the prevelance of multiple antibiotic resistance, antimicrobial resistance profiles and also to find out high risk source of contamination by MAR indexing.

Prevalence and Antibiotic sensitivity of Escherichia coli in Extensive Brackish water Aquaculture Ponds

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Prevalence and antibiotic resistance of Escherichia coli in the water and sediment samples of brackish water aquaculture ponds adjacent to Cochin backwaters was analysed. More than 50% of the water samples and more than 80% of sediment samples from all the sampling stations were tested positive for £. coli. Risk assessment of the E. coli strains was carried out using multiple antibiotic resistance (MAR) indexing. Majority of the strains were found to be multiple antibiotic resistant suggesting their origin from high risk sources of contamination such as human where antibiotics are frequently used. While none of the £. coli strains were resistant against amikacin, chloramphenicol, streptomycin and trimethoprim, considerable levels of resistance was encountered against ampicillin, erythromycin, penicillin G and vancomycin. High prevalence of £. coli in the water and sediment samples of this extensive brackish water ponds indicates high degree of faecal pollution of this environment. The high risk nature of the strains warrants efficient post harvest and processing measures to avoid health risk to consumers

Antibiotic resistance of Aeromonas hydrophila isolated from marketed fish and prawn of South India

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A total of 319 strains of Aeromonas hydrophila were isolated from 536 fish and 278 prawns for a 2-year period. All the strains were tested for resistance to 15 antibiotics and 100% of the strains was resistant to methicillin and rifampicin followed by bacitracin and novobiocin (99%). Only 3% of the strains exhibited resistance against chloramphenicol. The multiple antibiotic resistance (MAR) indexing of A. hydrophila strains showed that all of them originated from high-risk sources

Content Based Video Retrieval using SURF Descriptor

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a Robust Content Based Video Retrieval (CBVR) system. This system retrieves similar videos based on a local feature descriptor called SURF (Speeded Up Robust Feature). The higher dimensionality of SURF like feature descriptors causes huge storage consumption during indexing of video information. To achieve a dimensionality reduction on the SURF feature descriptor, this system employs a stochastic dimensionality reduction method and thus provides a model data for the videos. On retrieval, the model data of the test clip is classified to its similar videos using a minimum distance classifier. The performance of this system is evaluated using two different minimum distance classifiers during the retrieval stage. The experimental analyses performed on the system shows that the system has a retrieval performance of 78%. This system also analyses the performance efficiency of the low dimensional SURF descriptor.

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper discusses the use of online information resources for organising knowledge in library and information centres in Cochin University of Science and Technology (CUSAT). The paper discusses the status and extent of automation in CUSAT library. The use of different online resources and the purposes for which these resources are being used, is explained in detail. Structured interview method was applied for collecting data. It was observed that 67 per cent users consult online resources for assisting knowledge organisation. Library of Congress catalogue is the widely used (100 per cent) online resource followed by OPAC of CUSAT and catalogue of British Library. The main purposes for using these resources are class number building and subject indexing

Hierarchical Object Recognition Using Libraries of Parameterized Model Sub-Parts

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis describes the development of a model-based vision system that exploits hierarchies of both object structure and object scale. The focus of the research is to use these hierarchies to achieve robust recognition based on effective organization and indexing schemes for model libraries. The goal of the system is to recognize parameterized instances of non-rigid model objects contained in a large knowledge base despite the presence of noise and occlusion. Robustness is achieved by developing a system that can recognize viewed objects that are scaled or mirror-image instances of the known models or that contain components sub-parts with different relative scaling, rotation, or translation than in models. The approach taken in this thesis is to develop an object shape representation that incorporates a component sub-part hierarchy- to allow for efficient and correct indexing into an automatically generated model library as well as for relative parameterization among sub-parts, and a scale hierarchy- to allow for a general to specific recognition procedure. After analysis of the issues and inherent tradeoffs in the recognition process, a system is implemented using a representation based on significant contour curvature changes and a recognition engine based on geometric constraints of feature properties. Examples of the system's performance are given, followed by an analysis of the results. In conclusion, the system's benefits and limitations are presented.

The Use of Grouping in Visual Object Recognition.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The report describes a recognition system called GROPER, which performs grouping by using distance and relative orientation constraints that estimate the likelihood of different edges in an image coming from the same object. The thesis presents both a theoretical analysis of the grouping problem and a practical implementation of a grouping system. GROPER also uses an indexing module to allow it to make use of knowledge of different objects, any of which might appear in an image. We test GROPER by comparing it to a similar recognition system that does not use grouping.

«
1
2
...
9
10
11
12
13
14
15
...
28
29
»