921 resultados para cross-language information retrieval
Resumo:
Artificial Intelligence is reshaping the field of fashion industry in different ways. E-commerce retailers exploit their data through AI to enhance their search engines, make outfit suggestions and forecast the success of a specific fashion product. However, it is a challenging endeavour as the data they possess is huge, complex and multi-modal. The most common way to search for fashion products online is by matching keywords with phrases in the product's description which are often cluttered, inadequate and differ across collections and sellers. A customer may also browse an online store's taxonomy, although this is time-consuming and doesn't guarantee relevant items. With the advent of Deep Learning architectures, particularly Vision-Language models, ad-hoc solutions have been proposed to model both the product image and description to solve this problems. However, the suggested solutions do not exploit effectively the semantic or syntactic information of these modalities, and the unique qualities and relations of clothing items. In this work of thesis, a novel approach is proposed to address this issues, which aims to model and process images and text descriptions as graphs in order to exploit the relations inside and between each modality and employs specific techniques to extract syntactic and semantic information. The results obtained show promising performances on different tasks when compared to the present state-of-the-art deep learning architectures.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Los hablantes bilingües tienen un acceso al léxico más lento y menos robusto que los monolingües, incluso cuando hablan en su lengua materna y dominante. Este fenómeno, comúnmente llamado “la desventaja bilingüe” también se observa en hablantes de una segunda lengua en comparación con hablantes de una primera lengua. Una causa que posiblemente contribuya a estas desventajas es el uso de control inhibitorio durante la producción del lenguaje: la inhibición de palabras coactivadas de la lengua actualmente no en uso puede prevenir intrusiones de dicha lengua, pero al mismo tiempo ralentizar la producción del lenguaje. El primer objetivo de los estudios descritos en este informe era testear esta hipótesis mediante diferentes predicciones generadas por teorías de control inhibitorio del lenguaje. Un segundo objetivo era investigar la extensión de la desventaja bilingüe dentro y fuera de la producción de palabras aisladas, así como avanzar en el conocimiento de las variables que la modulan. En lo atingente al primer objetivo, la evidencia obtenida es incompatible con un control inhibitorio global, desafiando la idea de mecanismos específicos en el hablante bilingüe utilizados para la selección léxica. Esto implica que una explicación común para el control de lenguaje y la desventaja bilingüe en el acceso al léxico es poco plausible. En cuanto al segundo objetivo, los resultados muestran que (a) la desventaja bilingüe no tiene un impacto al acceso a la memoria; (b) la desventaja bilingüe extiende a la producción del habla conectada; y (c) similitudes entre lenguas a diferentes niveles de representación así como la frecuencia de uso son factores que modulan la desventaja bilingüe.
Resumo:
Background: Recent research based on comparisons between bilinguals and monolinguals postulates that bilingualism enhances cognitive control functions, because the parallel activation of languages necessitates control of interference. In a novel approach we investigated two groups of bilinguals, distinguished by their susceptibility to cross-language interference, asking whether bilinguals with strong language control abilities ('non-switchers") have an advantage in executive functions (inhibition of irrelevant information, problem solving, planning efficiency, generative fluency and self-monitoring) compared to those bilinguals showing weaker language control abilities ('switchers"). Methods: 29 late bilinguals (21 women) were evaluated using various cognitive control neuropsychological tests [e.g., Tower of Hanoi, Ruff Figural Fluency Task, Divided Attention, Go/noGo] tapping executive functions as well as four subtests of the Wechsler Adult Intelligence Scale. The analysis involved t-tests (two independent samples). Non-switchers (n = 16) were distinguished from switchers (n = 13) by their performance observed in a bilingual picture-naming task. Results: The non-switcher group demonstrated a better performance on the Tower of Hanoi and Ruff Figural Fluency task, faster reaction time in a Go/noGo and Divided Attention task, and produced significantly fewer errors in the Tower of Hanoi, Go/noGo, and Divided Attention tasks when compared to the switchers. Non-switchers performed significantly better on two verbal subtests of the Wechsler Adult Intelligence Scale (Information and Similarity), but not on the Performance subtests (Picture Completion, Block Design). Conclusions: The present results suggest that bilinguals with stronger language control have indeed a cognitive advantage in the administered tests involving executive functions, in particular inhibition, self-monitoring, problem solving, and generative fluency, and in two of the intelligence tests. What remains unclear is the direction of the relationship between executive functions and language control abilities.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Background: Recent research based on comparisons between bilinguals and monolinguals postulates that bilingualism enhances cognitive control functions, because the parallel activation of languages necessitates control of interference. In a novel approach we investigated two groups of bilinguals, distinguished by their susceptibility to cross-language interference, asking whether bilinguals with strong language control abilities ('non-switchers") have an advantage in executive functions (inhibition of irrelevant information, problem solving, planning efficiency, generative fluency and self-monitoring) compared to those bilinguals showing weaker language control abilities ('switchers"). Methods: 29 late bilinguals (21 women) were evaluated using various cognitive control neuropsychological tests [e.g., Tower of Hanoi, Ruff Figural Fluency Task, Divided Attention, Go/noGo] tapping executive functions as well as four subtests of the Wechsler Adult Intelligence Scale. The analysis involved t-tests (two independent samples). Non-switchers (n = 16) were distinguished from switchers (n = 13) by their performance observed in a bilingual picture-naming task. Results: The non-switcher group demonstrated a better performance on the Tower of Hanoi and Ruff Figural Fluency task, faster reaction time in a Go/noGo and Divided Attention task, and produced significantly fewer errors in the Tower of Hanoi, Go/noGo, and Divided Attention tasks when compared to the switchers. Non-switchers performed significantly better on two verbal subtests of the Wechsler Adult Intelligence Scale (Information and Similarity), but not on the Performance subtests (Picture Completion, Block Design). Conclusions: The present results suggest that bilinguals with stronger language control have indeed a cognitive advantage in the administered tests involving executive functions, in particular inhibition, self-monitoring, problem solving, and generative fluency, and in two of the intelligence tests. What remains unclear is the direction of the relationship between executive functions and language control abilities.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
This work is aimed at building an adaptable frame-based system for processing Dravidian languages. There are about 17 languages in this family and they are spoken by the people of South India.Karaka relations are one of the most important features of Indian languages. They are the semabtuco-syntactic relations between verbs and other related constituents in a sentence. The karaka relations and surface case endings are analyzed for meaning extraction. This approach is comparable with the borad class of case based grammars.The efficiency of this approach is put into test in two applications. One is machine translation and the other is a natural language interface (NLI) for information retrieval from databases. The system mainly consists of a morphological analyzer, local word grouper, a parser for the source language and a sentence generator for the target language. This work make contributios like, it gives an elegant account of the relation between vibhakthi and karaka roles in Dravidian languages. This mapping is elegant and compact. The same basic thing also explains simple and complex sentence in these languages. This suggests that the solution is not just ad hoc but has a deeper underlying unity. This methodology could be extended to other free word order languages. Since the frame designed for meaning representation is general, they are adaptable to other languages coming in this group and to other applications.
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.
Resumo:
The use of indexing language in university libraries collective catalogs and the socio-cognitive context of indexing and users were evaluated. The methodology consisted of a diagnostic study elaboration of the functioning and treatment procedures of the indexing information from nine libraries of the UNESP Network, representing the Civil Engineering, Pedagogy and Dentistry areas from a data collection using the Verbal Protocol introspective technique in the Individual and Group forms. The study conducted a reflection upon the statements issued by the seventy-two participating individuals whose the results revealed unsatisfactory results about the use of the Subject Headings List of the BIBLIODATA Network, indexing language utilizing by the UNESP Libraries Network, Brazil, in the representation and in the information retrieval process in the ATHENA catalog, about the sequent aspects of the language: lack of specialized vocabulary as well as updated; lack of remissives and of specific headings, and others. We have concluded that the adequate use of indexing languages of specialized scientific areas becomes by means of evaluation as to updating, specificity and compatibility in order to meet the needs of indexing and information retrieval.
Resumo:
The goal of the project is to analyze, experiment, and develop intelligent, interactive and multilingual Text Mining technologies, as a key element of the next generation of search engines, systems with the capacity to find "the need behind the query". This new generation will provide specialized services and interfaces according to the search domain and type of information needed. Moreover, it will integrate textual search (websites) and multimedia search (images, audio, video), it will be able to find and organize information, rather than generating ranked lists of websites.
Resumo:
Interviews with Australian university students returning from study in France indicate that problems in accessing crucial information are common experiences, and frequently lead to students reproducing stereotypes of French administrative inefficiency. Our paper argues that the issue is not one of information per se but of cultural differences in the dissemination of information. It analyses the ways in which students interpret their information-gathering difficulties, and the appropriateness of the strategies they devise for overcoming them. It then examines the pedagogical implications for preparing students for study abroad, suggesting means of both equipping students with alternative ways of understanding 'information skills' and intervening in the perpetuation of stereotypes. Cet article se base sur une quarantaine d'interviews avec des étudiants australiens ayant effectué des séjours d'études en France. La difficulté d'accéder aux renseignements jugés indispensables revient souvent au cours des entretiens, source de frustrations qui amène les Australiens à reproduire un stéréotype de l'inefficacité française. Nous posons qu'il s'agit moins d'un manque d'informations que d'une différence culturelle dans la diffusion des renseignements. Notre analyse porte sur les façons dont les étudiants interprètent leurs difficultés, ainsi que sur l'utilité de leurs stratégies pour réunir les données souhaitées. Ce travail a des conséquences pédagogiques pour la préparation de tels séjours : nous suggérons des moyens de conduire les étudiants à concevoir autrement la recherche de l'information et leurs expériences, intervenant ainsi dans la transmission des stéréotypes.