965 resultados para Natural language processing (Computer science) -- TFC
Resumo:
Il lavoro di tesi presentato è nato da una collaborazione con il Politecnico di Macao, i referenti sono: Prof. Rita Tse, Prof. Marcus Im e Prof. Su-Kit Tang. L'obiettivo consiste nella creazione di un modello di traduzione automatica italiano-cinese e nell'osservarne il comportamento, al fine di determinare se sia o meno possibile l'impresa. Il trattato approfondisce l'argomento noto come Neural Language Processing (NLP), rientrando dunque nell'ambito delle traduzioni automatiche. Sono servizi che, attraverso l'ausilio dell'intelligenza artificiale sono in grado di elaborare il linguaggio naturale, per poi interpretarlo e tradurlo. NLP è una branca dell'informatica che unisce: computer science, intelligenza artificiale e studio di lingue. Dal punto di vista della ricerca, le più grandi sfide in questo ambito coinvolgono: il riconoscimento vocale (speech-recognition), comprensione del testo (natural-language understanding) e infine la generazione automatica di testo (natural-language generation). Lo stato dell'arte attuale è stato definito dall'articolo "Attention is all you need" \cite{vaswani2017attention}, presentato nel 2017 a partire da una collaborazione di ricercatori della Cornell University.\\ I modelli di traduzione automatica più noti ed utilizzati al momento sono i Neural Machine Translators (NMT), ovvero modelli che attraverso le reti neurali artificiali profonde, sono in grado effettuare traduzioni o predizioni. La qualità delle traduzioni è particolarmente buona, tanto da arrivare quasi a raggiungere la qualità di una traduzione umana. Il lavoro infatti si concentrerà largamente sullo studio e utilizzo di NMT, allo scopo di proporre un modello funzionale e che sia in grado di performare al meglio nelle traduzioni da italiano a cinese e viceversa.
Resumo:
Computer Science is a subject which has difficulty in marketing itself. Further, pinning down a standard curriculum is difficult-there are many preferences which are hard to accommodate. This paper argues the case that part of the problem is the fact that, unlike more established disciplines, the subject does not clearly distinguish the study of principles from the study of artifacts. This point was raised in Curriculum 2001 discussions, and debate needs to start in good time for the next curriculum standard. This paper provides a starting point for debate, by outlining a process by which principles and artifacts may be separated, and presents a sample curriculum to illustrate the possibilities. This sample curriculum has some positive points, though these positive points are incidental to the need to start debating the issue. Other models, with a less rigorous ordering of principles before artifacts, would still gain from making it clearer whether a specific concept was fundamental, or a property of a specific technology. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
Ao longo dos tempos foi possível constatar que uma grande parte do tempo dos professores é gasta na componente de avaliação. Por esse facto, há já algumas décadas que a correcção automática de texto livre é alvo de investigação. Sendo a correcção de exercícios efectuada pelo computador permite que o professor dedique o seu tempo em tarefas que melhorem a aprendizagem dos alunos. Para além disso, cada vez mais as novas tecnologias permitem o uso de ferramentas com bastante utilidade no ensino, pois para além de facilitarem a exposição do conhecimento também permitem uma maior retenção da informação. Logo, associar ferramentas de gestão de sala de aula à correcção automática de respostas de texto livre é um desafio bastante interessante. O objectivo desta dissertação foi a realização de um estudo relativamente à área de avaliação assistida por computador em que este trabalho se insere. Inicialmente, foram analisados alguns correctores ortográficos para seleccionar aquele que seria integrado no módulo proposto. De seguida, foram estudadas as técnicas mais relevantes e as ferramentas que mais se enquadram no âmbito deste trabalho. Neste contexto, a ideia foi partir da existência de uma ferramenta de gestão de sala de aula e desenvolver um módulo para a correcção de exercícios. A aplicação UNI_NET-Classroom, que foi a ferramenta para a qual o módulo foi desenvolvido, já continha um componente de gestão de exercícios que apenas efectuava a correcção para as respostas de escolha múltipla. Com este trabalho pretendeu-se acrescentar mais uma funcionalidade a esse componente, cujo intuito é dar apoio ao professor através da correcção de exercícios e sugestão da cotação a atribuir. Por último, foram realizadas várias experiências sobre o módulo desenvolvido, de forma a ser possível retirar algumas conclusões para o presente trabalho. A conclusão mais importante foi que as ferramentas de correcção automática são uma mais-valia para os professores e escolas.
Resumo:
2nd ser. : v.9 (1831)
Resumo:
2nd ser. : v.3 (1828)
Resumo:
2nd ser. : v.4 (1828)
Resumo:
El presente trabajo proporciona un nivel de ayuda para los usuarios de aplicaciones sociales, brindándoles un criterio adicional al momento de contactar a otra persona, apoyando al usuario con advertencias sobre algún comportamiento de riesgo de sus contactos y fomentando así la prevención de enfermedades.
Resumo:
Language Resources are a critical component for Natural Language Processing applications. Throughout the years many resources were manually created for the same task, but with different granularity and coverage information. To create richer resources for a broad range of potential reuses, nformation from all resources has to be joined into one. The hight cost of comparing and merging different resources by hand has been a bottleneck for merging existing resources. With the objective of reducing human intervention, we present a new method for automating merging resources. We have addressed the merging of two verbs subcategorization frame (SCF) lexica for Spanish. The results achieved, a new lexicon with enriched information and conflicting information signalled, reinforce our idea that this approach can be applied for other task of NLP.
Resumo:
BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
As the world becomes more technologically advanced and economies become globalized, computer science evolution has become faster than ever before. With this evolution and globalization come the need for sustainable university curricula that adequately prepare graduates for life in the industry. Additionally, behavioural skills or “soft” skills have become just as important as technical abilities and knowledge or “hard” skills. The objective of this study was to investigate the current skill gap that exists between computer science university graduates and actual industry needs as well as the sustainability of current computer science university curricula by conducting a systematic literature review of existing publications on the subject as well as a survey of recently graduated computer science students and their work supervisors. A quantitative study was carried out with respondents from six countries, mainly Finland, 31 of the responses came from recently graduated computer science professionals and 18 from their employers. The observed trends suggest that a skill gap really does exist particularly with “soft” skills and that many companies are forced to provide additional training to newly graduated employees if they are to be successful at their jobs.
Resumo:
En apprentissage automatique, domaine qui consiste à utiliser des données pour apprendre une solution aux problèmes que nous voulons confier à la machine, le modèle des Réseaux de Neurones Artificiels (ANN) est un outil précieux. Il a été inventé voilà maintenant près de soixante ans, et pourtant, il est encore de nos jours le sujet d'une recherche active. Récemment, avec l'apprentissage profond, il a en effet permis d'améliorer l'état de l'art dans de nombreux champs d'applications comme la vision par ordinateur, le traitement de la parole et le traitement des langues naturelles. La quantité toujours grandissante de données disponibles et les améliorations du matériel informatique ont permis de faciliter l'apprentissage de modèles à haute capacité comme les ANNs profonds. Cependant, des difficultés inhérentes à l'entraînement de tels modèles, comme les minima locaux, ont encore un impact important. L'apprentissage profond vise donc à trouver des solutions, en régularisant ou en facilitant l'optimisation. Le pré-entraînnement non-supervisé, ou la technique du ``Dropout'', en sont des exemples. Les deux premiers travaux présentés dans cette thèse suivent cette ligne de recherche. Le premier étudie les problèmes de gradients diminuants/explosants dans les architectures profondes. Il montre que des choix simples, comme la fonction d'activation ou l'initialisation des poids du réseaux, ont une grande influence. Nous proposons l'initialisation normalisée pour faciliter l'apprentissage. Le second se focalise sur le choix de la fonction d'activation et présente le rectifieur, ou unité rectificatrice linéaire. Cette étude a été la première à mettre l'accent sur les fonctions d'activations linéaires par morceaux pour les réseaux de neurones profonds en apprentissage supervisé. Aujourd'hui, ce type de fonction d'activation est une composante essentielle des réseaux de neurones profonds. Les deux derniers travaux présentés se concentrent sur les applications des ANNs en traitement des langues naturelles. Le premier aborde le sujet de l'adaptation de domaine pour l'analyse de sentiment, en utilisant des Auto-Encodeurs Débruitants. Celui-ci est encore l'état de l'art de nos jours. Le second traite de l'apprentissage de données multi-relationnelles avec un modèle à base d'énergie, pouvant être utilisé pour la tâche de désambiguation de sens.
Resumo:
The goal of this work is to develop an Open Agent Architecture for Multilingual information retrieval from Relational Database. The query for information retrieval can be given in plain Hindi or Malayalam; two prominent regional languages of India. The system supports distributed processing of user requests through collaborating agents. Natural language processing techniques are used for meaning extraction from the plain query and information is given back to the user in his/ her native language. The system architecture is designed in a structured way so that it can be adapted to other regional languages of India
Resumo:
Statistical Machine Translation (SMT) is one of the potential applications in the field of Natural Language Processing. The translation process in SMT is carried out by acquiring translation rules automatically from the parallel corpora. However, for many language pairs (e.g. Malayalam- English), they are available only in very limited quantities. Therefore, for these language pairs a huge portion of phrases encountered at run-time will be unknown. This paper focuses on methods for handling such out-of-vocabulary (OOV) words in Malayalam that cannot be translated to English using conventional phrase-based statistical machine translation systems. The OOV words in the source sentence are pre-processed to obtain the root word and its suffix. Different inflected forms of the OOV root are generated and a match is looked up for the word variants in the phrase translation table of the translation model. A Vocabulary filter is used to choose the best among the translations of these word variants by finding the unigram count. A match for the OOV suffix is also looked up in the phrase entries and the target translations are filtered out. Structuring of the filtered phrases is done and SMT translation model is extended by adding OOV with its new phrase translations. By the results of the manual evaluation done it is observed that amount of OOV words in the input has been reduced considerably