911 resultados para Machine Learning,Natural Language Processing,Descriptive Text Mining,POIROT,Transformer


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the present work, we propose a model for the statistical distribution of people versus number of steps acquired by them in a learning process, based on competition, learning and natural selection. We consider that learning ability is normally distributed. We found that the number of people versus step acquired by them in a learning process is given through a power law. As competition, learning and selection is also at the core of all economical and social systems, we consider that power-law scaling is a quantitative description of this process in social systems. This gives an alternative thinking in holistic properties of complex systems. (C) 2004 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The applications of Automatic Vowel Recognition (AVR), which is a sub-part of fundamental importance in most of the speech processing systems, vary from automatic interpretation of spoken language to biometrics. State-of-the-art systems for AVR are based on traditional machine learning models such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), however, such classifiers can not deal with efficiency and effectiveness at the same time, existing a gap to be explored when real-time processing is required. In this work, we present an algorithm for AVR based on the Optimum-Path Forest (OPF), which is an emergent pattern recognition technique recently introduced in literature. Adopting a supervised training procedure and using speech tags from two public datasets, we observed that OPF has outperformed ANNs, SVMs, plus other classifiers, in terms of training time and accuracy. ©2010 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O processamento de voz tornou-se uma tecnologia cada vez mais baseada na modelagem automática de vasta quantidade de dados. Desta forma, o sucesso das pesquisas nesta área está diretamente ligado a existência de corpora de domínio público e outros recursos específicos, tal como um dicionário fonético. No Brasil, ao contrário do que acontece para a língua inglesa, por exemplo, não existe atualmente em domínio público um sistema de Reconhecimento Automático de Voz (RAV) para o Português Brasileiro com suporte a grandes vocabulários. Frente a este cenário, o trabalho tem como principal objetivo discutir esforços dentro da iniciativa FalaBrasil [1], criada pelo Laboratório de Processamento de Sinais (LaPS) da UFPA, apresentando pesquisas e softwares na área de RAV para o Português do Brasil. Mais especificamente, o presente trabalho discute a implementação de um sistema de reconhecimento de voz com suporte a grandes vocabulários para o Português do Brasil, utilizando a ferramenta HTK baseada em modelo oculto de Markov (HMM) e a criação de um módulo de conversão grafema-fone, utilizando técnicas de aprendizado de máquina.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta pesquisa teve como objetivo identificar e analisar quais as possíveis dificuldades advindas da linguagem que alunos enfrentam na conversão da língua natural para a linguagem matemática. A investigação foi realizada ao longo do ano letivo de 2008 em classes de Ensino Médio de duas escolas públicas da cidade de Belém, onde foram coletadas informações por meio de registros produzidos pelos alunos em testes e avaliações bimestrais. Para subsidiar a investigação foram utilizadas, como aporte teórico, idéias de Raymond Duval acerca da teoria dos registros de representação semiótica; o conceito de significado ligado a filosofia da linguagem segundo Wittgenstein; algumas considerações feitas por Gottlob Frege sobre a distinção entre sentido e referência assim como algumas idéias do filósofo Gilles-Gaston Granger no que concerne ao problema das significações e do aspecto formal da linguagem matemática. As análises das informações que foram coletadas no decorrer do processo investigativo revelaram que, na perspectiva dos alunos, a conversão da língua natural para a linguagem matemática se depara com quatro tipos de dificuldades: a primeira apontou para o fato de existirem em cada registro de representação de um mesmo objeto matemático, diferentes conteúdos a serem mobilizados; a segunda mostrou que os alunos fracassam ao realizar a conversão da língua natural para a linguagem matemática quando não interpretam corretamente as regras matemáticas implícitas no enunciado de uma situação problema; a terceira surgiu do fato de existirem no texto de uma situação problema, palavras que os alunos não compreendiam o seu significado ou que geravam ambigüidade de sentidos; a quarta surgiu a partir do fato dos alunos não conseguirem compreender o significado matemático das letras utilizadas nos enunciados dos problemas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In active learning, a machine learning algorithmis given an unlabeled set of examples U, and is allowed to request labels for a relatively small subset of U to use for training. The goal is then to judiciously choose which examples in U to have labeled in order to optimize some performance criterion, e.g. classification accuracy. We study how active learning affects AUC. We examine two existing algorithms from the literature and present our own active learning algorithms designed to maximize the AUC of the hypothesis. One of our algorithms was consistently the top performer, and Closest Sampling from the literature often came in second behind it. When good posterior probability estimates were available, our heuristics were by far the best.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples’ labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples in the budgeted learning model based on algorithms for the multi-armed bandit problem. All of our approaches outperformed the current state of the art. Furthermore, we present a new means for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. Our new example selection method improved performance of all the algorithms we tested, both ours and those in the literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data coming out from various researches carried out over the last years in Italy on the problem of school dispersion in secondary school show that difficulty in studying mathematics is one of the most frequent reasons of discomfort reported by students. Nevertheless, it is definitely unrealistic to think we can do without such knowledge in today society: mathematics is largely taught in secondary school and it is not confined within technical-scientific courses only. It is reasonable to say that, although students may choose academic courses that are, apparently, far away from mathematics, all students will have to come to terms, sooner or later in their life, with this subject. Among the reasons of discomfort given by the study of mathematics, some mention the very nature of this subject and in particular the complex symbolic language through which it is expressed. In fact, mathematics is a multimodal system composed by oral and written verbal texts, symbol expressions, such as formulae and equations, figures and graphs. For this, the study of mathematics represents a real challenge to those who suffer from dyslexia: this is a constitutional condition limiting people performances in relation to the activities of reading and writing and, in particular, to the study of mathematical contents. Here the difficulties in working with verbal and symbolic codes entail, in turn, difficulties in the comprehension of texts from which to deduce operations that, once combined together, would lead to the problem final solution. Information technologies may support this learning disorder effectively. However, these tools have some implementation limits, restricting their use in the study of scientific subjects. Vocal synthesis word processors are currently used to compensate difficulties in reading within the area of classical studies, but they are not used within the area of mathematics. This is because the vocal synthesis (or we should say the screen reader supporting it) is not able to interpret all that is not textual, such as symbols, images and graphs. The DISMATH software, which is the subject of this project, would allow dyslexic users to read technical-scientific documents with the help of a vocal synthesis, to understand the spatial structure of formulae and matrixes, to write documents with a technical-scientific content in a format that is compatible with main scientific editors. The system uses LaTex, a text mathematic language, as mediation system. It is set up as LaTex editor, whose graphic interface, in line with main commercial products, offers some additional specific functions with the capability to support the needs of users who are not able to manage verbal and symbolic codes on their own. LaTex is translated in real time into a standard symbolic language and it is read by vocal synthesis in natural language, in order to increase, through the bimodal representation, the ability to process information. The understanding of the mathematic formula through its reading is made possible by the deconstruction of the formula itself and its “tree” representation, so allowing to identify the logical elements composing it. Users, even without knowing LaTex language, are able to write whatever scientific document they need: in fact the symbolic elements are recalled by proper menus and automatically translated by the software managing the correct syntax. The final aim of the project, therefore, is to implement an editor enabling dyslexic people (but not only them) to manage mathematic formulae effectively, through the integration of different software tools, so allowing a better teacher/learner interaction too.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il problema relativo alla predizione, la ricerca di pattern predittivi all‘interno dei dati, è stato studiato ampiamente. Molte metodologie robuste ed efficienti sono state sviluppate, procedimenti che si basano sull‘analisi di informazioni numeriche strutturate. Quella testuale, d‘altro canto, è una tipologia di informazione fortemente destrutturata. Quindi, una immediata conclusione, porterebbe a pensare che per l‘analisi predittiva su dati testuali sia necessario sviluppare metodi completamente diversi da quelli ben noti dalle tecniche di data mining. Un problema di predizione può essere risolto utilizzando invece gli stessi metodi : dati testuali e documenti possono essere trasformati in valori numerici, considerando per esempio l‘assenza o la presenza di termini, rendendo di fatto possibile una utilizzazione efficiente delle tecniche già sviluppate. Il text mining abilita la congiunzione di concetti da campi di applicazione estremamente eterogenei. Con l‘immensa quantità di dati testuali presenti, basti pensare, sul World Wide Web, ed in continua crescita a causa dell‘utilizzo pervasivo di smartphones e computers, i campi di applicazione delle analisi di tipo testuale divengono innumerevoli. L‘avvento e la diffusione dei social networks e della pratica di micro blogging abilita le persone alla condivisione di opinioni e stati d‘animo, creando un corpus testuale di dimensioni incalcolabili aggiornato giornalmente. Le nuove tecniche di Sentiment Analysis, o Opinion Mining, si occupano di analizzare lo stato emotivo o la tipologia di opinione espressa all‘interno di un documento testuale. Esse sono discipline attraverso le quali, per esempio, estrarre indicatori dello stato d‘animo di un individuo, oppure di un insieme di individui, creando una rappresentazione dello stato emotivo sociale. L‘andamento dello stato emotivo sociale può condizionare macroscopicamente l‘evolvere di eventi globali? Studi in campo di Economia e Finanza Comportamentale assicurano un legame fra stato emotivo, capacità nel prendere decisioni ed indicatori economici. Grazie alle tecniche disponibili ed alla mole di dati testuali continuamente aggiornati riguardanti lo stato d‘animo di milioni di individui diviene possibile analizzare tali correlazioni. In questo studio viene costruito un sistema per la previsione delle variazioni di indici di borsa, basandosi su dati testuali estratti dalla piattaforma di microblogging Twitter, sotto forma di tweets pubblici; tale sistema include tecniche di miglioramento della previsione basate sullo studio di similarità dei testi, categorizzandone il contributo effettivo alla previsione.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computer-assisted translation (or computer-aided translation or CAT) is a form of language translation in which a human translator uses computer software in order to facilitate the translation process. Machine translation (MT) is the automated process by which a computerized system produces a translated text or speech from one natural language to another. Both of them are leading and promising technologies in the translation industry; it therefore seems important that translation students and professional translators become familiar with this relatively new types of technology. Whether used together, not only might these two different types of systems reduce translation time, but also lead to a further improvement in the field of translation technologies. The dissertation consists of four chapters. The first one surveys the chronological development of MT and CAT tools, the emergence of pre-editing, post-editing and controlled language and the very last frontiers in this sector. The second one provide a general overview on the four main CAT tools that are used nowadays and tested hereto. The third chapter is dedicated to the experimentations that have been conducted in order to analyze and evaluate the performance of the four integrated systems that are the core subject of this dissertation. Finally, the fourth chapter deals with the issue of terminological equivalence in interlinguistic translation. The purpose of this dissertation is not to provide an objective and definitive solution to the complex issues that arise at any time in the field of translation technologies, this aim being well away from being achieved, but to supply information about the limits and potentiality that are typical of those instruments which are now essential to any professional translator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La tesi da me svolta durante questi ultimi sei mesi è stata sviluppata presso i laboratori di ricerca di IMA S.p.a.. IMA (Industria Macchine Automatiche) è una azienda italiana che naque nel 1961 a Bologna ed oggi riveste il ruolo di leader mondiale nella produzione di macchine automatiche per il packaging di medicinali. Vorrei subito mettere in luce che in tale contesto applicativo l’utilizzo di algoritmi di data-mining risulta essere ostico a causa dei due ambienti in cui mi trovo. Il primo è quello delle macchine automatiche che operano con sistemi in tempo reale dato che non presentano a pieno le risorse di cui necessitano tali algoritmi. Il secondo è relativo alla produzione di farmaci in quanto vige una normativa internazionale molto restrittiva che impone il tracciamento di tutti gli eventi trascorsi durante l’impacchettamento ma che non permette la visione al mondo esterno di questi dati sensibili. Emerge immediatamente l’interesse nell’utilizzo di tali informazioni che potrebbero far affiorare degli eventi riconducibili a un problema della macchina o a un qualche tipo di errore al fine di migliorare l’efficacia e l’efficienza dei prodotti IMA. Lo sforzo maggiore per riuscire ad ideare una strategia applicativa è stata nella comprensione ed interpretazione dei messaggi relativi agli aspetti software. Essendo i dati molti, chiusi, e le macchine con scarse risorse per poter applicare a dovere gli algoritmi di data mining ho provveduto ad adottare diversi approcci in diversi contesti applicativi: • Sistema di identificazione automatica di errore al fine di aumentare di diminuire i tempi di correzione di essi. • Modifica di un algoritmo di letteratura per la caratterizzazione della macchina. La trattazione è così strutturata: • Capitolo 1: descrive la macchina automatica IMA Adapta della quale ci sono stati forniti i vari file di log. Essendo lei l’oggetto di analisi per questo lavoro verranno anche riportati quali sono i flussi di informazioni che essa genera. • Capitolo 2: verranno riportati degli screenshoot dei dati in mio possesso al fine di, tramite un’analisi esplorativa, interpretarli e produrre una formulazione di idee/proposte applicabili agli algoritmi di Machine Learning noti in letteratura. • Capitolo 3 (identificazione di errore): in questo capitolo vengono riportati i contesti applicativi da me progettati al fine di implementare una infrastruttura che possa soddisfare il requisito, titolo di questo capitolo. • Capitolo 4 (caratterizzazione della macchina): definirò l’algoritmo utilizzato, FP-Growth, e mostrerò le modifiche effettuate al fine di poterlo impiegare all’interno di macchine automatiche rispettando i limiti stringenti di: tempo di cpu, memoria, operazioni di I/O e soprattutto la non possibilità di aver a disposizione l’intero dataset ma solamente delle sottoporzioni. Inoltre verranno generati dei DataSet per il testing di dell’algoritmo FP-Growth modificato.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research tests the hypothesis that knowledge of derivational morphology facilitates vocabulary acquisition in beginning adult second language learners. Participants were mono-lingual English-speaking college students aged 18 years and older enrolled inintroductory Spanish courses. Knowledge of Spanish derivational morphology was tested through the use of a forced-choice translation task. Spanish lexical knowledge was measured by a translation task using direct translation (English word) primes and conceptual (picture) primes. A 2x2x2 mixed factor ANOVA examined the relationships between morphological knowledge (strong, moderate), error type (form-based, conceptual), and prime type (direct translation, picture). The results are consistent with the existence of a relationship between knowledge of derivational morphology andacquisition of second language vocabulary. Participants made more conceptually-based errors than form-based errors F (1,22)=7.744, p=.011. This result is consistent with Clahsen & Felser’s (2006) and Ullman’s (2004) models of second language processing. Additionally, participants with Strong morphological knowledge made fewer errors onthe lexical knowledge task than participants with Moderate morphological knowledge t(23)=-2.656, p=.014. I suggest future directions to clarify the relationship between morphological knowledge and lexical development in adult second language learners.