951 resultados para Visual Speech Recognition, Multiple Views, Frontal View, Profile View
Resumo:
These are the full proceedings of the conference.
Resumo:
O propósito desta pesquisa foi estudar algumas análises faciais utilizadas para diagnóstico ortodôntico e verificar a concordância entre norma lateral e frontal na avaliação da agradabilidade facial para os grupos leigos e profissionais, a concordância entre estes grupos na avaliação da agradabilidade facial nas normas lateral e frontal, bem como verificar a associação entre agradabilidade facial e Proporção Áurea, agradabilidade facial e Padrão Facial e entre Padrão Facial e Proporção Áurea. Utilizou-se 208 fotografias faciais padronizadas (104 laterais e 104 frontais) de 104 indivíduos escolhidos aleatoriamente, que primeiramente foram classificadas em agradável , aceitável e desagradável por dois grupos distintos: grupo Ortodontia e grupo Leigos . As fotografias laterais e frontais foram submetidas a medidas de Proporção Áurea Facial por meio de programa computadorizado e os indivíduos foram classificados quanto ao Padrão Facial pelo seu aspecto lateral. Após análise estatística, verificou-se que não houve concordância entre as variáveis da avaliação de agradabilidade estudadas, bem como não houve associação entre Proporção Áurea com agradabilidade facial ou com Padrão Facial. Entre agradabilidade facial e Padrão Facial, observou-se para a norma lateral associação fortemente positiva, porém para a frontal não houve associação para ambos os grupos de avaliadores.
Resumo:
This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.
Resumo:
This paper discusses the first of three studies which collectively represent a convergence of two ongoing research agendas: (1) the empirically-based comparison of the effects of evaluation environment on mobile usability evaluation results; and (2) the effect of environment - in this case lobster fishing boats - on achievable speech-recognition accuracy. We describe, in detail, our study and outline our results to date based on preliminary analysis. Broadly speaking, the potential for effective use of speech for data collection and vessel control looks very promising - surprisingly so! We outline our ongoing analysis and further work.
Resumo:
In this paper, we consider the task of recognizing epigraphs in images such as photos taken using mobile devices. Given a set of 17,155 photos related to 14,560 epigraphs, we used a k-NearestNeighbor approach in order to perform the recognition. The contribution of this work is in evaluating state-of-the-art visual object recognition techniques in this specific context. The experimental results conducted show that Vector of Locally Aggregated Descriptors obtained aggregating SIFT descriptors is the best choice for this task.
Resumo:
Negli ultimi anni, l'avanzamento incredibilmente rapido della tecnologia ha portato allo sviluppo e alla diffusione di dispositivi elettronici portatili aventi dimensioni estremamente ridotte e, allo stesso tempo, capacità computazionali molto notevoli. Più nello specifico, una particolare categoria di dispositivi, attualmente in forte sviluppo, che ha già fatto la propria comparsa sul mercato mondiale è sicuramente la categoria dei dispositivi Wearable. Come suggerisce il nome, questi sono progettati per essere letteralmente indossati, pensati per fornire continuo supporto, in diversi ambiti, a chi li utilizza. Se per interagire con essi l’utente non deve ricorrere obbligatoriamente all'utilizzo delle mani, allora si parla di dispositivi Wearable Hands Free. Questi sono generalmente in grado di percepire e catture l’input dell'utente seguendo tecniche e metodologie diverse, non basate sul tatto. Una di queste è sicuramente quella che prevede di modellare l’input dell’utente stesso attraverso la sua voce, appoggiandosi alla disciplina dell’ASR (Automatic Speech Recognition), che si occupa della traduzione del linguaggio parlato in testo, mediante l’utilizzo di dispositivi computerizzati. Si giunge quindi all’obiettivo della tesi, che è quello di sviluppare un framework, utilizzabile nell’ambito dei dispositivi Wearable, che fornisca un servizio di riconoscimento vocale appoggiandosi ad uno già esistente, in modo che presenti un certo livello di efficienza e facilità di utilizzo. Più in generale, in questo documento si punta a fornire una descrizione approfondita di quelli che sono i dispositivi Wearable e Wearable Hands-Free, definendone caratteristiche, criticità e ambiti di utilizzo. Inoltre, l’intento è quello di illustrare i principi di funzionamento dell’Automatic Speech Recognition per passare poi ad analisi, progettazione e sviluppo del framework appena citato.
Resumo:
A partir de los grabados publicados en las páginas de los periódicos que incluyeron humor gráfico en sus ediciones durante la Guerra del Pacífico (1879 - 1883), los caricaturistas chilenos desplegaron un discurso visual agresivo en clave patriótica y belicista, donde presentaron a sus lectores una imagen crítica y despectiva respecto de los adversarios de Chile. Recalcaron la supuesta falta de ánimo y valor combativo, ante la sola presencia de los efectivos militares chilenos tanto en el mar como en tierra. Así, la tinta y el papel, se transformaron en otra de las armas que intervinieron en el conflicto de Chile contra el Perú y Bolivia por la posesión de los ricos territorios salitreros de Tarapacá y Antofagasta. Las imágenes fueron interpretadas a partir de los postulados de la Escuela de Warburg, en especial los de Erwin Panofsky, que propone tres niveles de estudio del significado de cada obra, a saber, la “descripción preiconográfica”, luego el “estudio iconográfico” en cuanto tal y, finalmente, la “interpretación iconológica”.
Resumo:
Die Fähigkeit, geschriebene Texte zu verstehen, d.h. eine kohärente mentale Repräsentation von Textinhalten zu erstellen, ist eine notwendige Voraussetzung für eine erfolgreiche schulische und außerschulische Entwicklung. Es ist daher ein zentrales Anliegen des Bildungssystems Leseschwierigkeiten frühzeitig zu diagnostizieren und mithilfe zielgerichteter Interventionsprogramme zu fördern. Dies erfordert ein umfassendes Wissen über die kognitiven Teilprozesse, die dem Leseverstehen zugrunde liegen, ihre Zusammenhänge und ihre Entwicklung. Die vorliegende Dissertation soll zu einem umfassenden Verständnis über das Leseverstehen beitragen, indem sie eine Auswahl offener Fragestellungen experimentell untersucht. Studie 1 untersucht inwieweit phonologische Rekodier- und orthographische Dekodierfertigkeiten zum Satz- und Textverstehen beitragen und wie sich beide Fertigkeiten bei deutschen Grundschüler(inne)n von der 2. bis zur 4. Klasse entwickeln. Die Ergebnisse legen nahe, dass beide Fertigkeiten signifikante und eigenständige Beiträge zum Leseverstehen leisten und dass sich ihr relativer Beitrag über die Klassenstufen hinweg nicht verändert. Darüber hinaus zeigt sich, dass bereits deutsche Zweitklässler(innen) den Großteil geschriebener Wörter in altersgerechten Texten über orthographische Vergleichsprozesse erkennen. Nichtsdestotrotz nutzen deutsche Grundschulkinder offenbar kontinuierlich phonologische Informationen, um die visuelle Worterkennung zu optimieren. Studie 2 erweitert die bisherige empirische Forschung zu einem der bekanntesten Modelle des Leseverstehens—der Simple View of Reading (SVR, Gough & Tunmer, 1986). Die Studie überprüft die SVR (Reading comprehension = Decoding x Comprehension) mithilfe optimierter und methodisch stringenter Maße der Modellkonstituenten und überprüft ihre Generalisierbarkeit für deutsche Dritt- und Viertklässler(innen). Studie 2 zeigt, dass die SVR einer methodisch stringenten Überprüfung nicht standhält und nicht ohne Weiteres auf deutsche Dritt- und Viertklässler(innen) generalisiert werden kann. Es wurden nur schwache Belege für eine multiplikative Verknüpfung von Dekodier- (D) und Hörverstehensfertigkeiten (C) gefunden. Der Umstand, dass ein beachtlicher Teil der Varianz im Leseverstehen (R) nicht durch D und C aufgeklärt werden konnte, deutet darauf hin, dass das Modell nicht vollständig ist und ggf. durch weitere Komponenten ergänzt werden muss. Studie 3 untersucht die Verarbeitung positiv-kausaler und negativ-kausaler Kohärenzrelationen bei deutschen Erst- bis Viertklässler(inne)n und Erwachsenen im Lese- und Hörverstehen. In Übereinstimmung mit dem Cumulative Cognitive Complexity-Ansatz (Evers-Vermeul & Sanders, 2009; Spooren & Sanders, 2008) zeigt Studie 3, dass die Verarbeitung negativ-kausaler Kohärenzrelationen und Konnektoren kognitiv aufwändiger ist als die Verarbeitung positiv-kausaler Relationen. Darüber hinaus entwickelt sich das Verstehen beider Kohärenzrelationen noch über die Grundschulzeit hinweg und ist für negativ-kausale Relationen am Ende der vierten Klasse noch nicht abgeschlossen. Studie 4 zeigt und diskutiert die Nützlichkeit prozess-orientierter Lesetests wie ProDi- L (Richter et al., in press), die individuelle Unterschiede in den kognitiven Teilfertigkeiten des Leseverstehens selektiv erfassen. Hierzu wird exemplarisch die Konstruktvalidität des ProDi-L-Subtests ‚Syntaktische Integration’ nachgewiesen. Mittels explanatorischer Item- Repsonse-Modelle wird gezeigt, dass der Test Fertigkeiten syntaktischer Integration separat erfasst und Kinder mit defizitären syntaktischen Fertigkeiten identifizieren kann. Die berichteten Befunde tragen zu einem umfassenden Verständnis der kognitiven Teilfertigkeiten des Leseverstehens bei, das für eine optimale Gestaltung des Leseunterrichts, für das Erstellen von Lernmaterialien, Leseinstruktionen und Lehrbüchern unerlässlich ist. Darüber hinaus stellt es die Grundlage für eine sinnvolle Diagnose individueller Leseschwierigkeiten und für die Konzeption adaptiver und zielgerichteter Interventionsprogramme zur Förderung des Leseverstehens bei schwachen Leser(inne)n dar.
Resumo:
This study answers to How scenario analysis could help acquiring companies to reduce uncertainty in the acquisition process? It is due to the mismatch between academic world’s caveat emptor and business world’s eagerness to pursue acquisitions that motivated this study. Acquisitions are as popular as ever, thus, managing the uncertainty surrounding these transactions is relevant. This study creates a generic theoretical model with a strategy-level scope. Thus, the study does not discuss nor does it seek answers to operational issues related in both fields. This study is explorative and constructivist in nature. It discusses briefly the concepts and relatedness of risk and uncertainty and establishes a hierarchy between these two: Risks being a “sub-section” of uncertainty, although not with clear boundaries. Acquisition theory follows the process view that understands acquisitions as a process with various levels – some strategic, some operational. Scenario analysis is presented as tool for management to enrich their strategic discussion and understand their future options. The empirical data collection is done through interviewing. The results are reflected on literature on strategic management, scenario literature, and on a consultancy’s report picturing firm’s strategies in accordance with their acquisition processes. The study has an abductive approach as it tries to combine multiple views and generates discussion between literature review, interviews, the report, and second round of literature. The model suggests three propositions: First, at the strategic decision making level, when the decision whether or not to pursue an acquisition growth strategy has been made, it provides firms new data and enriches the strategic discussion. Second, when the acquisition strategy has been created, it can be applied as a tool to measure possible acquisition targets against the backdrop of the first set of scenarios. Third, due to the scenario analysis’ requirement to include people with various backgrounds and from multiple levels of the corporate hierarchy, it could help managers to avoid biases stemming from hubris.
Resumo:
Memristori on yksi elektroniikan peruskomponenteista vastuksen, kondensaattorin ja kelan lisäksi. Se on passiivinen komponentti, jonka teorian kehitti Leon Chua vuonna 1971. Kesti kuitenkin yli kolmekymmentä vuotta ennen kuin teoria pystyttiin yhdistämään kokeellisiin tuloksiin. Vuonna 2008 Hewlett Packard julkaisi artikkelin, jossa he väittivät valmistaneensa ensimmäisen toimivan memristorin. Memristori eli muistivastus on resistiivinen komponentti, jonka vastusarvoa pystytään muuttamaan. Nimens mukaisesti memristori kykenee myös säilyttämään vastusarvonsa ilman jatkuvaa virtaa ja jännitettä. Tyypillisesti memristorilla on vähintään kaksi vastusarvoa, joista kumpikin pystytään valitsemaan syöttämällä komponentille jännitettä tai virtaa. Tämän vuoksi memristoreita kutsutaankin usein resistiivisiksi kytkimiksi. Resistiivisiä kytkimiä tutkitaan nykyään paljon erityisesti niiden mahdollistaman muistiteknologian takia. Resistiivisistä kytkimistä rakennettua muistia kutsutaan ReRAM-muistiksi (lyhenne sanoista resistive random access memory). ReRAM-muisti on Flash-muistin tapaan haihtumaton muisti, jota voidaan sähköisesti ohjelmoida tai tyhjentää. Flash-muistia käytetään tällä hetkellä esimerkiksi muistitikuissa. ReRAM-muisti mahdollistaa kuitenkin nopeamman ja vähävirtaiseman toiminnan Flashiin verrattuna, joten se on tulevaisuudessa varteenotettava kilpailija markkinoilla. ReRAM-muisti mahdollistaa myös useammin bitin tallentamisen yhteen muistisoluun binäärisen (”0” tai ”1”) toiminnan sijaan. Tyypillisesti ReRAM-muistisolulla on kaksi rajoittavaa vastusarvoa, mutta näiden kahden tilan välille pystytään mahdollisesti ohjelmoimaan useampia tiloja. Muistisoluja voidaan kutsua analogisiksi, jos tilojen määrää ei ole rajoitettu. Analogisilla muistisoluilla olisi mahdollista rakentaa tehokkaasti esimerkiksi neuroverkkoja. Neuroverkoilla pyritään mallintamaan aivojen toimintaa ja suorittamaan tehtäviä, jotka ovat tyypillisesti vaikeita perinteisille tietokoneohjelmille. Neuroverkkoja käytetään esimerkiksi puheentunnistuksessa tai tekoälytoteutuksissa. Tässä diplomityössä tarkastellaan Ta2O5 -perustuvan ReRAM-muistisolun analogista toimintaa pitäen mielessä soveltuvuus neuroverkkoihin. ReRAM-muistisolun valmistus ja mittaustulokset käydään läpi. Muistisolun toiminta on harvoin täysin analogista, koska kahden rajoittavan vastusarvon välillä on usein rajattu määrä tiloja. Tämän vuoksi toimintaa kutsutaan pseudoanalogiseksi. Mittaustulokset osoittavat, että yksittäinen ReRAM-muistisolu kykenee binääriseen toimintaan hyvin. Joiltain osin yksittäinen solu kykenee tallentamaan useampia tiloja, mutta vastusarvoissa on peräkkäisten ohjelmointisyklien välillä suurta vaihtelevuutta, joka hankaloittaa tulkintaa. Valmistettu ReRAM-muistisolu ei sellaisenaan kykene toimimaan pseudoanalogisena muistina, vaan se vaati rinnalleen virtaa rajoittavan komponentin. Myös valmistusprosessin kehittäminen vähentäisi yksittäisen solun toiminnassa esiintyvää varianssia, jolloin sen toiminta muistuttaisi enemmän pseudoanalogista muistia.
Resumo:
Il lavoro di tesi presentato è nato da una collaborazione con il Politecnico di Macao, i referenti sono: Prof. Rita Tse, Prof. Marcus Im e Prof. Su-Kit Tang. L'obiettivo consiste nella creazione di un modello di traduzione automatica italiano-cinese e nell'osservarne il comportamento, al fine di determinare se sia o meno possibile l'impresa. Il trattato approfondisce l'argomento noto come Neural Language Processing (NLP), rientrando dunque nell'ambito delle traduzioni automatiche. Sono servizi che, attraverso l'ausilio dell'intelligenza artificiale sono in grado di elaborare il linguaggio naturale, per poi interpretarlo e tradurlo. NLP è una branca dell'informatica che unisce: computer science, intelligenza artificiale e studio di lingue. Dal punto di vista della ricerca, le più grandi sfide in questo ambito coinvolgono: il riconoscimento vocale (speech-recognition), comprensione del testo (natural-language understanding) e infine la generazione automatica di testo (natural-language generation). Lo stato dell'arte attuale è stato definito dall'articolo "Attention is all you need" \cite{vaswani2017attention}, presentato nel 2017 a partire da una collaborazione di ricercatori della Cornell University.\\ I modelli di traduzione automatica più noti ed utilizzati al momento sono i Neural Machine Translators (NMT), ovvero modelli che attraverso le reti neurali artificiali profonde, sono in grado effettuare traduzioni o predizioni. La qualità delle traduzioni è particolarmente buona, tanto da arrivare quasi a raggiungere la qualità di una traduzione umana. Il lavoro infatti si concentrerà largamente sullo studio e utilizzo di NMT, allo scopo di proporre un modello funzionale e che sia in grado di performare al meglio nelle traduzioni da italiano a cinese e viceversa.
Resumo:
The recording and processing of voice data raises increasing privacy concerns for users and service providers. One way to address these issues is to move processing on the edge device closer to the recording so that potentially identifiable information is not transmitted over the internet. However, this is often not possible due to hardware limitations. An interesting alternative is the development of voice anonymization techniques that remove individual speakers characteristics while preserving linguistic and acoustic information in the data. In this work, a state-of-the-art approach to sequence-to-sequence speech conversion, ini- tially based on x-vectors and bottleneck features for automatic speech recognition, is explored to disentangle the two acoustic information using different pre-trained speech and speakers representation. Furthermore, different strategies for selecting target speech representations are analyzed. Results on public datasets in terms of equal error rate and word error rate show that good privacy is achieved with limited impact on converted speech quality relative to the original method.
Resumo:
The aim of this Study was to compare the learning process of a highly complex ballet skill following demonstrations of point light and video models 16 participants divided into point light and video groups (ns = 8) performed 160 trials of a pirouette equally distributed in blocks of 20 trials alternating periods of demonstration and practice with a retention test a day later Measures of head and trunk oscillation coordination d1 parity from the model and movement time difference showed similarities between video and point light groups ballet experts evaluations indicated superiority of performance in the video over the point light group Results are discussed in terms of the task requirements of dissociation between head and trunk rotations focusing on the hypothesis of sufficiency and higher relevance of information contained in biological motion models applied to learning of complex motor skills
Resumo:
Additional neurological features have recently been described in seven families transmitting pathogenic mutations in OPA1, the most common cause of autosomal dominant optic atrophy. However, the frequency of these syndromal `dominant optic atrophy plus` variants and the extent of neurological involvement have not been established. In this large multi-centre study of 104 patients from 45 independent families, including 60 new cases, we show that extra-ocular neurological complications are common in OPA1 disease, and affect up to 20% of all mutational carriers. Bilateral sensorineural deafness beginning in late childhood and early adulthood was a prominent manifestation, followed by a combination of ataxia, myopathy, peripheral neuropathy and progressive external ophthalmoplegia from the third decade of life onwards. We also identified novel clinical presentations with spastic paraparesis mimicking hereditary spastic paraplegia, and a multiple sclerosis-like illness. In contrast to initial reports, multi-system neurological disease was associated with all mutational subtypes, although there was an increased risk with missense mutations [odds ratio = 3.06, 95% confidence interval = 1.44-6.49; P = 0.0027], and mutations located within the guanosine triphosphate-ase region (odds ratio = 2.29, 95% confidence interval = 1.08-4.82; P = 0.0271). Histochemical and molecular characterization of skeletal muscle biopsies revealed the presence of cytochrome c oxidase-deficient fibres and multiple mitochondrial DNA deletions in the majority of patients harbouring OPA1 mutations, even in those with isolated optic nerve involvement. However, the cytochrome c oxidase-deficient load was over four times higher in the dominant optic atrophy + group compared to the pure optic neuropathy group, implicating a causal role for these secondary mitochondrial DNA defects in disease pathophysiology. Individuals with dominant optic atrophy plus phenotypes also had significantly worse visual outcomes, and careful surveillance is therefore mandatory to optimize the detection and management of neurological disability in a group of patients who already have significant visual impairment.
Resumo:
Speech understanding disorders in the elderly may be due to peripheral or central auditory dysfunctions. Asymmetry of results in dichotic testing increases with age, and may reflect on a lack of inter-hemisphere transmission and cognitive decline. Aim: To investigate auditory processing of aged people with no hearing complaints. Study design: clinical prospective. Materials and Methods: Twenty-two voluntary individuals, aged between 55 and 75 years, were evaluated. They reported no hearing complaints and had maximal auditory thresholds of 40 dB HL until 4 KHz, 80% of minimal speech recognition scores and peripheral symmetry between the ears. We used two kinds of tests: speech in noise and dichotic alternated dissyllables (SSW). Results were compared between males and females, right and left ears and between age groups. Results: There were no significant differences between genders, in both tests. Their Left ears showed worse results, in the competitive condition of SSW. Individuals aged 65 or older had poorer performances than those aged 55 to 64. Conclusion: Central auditory tests showed worse performance with aging. The employment of a dichotic test in the auditory evaluation setting in the elderly may help in the early identification of degenerative processes, which are common among these patients.