746 resultados para Machine Vision


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a marvelous but somewhat neglected paper, 'The Corporation: Will It Be Managed by Machines?' Herbert Simon articulated from the perspective of 1960 his vision of what we now call the New Economy the machine-aided system of production and management of the late twentieth century. Simon's analysis sprang from what I term the principle of cognitive comparative advantage: one has to understand the quite different cognitive structures of humans and machines (including computers) in order to explain and predict the tasks to which each will be most suited. Perhaps unlike Simon's better-known predictions about progress in artificial intelligence research, the predictions of this 1960 article hold up remarkably well and continue to offer important insights. In what follows I attempt to tell a coherent story about the evolution of machines and the division of labor between humans and machines. Although inspired by Simon's 1960 paper, I weave many other strands into the tapestry, from classical discussions of the division of labor to present-day evolutionary psychology. The basic conclusion is that, with growth in the extent of the market, we should see humans 'crowded into' tasks that call for the kinds of cognition for which humans have been equipped by biological evolution. These human cognitive abilities range from the exercise of judgment in situations of ambiguity and surprise to more mundane abilities in spatio-temporal perception and locomotion. Conversely, we should see machines 'crowded into' tasks with a well-defined structure. This conclusion is not based (merely) on a claim that machines, including computers, are specialized idiots-savants today because of the limits (whether temporary or permanent) of artificial intelligence; rather, it rests on a claim that, for what are broadly 'economic' reasons, it will continue to make economic sense to create machines that are idiots-savants.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic 2D-to-3D conversion is an important application for filling the gap between the increasing number of 3D displays and the still scant 3D content. However, existing approaches have an excessive computational cost that complicates its practical application. In this paper, a fast automatic 2D-to-3D conversion technique is proposed, which uses a machine learning framework to infer the 3D structure of a query color image from a training database with color and depth images. Assuming that photometrically similar images have analogous 3D structures, a depth map is estimated by searching the most similar color images in the database, and fusing the corresponding depth maps. Large databases are desirable to achieve better results, but the computational cost also increases. A clustering-based hierarchical search using compact SURF descriptors to characterize images is proposed to drastically reduce search times. A significant computational time improvement has been obtained regarding other state-of-the-art approaches, maintaining the quality results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Federal Aviation Administration, Atlantic City International Airport, N.J.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A differenza di quanto avviene nel commercio tradizionale, in quello online il cliente non ha la possibilità di toccare con mano o provare il prodotto. La decisione di acquisto viene maturata in base ai dati messi a disposizione dal venditore attraverso titolo, descrizioni, immagini e alle recensioni di clienti precedenti. É quindi possibile prevedere quanto un prodotto venderà sulla base di queste informazioni. La maggior parte delle soluzioni attualmente presenti in letteratura effettua previsioni basandosi sulle recensioni, oppure analizzando il linguaggio usato nelle descrizioni per capire come questo influenzi le vendite. Le recensioni, tuttavia, non sono informazioni note ai venditori prima della commercializzazione del prodotto; usando solo dati testuali, inoltre, si tralascia l’influenza delle immagini. L'obiettivo di questa tesi è usare modelli di machine learning per prevedere il successo di vendita di un prodotto a partire dalle informazioni disponibili al venditore prima della commercializzazione. Si fa questo introducendo un modello cross-modale basato su Vision-Language Transformer in grado di effettuare classificazione. Un modello di questo tipo può aiutare i venditori a massimizzare il successo di vendita dei prodotti. A causa della mancanza, in letteratura, di dataset contenenti informazioni relative a prodotti venduti online che includono l’indicazione del successo di vendita, il lavoro svolto comprende la realizzazione di un dataset adatto a testare la soluzione sviluppata. Il dataset contiene un elenco di 78300 prodotti di Moda venduti su Amazon, per ognuno dei quali vengono riportate le principali informazioni messe a disposizione dal venditore e una misura di successo sul mercato. Questa viene ricavata a partire dal gradimento espresso dagli acquirenti e dal posizionamento del prodotto in una graduatoria basata sul numero di esemplari venduti.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

L'image captioning è un task di machine learning che consiste nella generazione di una didascalia, o caption, che descriva le caratteristiche di un'immagine data in input. Questo può essere applicato, ad esempio, per descrivere in dettaglio i prodotti in vendita su un sito di e-commerce, migliorando l'accessibilità del sito web e permettendo un acquisto più consapevole ai clienti con difficoltà visive. La generazione di descrizioni accurate per gli articoli di moda online è importante non solo per migliorare le esperienze di acquisto dei clienti, ma anche per aumentare le vendite online. Oltre alla necessità di presentare correttamente gli attributi degli articoli, infatti, descrivere i propri prodotti con il giusto linguaggio può contribuire a catturare l'attenzione dei clienti. In questa tesi, ci poniamo l'obiettivo di sviluppare un sistema in grado di generare una caption che descriva in modo dettagliato l'immagine di un prodotto dell'industria della moda dato in input, sia esso un capo di vestiario o un qualche tipo di accessorio. A questo proposito, negli ultimi anni molti studi hanno proposto soluzioni basate su reti convoluzionali e LSTM. In questo progetto proponiamo invece un'architettura encoder-decoder, che utilizza il modello Vision Transformer per la codifica delle immagini e GPT-2 per la generazione dei testi. Studiamo inoltre come tecniche di deep metric learning applicate in end-to-end durante l'addestramento influenzino le metriche e la qualità delle caption generate dal nostro modello.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the most visionary goals of Artificial Intelligence is to create a system able to mimic and eventually surpass the intelligence observed in biological systems including, ambitiously, the one observed in humans. The main distinctive strength of humans is their ability to build a deep understanding of the world by learning continuously and drawing from their experiences. This ability, which is found in various degrees in all intelligent biological beings, allows them to adapt and properly react to changes by incrementally expanding and refining their knowledge. Arguably, achieving this ability is one of the main goals of Artificial Intelligence and a cornerstone towards the creation of intelligent artificial agents. Modern Deep Learning approaches allowed researchers and industries to achieve great advancements towards the resolution of many long-standing problems in areas like Computer Vision and Natural Language Processing. However, while this current age of renewed interest in AI allowed for the creation of extremely useful applications, a concerningly limited effort is being directed towards the design of systems able to learn continuously. The biggest problem that hinders an AI system from learning incrementally is the catastrophic forgetting phenomenon. This phenomenon, which was discovered in the 90s, naturally occurs in Deep Learning architectures where classic learning paradigms are applied when learning incrementally from a stream of experiences. This dissertation revolves around the Continual Learning field, a sub-field of Machine Learning research that has recently made a comeback following the renewed interest in Deep Learning approaches. This work will focus on a comprehensive view of continual learning by considering algorithmic, benchmarking, and applicative aspects of this field. This dissertation will also touch on community aspects such as the design and creation of research tools aimed at supporting Continual Learning research, and the theoretical and practical aspects concerning public competitions in this field.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Vision systems are powerful tools playing an increasingly important role in modern industry, to detect errors and maintain product standards. With the enlarged availability of affordable industrial cameras, computer vision algorithms have been increasingly applied in industrial manufacturing processes monitoring. Until a few years ago, industrial computer vision applications relied only on ad-hoc algorithms designed for the specific object and acquisition setup being monitored, with a strong focus on co-designing the acquisition and processing pipeline. Deep learning has overcome these limits providing greater flexibility and faster re-configuration. In this work, the process to be inspected consists in vials’ pack formation entering a freeze-dryer, which is a common scenario in pharmaceutical active ingredient packaging lines. To ensure that the machine produces proper packs, a vision system is installed at the entrance of the freeze-dryer to detect eventual anomalies with execution times compatible with the production specifications. Other constraints come from sterility and safety standards required in pharmaceutical manufacturing. This work presents an overview about the production line, with particular focus on the vision system designed, and about all trials conducted to obtain the final performance. Transfer learning, alleviating the requirement for a large number of training data, combined with data augmentation methods, consisting in the generation of synthetic images, were used to effectively increase the performances while reducing the cost of data acquisition and annotation. The proposed vision algorithm is composed by two main subtasks, designed respectively to vials counting and discrepancy detection. The first one was trained on more than 23k vials (about 300 images) and tested on 5k more (about 75 images), whereas 60 training images and 52 testing images were used for the second one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il machine learning negli ultimi anni ha acquisito una crescente popolarità nell’ambito della ricerca scientifica e delle sue applicazioni. Lo scopo di questa tesi è stato quello di studiare il machine learning nei suoi aspetti generali e applicarlo a problemi di computer vision. La tesi ha affrontato le difficoltà del dover spiegare dal punto di vista teorico gli algoritmi alla base delle reti neurali convoluzionali e ha successivamente trattato due problemi concreti di riconoscimento immagini: il dataset MNIST (immagini di cifre scritte a mano) e un dataset che sarà chiamato ”MELANOMA dataset” (immagini di melanomi e nevi sani). Utilizzando le tecniche spiegate nella sezione teorica si sono riusciti ad ottenere risultati soddifacenti per entrambi i dataset ottenendo una precisione del 98% per il MNIST e del 76.8% per il MELANOMA dataset

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To evaluate the use of optical and nonoptical aids during reading and writing activities in individuals with acquired low vision. This study was performed using descriptive and cross-sectional surveys. The data collection instrument was created with structured questions that were developed from an exploratory study and a previous test based on interviews, and it evaluated the following variables: personal characteristics, use of optical and nonoptical aids, and activities that required the use of optical and nonoptical aids. The study population included 30 subjects with acquired low vision and visual acuities of 20/200-20/400. Most subjects reported the use of some optical aids (60.0%). Of these 60.0%, the majority (83.3%) cited spectacles as the most widely used optical aid. The majority (63.3%) of subjects also reported the use of nonoptical aids, the most frequent ones being letter magnification (68.4%), followed by bringing the objects closer to the eyes (57.8%). Subjects often used more than one nonoptical aid. The majority of participants reported the use of optical and nonoptical aids during reading activities, highlighting the use of spectacles, magnifying glasses, and letter magnification; however, even after the use of these aids, we found that the subjects often needed to read the text more than once to understand it. During writing activities, all subjects reported the use of optical aids, while most stated that they did not use nonoptical aids for such activities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

PURPOSE: To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). METHODS: Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. RESULTS: Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). CONCLUSION: Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVES: To assess the influence of Nd:YAG (neodymium: yttrium-aluminum- garnet) laser unilateral posterior capsulotomy on visual acuity and patients' perception of difficulties with vision-related activities of daily life. METHODS: We conducted an interventional survey that included 48 patients between 40 and 80 years of age with uni- or bilateral pseudophakia, posterior capsule opacification, and visual acuity <0.30 (logMAR) in one eye who were seen at a Brazilian university hospital. All patients underwent posterior capsulotomy using an Nd:YAG laser. Before and after the intervention, patients were asked to complete a questionnaire that was developed in an exploratory study. RESULTS: Before posterior capsulotomy, the median visual acuity (logMAR) of the included patients was 0.52 (range 0.30-1.60). After posterior capsulotomy, the median visual acuity of the included patients improved to 0.10 (range 0.0-0.52). According to the subjects' perceptions, their ability to perform most of their daily life activities improved after the intervention (p<0.05). CONCLUSIONS: After patients underwent posterior capsulotomy with an Nd:YAG laser, a significant improvement in the visual acuity of the treated eye was observed. Additionally, subjects felt that they experienced less difficulty performing most of their vision-dependent activities of daily living.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Eletrorretinograma (ERG) é o meio diagnóstico objetivo e não-invasivo para avaliar a função retiniana e detectar precocemente, em várias espécies, lesões nas suas camadas mais externas. As indicações mais comuns para ERG em cães são: avaliação pré-cirúrgica de pacientes com catarata, caracterização de distúrbios que causam cegueira, além de servir como importante modelo para o estudo da distrofia retiniana que acomete o homem. Vários são os fatores que podem alterar o ERG tais como: eletrorretinógrafo, fonte de estimulação luminosa, tipo do eletrodo, tempo de adaptação ao escuro, tamanho pupilar, opacidade de meios e protocolo de sedação ou anestesia; além da espécie, raça e idade. Objetivou-se com este estudo padronizar o ERG para cães submetidos à sedação, seguindo o protocolo da International Society for Clinical Electrophysiology of Vision (ISCEV), utilizando Ganzfeld e eletrodos Burian Allen. Foram realizados 233 eletrorretinogramas em cães, 147 fêmeas e 86 machos, com idades entre um e 14 anos. Dos 233 cães examinados, 100 apresentavam catarata em diferentes estágios de maturação, 72 eram diabéticos e apresentavam catarata madura ou hipermadura, 26 apresentaram eletrorretinograma compatível com degeneração retiniana progressiva, três apresentaram eletrorretinograma compatível com síndrome da degeneração retiniana adquirida subitamente e 32 não apresentaram lesão retiniana capaz de atenuar as respostas do ERG, sendo considerados normais quanto à função retiniana. A sedação foi capaz de produzir boa imobilização do paciente sem rotacionar o bulbo ocular, permitindo adequada estimulação retiniana bilateralmente, com auxílio do Ganzfeld. O sistema eletrodiagnóstico Veris registrou com sucesso e simultaneamente de ambos os olhos, as cinco respostas preconizadas pela ISCEV. Como o ERG de campo total tornou-se exame fundamental na rotina oftalmológica, sua padronização é indispensável quando se objetiva comparar resultados de laboratórios distintos. A confiabilidade e reprodutibilidade deste protocolo foi demonstrada com a obtenção de registros de ótima qualidade utilizando protocolo padrão da ISCEV, eletrorretinógrafo Veris, Ganzfeld e eletrodos Burian Allen nos cães submetidos à sedação.