948 resultados para biometria, impronte digitali, estrazione minuzie, ground truth
Resumo:
This thesis explores the problem of mobile robot navigation in dense human crowds. We begin by considering a fundamental impediment to classical motion planning algorithms called the freezing robot problem: once the environment surpasses a certain level of complexity, the planner decides that all forward paths are unsafe, and the robot freezes in place (or performs unnecessary maneuvers) to avoid collisions. Since a feasible path typically exists, this behavior is suboptimal. Existing approaches have focused on reducing predictive uncertainty by employing higher fidelity individual dynamics models or heuristically limiting the individual predictive covariance to prevent overcautious navigation. We demonstrate that both the individual prediction and the individual predictive uncertainty have little to do with this undesirable navigation behavior. Additionally, we provide evidence that dynamic agents are able to navigate in dense crowds by engaging in joint collision avoidance, cooperatively making room to create feasible trajectories. We accordingly develop interacting Gaussian processes, a prediction density that captures cooperative collision avoidance, and a "multiple goal" extension that models the goal driven nature of human decision making. Navigation naturally emerges as a statistic of this distribution.
Most importantly, we empirically validate our models in the Chandler dining hall at Caltech during peak hours, and in the process, carry out the first extensive quantitative study of robot navigation in dense human crowds (collecting data on 488 runs). The multiple goal interacting Gaussian processes algorithm performs comparably with human teleoperators in crowd densities nearing 1 person/m2, while a state of the art noncooperative planner exhibits unsafe behavior more than 3 times as often as the multiple goal extension, and twice as often as the basic interacting Gaussian process approach. Furthermore, a reactive planner based on the widely used dynamic window approach proves insufficient for crowd densities above 0.55 people/m2. We also show that our noncooperative planner or our reactive planner capture the salient characteristics of nearly any dynamic navigation algorithm. For inclusive validation purposes, we show that either our non-interacting planner or our reactive planner captures the salient characteristics of nearly any existing dynamic navigation algorithm. Based on these experimental results and theoretical observations, we conclude that a cooperation model is critical for safe and efficient robot navigation in dense human crowds.
Finally, we produce a large database of ground truth pedestrian crowd data. We make this ground truth database publicly available for further scientific study of crowd prediction models, learning from demonstration algorithms, and human robot interaction models in general.
Resumo:
In the quest for a descriptive theory of decision-making, the rational actor model in economics imposes rather unrealistic expectations and abilities on human decision makers. The further we move from idealized scenarios, such as perfectly competitive markets, and ambitiously extend the reach of the theory to describe everyday decision making situations, the less sense these assumptions make. Behavioural economics has instead proposed models based on assumptions that are more psychologically realistic, with the aim of gaining more precision and descriptive power. Increased psychological realism, however, comes at the cost of a greater number of parameters and model complexity. Now there are a plethora of models, based on different assumptions, applicable in differing contextual settings, and selecting the right model to use tends to be an ad-hoc process. In this thesis, we develop optimal experimental design methods and evaluate different behavioral theories against evidence from lab and field experiments.
We look at evidence from controlled laboratory experiments. Subjects are presented with choices between monetary gambles or lotteries. Different decision-making theories evaluate the choices differently and would make distinct predictions about the subjects' choices. Theories whose predictions are inconsistent with the actual choices can be systematically eliminated. Behavioural theories can have multiple parameters requiring complex experimental designs with a very large number of possible choice tests. This imposes computational and economic constraints on using classical experimental design methods. We develop a methodology of adaptive tests: Bayesian Rapid Optimal Adaptive Designs (BROAD) that sequentially chooses the "most informative" test at each stage, and based on the response updates its posterior beliefs over the theories, which informs the next most informative test to run. BROAD utilizes the Equivalent Class Edge Cutting (EC2) criteria to select tests. We prove that the EC2 criteria is adaptively submodular, which allows us to prove theoretical guarantees against the Bayes-optimal testing sequence even in the presence of noisy responses. In simulated ground-truth experiments, we find that the EC2 criteria recovers the true hypotheses with significantly fewer tests than more widely used criteria such as Information Gain and Generalized Binary Search. We show, theoretically as well as experimentally, that surprisingly these popular criteria can perform poorly in the presence of noise, or subject errors. Furthermore, we use the adaptive submodular property of EC2 to implement an accelerated greedy version of BROAD which leads to orders of magnitude speedup over other methods.
We use BROAD to perform two experiments. First, we compare the main classes of theories for decision-making under risk, namely: expected value, prospect theory, constant relative risk aversion (CRRA) and moments models. Subjects are given an initial endowment, and sequentially presented choices between two lotteries, with the possibility of losses. The lotteries are selected using BROAD, and 57 subjects from Caltech and UCLA are incentivized by randomly realizing one of the lotteries chosen. Aggregate posterior probabilities over the theories show limited evidence in favour of CRRA and moments' models. Classifying the subjects into types showed that most subjects are described by prospect theory, followed by expected value. Adaptive experimental design raises the possibility that subjects could engage in strategic manipulation, i.e. subjects could mask their true preferences and choose differently in order to obtain more favourable tests in later rounds thereby increasing their payoffs. We pay close attention to this problem; strategic manipulation is ruled out since it is infeasible in practice, and also since we do not find any signatures of it in our data.
In the second experiment, we compare the main theories of time preference: exponential discounting, hyperbolic discounting, "present bias" models: quasi-hyperbolic (α, β) discounting and fixed cost discounting, and generalized-hyperbolic discounting. 40 subjects from UCLA were given choices between 2 options: a smaller but more immediate payoff versus a larger but later payoff. We found very limited evidence for present bias models and hyperbolic discounting, and most subjects were classified as generalized hyperbolic discounting types, followed by exponential discounting.
In these models the passage of time is linear. We instead consider a psychological model where the perception of time is subjective. We prove that when the biological (subjective) time is positively dependent, it gives rise to hyperbolic discounting and temporal choice inconsistency.
We also test the predictions of behavioral theories in the "wild". We pay attention to prospect theory, which emerged as the dominant theory in our lab experiments of risky choice. Loss aversion and reference dependence predicts that consumers will behave in a uniquely distinct way than the standard rational model predicts. Specifically, loss aversion predicts that when an item is being offered at a discount, the demand for it will be greater than that explained by its price elasticity. Even more importantly, when the item is no longer discounted, demand for its close substitute would increase excessively. We tested this prediction using a discrete choice model with loss-averse utility function on data from a large eCommerce retailer. Not only did we identify loss aversion, but we also found that the effect decreased with consumers' experience. We outline the policy implications that consumer loss aversion entails, and strategies for competitive pricing.
In future work, BROAD can be widely applicable for testing different behavioural models, e.g. in social preference and game theory, and in different contextual settings. Additional measurements beyond choice data, including biological measurements such as skin conductance, can be used to more rapidly eliminate hypothesis and speed up model comparison. Discrete choice models also provide a framework for testing behavioural models with field data, and encourage combined lab-field experiments.
Resumo:
Optical Coherence Tomography(OCT) is a popular, rapidly growing imaging technique with an increasing number of bio-medical applications due to its noninvasive nature. However, there are three major challenges in understanding and improving an OCT system: (1) Obtaining an OCT image is not easy. It either takes a real medical experiment or requires days of computer simulation. Without much data, it is difficult to study the physical processes underlying OCT imaging of different objects simply because there aren't many imaged objects. (2) Interpretation of an OCT image is also hard. This challenge is more profound than it appears. For instance, it would require a trained expert to tell from an OCT image of human skin whether there is a lesion or not. This is expensive in its own right, but even the expert cannot be sure about the exact size of the lesion or the width of the various skin layers. The take-away message is that analyzing an OCT image even from a high level would usually require a trained expert, and pixel-level interpretation is simply unrealistic. The reason is simple: we have OCT images but not their underlying ground-truth structure, so there is nothing to learn from. (3) The imaging depth of OCT is very limited (millimeter or sub-millimeter on human tissues). While OCT utilizes infrared light for illumination to stay noninvasive, the downside of this is that photons at such long wavelengths can only penetrate a limited depth into the tissue before getting back-scattered. To image a particular region of a tissue, photons first need to reach that region. As a result, OCT signals from deeper regions of the tissue are both weak (since few photons reached there) and distorted (due to multiple scatterings of the contributing photons). This fact alone makes OCT images very hard to interpret.
This thesis addresses the above challenges by successfully developing an advanced Monte Carlo simulation platform which is 10000 times faster than the state-of-the-art simulator in the literature, bringing down the simulation time from 360 hours to a single minute. This powerful simulation tool not only enables us to efficiently generate as many OCT images of objects with arbitrary structure and shape as we want on a common desktop computer, but it also provides us the underlying ground-truth of the simulated images at the same time because we dictate them at the beginning of the simulation. This is one of the key contributions of this thesis. What allows us to build such a powerful simulation tool includes a thorough understanding of the signal formation process, clever implementation of the importance sampling/photon splitting procedure, efficient use of a voxel-based mesh system in determining photon-mesh interception, and a parallel computation of different A-scans that consist a full OCT image, among other programming and mathematical tricks, which will be explained in detail later in the thesis.
Next we aim at the inverse problem: given an OCT image, predict/reconstruct its ground-truth structure on a pixel level. By solving this problem we would be able to interpret an OCT image completely and precisely without the help from a trained expert. It turns out that we can do much better. For simple structures we are able to reconstruct the ground-truth of an OCT image more than 98% correctly, and for more complicated structures (e.g., a multi-layered brain structure) we are looking at 93%. We achieved this through extensive uses of Machine Learning. The success of the Monte Carlo simulation already puts us in a great position by providing us with a great deal of data (effectively unlimited), in the form of (image, truth) pairs. Through a transformation of the high-dimensional response variable, we convert the learning task into a multi-output multi-class classification problem and a multi-output regression problem. We then build a hierarchy architecture of machine learning models (committee of experts) and train different parts of the architecture with specifically designed data sets. In prediction, an unseen OCT image first goes through a classification model to determine its structure (e.g., the number and the types of layers present in the image); then the image is handed to a regression model that is trained specifically for that particular structure to predict the length of the different layers and by doing so reconstruct the ground-truth of the image. We also demonstrate that ideas from Deep Learning can be useful to further improve the performance.
It is worth pointing out that solving the inverse problem automatically improves the imaging depth, since previously the lower half of an OCT image (i.e., greater depth) can be hardly seen but now becomes fully resolved. Interestingly, although OCT signals consisting the lower half of the image are weak, messy, and uninterpretable to human eyes, they still carry enough information which when fed into a well-trained machine learning model spits out precisely the true structure of the object being imaged. This is just another case where Artificial Intelligence (AI) outperforms human. To the best knowledge of the author, this thesis is not only a success but also the first attempt to reconstruct an OCT image at a pixel level. To even give a try on this kind of task, it would require fully annotated OCT images and a lot of them (hundreds or even thousands). This is clearly impossible without a powerful simulation tool like the one developed in this thesis.
Resumo:
Na década de 90 com o aumento da capacidade de processamento e memória dos computadores, surgiu a fotogrametria digital, que tem como objetivo principal o mapeamento automático das feições naturais e artificiais do terreno, utilizando a imagem fotogramétrica digital como fonte primária de dados. As soluções fotogramétricas se tornaram mais compactas e versáteis. A estação fotogramétrica digital educacional E-FOTO é um projeto multidisciplinar, em desenvolvimento no laboratório de Fotogrametria Digital da Universidade do Estado do Rio de Janeiro, que se baseia em dois pilares: autoaprendizado e gratuidade. Este trabalho tem o objetivo geral de avaliar a qualidade das medições fotogramétricas utilizando a versão integrada 1.0β do E-FOTO. Para isso foram utilizados dois blocos de fotografias de regiões distintas do planeta: um bloco de fotografias (2005) do município de Seropédica-RJ e um bloco de fotografias antigas (1953) da região de Santiago de Compostela, na Espanha. Os resultados obtidos com o E-FOTO foram comparados com os resultados do software comercial de fotogrametria digital Leica Photogrammetry Suite (LPS 2010) e com as coordenadas no espaço-objeto de pontos medidos com posicionamento global por satélite (verdade de campo). Sendo possível avaliar as metodologias dos softwares na obtenção dos parâmetros das orientações interior e exterior e na determinação da exatidão das coordenadas no espaço-objeto dos pontos de verificação obtidas no módulo estereoplotter versão 1.64 do E-FOTO. Os resultados obtidos com a versão integrada 1.0β do E-FOTO na determinação dos parâmetros das orientações interior e exterior e no cálculo das coordenadas dos pontos de verificação, sem a inclusão dos parâmetros adicionais e a autocalibração são compatíveis com o processamento realizado com o software LPS. As diferenças dos parâmetros X0 e Y0 obtidos na orientação exterior com o E-FOTO, quando comparados com os obtidos com o LPS, incluindo os parâmetros adicionais e a autocalibração da câmara fotogramétrica, não são significativas. Em função da qualidade dos resultados obtidos e de acordo com o Padrão de Exatidão Cartográfica, seria possível obter um documento cartográfico Classe A em relação à planimetria e Classe B em relação à altimetria na escala 1/10.000, com o projeto Rural e Classe A em relação à planimetria e Classe C em relação à altimetria na escala 1/25.000, com o Projeto Santiago de Compostela. As coordenadas tridimensionais (E, N e H) dos pontos de verificação obtidas fotogrametricamente no módulo estereoplotter versão 1.64 do E-FOTO, podem ser consideradas equivalentes as medidas com tecnologia de posicionamento por satélites.
Resumo:
Somente no ano de 2011 foram adquiridos mais de 1.000TB de novos registros digitais de imagem advindos de Sensoriamento Remoto orbital. Tal gama de registros, que possui uma progressão geométrica crescente, é adicionada, anualmente, a incrível e extraordinária massa de dados de imagens orbitais já existentes da superfície da Terra (adquiridos desde a década de 70 do século passado). Esta quantidade maciça de registros, onde a grande maioria sequer foi processada, requer ferramentas computacionais que permitam o reconhecimento automático de padrões de imagem desejados, de modo a permitir a extração dos objetos geográficos e de alvos de interesse, de forma mais rápida e concisa. A proposta de tal reconhecimento ser realizado automaticamente por meio da integração de técnicas de Análise Espectral e de Inteligência Computacional com base no Conhecimento adquirido por especialista em imagem foi implementada na forma de um integrador com base nas técnicas de Redes Neurais Computacionais (ou Artificiais) (através do Mapa de Características Auto- Organizáveis de Kohonen SOFM) e de Lógica Difusa ou Fuzzy (através de Mamdani). Estas foram aplicadas às assinaturas espectrais de cada padrão de interesse, formadas pelos níveis de quantização ou níveis de cinza do respectivo padrão em cada uma das bandas espectrais, de forma que a classificação dos padrões irá depender, de forma indissociável, da correlação das assinaturas espectrais nas seis bandas do sensor, tal qual o trabalho dos especialistas em imagens. Foram utilizadas as bandas 1 a 5 e 7 do satélite LANDSAT-5 para a determinação de cinco classes/alvos de interesse da cobertura e ocupação terrestre em três recortes da área-teste, situados no Estado do Rio de Janeiro (Guaratiba, Mangaratiba e Magé) nesta integração, com confrontação dos resultados obtidos com aqueles derivados da interpretação da especialista em imagens, a qual foi corroborada através de verificação da verdade terrestre. Houve também a comparação dos resultados obtidos no integrador com dois sistemas computacionais comerciais (IDRISI Taiga e ENVI 4.8), no que tange a qualidade da classificação (índice Kappa) e tempo de resposta. O integrador, com classificações híbridas (supervisionadas e não supervisionadas) em sua implementação, provou ser eficaz no reconhecimento automático (não supervisionado) de padrões multiespectrais e no aprendizado destes padrões, pois para cada uma das entradas dos recortes da área-teste, menor foi o aprendizado necessário para sua classificação alcançar um acerto médio final de 87%, frente às classificações da especialista em imagem. A sua eficácia também foi comprovada frente aos sistemas computacionais testados, com índice Kappa médio de 0,86.
Resumo:
We present a multispectral photometric stereo method for capturing geometry of deforming surfaces. A novel photometric calibration technique allows calibration of scenes containing multiple piecewise constant chromaticities. This method estimates per-pixel photometric properties, then uses a RANSAC-based approach to estimate the dominant chromaticities in the scene. A likelihood term is developed linking surface normal, image intensity and photometric properties, which allows estimating the number of chromaticities present in a scene to be framed as a model estimation problem. The Bayesian Information Criterion is applied to automatically estimate the number of chromaticities present during calibration. A two-camera stereo system provides low resolution geometry, allowing the likelihood term to be used in segmenting new images into regions of constant chromaticity. This segmentation is carried out in a Markov Random Field framework and allows the correct photometric properties to be used at each pixel to estimate a dense normal map. Results are shown on several challenging real-world sequences, demonstrating state-of-the-art results using only two cameras and three light sources. Quantitative evaluation is provided against synthetic ground truth data. © 2011 IEEE.
Resumo:
In this paper we propose a new algorithm for reconstructing phase-encoded velocity images of catalytic reactors from undersampled NMR acquisitions. Previous work on this application has employed total variation and nonlinear conjugate gradients which, although promising, yields unsatisfactory, unphysical visual results. Our approach leverages prior knowledge about the piecewise-smoothness of the phase map and physical constraints imposed by the system under study. We show how iteratively regularizing the real and imaginary parts of the acquired complex image separately in a shift-invariant wavelet domain works to produce a piecewise-smooth velocity map, in general. Using appropriately defined metrics we demonstrate higher fidelity to the ground truth and physical system constraints than previous methods for this specific application. © 2013 IEEE.
Resumo:
本文以具有典型特征的苏北淤泥质潮滩海岸作为研究区,利用1975-2003年间14景覆盖该地区的Landsat和SPOT卫星影像作为主要数据源,结合地面调查和验证工作,在遥感影像处理和地理信息系统分析技术的支持下,对区内潮滩、岸线、水边线和盐沼植被等进行遥感解译,分析苏北辐射沙脊群和沿岸地貌的空间分布特征和动态演变趋势。研究结果表明:苏北辐射沙脊群海域的潮汐水位过程的不同步现象普遍存在,限制了常规遥感数据在苏北潮滩地貌研究中的适用范围和解译精度;在人工判别的辅助下,多光谱遥感的非监督分类方法可以有效解译淤泥质潮滩的水边线;利用修改型土壤调整植被指数(MSAVI)可以较好地提取潮滩上的盐沼植被信息;苏北沿岸潮滩的快速淤长促进了盐沼植被带向海侧快速扩展,近年来持续的潮滩围垦工程则不断从陆侧侵占盐沼植被带,使盐沼植被带宽度减小乃至消失;在大规模人类活动和自然条件的共同影响下,苏北辐射沙脊群海岸的岸线发育趋于平直化,无序的潮滩围垦项目使得可垦滩地资源被过度消耗;1975~2002年间,研究区北部和南部沿岸的高潮滩整体上处于淤长状态,中部沿岸潮滩和离岸沙洲高潮滩则被大面积侵蚀;1999年以来,研究区内低潮滩部位开始形成有序排列的滩面地物,并表现出逐年大面积蔓延的趋势,可能是滩涂紫菜养殖区扩展的结果。
Resumo:
从二维空间和三维空间2种角度研究误匹配滤波算法,提出在匹配前用于降低误匹配的灰度预处理算法和一种基于真实控制点的视差滤波算法。前者只针对2幅图像的重叠区域进行灰度均衡,可以减少计算量,后者在传统视差均值滤波的基础上可进一步提高误匹配的滤波效率。基于真实图像的实验结果表明,新算法可以有效滤除误匹配,提高三维重建精度,保证重建效果。
Resumo:
Grande, Manuel; Kellett, B.; Howe, C.; Perry, C.H., 'The D-CIXS X-ray spectrometer on the SMART-1 mission to the Moon - First Results', Planetary And Space Science (2007) 55(4) pp.494-502 RAE2008
Resumo:
High intensity focused ultrasound (HIFU) can be used to control bleeding, both from individual blood vessels as well as from gross damage to the capillary bed. This process, called acoustic hemostasis, is being studied in the hope that such a method would ultimately provide a lifesaving treatment during the so-called "golden hour", a brief grace period after a severe trauma in which prompt therapy can save the life of an injured person. Thermal effects play a major role in occlusion of small vessels and also appear to contribute to the sealing of punctures in major blood vessels. However, aggressive ultrasound-induced tissue heating can also impact healthy tissue and can lead to deleterious mechanical bioeffects. Moreover, the presence of vascularity can limit one’s ability to elevate the temperature of blood vessel walls owing to convective heat transport. In an effort to better understand the heating process in tissues with vascular structure we have developed a numerical simulation that couples models for ultrasound propagation, acoustic streaming, ultrasound heating and blood cooling in Newtonian viscous media. The 3-D simulation allows for the study of complicated biological structures and insonation geometries. We have also undertaken a series of in vitro experiments, in non-uniform flow-through tissue phantoms, designed to provide a ground truth verification of the model predictions. The calculated and measured results were compared over a range of values for insonation pressure, insonation time, and flow rate; we show good agreement between predictions and measurements. We then conducted a series of simulations that address two limiting problems of interest: hemostasis in small and large vessels. We employed realistic human tissue properties and considered more complex geometries. Results show that the heating pattern in and around a blood vessel is different for different vessel sizes, flow rates and for varying beam orientations relative to the flow axis. Complete occlusion and wall- puncture sealing are both possible depending on the exposure conditions. These results concur with prior clinical observations and may prove useful for planning of a more effective procedure in HIFU treatments.
Resumo:
A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation, and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24% is observed in 17 out of 21 test sequences. In all but one case the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.
Resumo:
A probabilistic, nonlinear supervised learning model is proposed: the Specialized Mappings Architecture (SMA). The SMA employs a set of several forward mapping functions that are estimated automatically from training data. Each specialized function maps certain domains of the input space (e.g., image features) onto the output space (e.g., articulated body parameters). The SMA can model ambiguous, one-to-many mappings that may yield multiple valid output hypotheses. Once learned, the mapping functions generate a set of output hypotheses for a given input via a statistical inference procedure. The SMA inference procedure incorporates an inverse mapping or feedback function in evaluating the likelihood of each of the hypothesis. Possible feedback functions include computer graphics rendering routines that can generate images for given hypotheses. The SMA employs a variant of the Expectation-Maximization algorithm for simultaneous learning of the specialized domains along with the mapping functions, and approximate strategies for inference. The framework is demonstrated in a computer vision system that can estimate the articulated pose parameters of a human’s body or hands, given silhouettes from a single image. The accuracy and stability of the SMA are also tested using synthetic images of human bodies and hands, where ground truth is known.
Resumo:
A novel technique to detect and localize periodic movements in video is presented. The distinctive feature of the technique is that it requires neither feature tracking nor object segmentation. Intensity patterns along linear sample paths in space-time are used in estimation of period of object motion in a given sequence of frames. Sample paths are obtained by connecting (in space-time) sample points from regions of high motion magnitude in the first and last frames. Oscillations in intensity values are induced at time instants when an object intersects the sample path. The locations of peaks in intensity are determined by parameters of both cyclic object motion and orientation of the sample path with respect to object motion. The information about peaks is used in a least squares framework to obtain an initial estimate of these parameters. The estimate is further refined using the full intensity profile. The best estimate for the period of cyclic object motion is obtained by looking for consensus among estimates from many sample paths. The proposed technique is evaluated with synthetic videos where ground-truth is known, and with American Sign Language videos where the goal is to detect periodic hand motions.
Resumo:
An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground truth labels of those matches, containing hand shape and camera viewpoint information, are returned by the system as estimates for the input image. Database retrieval is done hierarchically, by first quickly rejecting the vast majority of all database views, and then ranking the remaining candidates in order of similarity to the input. Four different similarity measures are employed, based on edge location, edge orientation, finger location and geometric moments.