15 resultados para 3-D modeling
em Massachusetts Institute of Technology
Resumo:
We describe a psychophysical investigation of the effects of object complexity and familiarity on the variation of recognition time and recognition accuracy over different views of novel 3D objects. Our findings indicate that with practice the response times for different views become more uniform and the initially orderly dependency of the response time on the distance to a "good" view disappears. One possible interpretation of our results is in terms of a tradeoff between memory needed for storing specific-view representations of objects and time spent in recognizing the objects.
Resumo:
This thesis examines a complete design framework for a real-time, autonomous system with specialized VLSI hardware for computing 3-D camera motion. In the proposed architecture, the first step is to determine point correspondences between two images. Two processors, a CCD array edge detector and a mixed analog/digital binary block correlator, are proposed for this task. The report is divided into three parts. Part I covers the algorithmic analysis; part II describes the design and test of a 32$\time $32 CCD edge detector fabricated through MOSIS; and part III compares the design of the mixed analog/digital correlator to a fully digital implementation.
Resumo:
We discuss a strategy for visual recognition by forming groups of salient image features, and then using these groups to index into a data base to find all of the matching groups of model features. We discuss the most space efficient possible method of representing 3-D models for indexing from 2-D data, and show how to account for sensing error when indexing. We also present a convex grouping method that is robust and efficient, both theoretically and in practice. Finally, we combine these modules into a complete recognition system, and test its performance on many real images.
Resumo:
Three dimensional (3-D) integrated circuits can be fabricated by bonding previously processed device layers using metal-metal bonds that also serve as layer-to-layer interconnects. Bonded copper interconnects test structures were created by thermocompression bonding and the bond toughness was measured using the four-point test. The effects of bonding temperature, physical bonding and failure mechanisms were investigated. The surface effects on copper surface due to pre-bond clean (with glacial acetic acid) were also looked into. A maximum average bond toughness of approximately 35 J/m² was obtained bonding temperature 300 C.
Resumo:
Visual object recognition requires the matching of an image with a set of models stored in memory. In this paper we propose an approach to recognition in which a 3-D object is represented by the linear combination of 2-D images of the object. If M = {M1,...Mk} is the set of pictures representing a given object, and P is the 2-D image of an object to be recognized, then P is considered an instance of M if P = Eki=aiMi for some constants ai. We show that this approach handles correctly rigid 3-D transformations of objects with sharp as well as smooth boundaries, and can also handle non-rigid transformations. The paper is divided into two parts. In the first part we show that the variety of views depicting the same object under different transformations can often be expressed as the linear combinations of a small number of views. In the second part we suggest how this linear combinatino property may be used in the recognition process.
Resumo:
Different approaches to visual object recognition can be divided into two general classes: model-based vs. non model-based schemes. In this paper we establish some limitation on the class of non model-based recognition schemes. We show that every function that is invariant to viewing position of all objects is the trivial (constant) function. It follows that every consistent recognition scheme for recognizing all 3-D objects must in general be model based. The result is extended to recognition schemes that are imperfect (allowed to make mistakes) or restricted to certain classes of objects.
Resumo:
We address the computational role that the construction of a complete surface representation may play in the recovery of 3--D structure from motion. We present a model that combines a feature--based structure--from- -motion algorithm with smooth surface interpolation. This model can represent multiple surfaces in a given viewing direction, incorporates surface constraints from object boundaries, and groups image features using their 2--D image motion. Computer simulations relate the model's behavior to perceptual observations. In a companion paper, we discuss further perceptual experiments regarding the role of surface reconstruction in the human recovery of 3--D structure from motion.
Resumo:
The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M- Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 12 function units, on-chip cache, and local memory. The multiple function units are used to exploit both instruction-level and thread-level parallelism. A user accessible message passing system yields fast communication and synchronization between nodes. Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms. This paper presents the architecture of the M-Machine and describes how its mechanisms maximize both single thread performance and overall system throughput.
Resumo:
The utility of vision-based face tracking for dual pointing tasks is evaluated. We first describe a 3-D face tracking technique based on real-time parametric motion-stereo, which is non-invasive, robust, and self-initialized. The tracker provides a real-time estimate of a ?frontal face ray? whose intersection with the display surface plane is used as a second stream of input for scrolling or pointing, in paral-lel with hand input. We evaluated the performance of com-bined head/hand input on a box selection and coloring task: users selected boxes with one pointer and colors with a second pointer, or performed both tasks with a single pointer. We found that performance with head and one hand was intermediate between single hand performance and dual hand performance. Our results are consistent with previously reported dual hand conflict in symmetric pointing tasks, and suggest that a head-based input stream should be used for asymmetric control.
Resumo:
We introduce a new method to describe, in a single image, changes in shape over time. We acquire both range and image information with a stationary stereo camera. From the pictures taken, we display a composite image consisting of the image data from the surface closest to the camera at every pixel. This reveals the 3-d relationships over time by easy-to-interpret occlusion relationships in the composite image. We call the composite a shape-time photograph. Small errors in depth measurements cause artifacts in the shape-time images. We correct most of these using a Markov network to estimate the most probable front surface, taking into account the depth measurements, their uncertainties, and layer continuity assumptions.
Resumo:
For applications involving the control of moving vehicles, the recovery of relative motion between a camera and its environment is of high utility. This thesis describes the design and testing of a real-time analog VLSI chip which estimates the focus of expansion (FOE) from measured time-varying images. Our approach assumes a camera moving through a fixed world with translational velocity; the FOE is the projection of the translation vector onto the image plane. This location is the point towards which the camera is moving, and other points appear to be expanding outward from. By way of the camera imaging parameters, the location of the FOE gives the direction of 3-D translation. The algorithm we use for estimating the FOE minimizes the sum of squares of the differences at every pixel between the observed time variation of brightness and the predicted variation given the assumed position of the FOE. This minimization is not straightforward, because the relationship between the brightness derivatives depends on the unknown distance to the surface being imaged. However, image points where brightness is instantaneously constant play a critical role. Ideally, the FOE would be at the intersection of the tangents to the iso-brightness contours at these "stationary" points. In practice, brightness derivatives are hard to estimate accurately given that the image is quite noisy. Reliable results can nevertheless be obtained if the image contains many stationary points and the point is found that minimizes the sum of squares of the perpendicular distances from the tangents at the stationary points. The FOE chip calculates the gradient of this least-squares minimization sum, and the estimation is performed by closing a feedback loop around it. The chip has been implemented using an embedded CCD imager for image acquisition and a row-parallel processing scheme. A 64 x 64 version was fabricated in a 2um CCD/ BiCMOS process through MOSIS with a design goal of 200 mW of on-chip power, a top frame rate of 1000 frames/second, and a basic accuracy of 5%. A complete experimental system which estimates the FOE in real time using real motion and image scenes is demonstrated.
Resumo:
This technical report describes a new protocol, the Unique Token Protocol, for reliable message communication. This protocol eliminates the need for end-to-end acknowledgments and minimizes the communication effort when no dynamic errors occur. Various properties of end-to-end protocols are presented. The unique token protocol solves the associated problems. It eliminates source buffering by maintaining in the network at least two copies of a message. A token is used to decide if a message was delivered to the destination exactly once. This technical report also presents a possible implementation of the protocol in a worm-hole routed, 3-D mesh network.
Resumo:
The problems under consideration center around the interpretation of binocular stereo disparity. In particular, the goal is to establish a set of mappings from stereo disparity to corresponding three-dimensional scene geometry. An analysis has been developed that shows how disparity information can be interpreted in terms of three-dimensional scene properties, such as surface depth, discontinuities, and orientation. These theoretical developments have been embodied in a set of computer algorithms for the recovery of scene geometry from input stereo disparity. The results of applying these algorithms to several disparity maps are presented. Comparisons are made to the interpretation of stereo disparity by biological systems.
Resumo:
Building robust recognition systems requires a careful understanding of the effects of error in sensed features. Error in these image features results in a region of uncertainty in the possible image location of each additional model feature. We present an accurate, analytic approximation for this uncertainty region when model poses are based on matching three image and model points, for both Gaussian and bounded error in the detection of image points, and for both scaled-orthographic and perspective projection models. This result applies to objects that are fully three- dimensional, where past results considered only two-dimensional objects. Further, we introduce a linear programming algorithm to compute the uncertainty region when poses are based on any number of initial matches. Finally, we use these results to extend, from two-dimensional to three- dimensional objects, robust implementations of alignmentt interpretation- tree search, and ransformation clustering.
Resumo:
Colloidal self assembly is an efficient method for making 3-D ordered nanostructures suitable for materials such as photonic crystals and macroscopic solids for catalysis and sensor applications. Colloidal crystals grown by convective methods exhibit defects on two different scales. Macro defects such as cracks and void bands originate from the dynamics of meniscus motion during colloidal crystal growth while micro defects like vacancies, dislocation and stacking faults are indigenous to the colloidal crystalline structure. This paper analyses the crystallography and energetics of the microscopic defects from the point of view of classical thermodynamics and discusses the strategy for the control of the macroscopic defects through optimization of the liquid-vapor interface.