927 resultados para 3D object recognition


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rapid growth of visual information on Web has led to immense interest in multimedia information retrieval (MIR). While advancement in MIR systems has achieved some success in specific domains, particularly the content-based approaches, general Web users still struggle to find the images they want. Despite the success in content-based object recognition or concept extraction, the major problem in current Web image searching remains in the querying process. Since most online users only express their needs in semantic terms or objects, systems that utilize visual features (e.g., color or texture) to search images create a semantic gap which hinders general users from fully expressing their needs. In addition, query-by-example (QBE) retrieval imposes extra obstacles for exploratory search because users may not always have the representative image at hand or in mind when starting a search (i.e. the page zero problem). As a result, the majority of current online image search engines (e.g., Google, Yahoo, and Flickr) still primarily use textual queries to search. The problem with query-based retrieval systems is that they only capture users’ information need in terms of formal queries;; the implicit and abstract parts of users’ information needs are inevitably overlooked. Hence, users often struggle to formulate queries that best represent their needs, and some compromises have to be made. Studies of Web search logs suggest that multimedia searches are more difficult than textual Web searches, and Web image searching is the most difficult compared to video or audio searches. Hence, online users need to put in more effort when searching multimedia contents, especially for image searches. Most interactions in Web image searching occur during query reformulation. While log analysis provides intriguing views on how the majority of users search, their search needs or motivations are ultimately neglected. User studies on image searching have attempted to understand users’ search contexts in terms of users’ background (e.g., knowledge, profession, motivation for search and task types) and the search outcomes (e.g., use of retrieved images, search performance). However, these studies typically focused on particular domains with a selective group of professional users. General users’ Web image searching contexts and behaviors are little understood although they represent the majority of online image searching activities nowadays. We argue that only by understanding Web image users’ contexts can the current Web search engines further improve their usefulness and provide more efficient searches. In order to understand users’ search contexts, a user study was conducted based on university students’ Web image searching in News, Travel, and commercial Product domains. The three search domains were deliberately chosen to reflect image users’ interests in people, time, event, location, and objects. We investigated participants’ Web image searching behavior, with the focus on query reformulation and search strategies. Participants’ search contexts such as their search background, motivation for search, and search outcomes were gathered by questionnaires. The searching activity was recorded with participants’ think aloud data for analyzing significant search patterns. The relationships between participants’ search contexts and corresponding search strategies were discovered by Grounded Theory approach. Our key findings include the following aspects: - Effects of users' interactive intents on query reformulation patterns and search strategies - Effects of task domain on task specificity and task difficulty, as well as on some specific searching behaviors - Effects of searching experience on result expansion strategies A contextual image searching model was constructed based on these findings. The model helped us understand Web image searching from user perspective, and introduced a context-aware searching paradigm for current retrieval systems. A query recommendation tool was also developed to demonstrate how users’ query reformulation contexts can potentially contribute to more efficient searching.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The integration of separate, yet complimentary, cortical pathways appears to play a role in visual perception and action when intercepting objects. The ventral system is responsible for object recognition and identification, while the dorsal system facilitates continuous regulation of action. This dual-system model implies that empirically manipulating different visual information sources during performance of an interceptive action might lead to the emergence of distinct gaze and movement pattern profiles. To test this idea, we recorded hand kinematics and eye movements of participants as they attempted to catch balls projected from a novel apparatus that synchronised or de-synchronised accompanying video images of a throwing action and ball trajectory. Results revealed that ball catching performance was less successful when patterns of hand movements and gaze behaviours were constrained by the absence of advanced perceptual information from the thrower's actions. Under these task constraints, participants began tracking the ball later, followed less of its trajectory, and adapted their actions by initiating movements later and moving the hand faster. There were no performance differences when the throwing action image and ball speed were synchronised or de-synchronised since hand movements were closely linked to information from ball trajectory. Results are interpreted relative to the two-visual system hypothesis, demonstrating that accurate interception requires integration of advanced visual information from kinematics of the throwing action and from ball flight trajectory.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mobile devices are rapidly developing into the primary technology for users to work, socialize, and play in a variety of settings and contexts. Their pervasiveness has provided researchers with the means to investigate innovative solutions to ever more complex user demands. Tools for Mobile Multimedia Programming and Development investigates the use of mobile platforms for research projects, focusing on the development, testing, and evaluation of prototypes rather than final products, which enables researchers to better understand the needs of users through image processing, object recognition, sensor integration, and user interactions. This book benefits researchers and professionals in multiple disciplines who utilize such techniques in the creation of prototypes for mobile devices and applications. This book is part of the Advances in Wireless Technologies and Telecommunication series collection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Robots currently recognise and use objects through algorithms that are hand-coded or specifically trained. Such robots can operate in known, structured environments but cannot learn to recognise or use novel objects as they appear. This thesis demonstrates that a robot can develop meaningful object representations by learning the fundamental relationship between action and change in sensory state; the robot learns sensorimotor coordination. Methods based on Markov Decision Processes are experimentally validated on a mobile robot capable of gripping objects, and it is found that object recognition and manipulation can be learnt as an emergent property of sensorimotor coordination.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

It’s commonly assumed that psychiatric violence is motivated by delusions, but here the concept of a reversed impetus is explored, to understand whether delusions are formed as ad-hoc or post-hoc rationalizations of behaviour or in advance of the actus reus. The reflexive violence model proposes that perceptual stimuli has motivational power and this may trigger unwanted actions and hallucinations. The model is based on the theory of ecological perception, where opportunities enabled by an object are cues to act. As an apple triggers a desire to eat, a gun triggers a desire to shoot. These affordances (as they are called) are part of the perceptual apparatus, they allow the direct recognition of objects – and in emergencies they enable the fastest possible reactions. Even under normal circumstances, the presence of a weapon will trigger inhibited violent impulses. The presence of a victim will also, but under normal circumstances, these affordances don’t become violent because negative action impulses are totally inhibited, whereas in psychotic illness, negative action impulses are treated as emergencies and bypass frontal inhibitory circuits. What would have been object recognition becomes a blind automatic action. A range of mental illnesses can cause inhibition to be bypassed. At its most innocuous, this causes both simple hallucinations (where the motivational power of an object is misattributed). But ecological perception may have the power to trigger serious violence also –a kind that’s devoid of motives or planning and is often shrouded in amnesia or post-rational delusions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Domain-invariant representations are key to addressing the domain shift problem where the training and test exam- ples follow different distributions. Existing techniques that have attempted to match the distributions of the source and target domains typically compare these distributions in the original feature space. This space, however, may not be di- rectly suitable for such a comparison, since some of the fea- tures may have been distorted by the domain shift, or may be domain specific. In this paper, we introduce a Domain Invariant Projection approach: An unsupervised domain adaptation method that overcomes this issue by extracting the information that is invariant across the source and tar- get domains. More specifically, we learn a projection of the data to a low-dimensional latent space where the distance between the empirical distributions of the source and target examples is minimized. We demonstrate the effectiveness of our approach on the task of visual object recognition and show that it outperforms state-of-the-art methods on a stan- dard domain adaptation benchmark dataset

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions be- come similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that prob- ability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this man- ifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the cor- responding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of- the-art results on a standard object recognition benchmark.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The earliest stages of human cortical visual processing can be conceived as extraction of local stimulus features. However, more complex visual functions, such as object recognition, require integration of multiple features. Recently, neural processes underlying feature integration in the visual system have been under intensive study. A specialized mid-level stage preceding the object recognition stage has been proposed to account for the processing of contours, surfaces and shapes as well as configuration. This thesis consists of four experimental, psychophysical studies on human visual feature integration. In two studies, classification image a recently developed psychophysical reverse correlation method was used. In this method visual noise is added to near-threshold stimuli. By investigating the relationship between random features in the noise and observer s perceptual decision in each trial, it is possible to estimate what features of the stimuli are critical for the task. The method allows visualizing the critical features that are used in a psychophysical task directly as a spatial correlation map, yielding an effective "behavioral receptive field". Visual context is known to modulate the perception of stimulus features. Some of these interactions are quite complex, and it is not known whether they reflect early or late stages of perceptual processing. The first study investigated the mechanisms of collinear facilitation, where nearby collinear Gabor flankers increase the detectability of a central Gabor. The behavioral receptive field of the mechanism mediating the detection of the central Gabor stimulus was measured by the classification image method. The results show that collinear flankers increase the extent of the behavioral receptive field for the central Gabor, in the direction of the flankers. The increased sensitivity at the ends of the receptive field suggests a low-level explanation for the facilitation. The second study investigated how visual features are integrated into percepts of surface brightness. A novel variant of the classification image method with brightness matching task was used. Many theories assume that perceived brightness is based on the analysis of luminance border features. Here, for the first time this assumption was directly tested. The classification images show that the perceived brightness of both an illusory Craik-O Brien-Cornsweet stimulus and a real uniform step stimulus depends solely on the border. Moreover, the spatial tuning of the features remains almost constant when the stimulus size is changed, suggesting that brightness perception is based on the output of a single spatial frequency channel. The third and fourth studies investigated global form integration in random-dot Glass patterns. In these patterns, a global form can be immediately perceived, if even a small proportion of random dots are paired to dipoles according to a geometrical rule. In the third study the discrimination of orientation structure in highly coherent concentric and Cartesian (straight) Glass patterns was measured. The results showed that the global form was more efficiently discriminated in concentric patterns. The fourth study investigated how form detectability depends on the global regularity of the Glass pattern. The local structure was either Cartesian or curved. It was shown that randomizing the local orientation deteriorated the performance only with the curved pattern. The results give support for the idea that curved and Cartesian patterns are processed in at least partially separate neural systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Deep convolutional neural networks (DCNNs) have been employed in many computer vision tasks with great success due to their robustness in feature learning. One of the advantages of DCNNs is their representation robustness to object locations, which is useful for object recognition tasks. However, this also discards spatial information, which is useful when dealing with topological information of the image (e.g. scene labeling, face recognition). In this paper, we propose a deeper and wider network architecture to tackle the scene labeling task. The depth is achieved by incorporating predictions from multiple early layers of the DCNN. The width is achieved by combining multiple outputs of the network. We then further refine the parsing task by adopting graphical models (GMs) as a post-processing step to incorporate spatial and contextual information into the network. The new strategy for a deeper, wider convolutional network coupled with graphical models has shown promising results on the PASCAL-Context dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents an enhanced relational description for the prescription of the grasp requirement and evolution of the posture of a digital human hand towards satisfaction of this requirement. Precise relational description needs anatomical segmentation of the hand geometry into palmar, dorsal and lateral patches using the palm-plane and joint locations information, and operational segmentation of the object geometry into pull,push and lateral patches with due consideration to the effect of friction. Relational description identifies appropriate patches for a desired grasp condition. Satisfaction of this requirement occurs in two discrete stages,namely,contact establishment and post-contact force exertion for object capturing. Contact establishment occurs in four potentially overlapping phases,namely,re-orientation,transfer,pre- shaping,and closing-in. The novel h and re-orientation phase,enables the palm to face the object in a task sequence scenario, transfer takes the wrist to the ball park ; pre-shaping and close-in finally achieves the contact. In this paper, an anatomically pertinent closed-form formulation is presented for the closing-in phase for identification of the point of contact on the patches ,prescribed by the relational description. Since mere contact does not ensure grasp and slip phenomenon at the point of contact on application of force is a common occurrence, the effect of slip in presence of friction has been studied for 2D and 3D object grasping endeavours and a computational generation of the slip locus is presented.A general slip locus is found to be a non-linear curve even on planar faces.Two varieties of slip phenomena,namely,stabilizing and non-stabilizing slips, and their local characteristics have been identified.Study of the evolution of this slip characteristic over the slip locus exhibited diverse grasping behaviour possibilities. Thus, the relational description paradigm not only makes the requirement specification easy and meaningful but also enables high fidelity hand object interaction studies possible.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

RATIONALE: Impulsivity is a vulnerability marker for drug addiction in which other behavioural traits such as anxiety and novelty seeking ('sensation seeking') are also widely present. However, inter-relationships between impulsivity, novelty seeking and anxiety traits are poorly understood. OBJECTIVE: The objective of this paper was to investigate the contribution of novelty seeking and anxiety traits to the expression of behavioural impulsivity in rats. METHODS: Rats were screened on the five-choice serial reaction time task (5-CSRTT) for spontaneously high impulsivity (SHI) and low impulsivity (SLI) and subsequently tested for novelty reactivity and preference, assessed by open-field locomotor activity (OF), novelty place preference (NPP), and novel object recognition (OR). Anxiety was assessed on the elevated plus maze (EPM) both prior to and following the administration of the anxiolytic drug diazepam, and by blood corticosterone levels following forced novelty exposure. Finally, the effects of diazepam on impulsivity and visual attention were assessed in SHI and SLI rats. RESULTS: SHI rats were significantly faster to enter an open arm on the EPM and exhibited preference for novelty in the OR and NPP tests, unlike SLI rats. However, there was no dimensional relationship between impulsivity and either novelty-seeking behaviour, anxiety levels, OF activity or novelty-induced changes in blood corticosterone levels. By contrast, diazepam (0.3-3 mg/kg), whilst not significantly increasing or decreasing impulsivity in SHI and SLI rats, did reduce the contrast in impulsivity between these two groups of animals. CONCLUSIONS: This investigation indicates that behavioural impulsivity in rats on the 5-CSRTT, which predicts vulnerability for cocaine addiction, is distinct from anxiety, novelty reactivity and novelty-induced stress responses, and thus has relevance for the aetiology of drug addiction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a novel coarse-to-fine global localization approach inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by scale-invariant transformation feature descriptors are used as natural landmarks. They are indexed into two databases: a location vector space model (LVSM) and a location database. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the LVSM is fast, but not accurate enough, whereas localization from the location database using a voting algorithm is relatively slow, but more accurate. The integration of coarse and fine stages makes fast and reliable localization possible. If necessary, the localization result can be verified by epipolar geometry between the representative view in the database and the view to be localized. In addition, the localization system recovers the position of the camera by essential matrix decomposition. The localization system has been tested in indoor and outdoor environments. The results show that our approach is efficient and reliable. © 2006 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a novel coarse-to-fine global localization approach that is inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by SIFT descriptors are used as natural land-marks. These descriptors are indexed into two databases: an inverted index and a location database. The inverted index is built based on a visual vocabulary learned from the feature descriptors. In the location database, each location is directly represented by a set of scale invariant descriptors. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the inverted index is fast but not accurate enough; whereas localization from the location database using voting algorithm is relatively slow but more accurate. The combination of coarse and fine stages makes fast and reliable localization possible. In addition, if necessary, the localization result can be verified by epipolar geometry between the representative view in database and the view to be localized. Experimental results show that our approach is efficient and reliable. ©2005 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

215 p.

Relevância:

80.00% 80.00%

Publicador: