6 resultados para object modeling from images

em Helda - Digital Repository of University of Helsinki


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tove Jansson (1914--2001) was a Finnish illustrator, author, artist, caricaturist and comic artist. She is best known for her Moomin Books, written in Swedish, which she illustrated herself, and published between 1945 and 1977. My study focuses on the interweaving of images and words in Jansson s picturebooks, novels and short stories situated in the fantasy world of Moomin Valley. In particular, it concentrates on Jansson s development of a special kind of aesthetics of movement and stasis, based upon both illustration and text. The conventions of picturebook art and illustration are significant to both Jansson s visual art and her writing, and she was acutely conscious of them. My analysis of Jansson s work begins by discussing her first published picturebooks and less familiar illustrations (before she began her Moomin books) and I then proceed to discuss her three Moomin picturebooks, The Book about Moomin, Mymble and Little My; Who Will Comfort Toffle?, and The Dangerous Journey. The discussion moves from images to words and from words to images: Barthes s (1982) concept of anchoring and, in particular, what he calls relaying , form a point of reading and viewing Moomin texts and illustrations in a complementary relation, in which the message s unity occurs on a higher level: that of the story, the anecdote, the diegesis . The eight illustrated Moomin novels and one collection of short stories are analysed in a similar manner, taking into account the academic discourse about picturebooks which was developed in the last decade of the 20th century and beginning of the 21st century by, among others, scholars such as Nodelman, Rhedin, Doonan, Thiele, Stephens, Lewis, Nikolajeva and Scott. In her Moomin books, Jansson uses a wide variety of narrative and illustrative styles which are complementary to each other. Each book is different and unique in its own way, but a certain development or progression of mood and representation can be seen when assessing the series as a whole. Jansson s early stories are happy and adventurous but her later Moomin novels, beginning from Moominland Midwinter, focus more on the interiority of the characters, placing them in difficult situations which approximate social reality. This orientation is also reflected in the representation of movement and space. The books which were published first include more obviously descriptive passages, exemplifying the tradition of literary pictorialism. Whereas in Jansson s later work, the space develops into something that is alive which can have an enduring effect on the characters personalities and behaviour. This study shows how the idea of an image a dynamic image -- forms a holistic foundation for Jansson s imagination and work. The idea of central perspective, or frame, for instance, provided inspiration for whole stories or in the way that she developed her characters, as in the case of the Fillyjonk, who is a complex female figure, simultaneously frantic and prim. The idea of movement is central to the narrative art of picturebooks and illustrated texts, particularly in relation to the way that action is depicted. Jansson, however, also develops a specific choreography of characters in which poses and postures signify action, feelings and relationships. Here, I use two ideas from modern dance, contraction and release (Graham), to characterise the language of movement which is evident in Jansson s words and images. In Jansson s final Moomin novels and short stories, the idea of space becomes more and more dynamic and closely linked with characterisation. My study also examines a number of Jansson s early sketches for her Moomin novels, in which movement is performed much more dramatically than in those illustrations which appeared in the last novels to be published.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern smart phones often come with a significant amount of computational power and an integrated digital camera making them an ideal platform for intelligents assistants. This work is restricted to retail environments, where users could be provided with for example navigational in- structions to desired products or information about special offers within their close proximity. This kind of applications usually require information about the user's current location in the domain environment, which in our case corresponds to a retail store. We propose a vision based positioning approach that recognizes products the user's mobile phone's camera is currently pointing at. The products are related to locations within the store, which enables us to locate the user by pointing the mobile phone's camera to a group of products. The first step of our method is to extract meaningful features from digital images. We use the Scale- Invariant Feature Transform SIFT algorithm, which extracts features that are highly distinctive in the sense that they can be correctly matched against a large database of features from many images. We collect a comprehensive set of images from all meaningful locations within our domain and extract the SIFT features from each of these images. As the SIFT features are of high dimensionality and thus comparing individual features is infeasible, we apply the Bags of Keypoints method which creates a generic representation, visual category, from all features extracted from images taken from a specific location. A category for an unseen image can be deduced by extracting the corresponding SIFT features and by choosing the category that best fits the extracted features. We have applied the proposed method within a Finnish supermarket. We consider grocery shelves as categories which is a sufficient level of accuracy to help users navigate or to provide useful information about nearby products. We achieve a 40% accuracy which is quite low for commercial applications while significantly outperforming the random guess baseline. Our results suggest that the accuracy of the classification could be increased with a deeper analysis on the domain and by combining existing positioning methods with ours.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In visual object detection and recognition, classifiers have two interesting characteristics: accuracy and speed. Accuracy depends on the complexity of the image features and classifier decision surfaces. Speed depends on the hardware and the computational effort required to use the features and decision surfaces. When attempts to increase accuracy lead to increases in complexity and effort, it is necessary to ask how much are we willing to pay for increased accuracy. For example, if increased computational effort implies quickly diminishing returns in accuracy, then those designing inexpensive surveillance applications cannot aim for maximum accuracy at any cost. It becomes necessary to find trade-offs between accuracy and effort. We study efficient classification of images depicting real-world objects and scenes. Classification is efficient when a classifier can be controlled so that the desired trade-off between accuracy and effort (speed) is achieved and unnecessary computations are avoided on a per input basis. A framework is proposed for understanding and modeling efficient classification of images. Classification is modeled as a tree-like process. In designing the framework, it is important to recognize what is essential and to avoid structures that are narrow in applicability. Earlier frameworks are lacking in this regard. The overall contribution is two-fold. First, the framework is presented, subjected to experiments, and shown to be satisfactory. Second, certain unconventional approaches are experimented with. This allows the separation of the essential from the conventional. To determine if the framework is satisfactory, three categories of questions are identified: trade-off optimization, classifier tree organization, and rules for delegation and confidence modeling. Questions and problems related to each category are addressed and empirical results are presented. For example, related to trade-off optimization, we address the problem of computational bottlenecks that limit the range of trade-offs. We also ask if accuracy versus effort trade-offs can be controlled after training. For another example, regarding classifier tree organization, we first consider the task of organizing a tree in a problem-specific manner. We then ask if problem-specific organization is necessary.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The paradigm of computational vision hypothesizes that any visual function -- such as the recognition of your grandparent -- can be replicated by computational processing of the visual input. What are these computations that the brain performs? What should or could they be? Working on the latter question, this dissertation takes the statistical approach, where the suitable computations are attempted to be learned from the natural visual data itself. In particular, we empirically study the computational processing that emerges from the statistical properties of the visual world and the constraints and objectives specified for the learning process. This thesis consists of an introduction and 7 peer-reviewed publications, where the purpose of the introduction is to illustrate the area of study to a reader who is not familiar with computational vision research. In the scope of the introduction, we will briefly overview the primary challenges to visual processing, as well as recall some of the current opinions on visual processing in the early visual systems of animals. Next, we describe the methodology we have used in our research, and discuss the presented results. We have included some additional remarks, speculations and conclusions to this discussion that were not featured in the original publications. We present the following results in the publications of this thesis. First, we empirically demonstrate that luminance and contrast are strongly dependent in natural images, contradicting previous theories suggesting that luminance and contrast were processed separately in natural systems due to their independence in the visual data. Second, we show that simple cell -like receptive fields of the primary visual cortex can be learned in the nonlinear contrast domain by maximization of independence. Further, we provide first-time reports of the emergence of conjunctive (corner-detecting) and subtractive (opponent orientation) processing due to nonlinear projection pursuit with simple objective functions related to sparseness and response energy optimization. Then, we show that attempting to extract independent components of nonlinear histogram statistics of a biologically plausible representation leads to projection directions that appear to differentiate between visual contexts. Such processing might be applicable for priming, \ie the selection and tuning of later visual processing. We continue by showing that a different kind of thresholded low-frequency priming can be learned and used to make object detection faster with little loss in accuracy. Finally, we show that in a computational object detection setting, nonlinearly gain-controlled visual features of medium complexity can be acquired sequentially as images are encountered and discarded. We present two online algorithms to perform this feature selection, and propose the idea that for artificial systems, some processing mechanisms could be selectable from the environment without optimizing the mechanisms themselves. In summary, this thesis explores learning visual processing on several levels. The learning can be understood as interplay of input data, model structures, learning objectives, and estimation algorithms. The presented work adds to the growing body of evidence showing that statistical methods can be used to acquire intuitively meaningful visual processing mechanisms. The work also presents some predictions and ideas regarding biological visual processing.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The synchronization of neuronal activity, especially in the beta- (14-30 Hz) /gamma- (30 80 Hz) frequency bands, is thought to provide a means for the integration of anatomically distributed processing and for the formation of transient neuronal assemblies. Thus non-stimulus locked (i.e. induced) gamma-band oscillations are believed to underlie feature binding and the formation of neuronal object representations. On the other hand, the functional roles of neuronal oscillations in slower theta- (4 8 Hz) and alpha- (8 14 Hz) frequency bands remain controversial. In addition, early stimulus-locked activity has been largely ignored, as it is believed to reflect merely the physical properties of sensory stimuli. With human neuromagnetic recordings, both the functional roles of gamma- and alpha-band oscillations and the significance of early stimulus-locked activity in neuronal processing were examined in this thesis. Study I of this thesis shows that even the stimulus-locked (evoked) gamma oscillations were sensitive to high-level stimulus features for speech and non-speech sounds, suggesting that they may underlie the formation of early neuronal object representations for stimuli with a behavioural relevance. Study II shows that neuronal processing for consciously perceived and unperceived stimuli differed as early as 30 ms after stimulus onset. This study also showed that the alpha band oscillations selectively correlated with conscious perception. Study III, in turn, shows that prestimulus alpha-band oscillations influence the subsequent detection and processing of sensory stimuli. Further, in Study IV, we asked whether phase synchronization between distinct frequency bands is present in cortical circuits. This study revealed prominent task-sensitive phase synchrony between alpha and beta/gamma oscillations. Finally, the implications of Studies II, III, and IV to the broader scientific context are analysed in the last study of this thesis (V). I suggest, in this thesis that neuronal processing may be extremely fast and that the evoked response is important for cognitive processes. I also propose that alpha oscillations define the global neuronal workspace of perception, action, and consciousness and, further, that cross-frequency synchronization is required for the integration of neuronal object representations into global neuronal workspace.