987 resultados para Video genre classification


30.00% 30.00%



Co-training is a semi-supervised learning method that is designed to take advantage of the redundancy that is present when the object to be identified has multiple descriptions. Co-training is known to work well when the multiple descriptions are conditional independent given the class of the object. The presence of multiple descriptions of objects in the form of text, images, audio and video in multimedia applications appears to provide redundancy in the form that may be suitable for co-training. In this paper, we investigate the suitability of utilizing text and image data from the Web for co-training. We perform measurements to find indications of conditional independence in the texts and images obtained from the Web. Our measurements suggest that conditional independence is likely to be present in the data. Our experiments, within a relevance feedback framework to test whether a method that exploits the conditional independence outperforms methods that do not, also indicate that better performance can indeed be obtained by designing algorithms that exploit this form of the redundancy when it is present.


30.00% 30.00%



We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos


30.00% 30.00%



Short video on laser classification produced by the National Physical Laboratory


30.00% 30.00%



In this paper, we propose a content selection framework that improves the users` experience when they are enriching or authoring pieces of news. This framework combines a variety of techniques to retrieve semantically related videos, based on a set of criteria which are specified automatically depending on the media`s constraints. The combination of different content selection mechanisms can improve the quality of the retrieved scenes, because each technique`s limitations are minimized by other techniques` strengths. We present an evaluation based on a number of experiments, which show that the retrieved results are better when all criteria are used at time.


30.00% 30.00%



Wooden railway sleeper inspections in Sweden are currently performed manually by a human operator; such inspections are based on visual analysis. Machine vision based approach has been done to emulate the visual abilities of human operator to enable automation of the process. Through this process bad sleepers are identified, and a spot is marked on it with specific color (blue in the current case) on the rail so that the maintenance operators are able to identify the spot and replace the sleeper. The motive of this thesis is to help the operators to identify those sleepers which are marked by color (spots), using an “Intelligent Vehicle” which is capable of running on the track. Capturing video while running on the track and segmenting the object of interest (spot) through this vehicle; we can automate this work and minimize the human intuitions. The video acquisition process depends on camera position and source light to obtain fine brightness in acquisition, we have tested 4 different types of combinations (camera position and source light) here to record the video and test the validity of proposed method. A sequence of real time rail frames are extracted from these videos and further processing (depending upon the data acquisition process) is done to identify the spots. After identification of spot each frame is divided in to 9 regions to know the particular region where the spot lies to avoid overlapping with noise, and so on. The proposed method will generate the information regarding in which region the spot lies, based on nine regions in each frame. From the generated results we have made some classification regarding data collection techniques, efficiency, time and speed. In this report, extensive experiments using image sequences from particular camera are reported and the experiments were done using intelligent vehicle as well as test vehicle and the results shows that we have achieved 95% success in identifying the spots when we use video as it is, in other method were we can skip some frames in pre-processing to increase the speed of video but the segmentation results we reduced to 85% and the time was very less compared to previous one. This shows the validity of proposed method in identification of spots lying on wooden railway sleepers where we can compromise between time and efficiency to get the desired result.


30.00% 30.00%



Previously, the authors proposed a new, simple method of frequency domain analysis based on the two-dimensional discrete wavelet transform to objectively measure the pilling intensity in sample fabric images. The method was further characterized, and the results obtained indicate that standard deviation and variance are the most appropriate measures of the dispersion of wavelet details coefficients for analysis, that the relationship between wavelet analysis scale and fabric inter-yarn pitch was empirically confirmed, and, that fabrics with random patterns do not appear to impact on the effectiveness of the analysis method.


30.00% 30.00%



A new two-level real-time vehicle detection method is proposed in order to meet the robustness and efficiency requirements of real world applications. At the high level, pixels of the background image are classified into three categories according to the characteristics of Red, Green, Blue (RGB) curves. The robustness of the classification is further enhanced by using
line detection and pattern connectivity. At the lower level, an exponential forgetting algorithm with adaptive parameters for different categories is utilised to calculate the background and reduce the distortion by the small motion of video cameras. Scene tests show that the proposed method is more robust and faster than previous methods, which is very suitable for real-time vehicle detection in outdoor environments, especially concerning locations where the level of illumination changes frequently and speed detection is important.


30.00% 30.00%



In this habitat mapping study, multi-beam acoustic data are integrated with extensive, precisely geo-referenced video validation data in a GIS environment to classify benthic substrates and biota at a 33km2 site in the near shore waters of Victoria, Australia. Using an automated decision-tree classification method, 5 representative biotic groups were identified in the Cape Nelson survey area using a combination of multi-beam bathymetry, backscatter and derivative products. Rigorous error assessment of derived, classified maps produced high overall accuracies (>85%) for all mapping products. In addition, a discrete multivariate analysis technique (kappa analysis) was used to assess classification accuracy. High-resolution (2.5m cell-size) representation of sea floor morphology and textural characteristics provided by multi-beam bathymetry and backscatter datasets, allowed the interpretation of benthic substrates of the Cape Nelson site and the communities of sessile organisms that populate them. Non-parametric multivariate statistical analysis (ANOSIM) revealed a significant difference in biotic composition between depth strata, and between substrate types. Incorporated with other descriptive measures, these results indicate that depth and substrate are important factors in the distributional ecology of the biotic communities at the Cape Nelson study site. BIOENV analysis indicates that derivatives of both multi-beam datasets (bathymetry and backscatter) are correlated with distribution and density of biotic communities. Results from this study provide new tools for research and management of the coastal zone.


30.00% 30.00%



Interpretation of video information is a difficult task for computer vision and machine intelligence. In this paper we examine the utility of a non-image based source of information about video contents, namely the shot list, and study its use in aiding image interpretation. We show how the shot list may be analysed to produce a simple summary of the 'who and where' of a documentary or interview video. In order to detect the subject of a video we use the notion of a 'shot syntax' of a particular genre to isolate actual interview sections.


30.00% 30.00%



We present results on an extension to our approach for automatic sports video annotation. Sports video is augmented with accelerometer data from wrist bands worn by umpires in the game. We solve the problem of automatic segmentation and robust gesture classification using a hierarchical hidden Markov model in conjunction with a filler model. The hierarchical model allows us to consider gestures at different levels of abstraction and the filler model allows us to handle extraneous umpire movements. Results are presented for labeling video for a game of Cricket.


30.00% 30.00%



This paper addresses the area of video annotation, indexing and retrieval, and shows how a set of tools can be employed, along with domain knowledge, to detect narrative structure in broadcast news. The initial structure is detected using low-level audio visual processing in conjunction with domain knowledge. Higher level processing may then utilize the initial structure detected to direct processing to improve and extend the initial classification.

The structure detected breaks a news broadcast into segments, each of which contains a single topic of discussion. Further the segments are labeled as a) anchor person or reporter, b) footage with a voice over or c) sound bite. This labeling may be used to provide a summary, for example by presenting a thumbnail for each reporter present in a section of the video. The inclusion of domain knowledge in computation allows more directed application of high level processing, giving much greater efficiency of effort expended. This allows valid deductions to be made about structure and semantics of the contents of a news video stream, as demonstrated by our experiments on CNN news broadcasts.


30.00% 30.00%



Many tasks in computer vision can be expressed as graph problems. This allows the task to be solved using a well studied algorithm, however many of these algorithms are of exponential complexity. This is a disadvantage when considered in the context of searching a database of images or videos for similarity. Work by Mesaner and Bunke (1995) has suggested a new class of graph matching algorithms which uses a priori knowledge about a database of models to reduce the time taken during online classification. This paper presents a new algorithm which extends the earlier work to detection of the largest common subgraph.


30.00% 30.00%



This paper presents a comparative evaluation of popular multi-label classification methods on several multi-label problems from different domains. The methods include multi-label k-nearest neighbor, binary relevance, label power set, random k-label set ensemble learning, calibrated label ranking, hierarchy of multi-label classifiers and triple random ensemble multi-label classification algorithms. These multi-label learning algorithms are evaluated using several widely used MLC evaluation metrics. The evaluation results show that for each multi-label classification problem a particular MLC method can be recommended. The multi-label evaluation datasets used in this study are related to scene images, multimedia video frames, diagnostic medical report, email messages, emotional music data, biological genes and multi-structural proteins categorization.


30.00% 30.00%



In this paper we consider face recognition from sets of face images and, in particular, recognition invariance to illumination. The main contribution is an algorithm based on the novel concept of maximally probable mutual modes (MMPM). Specifically: (i) we discuss and derive a local manifold illumination invariant and (ii) show how the invariant naturally leads to a formulation of "common modes" of two face appearance distributions. Recognition is then performed by finding the most probable mode, which is shown to be an eigenvalue problem. The effectiveness of the proposed method is demonstrated empirically on a challenging database containing the total of 700 video sequences of 100 individuals