412 resultados para multi-camera environment
em Queensland University of Technology - ePrints Archive
Resumo:
CCTV and surveillance networks are increasingly being used for operational as well as security tasks. One emerging area of technology that lends itself to operational analytics is soft biometrics. Soft biometrics can be used to describe a person and detect them throughout a sparse multi-camera network. This enables them to be used to perform tasks such as determining the time taken to get from point to point, and the paths taken through an environment by detecting and matching people across disjoint views. However, in a busy environment where there are 100's if not 1000's of people such as an airport, attempting to monitor everyone is highly unrealistic. In this paper we propose an average soft biometric, that can be used to identity people who look distinct, and are thus suitable for monitoring through a large, sparse camera network. We demonstrate how an average soft biometric can be used to identify unique people to calculate operational measures such as the time taken to travel from point to point.
Resumo:
Person re-identification involves recognising individuals in different locations across a network of cameras and is a challenging task due to a large number of varying factors such as pose (both subject and camera) and ambient lighting conditions. Existing databases do not adequately capture these variations, making evaluations of proposed techniques difficult. In this paper, we present a new challenging multi-camera surveillance database designed for the task of person re-identification. This database consists of 150 unscripted sequences of subjects travelling in a building environment though up to eight camera views, appearing from various angles and in varying illumination conditions. A flexible XML-based evaluation protocol is provided to allow a highly configurable evaluation setup, enabling a variety of scenarios relating to pose and lighting conditions to be evaluated. A baseline person re-identification system consisting of colour, height and texture models is demonstrated on this database.
Resumo:
Automated crowd counting has become an active field of computer vision research in recent years. Existing approaches are scene-specific, as they are designed to operate in the single camera viewpoint that was used to train the system. Real world camera networks often span multiple viewpoints within a facility, including many regions of overlap. This paper proposes a novel scene invariant crowd counting algorithm that is designed to operate across multiple cameras. The approach uses camera calibration to normalise features between viewpoints and to compensate for regions of overlap. This compensation is performed by constructing an 'overlap map' which provides a measure of how much an object at one location is visible within other viewpoints. An investigation into the suitability of various feature types and regression models for scene invariant crowd counting is also conducted. The features investigated include object size, shape, edges and keypoints. The regression models evaluated include neural networks, K-nearest neighbours, linear and Gaussian process regresion. Our experiments demonstrate that accurate crowd counting was achieved across seven benchmark datasets, with optimal performance observed when all features were used and when Gaussian process regression was used. The combination of scene invariance and multi camera crowd counting is evaluated by training the system on footage obtained from the QUT camera network and testing it on three cameras from the PETS 2009 database. Highly accurate crowd counting was observed with a mean relative error of less than 10%. Our approach enables a pre-trained system to be deployed on a new environment without any additional training, bringing the field one step closer toward a 'plug and play' system.
Resumo:
Handling information overload online, from the user's point of view is a big challenge, especially when the number of websites is growing rapidly due to growth in e-commerce and other related activities. Personalization based on user needs is the key to solving the problem of information overload. Personalization methods help in identifying relevant information, which may be liked by a user. User profile and object profile are the important elements of a personalization system. When creating user and object profiles, most of the existing methods adopt two-dimensional similarity methods based on vector or matrix models in order to find inter-user and inter-object similarity. Moreover, for recommending similar objects to users, personalization systems use the users-users, items-items and users-items similarity measures. In most cases similarity measures such as Euclidian, Manhattan, cosine and many others based on vector or matrix methods are used to find the similarities. Web logs are high-dimensional datasets, consisting of multiple users, multiple searches with many attributes to each. Two-dimensional data analysis methods may often overlook latent relationships that may exist between users and items. In contrast to other studies, this thesis utilises tensors, the high-dimensional data models, to build user and object profiles and to find the inter-relationships between users-users and users-items. To create an improved personalized Web system, this thesis proposes to build three types of profiles: individual user, group users and object profiles utilising decomposition factors of tensor data models. A hybrid recommendation approach utilising group profiles (forming the basis of a collaborative filtering method) and object profiles (forming the basis of a content-based method) in conjunction with individual user profiles (forming the basis of a model based approach) is proposed for making effective recommendations. A tensor-based clustering method is proposed that utilises the outcomes of popular tensor decomposition techniques such as PARAFAC, Tucker and HOSVD to group similar instances. An individual user profile, showing the user's highest interest, is represented by the top dimension values, extracted from the component matrix obtained after tensor decomposition. A group profile, showing similar users and their highest interest, is built by clustering similar users based on tensor decomposed values. A group profile is represented by the top association rules (containing various unique object combinations) that are derived from the searches made by the users of the cluster. An object profile is created to represent similar objects clustered on the basis of their similarity of features. Depending on the category of a user (known, anonymous or frequent visitor to the website), any of the profiles or their combinations is used for making personalized recommendations. A ranking algorithm is also proposed that utilizes the personalized information to order and rank the recommendations. The proposed methodology is evaluated on data collected from a real life car website. Empirical analysis confirms the effectiveness of recommendations made by the proposed approach over other collaborative filtering and content-based recommendation approaches based on two-dimensional data analysis methods.
'Going live' : establishing the creative attributes of the live multi-camera television professional
Resumo:
In my capacity as a television professional and teacher specialising in multi-camera live television production for over 40 years, I was drawn to the conclusion that opaque or inadequately formed understandings of how creativity applies to the field of live television, have impeded the development of pedagogies suitable to the teaching of live television in universities. In the pursuit of this hypothesis, the thesis shows that television degrees were born out of film studies degrees, where intellectual creativity was aligned to single camera production, and the 'creative roles' of producers, directors and scriptwriters. At the same time, multi-camera live television production was subsumed under the 'mass communication' banner, leading to an understanding that roles other than producer and director are simply technical, and bereft of creative intent or acumen. The thesis goes on to show that this attitude to other television production personnel, for example, the vision mixer, videotape operator and camera operator, relegates their roles to that of 'button pusher'. This has resulted in university teaching models with inappropriate resources and unsuitable teaching practices. As a result, the industry is struggling to find people with the skills to fill the demands of the multi-camera live television sector. In specific terms the central hypothesis is pursued through the following sequenced approach. Firstly, the thesis sets out to outline the problems, and traces the origins of the misconceptions that hold with the notion that intellectual creativity does not exist in live multi-camera television. Secondly, this more adequately conceptualised rendition, of the origins particular to the misconceptions of live television and creativity, is then anchored to the field of examination by presentation of the foundations of the roles involved in making live television programs, using multicamera production techniques. Thirdly, this more nuanced rendition of the field sets the stage for a thorough analysis of education and training in the industry, and teaching models at Australian universities. The findings clearly establish that the pedagogical models are aimed at single camera production, a position that deemphasises the creative aspects of multi-camera live television production. Informed by an examination of theories of learning, qualitative interviews, professional reflective practice and observations, the roles of four multi-camera live production crewmembers (camera operator, vision mixer, EVS/videotape operator and director's assistant), demonstrate the existence of intellectual creativity during live production. Finally, supported by the theories of learning, and the development and explication of a successful teaching model, a new approach to teaching students how to work in live television is proposed and substantiated.
Resumo:
The selection of optimal camera configurations (camera locations, orientations, etc.) for multi-camera networks remains an unsolved problem. Previous approaches largely focus on proposing various objective functions to achieve different tasks. Most of them, however, do not generalize well to large scale networks. To tackle this, we propose a statistical framework of the problem as well as propose a trans-dimensional simulated annealing algorithm to effectively deal with it. We compare our approach with a state-of-the-art method based on binary integer programming (BIP) and show that our approach offers similar performance on small scale problems. However, we also demonstrate the capability of our approach in dealing with large scale problems and show that our approach produces better results than two alternative heuristics designed to deal with the scalability issue of BIP. Last, we show the versatility of our approach using a number of specific scenarios.
Resumo:
In public venues, crowd size is a key indicator of crowd safety and stability. Crowding levels can be detected using holistic image features, however this requires a large amount of training data to capture the wide variations in crowd distribution. If a crowd counting algorithm is to be deployed across a large number of cameras, such a large and burdensome training requirement is far from ideal. In this paper we propose an approach that uses local features to count the number of people in each foreground blob segment, so that the total crowd estimate is the sum of the group sizes. This results in an approach that is scalable to crowd volumes not seen in the training data, and can be trained on a very small data set. As a local approach is used, the proposed algorithm can easily be used to estimate crowd density throughout different regions of the scene and be used in a multi-camera environment. A unique localised approach to ground truth annotation reduces the required training data is also presented, as a localised approach to crowd counting has different training requirements to a holistic one. Testing on a large pedestrian database compares the proposed technique to existing holistic techniques and demonstrates improved accuracy, and superior performance when test conditions are unseen in the training set, or a minimal training set is used.
Resumo:
This paper proposes a semi-supervised intelligent visual surveillance system to exploit the information from multi-camera networks for the monitoring of people and vehicles. Modules are proposed to perform critical surveillance tasks including: the management and calibration of cameras within a multi-camera network; tracking of objects across multiple views; recognition of people utilising biometrics and in particular soft-biometrics; the monitoring of crowds; and activity recognition. Recent advances in these computer vision modules and capability gaps in surveillance technology are also highlighted.
Resumo:
The selection of optimal camera configurations (camera locations, orientations etc.) for multi-camera networks remains an unsolved problem. Previous approaches largely focus on proposing various objective functions to achieve different tasks. Most of them, however, do not generalize well to large scale networks. To tackle this, we introduce a statistical formulation of the optimal selection of camera configurations as well as propose a Trans-Dimensional Simulated Annealing (TDSA) algorithm to effectively solve the problem. We compare our approach with a state-of-the-art method based on Binary Integer Programming (BIP) and show that our approach offers similar performance on small scale problems. However, we also demonstrate the capability of our approach in dealing with large scale problems and show that our approach produces better results than 2 alternative heuristics designed to deal with the scalability issue of BIP.
Resumo:
After first observing a person, the task of person re-identification involves recognising an individual at different locations across a network of cameras at a later time. Traditionally, this task has been performed by first extracting appearance features of an individual and then matching these features to the previous observation. However, identifying an individual based solely on appearance can be ambiguous, particularly when people wear similar clothing (i.e. people dressed in uniforms in sporting and school settings). This task is made more difficult when the resolution of the input image is small as is typically the case in multi-camera networks. To circumvent these issues, we need to use other contextual cues. In this paper, we use "group" information as our contextual feature to aid in the re-identification of a person, which is heavily motivated by the fact that people generally move together as a collective group. To encode group context, we learn a linear mapping function to assign each person to a "role" or position within the group structure. We then combine the appearance and group context cues using a weighted summation. We demonstrate how this improves performance of person re-identification in a sports environment over appearance based-features.
Resumo:
Agent-oriented conceptual modelling (AoCM) approaches in Requirements Engineering (RE) have received considerable attention recently. Semi-formal modeling frameworks such as i* assist analysts in requirements elicitation and reasoning of early-phase RE. AgentSpeak(L) is a widely accepted agent programming language. The Strategic Rationale (SR) model of the i* framework naturally lends itself to AgentSpeak(L) programs. Furthermore, the Strategic Dependency (SD) component of the i* framework prescribes the interaction between the agents in a multi-agent environment. This paper proposes a formal methodology for transforming a SR model to an AgentS- peak(L) agent. The constructed AgentSpeak(L) agents will then form the essential components of a multi-agent system, MAS.
Resumo:
In a typical collaborative application, users contends for common resources by mutual exclusion. The introduction of multi-modal environment, however, introduced problems such as frequent dropping of connection or limited connectivity speed of mobile users. This paper target 3D resources which require additional considerations such as dependency of users' manipulation command. This paper introduces Dynamic Locking Synchronisation technique to enable seamless and collaborative environment for large number of user, by combining the contention-free concepts of locking mechanism and the seamless nature of lockless design.
Resumo:
Seaport container terminals are an important part of the logistics systems in international trades. This paper investigates the relationship between quay cranes, yard machines and container storage locations in a multi-berth and multi-ship environment. The aims are to develop a model for improving the operation efficiency of the seaports and to develop an analytical tool for yard operation planning. Due to the fact that the container transfer times are sequence-dependent and with the large number of variables involve, the proposed model cannot be solved in a reasonable time interval for realistically sized problems. For this reason, List Scheduling and Tabu Search algorithms have been developed to solve this formidable and NP-hard scheduling problem. Numerical implementations have been analysed and promising results have been achieved.
Resumo:
In public places, crowd size may be an indicator of congestion, delay, instability, or of abnormal events, such as a fight, riot or emergency. Crowd related information can also provide important business intelligence such as the distribution of people throughout spaces, throughput rates, and local densities. A major drawback of many crowd counting approaches is their reliance on large numbers of holistic features, training data requirements of hundreds or thousands of frames per camera, and that each camera must be trained separately. This makes deployment in large multi-camera environments such as shopping centres very costly and difficult. In this chapter, we present a novel scene-invariant crowd counting algorithm that uses local features to monitor crowd size. The use of local features allows the proposed algorithm to calculate local occupancy statistics, scale to conditions which are unseen in the training data, and be trained on significantly less data. Scene invariance is achieved through the use of camera calibration, allowing the system to be trained on one or more viewpoints and then deployed on any number of new cameras for testing without further training. A pre-trained system could then be used as a ‘turn-key’ solution for crowd counting across a wide range of environments, eliminating many of the costly barriers to deployment which currently exist.