944 resultados para Programming languages (Electronic computers) - Semantics
Resumo:
A collaboration between dot.rural at the University of Aberdeen and the iSchool at Northumbria University, POWkist is a pilot-study exploring potential usages of currently available linked datasets within the cultural heritage domain. Many privately-held family history collections (shoebox archives) remain vulnerable unless a sustainable, affordable and accessible model of citizen-archivist digital preservation can be offered. Citizen-historians have used the web as a platform to preserve cultural heritage, however with no accessible or sustainable model these digital footprints have been ad hoc and rarely connected to broader historical research. Similarly, current approaches to connecting material on the web by exploiting linked datasets do not take into account the data characteristics of the cultural heritage domain. Funded by Semantic Media, the POWKist project is investigating how best to capture, curate, connect and present the contents of citizen-historians’ shoebox archives in an accessible and sustainable online collection. Using the Curios platform - an open-source digital archive - we have digitised a collection relating to a prisoner of war during WWII (1939-1945). Following a series of user group workshops, POWkist is now connecting these ‘made digital’ items with the broader web using a semantic technology model and identifying appropriate linked datasets of relevant content such as DBPedia (an archived linked dataset of Wikipedia) and Ordnance Survey Open Data. We are analysing the characteristics of cultural heritage linked datasets, so that these materials are better visualised, contextualised and presented in an attractive and comprehensive user interface. Our paper will consider the issues we have identified, the solutions we are developing and include a demonstration of our work-in-progress.
Resumo:
Recent years have seen an astronomical rise in SQL Injection Attacks (SQLIAs) used to compromise the confidentiality, authentication and integrity of organisations’ databases. Intruders becoming smarter in obfuscating web requests to evade detection combined with increasing volumes of web traffic from the Internet of Things (IoT), cloud-hosted and on-premise business applications have made it evident that the existing approaches of mostly static signature lack the ability to cope with novel signatures. A SQLIA detection and prevention solution can be achieved through exploring an alternative bio-inspired supervised learning approach that uses input of labelled dataset of numerical attributes in classifying true positives and negatives. We present in this paper a Numerical Encoding to Tame SQLIA (NETSQLIA) that implements a proof of concept for scalable numerical encoding of features to a dataset attributes with labelled class obtained from deep web traffic analysis. In the numerical attributes encoding: the model leverages proxy in the interception and decryption of web traffic. The intercepted web requests are then assembled for front-end SQL parsing and pattern matching by applying traditional Non-Deterministic Finite Automaton (NFA). This paper is intended for a technique of numerical attributes extraction of any size primed as an input dataset to an Artificial Neural Network (ANN) and statistical Machine Learning (ML) algorithms implemented using Two-Class Averaged Perceptron (TCAP) and Two-Class Logistic Regression (TCLR) respectively. This methodology then forms the subject of the empirical evaluation of the suitability of this model in the accurate classification of both legitimate web requests and SQLIA payloads.
Resumo:
Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.
Resumo:
Participation Space Studies explore eParticipation in the day-to-day activities of local, citizen-led groups, working to improve their communities. The focus is the relationship between activities and contexts. The concept of a participation space is introduced in order to reify online and offline contexts where people participate in democracy. Participation spaces include websites, blogs, email, social media presences, paper media, and physical spaces. They are understood as sociotechnical systems: assemblages of heterogeneous elements, with relevant histories and trajectories of development and use. This approach enables the parallel study of diverse spaces, on and offline. Participation spaces are investigated within three case studies, centred on interviews and participant observation. Each case concerns a community or activist group, in Scotland. The participation spaces are then modelled using a Socio-Technical Interaction Network (STIN) framework (Kling, McKim and King, 2003). The participation space concept effectively supports the parallel investigation of the diverse social and technical contexts of grassroots democracy and the relationship between the case-study groups and the technologies they use to support their work. Participants’ democratic participation is supported by online technologies, especially email, and they create online communities and networks around their goals. The studies illustrate the mutual shaping relationship between technology and democracy. Participants’ choice of technologies can be understood in spatial terms: boundaries, inhabitants, access, ownership, and cost. Participation spaces and infrastructures are used together and shared with other groups. Non-public online spaces, such as Facebook groups, are vital contexts for eParticipation; further, the majority of participants’ work is non-public, on and offline. It is informational, potentially invisible, work that supports public outputs. The groups involve people and influence events through emotional and symbolic impact, as well as rational argument. Images are powerful vehicles for this and digital images become an increasingly evident and important feature of participation spaces throughout the consecutively conducted case studies. Collaboration of diverse people via social media indicates that these spaces could be understood as boundary objects (Star and Griesemer, 1989). The Participation Space Studies draw from and contribute to eParticipation, social informatics, mediation, social shaping studies, and ethnographic studies of Internet use.
Resumo:
Choosing a single similarity threshold for cutting dendrograms is not sufficient for performing hierarchical clustering analysis of heterogeneous data sets. In addition, alternative automated or semi-automated methods that cut dendrograms in multiple levels make assumptions about the data in hand. In an attempt to help the user to find patterns in the data and resolve ambiguities in cluster assignments, we developed MLCut: a tool that provides visual support for exploring dendrograms of heterogeneous data sets in different levels of detail. The interactive exploration of the dendrogram is coordinated with a representation of the original data, shown as parallel coordinates. The tool supports three analysis steps. Firstly, a single-height similarity threshold can be applied using a dynamic slider to identify the main clusters. Secondly, a distinctiveness threshold can be applied using a second dynamic slider to identify “weak-edges” that indicate heterogeneity within clusters. Thirdly, the user can drill-down to further explore the dendrogram structure - always in relation to the original data - and cut the branches of the tree at multiple levels. Interactive drill-down is supported using mouse events such as hovering, pointing and clicking on elements of the dendrogram. Two prototypes of this tool have been developed in collaboration with a group of biologists for analysing their own data sets. We found that enabling the users to cut the tree at multiple levels, while viewing the effect in the original data, is a promising method for clustering which could lead to scientific discoveries.
Resumo:
Physical places are given contextual meaning by the objects and people that make up the space. Presence in physical places can be utilised to support mobile interaction by making access to media and notifications on a smartphone easier and more visible to other people. Smartphone interfaces can be extended into the physical world in a meaningful way by anchoring digital content to artefacts, and interactions situated around physical artefacts can provide contextual meaning to private manipulations with a mobile device. Additionally, places themselves are designed to support a set of tasks, and the logical structure of places can be used to organise content on the smartphone. Menus that adapt the functionality of a smartphone can support the user by presenting the tools most likely to be needed just-in-time, so that information needs can be satisfied quickly and with little cognitive effort. Furthermore, places are often shared with people whom the user knows, and the smartphone can facilitate social situations by providing access to content that stimulates conversation. However, the smartphone can disrupt a collaborative environment, by alerting the user with unimportant notifications, or sucking the user in to the digital world with attractive content that is only shown on a private screen. Sharing smartphone content on a situated display creates an inclusive and unobtrusive user experience, and can increase focus on a primary task by allowing content to be read at a glance. Mobile interaction situated around artefacts of personal places is investigated as a way to support users to access content from their smartphone while managing their physical presence. A menu that adapts to personal places is evaluated to reduce the time and effort of app navigation, and coordinating smartphone content on a situated display is found to support social engagement and the negotiation of notifications. Improving the sensing of smartphone users in places is a challenge that is out-with the scope of this thesis. Instead, interaction designers and developers should be provided with low-cost positioning tools that utilise presence in places, and enable quantitative and qualitative data to be collected in user evaluations. Two lightweight positioning tools are developed with the low-cost sensors that are currently available: The Microsoft Kinect depth sensor allows movements of a smartphone user to be tracked in a limited area of a place, and Bluetooth beacons enable the larger context of a place to be detected. Positioning experiments with each sensor are performed to highlight the capabilities and limitations of current sensing techniques for designing interactions with a smartphone. Both tools enable prototypes to be built with a rapid prototyping approach, and mobile interactions can be tested with more advanced sensing techniques as they become available. Sensing technologies are becoming pervasive, and it will soon be possible to perform reliable place detection in-the-wild. Novel interactions that utilise presence in places can support smartphone users by making access to useful functionality easy and more visible to the people who matter most in everyday life.
Resumo:
Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes five significant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identifies a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artificial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for configuring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-specific and configuring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.
Resumo:
This thesis reports on an investigation of the feasibility and usefulness of incorporating dynamic management facilities for managing sensed context data in a distributed contextaware mobile application. The investigation focuses on reducing the work required to integrate new sensed context streams in an existing context aware architecture. Current architectures require integration work for new streams and new contexts that are encountered. This means of operation is acceptable for current fixed architectures. However, as systems become more mobile the number of discoverable streams increases. Without the ability to discover and use these new streams the functionality of any given device will be limited to the streams that it knows how to decode. The integration of new streams requires that the sensed context data be understood by the current application. If the new source provides data of a type that an application currently requires then the new source should be connected to the application without any prior knowledge of the new source. If the type is similar and can be converted then this stream too should be appropriated by the application. Such applications are based on portable devices (phones, PDAs) for semi-autonomous services that use data from sensors connected to the devices, plus data exchanged with other such devices and remote servers. Such applications must handle input from a variety of sensors, refining the data locally and managing its communication from the device in volatile and unpredictable network conditions. The choice to focus on locally connected sensory input allows for the introduction of privacy and access controls. This local control can determine how the information is communicated to others. This investigation focuses on the evaluation of three approaches to sensor data management. The first system is characterised by its static management based on the pre-pended metadata. This was the reference system. Developed for a mobile system, the data was processed based on the attached metadata. The code that performed the processing was static. The second system was developed to move away from the static processing and introduce a greater freedom of handling for the data stream, this resulted in a heavy weight approach. The approach focused on pushing the processing of the data into a number of networked nodes rather than the monolithic design of the previous system. By creating a separate communication channel for the metadata it is possible to be more flexible with the amount and type of data transmitted. The final system pulled the benefits of the other systems together. By providing a small management class that would load a separate handler based on the incoming data, Dynamism was maximised whilst maintaining ease of code understanding. The three systems were then compared to highlight their ability to dynamically manage new sensed context. The evaluation took two approaches, the first is a quantitative analysis of the code to understand the complexity of the relative three systems. This was done by evaluating what changes to the system were involved for the new context. The second approach takes a qualitative view of the work required by the software engineer to reconfigure the systems to provide support for a new data stream. The evaluation highlights the various scenarios in which the three systems are most suited. There is always a trade-o↵ in the development of a system. The three approaches highlight this fact. The creation of a statically bound system can be quick to develop but may need to be completely re-written if the requirements move too far. Alternatively a highly dynamic system may be able to cope with new requirements but the developer time to create such a system may be greater than the creation of several simpler systems.
Resumo:
Due to the growth of design size and complexity, design verification is an important aspect of the Logic Circuit development process. The purpose of verification is to validate that the design meets the system requirements and specification. This is done by either functional or formal verification. The most popular approach to functional verification is the use of simulation based techniques. Using models to replicate the behaviour of an actual system is called simulation. In this thesis, a software/data structure architecture without explicit locks is proposed to accelerate logic gate circuit simulation. We call thus system ZSIM. The ZSIM software architecture simulator targets low cost SIMD multi-core machines. Its performance is evaluated on the Intel Xeon Phi and 2 other machines (Intel Xeon and AMD Opteron). The aim of these experiments is to: • Verify that the data structure used allows SIMD acceleration, particularly on machines with gather instructions ( section 5.3.1). • Verify that, on sufficiently large circuits, substantial gains could be made from multicore parallelism ( section 5.3.2 ). • Show that a simulator using this approach out-performs an existing commercial simulator on a standard workstation ( section 5.3.3 ). • Show that the performance on a cheap Xeon Phi card is competitive with results reported elsewhere on much more expensive super-computers ( section 5.3.5 ). To evaluate the ZSIM, two types of test circuits were used: 1. Circuits from the IWLS benchmark suit [1] which allow direct comparison with other published studies of parallel simulators.2. Circuits generated by a parametrised circuit synthesizer. The synthesizer used an algorithm that has been shown to generate circuits that are statistically representative of real logic circuits. The synthesizer allowed testing of a range of very large circuits, larger than the ones for which it was possible to obtain open source files. The experimental results show that with SIMD acceleration and multicore, ZSIM gained a peak parallelisation factor of 300 on Intel Xeon Phi and 11 on Intel Xeon. With only SIMD enabled, ZSIM achieved a maximum parallelistion gain of 10 on Intel Xeon Phi and 4 on Intel Xeon. Furthermore, it was shown that this software architecture simulator running on a SIMD machine is much faster than, and can handle much bigger circuits than a widely used commercial simulator (Xilinx) running on a workstation. The performance achieved by ZSIM was also compared with similar pre-existing work on logic simulation targeting GPUs and supercomputers. It was shown that ZSIM simulator running on a Xeon Phi machine gives comparable simulation performance to the IBM Blue Gene supercomputer at very much lower cost. The experimental results have shown that the Xeon Phi is competitive with simulation on GPUs and allows the handling of much larger circuits than have been reported for GPU simulation. When targeting Xeon Phi architecture, the automatic cache management of the Xeon Phi, handles and manages the on-chip local store without any explicit mention of the local store being made in the architecture of the simulator itself. However, targeting GPUs, explicit cache management in program increases the complexity of the software architecture. Furthermore, one of the strongest points of the ZSIM simulator is its portability. Note that the same code was tested on both AMD and Xeon Phi machines. The same architecture that efficiently performs on Xeon Phi, was ported into a 64 core NUMA AMD Opteron. To conclude, the two main achievements are restated as following: The primary achievement of this work was proving that the ZSIM architecture was faster than previously published logic simulators on low cost platforms. The secondary achievement was the development of a synthetic testing suite that went beyond the scale range that was previously publicly available, based on prior work that showed the synthesis technique is valid.
Resumo:
The Internet has grown in size at rapid rates since BGP records began, and continues to do so. This has raised concerns about the scalability of the current BGP routing system, as the routing state at each router in a shortest-path routing protocol will grow at a supra-linearly rate as the network grows. The concerns are that the memory capacity of routers will not be able to keep up with demands, and that the growth of the Internet will become ever more cramped as more and more of the world seeks the benefits of being connected. Compact routing schemes, where the routing state grows only sub-linearly relative to the growth of the network, could solve this problem and ensure that router memory would not be a bottleneck to Internet growth. These schemes trade away shortest-path routing for scalable memory state, by allowing some paths to have a certain amount of bounded “stretch”. The most promising such scheme is Cowen Routing, which can provide scalable, compact routing state for Internet routing, while still providing shortest-path routing to nearly all other nodes, with only slightly stretched paths to a very small subset of the network. Currently, there is no fully distributed form of Cowen Routing that would be practical for the Internet. This dissertation describes a fully distributed and compact protocol for Cowen routing, using the k-core graph decomposition. Previous compact routing work showed the k-core graph decomposition is useful for Cowen Routing on the Internet, but no distributed form existed. This dissertation gives a distributed k-core algorithm optimised to be efficient on dynamic graphs, along with with proofs of its correctness. The performance and efficiency of this distributed k-core algorithm is evaluated on large, Internet AS graphs, with excellent results. This dissertation then goes on to describe a fully distributed and compact Cowen Routing protocol. This protocol being comprised of a landmark selection process for Cowen Routing using the k-core algorithm, with mechanisms to ensure compact state at all times, including at bootstrap; a local cluster routing process, with mechanisms for policy application and control of cluster sizes, ensuring again that state can remain compact at all times; and a landmark routing process is described with a prioritisation mechanism for announcements that ensures compact state at all times.
Resumo:
The next generation of vehicles will be equipped with automated Accident Warning Systems (AWSs) capable of warning neighbouring vehicles about hazards that might lead to accidents. The key enabling technology for these systems is the Vehicular Ad-hoc Networks (VANET) but the dynamics of such networks make the crucial timely delivery of warning messages challenging. While most previously attempted implementations have used broadcast-based data dissemination schemes, these do not cope well as data traffic load or network density increases. This problem of sending warning messages in a timely manner is addressed by employing a network coding technique in this thesis. The proposed NETwork COded DissEmination (NETCODE) is a VANET-based AWS responsible for generating and sending warnings to the vehicles on the road. NETCODE offers an XOR-based data dissemination scheme that sends multiple warning in a single transmission and therefore, reduces the total number of transmissions required to send the same number of warnings that broadcast schemes send. Hence, it reduces contention and collisions in the network improving the delivery time of the warnings. The first part of this research (Chapters 3 and 4) asserts that in order to build a warning system, it is needful to ascertain the system requirements, information to be exchanged, and protocols best suited for communication between vehicles. Therefore, a study of these factors along with a review of existing proposals identifying their strength and weakness is carried out. Then an analysis of existing broadcast-based warning is conducted which concludes that although this is the most straightforward scheme, loading can result an effective collapse, resulting in unacceptably long transmission delays. The second part of this research (Chapter 5) proposes the NETCODE design, including the main contribution of this thesis, a pair of encoding and decoding algorithms that makes the use of an XOR-based technique to reduce transmission overheads and thus allows warnings to get delivered in time. The final part of this research (Chapters 6--8) evaluates the performance of the proposed scheme as to how it reduces the number of transmissions in the network in response to growing data traffic load and network density and investigates its capacity to detect potential accidents. The evaluations use a custom-built simulator to model real-world scenarios such as city areas, junctions, roundabouts, motorways and so on. The study shows that the reduction in the number of transmissions helps reduce competition in the network significantly and this allows vehicles to deliver warning messages more rapidly to their neighbours. It also examines the relative performance of NETCODE when handling both sudden event-driven and longer-term periodic messages in diverse scenarios under stress caused by increasing numbers of vehicles and transmissions per vehicle. This work confirms the thesis' primary contention that XOR-based network coding provides a potential solution on which a more efficient AWS data dissemination scheme can be built.
Resumo:
In this thesis, we present a quantitative approach using probabilistic verification techniques for the analysis of reliability, availability, maintainability, and safety (RAMS) properties of satellite systems. The subject of our research is satellites used in mission critical industrial applications. A strong case for using probabilistic model checking to support RAMS analysis of satellite systems is made by our verification results. This study is intended to build a foundation to help reliability engineers with a basic background in model checking to apply probabilistic model checking to small satellite systems. We make two major contributions. One of these is the approach of RAMS analysis to satellite systems. In the past, RAMS analysis has been extensively applied to the field of electrical and electronics engineering. It allows system designers and reliability engineers to predict the likelihood of failures from the indication of historical or current operational data. There is a high potential for the application of RAMS analysis in the field of space science and engineering. However, there is a lack of standardisation and suitable procedures for the correct study of RAMS characteristics for satellite systems. This thesis considers the promising application of RAMS analysis to the case of satellite design, use, and maintenance, focusing on its system segments. Data collection and verification procedures are discussed, and a number of considerations are also presented on how to predict the probability of failure. Our second contribution is leveraging the power of probabilistic model checking to analyse satellite systems. We present techniques for analysing satellite systems that differ from the more common quantitative approaches based on traditional simulation and testing. These techniques have not been applied in this context before. We present the use of probabilistic techniques via a suite of detailed examples, together with their analysis. Our presentation is done in an incremental manner: in terms of complexity of application domains and system models, and a detailed PRISM model of each scenario. We also provide results from practical work together with a discussion about future improvements.
Resumo:
Embedded software systems in vehicles are of rapidly increasing commercial importance for the automotive industry. Current systems employ a static run-time environment; due to the difficulty and cost involved in the development of dynamic systems in a high-integrity embedded control context. A dynamic system, referring to the system configuration, would greatly increase the flexibility of the offered functionality and enable customised software configuration for individual vehicles, adding customer value through plug-and-play capability, and increased quality due to its inherent ability to adjust to changes in hardware and software. We envisage an automotive system containing a variety of components, from a multitude of organizations, not necessarily known at development time. The system dynamically adapts its configuration to suit the run-time system constraints. This paper presents our vision for future automotive control systems that will be regarded in an EU research project, referred to as DySCAS (Dynamically Self-Configuring Automotive Systems). We propose a self-configuring vehicular control system architecture, with capabilities that include automatic discovery and inclusion of new devices, self-optimisation to best-use the processing, storage and communication resources available, self-diagnostics and ultimately self-healing. Such an architecture has benefits extending to reduced development and maintenance costs, improved passenger safety and comfort, and flexible owner customisation. Specifically, this paper addresses the following issues: The state of the art of embedded software systems in vehicles, emphasising the current limitations arising from fixed run-time configurations; and the benefits and challenges of dynamic configuration, giving rise to opportunities for self-healing, self-optimisation, and the automatic inclusion of users’ Consumer Electronic (CE) devices. Our proposal for a dynamically reconfigurable automotive software system platform is outlined and a typical use-case is presented as an example to exemplify the benefits of the envisioned dynamic capabilities.
Resumo:
Abstract not available
Resumo:
With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).