814 resultados para data gathering algorithm
Resumo:
Monitoring Internet traffic is critical in order to acquire a good understanding of threats to computer and network security and in designing efficient computer security systems. Researchers and network administrators have applied several approaches to monitoring traffic for malicious content. These techniques include monitoring network components, aggregating IDS alerts, and monitoring unused IP address spaces. Another method for monitoring and analyzing malicious traffic, which has been widely tried and accepted, is the use of honeypots. Honeypots are very valuable security resources for gathering artefacts associated with a variety of Internet attack activities. As honeypots run no production services, any contact with them is considered potentially malicious or suspicious by definition. This unique characteristic of the honeypot reduces the amount of collected traffic and makes it a more valuable source of information than other existing techniques. Currently, there is insufficient research in the honeypot data analysis field. To date, most of the work on honeypots has been devoted to the design of new honeypots or optimizing the current ones. Approaches for analyzing data collected from honeypots, especially low-interaction honeypots, are presently immature, while analysis techniques are manual and focus mainly on identifying existing attacks. This research addresses the need for developing more advanced techniques for analyzing Internet traffic data collected from low-interaction honeypots. We believe that characterizing honeypot traffic will improve the security of networks and, if the honeypot data is handled in time, give early signs of new vulnerabilities or breakouts of new automated malicious codes, such as worms. The outcomes of this research include: • Identification of repeated use of attack tools and attack processes through grouping activities that exhibit similar packet inter-arrival time distributions using the cliquing algorithm; • Application of principal component analysis to detect the structure of attackers’ activities present in low-interaction honeypots and to visualize attackers’ behaviors; • Detection of new attacks in low-interaction honeypot traffic through the use of the principal component’s residual space and the square prediction error statistic; • Real-time detection of new attacks using recursive principal component analysis; • A proof of concept implementation for honeypot traffic analysis and real time monitoring.
Resumo:
This paper investigates the use of the FAB-MAP appearance-only SLAM algorithm as a method for performing visual data association for RatSLAM, a semi-metric full SLAM system. While both systems have shown the ability to map large (60-70km) outdoor locations of approximately the same scale, for either larger areas or across longer time periods both algorithms encounter difficulties with false positive matches. By combining these algorithms using a mapping between appearance and pose space, both false positives and false negatives generated by FAB-MAP are significantly reduced during outdoor mapping using a forward-facing camera. The hybrid FAB-MAP-RatSLAM system developed demonstrates the potential for successful SLAM over large periods of time.
Resumo:
Cloud computing is a latest new computing paradigm where applications, data and IT services are provided over the Internet. Cloud computing has become a main medium for Software as a Service (SaaS) providers to host their SaaS as it can provide the scalability a SaaS requires. The challenges in the composite SaaS placement process rely on several factors including the large size of the Cloud network, SaaS competing resource requirements, SaaS interactions between its components and SaaS interactions with its data components. However, existing applications’ placement methods in data centres are not concerned with the placement of the component’s data. In addition, a Cloud network is much larger than data center networks that have been discussed in existing studies. This paper proposes a penalty-based genetic algorithm (GA) to the composite SaaS placement problem in the Cloud. We believe this is the first attempt to the SaaS placement with its data in Cloud provider’s servers. Experimental results demonstrate the feasibility and the scalability of the GA.
Dynamic analysis of on-board mass data to determine tampering in heavy vehicle on-board mass systems
Resumo:
Transport Certification Australia Limited, jointly with the National Transport Commission, has undertaken a project to investigate the feasibility of on-board mass monitoring (OBM) devices for regulatory purposes. OBM increases jurisdictional confidence in operational heavy vehicle compliance. This paper covers technical issues regarding potential use of dynamic data from OBM systems to indicate that tampering has occurred. Tamper-evidence and accuracy of current OBM systems needed to be determined before any regulatory schemes were put in place for its use. Tests performed to determine potential for, and ease of, tampering. An algorithm was developed to detect tamper events. Its results are detailed.
Resumo:
The study described in this paper developed a model of animal movement, which explicitly recognised each individual as the central unit of measure. The model was developed by learning from a real dataset that measured and calculated, for individual cows in a herd, their linear and angular positions and directional and angular speeds. Two learning algorithms were implemented: a Hidden Markov model (HMM) and a long-term prediction algorithm. It is shown that a HMM can be used to describe the animal's movement and state transition behaviour within several “stay” areas where cows remained for long periods. Model parameters were estimated for hidden behaviour states such as relocating, foraging and bedding. For cows’ movement between the “stay” areas a long-term prediction algorithm was implemented. By combining these two algorithms it was possible to develop a successful model, which achieved similar results to the animal behaviour data collected. This modelling methodology could easily be applied to interactions of other animal species.
Resumo:
We present the design and deployment results for PosNet - a large-scale, long-duration sensor network that gathers summary position and status information from mobile nodes. The mobile nodes have a fixed-sized memory buffer to which position data is added at a constant rate, and from which data is downloaded at a non-constant rate. We have developed a novel algorithm that performs online summarization of position data within the buffer, where the algorithm naturally accommodates data input and output rate mismatch, and also provides a delay-tolerant approach to data transport. The algorithm has been extensively tested in a large-scale long-duration cattle monitoring and control application.
Resumo:
Cloud computing has become a main medium for Software as a Service (SaaS) hosting as it can provide the scalability a SaaS requires. One of the challenges in hosting the SaaS is the placement process where the placement has to consider SaaS interactions between its components and SaaS interactions with its data components. A previous research has tackled this problem using a classical genetic algorithm (GA) approach. This paper proposes a cooperative coevolutionary algorithm (CCEA) approach. The CCEA has been implemented and evaluated and the result has shown that the CCEA has produced higher quality solutions compared to the GA.
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
A new explicit rate allocation algorithm is proposed for achieving generic weight-proportional max-min (GWPMM) fairness in asynchronous transfer mode (ATM) available bit rate services. This algorithm scales well with a fixed computational complexity of O(1) and can realise GWPMM fair rate allocation in an ATM network accurately.
Resumo:
Advances in data mining have provided techniques for automatically discovering underlying knowledge and extracting useful information from large volumes of data. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large complex databases. Application of data mining to manufacturing is relatively limited mainly because of complexity of manufacturing data. Growing self organizing map (GSOM) algorithm has been proven to be an efficient algorithm to analyze unsupervised DNA data. However, it produced unsatisfactory clustering when used on some large manufacturing data. In this paper a data mining methodology has been proposed using a GSOM tool which was developed using a modified GSOM algorithm. The proposed method is used to generate clusters for good and faulty products from a manufacturing dataset. The clustering quality (CQ) measure proposed in the paper is used to evaluate the performance of the cluster maps. The paper also proposed an automatic identification of variables to find the most probable causative factor(s) that discriminate between good and faulty product by quickly examining the historical manufacturing data. The proposed method offers the manufacturers to smoothen the production flow and improve the quality of the products. Simulation results on small and large manufacturing data show the effectiveness of the proposed method.
Resumo:
Australia’s Arts and Entertainment Sector underpins cultural and social innovation, improves the quality of community life, is essential to maintaining our cities as world class attractors of talent and investment, and helps create ‘Brand Australia’ in the global marketplace of ideas (QUT Creative Industries Faculty 2010). The sector makes a significant contribution to the Australian economy. So what is the size and nature of this contribution? The Creative Industries Faculty at Queensland University of Technology recently conducted an exercise to source and present statistics in order to produce a data picture of Australia’s Arts and Entertainment Sector. The exercise involved gathering the latest statistics on broadcasting, new media, performing arts, and music composition, distribution and publishing as well as Australia’s performance in world markets.
Resumo:
Decentralised sensor networks typically consist of multiple processing nodes supporting one or more sensors. These nodes are interconnected via wireless communication. Practical applications of Decentralised Data Fusion have generally been restricted to using Gaussian based approaches such as the Kalman or Information Filter This paper proposes the use of Parzen window estimates as an alternate representation to perform Decentralised Data Fusion. It is required that the common information between two nodes be removed from any received estimates before local data fusion may occur Otherwise, estimates may become overconfident due to data incest. A closed form approximation to the division of two estimates is described to enable conservative assimilation of incoming information to a node in a decentralised data fusion network. A simple example of tracking a moving particle with Parzen density estimates is shown to demonstrate how this algorithm allows conservative assimilation of network information.