819 resultados para Hybrid clustering algorithm
Resumo:
The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Hybrid simulation is a technique that combines experimental and numerical testing and has been used for the last decades in the fields of aerospace, civil and mechanical engineering. During this time, most of the research has focused on developing algorithms and the necessary technology, including but not limited to, error minimisation techniques, phase lag compensation and faster hydraulic cylinders. However, one of the main shortcomings in hybrid simulation that has pre- vented its widespread use is the size of the numerical models and the effect that higher frequencies may have on the stability and accuracy of the simulation. The first chapter in this document provides an overview of the hybrid simulation method and the different hybrid simulation schemes, and the corresponding time integration algorithms, that are more commonly used in this field. The scope of this thesis is presented in more detail in chapter 2: a substructure algorithm, the Substep Force Feedback (Subfeed), is adapted in order to fulfil the necessary requirements in terms of speed. The effects of more complex models on the Subfeed are also studied in detail, and the improvements made are validated experimentally. Chapters 3 and 4 detail the methodologies that have been used in order to accomplish the objectives mentioned in the previous lines, listing the different cases of study and detailing the hardware and software used to experimentally validate them. The third chapter contains a brief introduction to a project, the DFG Subshake, whose data have been used as a starting point for the developments that are shown later in this thesis. The results obtained are presented in chapters 5 and 6, with the first of them focusing on purely numerical simulations while the second of them is more oriented towards a more practical application including experimental real-time hybrid simulation tests with large numerical models. Following the discussion of the developments in this thesis is a list of hardware and software requirements that have to be met in order to apply the methods described in this document, and they can be found in chapter 7. The last chapter, chapter 8, of this thesis focuses on conclusions and achievements extracted from the results, namely: the adaptation of the hybrid simulation algorithm Subfeed to be used in conjunction with large numerical models, the study of the effect of high frequencies on the substructure algorithm and experimental real-time hybrid simulation tests with vibrating subsystems using large numerical models and shake tables. A brief discussion of possible future research activities can be found in the concluding chapter.
Resumo:
Automatic detection of suspicious activities in CCTV camera feeds is crucial to the success of video surveillance systems. Such a capability can help transform the dumb CCTV cameras into smart surveillance tools for fighting crime and terror. Learning and classification of basic human actions is a precursor to detecting suspicious activities. Most of the current approaches rely on a non-realistic assumption that a complete dataset of normal human actions is available. This paper presents a different approach to deal with the problem of understanding human actions in video when no prior information is available. This is achieved by working with an incomplete dataset of basic actions which are continuously updated. Initially, all video segments are represented by Bags-Of-Words (BOW) method using only Term Frequency-Inverse Document Frequency (TF-IDF) features. Then, a data-stream clustering algorithm is applied for updating the system's knowledge from the incoming video feeds. Finally, all the actions are classified into different sets. Experiments and comparisons are conducted on the well known Weizmann and KTH datasets to show the efficacy of the proposed approach.
Resumo:
Accurate road lane information is crucial for advanced vehicle navigation and safety applications. With the increasing of very high resolution (VHR) imagery of astonishing quality provided by digital airborne sources, it will greatly facilitate the data acquisition and also significantly reduce the cost of data collection and updates if the road details can be automatically extracted from the aerial images. In this paper, we proposed an effective approach to detect road lanes from aerial images with employment of the image analysis procedures. This algorithm starts with constructing the (Digital Surface Model) DSM and true orthophotos from the stereo images. Next, a maximum likelihood clustering algorithm is used to separate road from other ground objects. After the detection of road surface, the road traffic and lane lines are further detected using texture enhancement and morphological operations. Finally, the generated road network is evaluated to test the performance of the proposed approach, in which the datasets provided by Queensland department of Main Roads are used. The experiment result proves the effectiveness of our approach.
Resumo:
The ad hoc networks are vulnerable to attacks due to distributed nature and lack of infrastructure. Intrusion detection systems (IDS) provide audit and monitoring capabilities that offer the local security to a node and help to perceive the specific trust level of other nodes. The clustering protocols can be taken as an additional advantage in these processing constrained networks to collaboratively detect intrusions with less power usage and minimal overhead. Existing clustering protocols are not suitable for intrusion detection purposes, because they are linked with the routes. The route establishment and route renewal affects the clusters and as a consequence, the processing and traffic overhead increases due to instability of clusters. The ad hoc networks are battery and power constraint, and therefore a trusted monitoring node should be available to detect and respond against intrusions in time. This can be achieved only if the clusters are stable for a long period of time. If the clusters are regularly changed due to routes, the intrusion detection will not prove to be effective. Therefore, a generalized clustering algorithm has been proposed that can run on top of any routing protocol and can monitor the intrusions constantly irrespective of the routes. The proposed simplified clustering scheme has been used to detect intrusions, resulting in high detection rates and low processing and memory overhead irrespective of the routes, connections, traffic types and mobility of nodes in the network. Clustering is also useful to detect intrusions collaboratively since an individual node can neither detect the malicious node alone nor it can take action against that node on its own.
Resumo:
Stereo vision is a method of depth perception, in which depth information is inferred from two (or more) images of a scene, taken from different perspectives. Practical applications for stereo vision include aerial photogrammetry, autonomous vehicle guidance, robotics and industrial automation. The initial motivation behind this work was to produce a stereo vision sensor for mining automation applications. For such applications, the input stereo images would consist of close range scenes of rocks. A fundamental problem faced by matching algorithms is the matching or correspondence problem. This problem involves locating corresponding points or features in two images. For this application, speed, reliability, and the ability to produce a dense depth map are of foremost importance. This work implemented a number of areabased matching algorithms to assess their suitability for this application. Area-based techniques were investigated because of their potential to yield dense depth maps, their amenability to fast hardware implementation, and their suitability to textured scenes such as rocks. In addition, two non-parametric transforms, the rank and census, were also compared. Both the rank and the census transforms were found to result in improved reliability of matching in the presence of radiometric distortion - significant since radiometric distortion is a problem which commonly arises in practice. In addition, they have low computational complexity, making them amenable to fast hardware implementation. Therefore, it was decided that matching algorithms using these transforms would be the subject of the remainder of the thesis. An analytic expression for the process of matching using the rank transform was derived from first principles. This work resulted in a number of important contributions. Firstly, the derivation process resulted in one constraint which must be satisfied for a correct match. This was termed the rank constraint. The theoretical derivation of this constraint is in contrast to the existing matching constraints which have little theoretical basis. Experimental work with actual and contrived stereo pairs has shown that the new constraint is capable of resolving ambiguous matches, thereby improving match reliability. Secondly, a novel matching algorithm incorporating the rank constraint has been proposed. This algorithm was tested using a number of stereo pairs. In all cases, the modified algorithm consistently resulted in an increased proportion of correct matches. Finally, the rank constraint was used to devise a new method for identifying regions of an image where the rank transform, and hence matching, are more susceptible to noise. The rank constraint was also incorporated into a new hybrid matching algorithm, where it was combined a number of other ideas. These included the use of an image pyramid for match prediction, and a method of edge localisation to improve match accuracy in the vicinity of edges. Experimental results obtained from the new algorithm showed that the algorithm is able to remove a large proportion of invalid matches, and improve match accuracy.
Resumo:
Speaker verification is the process of verifying the identity of a person by analysing their speech. There are several important applications for automatic speaker verification (ASV) technology including suspect identification, tracking terrorists and detecting a person’s presence at a remote location in the surveillance domain, as well as person authentication for phone banking and credit card transactions in the private sector. Telephones and telephony networks provide a natural medium for these applications. The aim of this work is to improve the usefulness of ASV technology for practical applications in the presence of adverse conditions. In a telephony environment, background noise, handset mismatch, channel distortions, room acoustics and restrictions on the available testing and training data are common sources of errors for ASV systems. Two research themes were pursued to overcome these adverse conditions: Modelling mismatch and modelling uncertainty. To directly address the performance degradation incurred through mismatched conditions it was proposed to directly model this mismatch. Feature mapping was evaluated for combating handset mismatch and was extended through the use of a blind clustering algorithm to remove the need for accurate handset labels for the training data. Mismatch modelling was then generalised by explicitly modelling the session conditions as a constrained offset of the speaker model means. This session variability modelling approach enabled the modelling of arbitrary sources of mismatch, including handset type, and halved the error rates in many cases. Methods to model the uncertainty in speaker model estimates and verification scores were developed to address the difficulties of limited training and testing data. The Bayes factor was introduced to account for the uncertainty of the speaker model estimates in testing by applying Bayesian theory to the verification criterion, with improved performance in matched conditions. Modelling the uncertainty in the verification score itself met with significant success. Estimating a confidence interval for the "true" verification score enabled an order of magnitude reduction in the average quantity of speech required to make a confident verification decision based on a threshold. The confidence measures developed in this work may also have significant applications for forensic speaker verification tasks.
Resumo:
Many large coal mining operations in Australia rely heavily on the rail network to transport coal from mines to coal terminals at ports for shipment. Over the last few years, due to the fast growing demand, the coal rail network is becoming one of the worst industrial bottlenecks in Australia. As a result, this provides great incentives for pursuing better optimisation and control strategies for the operation of the whole rail transportation system under network and terminal capacity constraints. This PhD research aims to achieve a significant efficiency improvement in a coal rail network on the basis of the development of standard modelling approaches and generic solution techniques. Generally, the train scheduling problem can be modelled as a Blocking Parallel- Machine Job-Shop Scheduling (BPMJSS) problem. In a BPMJSS model for train scheduling, trains and sections respectively are synonymous with jobs and machines and an operation is regarded as the movement/traversal of a train across a section. To begin, an improved shifting bottleneck procedure algorithm combined with metaheuristics has been developed to efficiently solve the Parallel-Machine Job- Shop Scheduling (PMJSS) problems without the blocking conditions. Due to the lack of buffer space, the real-life train scheduling should consider blocking or hold-while-wait constraints, which means that a track section cannot release and must hold a train until the next section on the routing becomes available. As a consequence, the problem has been considered as BPMJSS with the blocking conditions. To develop efficient solution techniques for BPMJSS, extensive studies on the nonclassical scheduling problems regarding the various buffer conditions (i.e. blocking, no-wait, limited-buffer, unlimited-buffer and combined-buffer) have been done. In this procedure, an alternative graph as an extension of the classical disjunctive graph is developed and specially designed for the non-classical scheduling problems such as the blocking flow-shop scheduling (BFSS), no-wait flow-shop scheduling (NWFSS), and blocking job-shop scheduling (BJSS) problems. By exploring the blocking characteristics based on the alternative graph, a new algorithm called the topological-sequence algorithm is developed for solving the non-classical scheduling problems. To indicate the preeminence of the proposed algorithm, we compare it with two known algorithms (i.e. Recursive Procedure and Directed Graph) in the literature. Moreover, we define a new type of non-classical scheduling problem, called combined-buffer flow-shop scheduling (CBFSS), which covers four extreme cases: the classical FSS (FSS) with infinite buffer, the blocking FSS (BFSS) with no buffer, the no-wait FSS (NWFSS) and the limited-buffer FSS (LBFSS). After exploring the structural properties of CBFSS, we propose an innovative constructive algorithm named the LK algorithm to construct the feasible CBFSS schedule. Detailed numerical illustrations for the various cases are presented and analysed. By adjusting only the attributes in the data input, the proposed LK algorithm is generic and enables the construction of the feasible schedules for many types of non-classical scheduling problems with different buffer constraints. Inspired by the shifting bottleneck procedure algorithm for PMJSS and characteristic analysis based on the alternative graph for non-classical scheduling problems, a new constructive algorithm called the Feasibility Satisfaction Procedure (FSP) is proposed to obtain the feasible BPMJSS solution. A real-world train scheduling case is used for illustrating and comparing the PMJSS and BPMJSS models. Some real-life applications including considering the train length, upgrading the track sections, accelerating a tardy train and changing the bottleneck sections are discussed. Furthermore, the BPMJSS model is generalised to be a No-Wait Blocking Parallel- Machine Job-Shop Scheduling (NWBPMJSS) problem for scheduling the trains with priorities, in which prioritised trains such as express passenger trains are considered simultaneously with non-prioritised trains such as freight trains. In this case, no-wait conditions, which are more restrictive constraints than blocking constraints, arise when considering the prioritised trains that should traverse continuously without any interruption or any unplanned pauses because of the high cost of waiting during travel. In comparison, non-prioritised trains are allowed to enter the next section immediately if possible or to remain in a section until the next section on the routing becomes available. Based on the FSP algorithm, a more generic algorithm called the SE algorithm is developed to solve a class of train scheduling problems in terms of different conditions in train scheduling environments. To construct the feasible train schedule, the proposed SE algorithm consists of many individual modules including the feasibility-satisfaction procedure, time-determination procedure, tune-up procedure and conflict-resolve procedure algorithms. To find a good train schedule, a two-stage hybrid heuristic algorithm called the SE-BIH algorithm is developed by combining the constructive heuristic (i.e. the SE algorithm) and the local-search heuristic (i.e. the Best-Insertion- Heuristic algorithm). To optimise the train schedule, a three-stage algorithm called the SE-BIH-TS algorithm is developed by combining the tabu search (TS) metaheuristic with the SE-BIH algorithm. Finally, a case study is performed for a complex real-world coal rail network under network and terminal capacity constraints. The computational results validate that the proposed methodology would be very promising because it can be applied as a fundamental tool for modelling and solving many real-world scheduling problems.
Resumo:
A hybrid genetic algorithm/scaled conjugate gradient regularisation method is designed to alleviate ANN `over-fitting'. In application to day-ahead load forecasting, the proposed algorithm performs better than early-stopping and Bayesian regularisation, showing promising initial results.
Resumo:
Video surveillance technology, based on Closed Circuit Television (CCTV) cameras, is one of the fastest growing markets in the field of security technologies. However, the existing video surveillance systems are still not at a stage where they can be used for crime prevention. The systems rely heavily on human observers and are therefore limited by factors such as fatigue and monitoring capabilities over long periods of time. To overcome this limitation, it is necessary to have “intelligent” processes which are able to highlight the salient data and filter out normal conditions that do not pose a threat to security. In order to create such intelligent systems, an understanding of human behaviour, specifically, suspicious behaviour is required. One of the challenges in achieving this is that human behaviour can only be understood correctly in the context in which it appears. Although context has been exploited in the general computer vision domain, it has not been widely used in the automatic suspicious behaviour detection domain. So, it is essential that context has to be formulated, stored and used by the system in order to understand human behaviour. Finally, since surveillance systems could be modeled as largescale data stream systems, it is difficult to have a complete knowledge base. In this case, the systems need to not only continuously update their knowledge but also be able to retrieve the extracted information which is related to the given context. To address these issues, a context-based approach for detecting suspicious behaviour is proposed. In this approach, contextual information is exploited in order to make a better detection. The proposed approach utilises a data stream clustering algorithm in order to discover the behaviour classes and their frequency of occurrences from the incoming behaviour instances. Contextual information is then used in addition to the above information to detect suspicious behaviour. The proposed approach is able to detect observed, unobserved and contextual suspicious behaviour. Two case studies using video feeds taken from CAVIAR dataset and Z-block building, Queensland University of Technology are presented in order to test the proposed approach. From these experiments, it is shown that by using information about context, the proposed system is able to make a more accurate detection, especially those behaviours which are only suspicious in some contexts while being normal in the others. Moreover, this information give critical feedback to the system designers to refine the system. Finally, the proposed modified Clustream algorithm enables the system to both continuously update the system’s knowledge and to effectively retrieve the information learned in a given context. The outcomes from this research are: (a) A context-based framework for automatic detecting suspicious behaviour which can be used by an intelligent video surveillance in making decisions; (b) A modified Clustream data stream clustering algorithm which continuously updates the system knowledge and is able to retrieve contextually related information effectively; and (c) An update-describe approach which extends the capability of the existing human local motion features called interest points based features to the data stream environment.
Resumo:
Continuous user authentication with keystroke dynamics uses characters sequences as features. Since users can type characters in any order, it is imperative to find character sequences (n-graphs) that are representative of user typing behavior. The contemporary feature selection approaches do not guarantee selecting frequently-typed features which may cause less accurate statistical user-representation. Furthermore, the selected features do not inherently reflect user typing behavior. We propose four statistical based feature selection techniques that mitigate limitations of existing approaches. The first technique selects the most frequently occurring features. The other three consider different user typing behaviors by selecting: n-graphs that are typed quickly; n-graphs that are typed with consistent time; and n-graphs that have large time variance among users. We use Gunetti’s keystroke dataset and k-means clustering algorithm for our experiments. The results show that among the proposed techniques, the most-frequent feature selection technique can effectively find user representative features. We further substantiate our results by comparing the most-frequent feature selection technique with three existing approaches (popular Italian words, common n-graphs, and least frequent ngraphs). We find that it performs better than the existing approaches after selecting a certain number of most-frequent n-graphs.
Resumo:
Video surveillance systems using Closed Circuit Television (CCTV) cameras, is one of the fastest growing areas in the field of security technologies. However, the existing video surveillance systems are still not at a stage where they can be used for crime prevention. The systems rely heavily on human observers and are therefore limited by factors such as fatigue and monitoring capabilities over long periods of time. This work attempts to address these problems by proposing an automatic suspicious behaviour detection which utilises contextual information. The utilisation of contextual information is done via three main components: a context space model, a data stream clustering algorithm, and an inference algorithm. The utilisation of contextual information is still limited in the domain of suspicious behaviour detection. Furthermore, it is nearly impossible to correctly understand human behaviour without considering the context where it is observed. This work presents experiments using video feeds taken from CAVIAR dataset and a camera mounted on one of the buildings Z-Block) at the Queensland University of Technology, Australia. From these experiments, it is shown that by exploiting contextual information, the proposed system is able to make more accurate detections, especially of those behaviours which are only suspicious in some contexts while being normal in the others. Moreover, this information gives critical feedback to the system designers to refine the system.