282 resultados para clustering algorithm
Resumo:
Background Heatwaves could cause the population excess death numbers to be ranged from tens to thousands within a couple of weeks in a local area. An excess mortality due to a special event (e.g., a heatwave or an epidemic outbreak) is estimated by subtracting the mortality figure under ‘normal’ conditions from the historical daily mortality records. The calculation of the excess mortality is a scientific challenge because of the stochastic temporal pattern of the daily mortality data which is characterised by (a) the long-term changing mean levels (i.e., non-stationarity); (b) the non-linear temperature-mortality association. The Hilbert-Huang Transform (HHT) algorithm is a novel method originally developed for analysing the non-linear and non-stationary time series data in the field of signal processing, however, it has not been applied in public health research. This paper aimed to demonstrate the applicability and strength of the HHT algorithm in analysing health data. Methods Special R functions were developed to implement the HHT algorithm to decompose the daily mortality time series into trend and non-trend components in terms of the underlying physical mechanism. The excess mortality is calculated directly from the resulting non-trend component series. Results The Brisbane (Queensland, Australia) and the Chicago (United States) daily mortality time series data were utilized for calculating the excess mortality associated with heatwaves. The HHT algorithm estimated 62 excess deaths related to the February 2004 Brisbane heatwave. To calculate the excess mortality associated with the July 1995 Chicago heatwave, the HHT algorithm needed to handle the mode mixing issue. The HHT algorithm estimated 510 excess deaths for the 1995 Chicago heatwave event. To exemplify potential applications, the HHT decomposition results were used as the input data for a subsequent regression analysis, using the Brisbane data, to investigate the association between excess mortality and different risk factors. Conclusions The HHT algorithm is a novel and powerful analytical tool in time series data analysis. It has a real potential to have a wide range of applications in public health research because of its ability to decompose a nonlinear and non-stationary time series into trend and non-trend components consistently and efficiently.
Resumo:
The Common Scrambling Algorithm Stream Cipher (CSASC) is a shift register based stream cipher designed to encrypt digital video broadcast. CSA-SC produces a pseudo-random binary sequence that is used to mask the contents of the transmission. In this paper, we analyse the initialisation process of the CSA-SC keystream generator and demonstrate weaknesses which lead to state convergence, slid pairs and shifted keystreams. As a result, the cipher may be vulnerable to distinguishing attacks, time-memory-data trade-off attacks or slide attacks.
Resumo:
Organisations are constantly seeking new ways to improve operational efficiencies. This research study investigates a novel way to identify potential efficiency gains in business operations by observing how they are carried out in the past and then exploring better ways of executing them by taking into account trade-offs between time, cost and resource utilisation. This paper demonstrates how they can be incorporated in the assessment of alternative process execution scenarios by making use of a cost environment. A genetic algorithm-based approach is proposed to explore and assess alternative process execution scenarios, where the objective function is represented by a comprehensive cost structure that captures different process dimensions. Experiments conducted with different variants of the genetic algorithm evaluate the approach's feasibility. The findings demonstrate that a genetic algorithm-based approach is able to make use of cost reduction as a way to identify improved execution scenarios in terms of reduced case durations and increased resource utilisation. The ultimate aim is to utilise cost-related insights gained from such improved scenarios to put forward recommendations for reducing process-related cost within organisations.
Resumo:
Live migration of multiple Virtual Machines (VMs) has become an integral management activity in data centers for power saving, load balancing and system maintenance. While state-of-the-art live migration techniques focus on the improvement of migration performance of an independent single VM, only a little has been investigated to the case of live migration of multiple interacting VMs. Live migration is mostly influenced by the network bandwidth and arbitrarily migrating a VM which has data inter-dependencies with other VMs may increase the bandwidth consumption and adversely affect the performances of subsequent migrations. In this paper, we propose a Random Key Genetic Algorithm (RKGA) that efficiently schedules the migration of a given set of VMs accounting both inter-VM dependency and data center communication network. The experimental results show that the RKGA can schedule the migration of multiple VMs with significantly shorter total migration time and total downtime compared to a heuristic algorithm.
Resumo:
This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.
Resumo:
The work presented in this report is aimed to implement a cost-effective offline mission path planner for aerial inspection tasks of large linear infrastructures. Like most real-world optimisation problems, mission path planning involves a number of objectives which ideally should be minimised simultaneously. Understandably, the objectives of a practical optimisation problem are conflicting each other and the minimisation of one of them necessarily implies the impossibility to minimise the other ones. This leads to the need to find a set of optimal solutions for the problem; once such a set of available options is produced, the mission planning problem is reduced to a decision making problem for the mission specialists, who will choose the solution which best fit the requirements of the mission. The goal of this work is then to develop a Multi-Objective optimisation tool able to provide the mission specialists a set of optimal solutions for the inspection task amongst which the final trajectory will be chosen, given the environment data, the mission requirements and the definition of the objectives to minimise. All the possible optimal solutions of a Multi-Objective optimisation problem are said to form the Pareto-optimal front of the problem. For any of the Pareto-optimal solutions, it is impossible to improve one objective without worsening at least another one. Amongst a set of Pareto-optimal solutions, no solution is absolutely better than another and the final choice must be a trade-off of the objectives of the problem. Multi-Objective Evolutionary Algorithms (MOEAs) are recognised to be a convenient method for exploring the Pareto-optimal front of Multi-Objective optimization problems. Their efficiency is due to their parallelism architecture which allows to find several optimal solutions at each time
Resumo:
In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.
Resumo:
We present a novel method for improving hierarchical speaker clustering in the tasks of speaker diarization and speaker linking. In hierarchical clustering, a tree can be formed that demonstrates various levels of clustering. We propose a ratio that expresses the impact of each cluster on the formation of this tree and use this to rescale cluster scores. This provides score normalisation based on the impact of each cluster. We use a state-of-the-art speaker diarization and linking system across the SAIVT-BNEWS corpus to show that our proposed impact ratio can provide a relative improvement of 16% in diarization error rate (DER).
Resumo:
A computationally efficient sequential Monte Carlo algorithm is proposed for the sequential design of experiments for the collection of block data described by mixed effects models. The difficulty in applying a sequential Monte Carlo algorithm in such settings is the need to evaluate the observed data likelihood, which is typically intractable for all but linear Gaussian models. To overcome this difficulty, we propose to unbiasedly estimate the likelihood, and perform inference and make decisions based on an exact-approximate algorithm. Two estimates are proposed: using Quasi Monte Carlo methods and using the Laplace approximation with importance sampling. Both of these approaches can be computationally expensive, so we propose exploiting parallel computational architectures to ensure designs can be derived in a timely manner. We also extend our approach to allow for model uncertainty. This research is motivated by important pharmacological studies related to the treatment of critically ill patients.
Resumo:
Network Real-Time Kinematic (NRTK) is a technology that can provide centimeter-level accuracy positioning services in real time, and it is enabled by a network of Continuously Operating Reference Stations (CORS). The location-oriented CORS placement problem is an important problem in the design of a NRTK as it will directly affect not only the installation and operational cost of the NRTK, but also the quality of positioning services provided by the NRTK. This paper presents a Memetic Algorithm (MA) for the location-oriented CORS placement problem, which hybridizes the powerful explorative search capacity of a genetic algorithm and the efficient and effective exploitative search capacity of a local optimization. Experimental results have shown that the MA has better performance than existing approaches. In this paper we also conduct an empirical study about the scalability of the MA, effectiveness of the hybridization technique and selection of crossover operator in the MA.
Resumo:
The problem of clustering a large document collection is not only challenged by the number of documents and the number of dimensions, but it is also affected by the number and sizes of the clusters. Traditional clustering methods fail to scale when they need to generate a large number of clusters. Furthermore, when the clusters size in the solution is heterogeneous, i.e. some of the clusters are large in size, the similarity measures tend to degrade. A ranking based clustering method is proposed to deal with these issues in the context of the Social Event Detection task. Ranking scores are used to select a small number of most relevant clusters in order to compare and place a document. Additionally,instead of conventional cluster centroids, cluster patches are proposed to represent clusters, that are hubs-like set of documents. Text, temporal, spatial and visual content information collected from the social event images is utilized in calculating similarity. Results show that these strategies allow us to have a balance between performance and accuracy of the clustering solution gained by the clustering method.
Resumo:
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.
Resumo:
This paper presents an algorithm for mining unordered embedded subtrees using the balanced-optimal-search canonical form (BOCF). A tree structure guided scheme based enumeration approach is defined using BOCF for systematically enumerating the valid subtrees only. Based on this canonical form and enumeration technique, the balanced optimal search embedded subtree mining algorithm (BEST) is introduced for mining embedded subtrees from a database of labelled rooted unordered trees. The extensive experiments on both synthetic and real datasets demonstrate the efficiency of BEST over the two state-of-the-art algorithms for mining embedded unordered subtrees, SLEUTH and U3.