282 resultados para Clustering algorithm
Resumo:
Here we present a sequential Monte Carlo (SMC) algorithm that can be used for any one-at-a-time Bayesian sequential design problem in the presence of model uncertainty where discrete data are encountered. Our focus is on adaptive design for model discrimination but the methodology is applicable if one has a different design objective such as parameter estimation or prediction. An SMC algorithm is run in parallel for each model and the algorithm relies on a convenient estimator of the evidence of each model which is essentially a function of importance sampling weights. Other methods for this task such as quadrature, often used in design, suffer from the curse of dimensionality. Approximating posterior model probabilities in this way allows us to use model discrimination utility functions derived from information theory that were previously difficult to compute except for conjugate models. A major benefit of the algorithm is that it requires very little problem specific tuning. We demonstrate the methodology on three applications, including discriminating between models for decline in motor neuron numbers in patients suffering from neurological diseases such as Motor Neuron disease.
Resumo:
While extensive research efforts have been devoted to improve the motorcycle safety, the relationship between the rider behavior and the crash risk is still not well understood.The objective of this study is to evaluate how behavioral factors influence crash risk and to identify the most vulnerable group of motorcyclists. To explore the rider behavior, a questionnaire containing 61-items of impulsive sensation seeking, aggression, and risk-taking behavior was developed. By clustering the crash risk using the medoid portioning algorithm, the log-linear model relating the rider behavior to crash risk has been developed. Results show that crash-involved motorcyclists score higher in all three behavioral traits. Aggressive and high risk-taking motorcyclists are more likely to fall under the high vulnerable group while impulsive sensation seeking is not found to be significant. Defining personality types from aggression and risk-taking behavior, “Extrovert” and “Follower” personality type of motorcyclists are more prone to crashes. The findings of this study will be useful for road safety campaign planners to be more focused in the target group as well as those who employ motorcyclists for their delivery business
Resumo:
Introduction: In Singapore, motorcycle crashes account for 50% of traffic fatalities and 53% of injuries. While extensive research efforts have been devoted to improve the motorcycle safety, the relationship between the rider behavior and the crash risk is still not well understood. The objective of this study is to evaluate how behavioral factors influence crash risk and to identify the most vulnerable group of motorcyclists. Methods: To explore the rider behavior, a 61-item questionnaire examining sensation seeking (Zuckerman et al., 1978), impulsiveness (Eysenck et al., 1985), aggressiveness (Buss & Perry, 1992), and risk-taking behavior (Weber et al., 2002) was developed. A total of 240 respondents with at least one year riding experience form the sample that relate behavior to their crash history, traffic penalty awareness, and demographic characteristics. By clustering the crash risk using the medoid portioning algorithm, the log-linear model relating the rider behavior to crash risk was developed. Results and Discussions: Crash-involved motorcyclists scored higher in impulsive sensation seeking, aggression and risk-taking behavior. Aggressive and high risk-taking motorcyclists were respectively 1.30 and 2.21 times more likely to fall under the high crash involvement group while impulsive sensation seeking was not found to be significant. Based on the scores on risk-taking and aggression, the motorcyclists were clustered into four distinct personality combinations namely, extrovert (aggressive, impulsive risk-takers), leader (cautious, aggressive risk-takers), follower (agreeable, ignorant risk-takers), and introvert (self-consciousness, fainthearted risk-takers). “Extrovert” motorcyclists were most prone to crashes, being 3.34 times more likely to involve in crash and 8.29 times more vulnerable than the “introvert”. Mediating factors like demographic characteristics, riding experience, and traffic penalty awareness were found not to be significant in reducing crash risk. Conclusion: The findings of this study will be useful for road safety campaign planners to be more focused in the target group as well as those who employ motorcyclists for their delivery business.
Resumo:
The Wright-Fisher model is an Itô stochastic differential equation that was originally introduced to model genetic drift within finite populations and has recently been used as an approximation to ion channel dynamics within cardiac and neuronal cells. While analytic solutions to this equation remain within the interval [0,1], current numerical methods are unable to preserve such boundaries in the approximation. We present a new numerical method that guarantees approximations to a form of Wright-Fisher model, which includes mutation, remain within [0,1] for all time with probability one. Strong convergence of the method is proved and numerical experiments suggest that this new scheme converges with strong order 1/2. Extending this method to a multidimensional case, numerical tests suggest that the algorithm still converges strongly with order 1/2. Finally, numerical solutions obtained using this new method are compared to those obtained using the Euler-Maruyama method where the Wiener increment is resampled to ensure solutions remain within [0,1].
Resumo:
A composite SaaS (Software as a Service) is a software that is comprised of several software components and data components. The composite SaaS placement problem is to determine where each of the components should be deployed in a cloud computing environment such that the performance of the composite SaaS is optimal. From the computational point of view, the composite SaaS placement problem is a large-scale combinatorial optimization problem. Thus, an Iterative Cooperative Co-evolutionary Genetic Algorithm (ICCGA) was proposed. The ICCGA can find reasonable quality of solutions. However, its computation time is noticeably slow. Aiming at improving the computation time, we propose an unsynchronized Parallel Cooperative Co-evolutionary Genetic Algorithm (PCCGA) in this paper. Experimental results have shown that the PCCGA not only has quicker computation time, but also generates better quality of solutions than the ICCGA.
Resumo:
In this paper, the goal of identifying disease subgroups based on differences in observed symptom profile is considered. Commonly referred to as phenotype identification, solutions to this task often involve the application of unsupervised clustering techniques. In this paper, we investigate the application of a Dirichlet Process mixture (DPM) model for this task. This model is defined by the placement of the Dirichlet Process (DP) on the unknown components of a mixture model, allowing for the expression of uncertainty about the partitioning of observed data into homogeneous subgroups. To exemplify this approach, an application to phenotype identification in Parkinson’s disease (PD) is considered, with symptom profiles collected using the Unified Parkinson’s Disease Rating Scale (UPDRS). Clustering, Dirichlet Process mixture, Parkinson’s disease, UPDRS.
Resumo:
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.
Resumo:
The 2010 LAGI competition was held on three underutilized sites in the United Arab Emirates. By choosing Staten Island, New York in 2012 the competition organises have again brought into question new roles for public open space in the contemporary city. In the case of the UEA sites, the competition produced many entries which aimed to create a sculpture and by doing so, they attracted people to the selected empty spaces in an arid climate. In a way these proposals were the incubators and the new characters of these empty spaces. The competition was thus successful at advancing understandings of the expanded role of public open spaces in EAU and elsewhere. LAGI 2012 differs significantly to the UAE program because Fresh Kills Park has already been planned as a public open space for New Yorkers - with or without these clean energy sculptures. Furthermore, Fresh Kills Park is already an (gas) energy generating site in its own right. We believe Fresh Kills Park, as a site, presents a problem which somewhat transcends the aims of the competition brief. Advancing a sustainable urban design proposition for the site therefore requires a fundamental reconsideration of the established paradigms public open space. Hence our strategy is to not only create an energy generating, site specific art work, but to create synergy between the public and the site engagement while at the same time complement the idiosyncrasies of the pre-existing engineered landscape. Current PhD research about energy generation in public open spaces informs this work.
Resumo:
Improving energy efficiency has become increasingly important in data centers in recent years to reduce the rapidly growing tremendous amounts of electricity consumption. The power dissipation of the physical servers is the root cause of power usage of other systems, such as cooling systems. Many efforts have been made to make data centers more energy efficient. One of them is to minimize the total power consumption of these servers in a data center through virtual machine consolidation, which is implemented by virtual machine placement. The placement problem is often modeled as a bin packing problem. Due to the NP-hard nature of the problem, heuristic solutions such as First Fit and Best Fit algorithms have been often used and have generally good results. However, their performance leaves room for further improvement. In this paper we propose a Simulated Annealing based algorithm, which aims at further improvement from any feasible placement. This is the first published attempt of using SA to solve the VM placement problem to optimize the power consumption. Experimental results show that this SA algorithm can generate better results, saving up to 25 percentage more energy than First Fit Decreasing in an acceptable time frame.
Resumo:
Server consolidation using virtualization technology has become an important technology to improve the energy efficiency of data centers. Virtual machine placement is the key in the server consolidation. In the past few years, many approaches to the virtual machine placement have been proposed. However, existing virtual machine placement approaches to the virtual machine placement problem consider the energy consumption by physical machines in a data center only, but do not consider the energy consumption in communication network in the data center. However, the energy consumption in the communication network in a data center is not trivial, and therefore should be considered in the virtual machine placement in order to make the data center more energy-efficient. In this paper, we propose a genetic algorithm for a new virtual machine placement problem that considers the energy consumption in both the servers and the communication network in the data center. Experimental results show that the genetic algorithm performs well when tackling test problems of different kinds, and scales up well when the problem size increases.
Resumo:
A simple and effective down-sample algorithm, Peak-Hold-Down-Sample (PHDS) algorithm is developed in this paper to enable a rapid and efficient data transfer in remote condition monitoring applications. The algorithm is particularly useful for high frequency Condition Monitoring (CM) techniques, and for low speed machine applications since the combination of the high sampling frequency and low rotating speed will generally lead to large unwieldy data size. The effectiveness of the algorithm was evaluated and tested on four sets of data in the study. One set of the data was extracted from the condition monitoring signal of a practical industry application. Another set of data was acquired from a low speed machine test rig in the laboratory. The other two sets of data were computer simulated bearing defect signals having either a single or multiple bearing defects. The results disclose that the PHDS algorithm can substantially reduce the size of data while preserving the critical bearing defect information for all the data sets used in this work even when a large down-sample ratio was used (i.e., 500 times down-sampled). In contrast, the down-sample process using existing normal down-sample technique in signal processing eliminates the useful and critical information such as bearing defect frequencies in a signal when the same down-sample ratio was employed. Noise and artificial frequency components were also induced by the normal down-sample technique, thus limits its usefulness for machine condition monitoring applications.
Resumo:
A fundamental problem faced by stereo vision algorithms is that of determining correspondences between two images which comprise a stereo pair. This paper presents work towards the development of a new matching algorithm, based on the rank transform. This algorithm makes use of both area-based and edge-based information, and is therefore referred to as a hybrid algorithm. In addition, this algorithm uses a number of matching constraints,including the novel rank constraint. Results obtained using a number of test pairs show that the matching algorithm is capable of removing a significant proportion of invalid matches. The accuracy of matching in the vicinity of edges is also improved.
Resumo:
A fundamental problem faced by stereo vision algorithms is that of determining correspondences between two images which comprise a stereo pair. This paper presents work towards the development of a new matching algorithm, based on the rank transform. This algorithm makes use of both area-based and edge-based information, and is therefore referred to as a hybrid algorithm. In addition, this algorithm uses a number of matching constraints, including the novel rank constraint. Results obtained using a number of test pairs show that the matching algorithm is capable of removing most invalid matches. The accuracy of matching in the vicinity of edges is also improved.
Resumo:
Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers
Resumo:
This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.