873 resultados para constrained clustering
Resumo:
Recent technological advances have increased the quantity of movement data being recorded. While valuable knowledge can be gained by analysing such data, its sheer volume creates challenges. Geovisual analytics, which helps the human cognition process by using tools to reason about data, offers powerful techniques to resolve these challenges. This paper introduces such a geovisual analytics environment for exploring movement trajectories, which provides visualisation interfaces, based on the classic space-time cube. Additionally, a new approach, using the mathematical description of motion within a space-time cube, is used to determine the similarity of trajectories and forms the basis for clustering them. These techniques were used to analyse pedestrian movement. The results reveal interesting and useful spatiotemporal patterns and clusters of pedestrians exhibiting similar behaviour.
Resumo:
Increasingly semiconductor manufacturers are exploring opportunities for virtual metrology (VM) enabled process monitoring and control as a means of reducing non-value added metrology and achieving ever more demanding wafer fabrication tolerances. However, developing robust, reliable and interpretable VM models can be very challenging due to the highly correlated input space often associated with the underpinning data sets. A particularly pertinent example is etch rate prediction of plasma etch processes from multichannel optical emission spectroscopy data. This paper proposes a novel input-clustering based forward stepwise regression methodology for VM model building in such highly correlated input spaces. Max Separation Clustering (MSC) is employed as a pre-processing step to identify a reduced srt of well-conditioned, representative variables that can then be used as inputs to state-of-the-art model building techniques such as Forward Selection Regression (FSR), Ridge regression, LASSO and Forward Selection Ridge Regression (FCRR). The methodology is validated on a benchmark semiconductor plasma etch dataset and the results obtained are compared with those achieved when the state-of-art approaches are applied directly to the data without the MSC pre-processing step. Significant performance improvements are observed when MSC is combined with FSR (13%) and FSRR (8.5%), but not with Ridge Regression (-1%) or LASSO (-32%). The optimal VM results are obtained using the MSC-FSR and MSC-FSRR generated models. © 2012 IEEE.
Resumo:
The recent development of the massive multiple-input multiple-output (MIMO) paradigm, has been extensively based on the pursuit of favorable propagation: in the asymptotic limit, the channel vectors become nearly orthogonal and interuser interference tends to zero [1]. In this context, previous studies
have considered fixed inter-antenna distance, which implies an increasing array aperture as the number of elements increases. Here, we focus on a practical, space-constrained topology, where an increase in the number of antenna elements in a fixed total space imposes an inversely proportional decrease in the inter-antenna distance. Our analysis shows that, contrary to existing studies, inter-user interference does not vanish in the massive MIMO regime, thereby creating a saturation effect on the achievable rate.
Resumo:
Approximate execution is a viable technique for energy-con\-strained environments, provided that applications have the mechanisms to produce outputs of the highest possible quality within the given energy budget.
We introduce a framework for energy-constrained execution with controlled and graceful quality loss. A simple programming model allows users to express the relative importance of computations for the quality of the end result, as well as minimum quality requirements. The significance-aware runtime system uses an application-specific analytical energy model to identify the degree of concurrency and approximation that maximizes quality while meeting user-specified energy constraints. Evaluation on a dual-socket 8-core server shows that the proposed
framework predicts the optimal configuration with high accuracy, enabling energy-constrained executions that result in significantly higher quality compared to loop perforation, a compiler approximation technique.
Resumo:
Electric vehicles (EVs) offer great potential to move from fossil fuel dependency in transport once some of the technical barriers related to battery reliability and grid integration are resolved. The European Union has set a target to achieve a 10% reduction in greenhouse gas emissions by 2020 relative to 2005 levels. This target is binding in all the European Union member states. If electric vehicle issues are overcome then the challenge is to use as much renewable energy as possible to achieve this target. In this paper, the impacts of electric vehicle charged in the all-Ireland single wholesale electricity market after the 2020 deadline passes is investigated using a power system dispatch model. For the purpose of this work it is assumed that a 10% electric vehicle target in the Republic of Ireland is not achieved, but instead 8% is reached by 2025 considering the slow market uptake of electric vehicles. Our experimental study shows that the increasing penetration of EVs could contribute to approach the target of the EU and Ireland government on emissions reduction, regardless of different charging scenarios. Furthermore, among various charging scenarios, the off-peak charging is the best approach, contributing 2.07% to the target of 10% reduction of Greenhouse gas emissions by 2025.
Resumo:
A novel approach for the multi-objective design optimisation of aerofoil profiles is presented. The proposed method aims to exploit the relative strengths of global and local optimisation algorithms, whilst using surrogate models to limit the number of computationally expensive CFD simulations required. The local search stage utilises a re-parameterisation scheme that increases the flexibility of the geometry description by iteratively increasing the number of design variables, enabling superior designs to be generated with minimal user intervention. Capability of the algorithm is demonstrated via the conceptual design of aerofoil sections for use on a lightweight laminar flow business jet. The design case is formulated to account for take-off performance while reducing sensitivity to leading edge contamination. The algorithm successfully manipulates boundary layer transition location to provide a potential set of aerofoils that represent the trade-offs between drag at cruise and climb conditions in the presence of a challenging constraint set. Variations in the underlying flow physics between Pareto-optimal aerofoils are examined to aid understanding of the mechanisms that drive the trade-offs in objective functions.
Resumo:
In this paper we propose a graph stream clustering algorithm with a unied similarity measure on both structural and attribute properties of vertices, with each attribute being treated as a vertex. Unlike others, our approach does not require an input parameter for the number of clusters, instead, it dynamically creates new sketch-based clusters and periodically merges existing similar clusters. Experiments on two publicly available datasets reveal the advantages of our approach in detecting vertex clusters in the graph stream. We provide a detailed investigation into how parameters affect the algorithm performance. We also provide a quantitative evaluation and comparison with a well-known offline community detection algorithm which shows that our streaming algorithm can achieve comparable or better average cluster purity.
Resumo:
Throughout the European Union there is an increasing amount of wind generation being dispatched-down due to the binding of power system operating constraints from high levels of wind generation. This paper examines the impact a system non-synchronous penetration limit has on the dispatch-down of wind and quantifies the significance of interconnector counter-trading to the priority dispatching of wind power. A fully coupled economic dispatch and security constrained unit commitment model of the Single Electricity Market of the Republic of Ireland and Northern Ireland and the British Electricity Trading and Transmission Arrangement was used in this study. The key finding was interconnector counter-trading reduces the impact the system non-synchronous penetration limit has on the dispatch-down of wind. The capability to counter-trade on the interconnectors and an increase in system non-synchronous penetration limit from 50% to 55% reduces the dispatch-down of wind by 311 GW h and decreases total electricity payments to the consumer by €1.72/MW h. In terms of the European Union electricity market integration, the results show the importance of developing individual electricity markets that allow system operators to counter-trade on interconnectors to ensure the priority dispatch of the increasing levels of wind generation.
Resumo:
Introduction
Mild cognitive impairment (MCI) has clinical value in its ability to predict later dementia. A better understanding of cognitive profiles can further help delineate who is most at risk of conversion to dementia. We aimed to (1) examine to what extent the usual MCI subtyping using core criteria corresponds to empirically defined clusters of patients (latent profile analysis [LPA] of continuous neuropsychological data) and (2) compare the two methods of subtyping memory clinic participants in their prediction of conversion to dementia.
Methods
Memory clinic participants (MCI, n = 139) and age-matched controls (n = 98) were recruited. Participants had a full cognitive assessment, and results were grouped (1) according to traditional MCI subtypes and (2) using LPA. MCI participants were followed over approximately 2 years after their initial assessment to monitor for conversion to dementia.
Results
Groups were well matched for age and education. Controls performed significantly better than MCI participants on all cognitive measures. With the traditional analysis, most MCI participants were in the amnestic multidomain subgroup (46.8%) and this group was most at risk of conversion to dementia (63%). From the LPA, a three-profile solution fit the data best. Profile 3 was the largest group (40.3%), the most cognitively impaired, and most at risk of conversion to dementia (68% of the group).
Discussion
LPA provides a useful adjunct in delineating MCI participants most at risk of conversion to dementia and adds confidence to standard categories of clinical inference.
Resumo:
Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop two clustering algorithms toward the outlined goal of building interpretable and reconfigurable cluster models. They generate clusters with associated rules that are composed of conditions on word occurrences or nonoccurrences. The proposed approaches vary in the complexity of the format of the rules; RGC employs disjunctions and conjunctions in rule generation whereas RGC-D rules are simple disjunctions of conditions signifying presence of various words. In both the cases, each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. Rules of the latter kind are easy to interpret, whereas the former leads to more accurate clustering. We show that our approaches outperform the unsupervised decision tree approach for rule-generating clustering and also an approach we provide for generating interpretable models for general clusterings, both by significant margins. We empirically show that the purity and f-measure losses to achieve interpretability can be as little as 3 and 5%, respectively using the algorithms presented herein.
Resumo:
Al Rawi, Anas F., Emiliano Garcia-Palacios, Sonia Aissa, Charalampos C. Tsimenidis, and Bayan S. Sharif. "Dual-Diversity Combining for Constrained Resource Allocation and Throughput Maximization in OFDMA Networks." In Vehicular Technology Conference (VTC Spring), 2013 IEEE 77th, pp. 1-5. IEEE, 2013.
Resumo:
Most traditional data mining algorithms struggle to cope with the sheer scale of data efficiently. In this paper, we propose a general framework to accelerate existing clustering algorithms to cluster large-scale datasets which contain large numbers of attributes, items, and clusters. Our framework makes use of locality sensitive hashing (LSH) to significantly reduce the cluster search space. We also theoretically prove that our framework has a guaranteed error bound in terms of the clustering quality. This framework can be applied to a set of centroid-based clustering algorithms that assign an object to the most similar cluster, and we adopt the popular K-Modes categorical clustering algorithm to present how the framework can be applied. We validated our framework with five synthetic datasets and a real world Yahoo! Answers dataset. The experimental results demonstrate that our framework is able to speed up the existing clustering algorithm between factors of 2 and 6, while maintaining comparable cluster purity.
Resumo:
Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.
Resumo:
One of the most popular techniques of generating classifier ensembles is known as stacking which is based on a meta-learning approach. In this paper, we introduce an alternative method to stacking which is based on cluster analysis. Similar to stacking, instances from a validation set are initially classified by all base classifiers. The output of each classifier is subsequently considered as a new attribute of the instance. Following this, a validation set is divided into clusters according to the new attributes and a small subset of the original attributes of the instances. For each cluster, we find its centroid and calculate its class label. The collection of centroids is considered as a meta-classifier. Experimental results show that the new method outperformed all benchmark methods, namely Majority Voting, Stacking J48, Stacking LR, AdaBoost J48, and Random Forest, in 12 out of 22 data sets. The proposed method has two advantageous properties: it is very robust to relatively small training sets and it can be applied in semi-supervised learning problems. We provide a theoretical investigation regarding the proposed method. This demonstrates that for the method to be successful, the base classifiers applied in the ensemble should have greater than 50% accuracy levels.