806 resultados para Fuzzy Clustering
Resumo:
Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.
Resumo:
The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.
Resumo:
Uncertainty plays an important role in water quality management problems. The major sources of uncertainty in a water quality management problem are the random nature of hydrologic variables and imprecision (fuzziness) associated with goals of the dischargers and pollution control agencies (PCA). Many Waste Load Allocation (WLA)problems are solved by considering these two sources of uncertainty. Apart from randomness and fuzziness, missing data in the time series of a hydrologic variable may result in additional uncertainty due to partial ignorance. These uncertainties render the input parameters as imprecise parameters in water quality decision making. In this paper an Imprecise Fuzzy Waste Load Allocation Model (IFWLAM) is developed for water quality management of a river system subject to uncertainty arising from partial ignorance. In a WLA problem, both randomness and imprecision can be addressed simultaneously by fuzzy risk of low water quality. A methodology is developed for the computation of imprecise fuzzy risk of low water quality, when the parameters are characterized by uncertainty due to partial ignorance. A Monte-Carlo simulation is performed to evaluate the imprecise fuzzy risk of low water quality by considering the input variables as imprecise. Fuzzy multiobjective optimization is used to formulate the multiobjective model. The model developed is based on a fuzzy multiobjective optimization problem with max-min as the operator. This usually does not result in a unique solution but gives multiple solutions. Two optimization models are developed to capture all the decision alternatives or multiple solutions. The objective of the two optimization models is to obtain a range of fractional removal levels for the dischargers, such that the resultant fuzzy risk will be within acceptable limits. Specification of a range for fractional removal levels enhances flexibility in decision making. The methodology is demonstrated with a case study of the Tunga-Bhadra river system in India.
Resumo:
A new clustering technique, based on the concept of immediato neighbourhood, with a novel capability to self-learn the number of clusters expected in the unsupervized environment, has been developed. The method compares favourably with other clustering schemes based on distance measures, both in terms of conceptual innovations and computational economy. Test implementation of the scheme using C-l flight line training sample data in a simulated unsupervized mode has brought out the efficacy of the technique. The technique can easily be implemented as a front end to established pattern classification systems with supervized learning capabilities to derive unified learning systems capable of operating in both supervized and unsupervized environments. This makes the technique an attractive proposition in the context of remotely sensed earth resources data analysis wherein it is essential to have such a unified learning system capability.
Resumo:
This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.
Resumo:
Fuzzy Waste Load Allocation Model (FWLAM), developed in an earlier study, derives the optimal fractional levels, for the base flow conditions, considering the goals of the Pollution Control Agency (PCA) and dischargers. The Modified Fuzzy Waste Load Allocation Model (MFWLAM) developed subsequently is a stochastic model and considers the moments (mean, variance and skewness) of water quality indicators, incorporating uncertainty due to randomness of input variables along with uncertainty due to imprecision. The risk of low water quality is reduced significantly by using this modified model, but inclusion of new constraints leads to a low value of acceptability level, A, interpreted as the maximized minimum satisfaction in the system. To improve this value, a new model, which is a combination Of FWLAM and MFWLAM, is presented, allowing for some violations in the constraints of MFWLAM. This combined model is a multiobjective optimization model having the objectives, maximization of acceptability level and minimization of violation of constraints. Fuzzy multiobjective programming, goal programming and fuzzy goal programming are used to find the solutions. For the optimization model, Probabilistic Global Search Lausanne (PGSL) is used as a nonlinear optimization tool. The methodology is applied to a case study of the Tunga-Bhadra river system in south India. The model results in a compromised solution of a higher value of acceptability level as compared to MFWLAM, with a satisfactory value of risk. Thus the goal of risk minimization is achieved with a comparatively better value of acceptability level.
Resumo:
The present study deals with the application of cluster analysis, Fuzzy Cluster Analysis (FCA) and Kohonen Artificial Neural Networks (KANN) methods for classification of 159 meteorological stations in India into meteorologically homogeneous groups. Eight parameters, namely latitude, longitude, elevation, average temperature, humidity, wind speed, sunshine hours and solar radiation, are considered as the classification criteria for grouping. The optimal number of groups is determined as 14 based on the Davies-Bouldin index approach. It is observed that the FCA approach performed better than the other two methodologies for the present study.
Resumo:
A health-monitoring and life-estimation strategy for composite rotor blades is developed in this work. The cross-sectional stiffness reduction obtained by physics-based models is expressed as a function of the life of the structure using a recent phenomenological damage model. This stiffness reduction is further used to study the behavior of measurable system parameters such as blade deflections, loads, and strains of a composite rotor blade in static analysis and forward flight. The simulated measurements are obtained using an aeroelastic analysis of the composite rotor blade based on the finite element in space and time with physics-based damage modes that are then linked to the life consumption of the blade. The model-based measurements are contaminated with noise to simulate real data. Genetic fuzzy systems are developed for global online prediction of physical damage and life consumption using displacement- and force-based measurement deviations between damaged and undamaged conditions. Furthermore, local online prediction of physical damage and life consumption is done using strains measured along the blade length. It is observed that the life consumption in the matrix-cracking zone is about 12-15% and life consumption in debonding/delamination zone is about 45-55% of the total life of the blade. It is also observed that the success rate of the genetic fuzzy systems depends upon the number of measurements, type of measurements and training, and the testing noise level. The genetic fuzzy systems work quite well with noisy data and are recommended for online structural health monitoring of composite helicopter rotor blades.
Resumo:
The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RApid Clustering using k-means), which takes time favorably comparable with the fastest known existing techniques, and (b) we prove an expected bound on the quality of clustering achieved using RACK. Our experimental results on large datasets strongly suggest that RACK is competitive with the k-means algorithm in terms of quality of clustering, while taking significantly less time.
Resumo:
The keyword based search technique suffers from the problem of synonymic and polysemic queries. Current approaches address only theproblem of synonymic queries in which different queries might have the same information requirement. But the problem of polysemic queries,i.e., same query having different intentions, still remains unaddressed. In this paper, we propose the notion of intent clusters, the members of which will have the same intention. We develop a clustering algorithm that uses the user session information in query logs in addition to query URL entries to identify cluster of queries having the same intention. The proposed approach has been studied through case examples from the actual log data from AOL, and the clustering algorithm is shown to be successful in discerning the user intentions.