873 resultados para constrained clustering
Resumo:
The paper deals with a model-theoretic approach to clustering. The approach can be used to generate cluster description based on knowledge alone. Such a process of generating descriptions would be extremely useful in clustering partially specified objects. A natural byproduct of the proposed approach is that missing values of attributes of an object can be estimated with ease in a meaningful fashion. An important feature of the approach is that noisy objects can be detected effectively, leading to the formation of natural groups. The proposed algorithm is applied to a library database consisting of a collection of books.
Resumo:
Relative geometric arrangements of the sample points, with reference to the structure of the imbedding space, produce clusters. Hence, if each sample point is imagined to acquire a volume of a small M-cube (called pattern-cell), depending on the ranges of its (M) features and number (N) of samples; then overlapping pattern-cells would indicate naturally closer sample-points. A chain or blob of such overlapping cells would mean a cluster and separate clusters would not share a common pattern-cell between them. The conditions and an analytic method to find such an overlap are developed. A simple, intuitive, nonparametric clustering procedure, based on such overlapping pattern-cells is presented. It may be classified as an agglomerative, hierarchical, linkage-type clustering procedure. The algorithm is fast, requires low storage and can identify irregular clusters. Two extensions of the algorithm, to separate overlapping clusters and to estimate the nature of pattern distributions in the sample space, are also indicated.
Resumo:
Clustering is a process of partitioning a given set of patterns into meaningful groups. The clustering process can be viewed as consisting of the following three phases: (i) feature selection phase, (ii) classification phase, and (iii) description generation phase. Conventional clustering algorithms implicitly use knowledge about the clustering environment to a large extent in the feature selection phase. This reduces the need for the environmental knowledge in the remaining two phases, permitting the usage of simple numerical measure of similarity in the classification phase. Conceptual clustering algorithms proposed by Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] and Stepp and Michalski [Artif. Intell., pp. 43–69 (1986)] make use of the knowledge about the clustering environment in the form of a set of predefined concepts to compute the conceptual cohesiveness during the classification phase. Michalski and Stepp [IEEE Trans. PAMI, PAMI-5, 396–410 (1983)] have argued that the results obtained with the conceptual clustering algorithms are superior to conventional methods of numerical classification. However, this claim was not supported by the experimental results obtained by Dale [IEEE Trans. PAMI, PAMI-7, 241–244 (1985)]. In this paper a theoretical framework, based on an intuitively appealing set of axioms, is developed to characterize the equivalence between the conceptual clustering and conventional clustering. In other words, it is shown that any classification obtained using conceptual clustering can also be obtained using conventional clustering and vice versa.
Resumo:
The ability of DNA sequences to adopt unusual structures under the superhelical torsional stress has been studied. Sequences that are forced to adopt unusual conformation in topologically constrained pBR322 form V DNA (Lk=0) were mapped using restriction enzymes as probes. Restriction enzymes such as BamHI, Pstl, Aval and HindIII could not cleave their recognition sequences. The removal of topological constraint relieved this inhibition. The influence of neighbouring sequences on the ability of a given sequence to adopt unusual DNA structure, presumably left handed Z conformation, was studied through single hit analysis. Using multiple cut restriction enzymes such as Narl and Fspl, it could be shown that under identical topological strain, the extent of structural alteration is greatly influenced by the neighbouring sequences. In the light of the variety of sequences and locations that could be mapped to adopt non-6 conformation in pBR322 form V DNA, restriction enzymes appear as potential structural probes for natural DNA sequences.
Resumo:
The design of folded structures in peptides containing the higher homologues of alpha-amino acid residues requires the restriction of the range of local conformational choices In alpha-amino acids stereochemically constrained residues like alpha,alpha-dialkylated residue, aminoisobutyric acid (Aib), and D-Proline ((D)Pro) have proved extremely useful in the design of helices and hairpins in short peptides Extending this approach, backbone substitution and cyclization are anticipated to bc useful in generating conformationally constrained beta- and gamma-residues This brief review provides a survey of work on hybrid peptide sequences concerning the conformationally constrained gamma-amino acid residue 1-aminomethyl cyclohexane acetic acid, gabapentin (Gpn) This achiral, beta,beta-disubstituted, gamma-residue strongly favors gauche-gauche conformations about the C-alpha-C-beta (0(2)) and C-alpha-C-gamma (0(1)) bonds, facilitating local folding The Gpn residue can adopt both C-7 (NH1 -> CO1) and C-9 (CO1 (I)<- NH1+I) hydrogen bonds which are analogous to the C-5 and C7 (gamma-turn) conformations at alpha-residues In conjunction with adjacent residues, Gpn may be used in ay and gamma alpha segments to generate C-12 hydrogen bonded conformations which may be considered as expanded analogs of conventional beta-turns The structural characterization of C-12 helices, C-12/C-10 helices with mixed hydrogen bond directionalities and beta-hairpins incorporating Gpn residues at the turn segment is illustrated (C) 2010 Wiley Periodicals, Inc Biopolymers (Pept Sci) 94 733-741 2010
Resumo:
We develop four algorithms for simulation-based optimization under multiple inequality constraints. Both the cost and the constraint functions are considered to be long-run averages of certain state-dependent single-stage functions. We pose the problem in the simulation optimization framework by using the Lagrange multiplier method. Two of our algorithms estimate only the gradient of the Lagrangian, while the other two estimate both the gradient and the Hessian of it. In the process, we also develop various new estimators for the gradient and Hessian. All our algorithms use two simulations each. Two of these algorithms are based on the smoothed functional (SF) technique, while the other two are based on the simultaneous perturbation stochastic approximation (SPSA) method. We prove the convergence of our algorithms and show numerical experiments on a setting involving an open Jackson network. The Newton-based SF algorithm is seen to show the best overall performance.
Resumo:
We develop in this article the first actor-critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagrange multiplier method to handle the inequality constraints. Our algorithm makes use of multi-timescale stochastic approximation and incorporates a temporal difference (TD) critic and an actor that makes a gradient search in the space of policy parameters using efficient simultaneous perturbation stochastic approximation (SPSA) gradient estimates. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal policy. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
This paper studies the problem of constructing robust classifiers when the training is plagued with uncertainty. The problem is posed as a Chance-Constrained Program (CCP) which ensures that the uncertain data points are classified correctly with high probability. Unfortunately such a CCP turns out to be intractable. The key novelty is in employing Bernstein bounding schemes to relax the CCP as a convex second order cone program whose solution is guaranteed to satisfy the probabilistic constraint. Prior to this work, only the Chebyshev based relaxations were exploited in learning algorithms. Bernstein bounds employ richer partial information and hence can be far less conservative than Chebyshev bounds. Due to this efficient modeling of uncertainty, the resulting classifiers achieve higher classification margins and hence better generalization. Methodologies for classifying uncertain test data points and error measures for evaluating classifiers robust to uncertain data are discussed. Experimental results on synthetic and real-world datasets show that the proposed classifiers are better equipped to handle data uncertainty and outperform state-of-the-art in many cases.
Resumo:
Here we rederive the hierarchy of equations for the evolution of distribution functions of various orders using a convenient parameterization. We use this to obtain equations for two- and three-point correlation functions in powers of a small parameter, viz., the initial density contrast. The correspondence of the lowest order solutions of these equations to the results from the linear theory of density perturbations is shown for an OMEGA = 1 universe. These equations are then used to calculate, to the lowest order, the induced three-point correlation function that arises from Gaussian initial conditions in an OMEGA = 1 universe. We obtain an expression which explicitly exhibits the spatial structure of the induced three-point correlation function. It is seen that the spatial structure of this quantity is independent of the value of OMEGA. We also calculate the triplet momentum. We find that the induced three-point correlation function does not have the ''hierarchical'' form often assumed. We discuss possibilities of using the induced three-point correlation to interpret observational data. The formalism developed here can also be used to test a validity of different schemes to close the
Resumo:
Optimizing a shell and tube heat exchanger for a given duty is an important and relatively difficult task. There is a need for a simple, general and reliable method for realizing this task. The authors present here one such method for optimizing single phase shell-and-tube heat exchangers with given geometric and thermohydraulic constraints. They discuss the problem in detail. Then they introduce a basic algorithm for optimizing the exchanger. This algorithm is based on data from an earlier study of a large collection of feasible designs generated for different process specifications. The algorithm ensures a near-optimal design satisfying the given heat duty and geometric constraints. The authors also provide several sub-algorithms to satisfy imposed velocity limitations. They illustrate how useful these sub-algorithms are with several examples where the exchanger weight is minimized.
Resumo:
In the knowledge-based clustering approaches reported in the literature, explicit know ledge, typically in the form of a set of concepts, is used in computing similarity or conceptual cohesiveness between objects and in grouping them. We propose a knowledge-based clustering approach in which the domain knowledge is also used in the pattern representation phase of clustering. We argue that such a knowledge-based pattern representation scheme reduces the complexity of similarity computation and grouping phases. We present a knowledge-based clustering algorithm for grouping hooks in a library.
Resumo:
We use the BBGKY hierarchy equations to calculate, perturbatively, the lowest order nonlinear correction to the two-point correlation and the pair velocity for Gaussian initial conditions in a critical density matter-dominated cosmological model. We compare our results with the results obtained using the hydrodynamic equations that neglect pressure and find that the two match, indicating that there are no effects of multistreaming at this order of perturbation. We analytically study the effect of small scales on the large scales by calculating the nonlinear correction for a Dirac delta function initial two-point correlation. We find that the induced two-point correlation has a x(-6) behavior at large separations. We have considered a class of initial conditions where the initial power spectrum at small k has the form k(n) with 0 < n less than or equal to 3 and have numerically calculated the nonlinear correction to the two-point correlation, its average over a sphere and the pair velocity over a large dynamical range. We find that at small separations the effect of the nonlinear term is to enhance the clustering, whereas at intermediate scales it can act to either increase or decrease the clustering. At large scales we find a simple formula that gives a very good fit for the nonlinear correction in terms of the initial function. This formula explicitly exhibits the influence of small scales on large scales and because of this coupling the perturbative treatment breaks down at large scales much before one would expect it to if the nonlinearity were local in real space. We physically interpret this formula in terms of a simple diffusion process. We have also investigated the case n = 0, and we find that it differs from the other cases in certain respects. We investigate a recently proposed scaling property of gravitational clustering, and we find that the lowest order nonlinear terms cause deviations from the scaling relations that are strictly valid in the linear regime. The approximate validity of these relations in the nonlinear regime in l(T)-body simulations cannot be understood at this order of evolution.
Resumo:
A numerical study of the ductile rupture in a metal foil constrained between two stiff ceramic blocks is performed. The finite element analysis is carried out under the conditions of mode I, plane strain, small-scale yielding. The rate-independent version of the Gurson model that accounts for the ductile failure mechanisms of microvoid nucleation, growth and coalescence is employed to represent the behavior of the metal foil. Different distributions of void nucleating sites in the metal foil are considered for triggering the initiation of discrete voids. The results clearly show that far-field triaxiality-induced cavitation is the dominant failure mode when the spacing of the void nucleating sites is large. On the contrary, void coalescence near the notch tip is found to be the operative failure mechanism when closely spaced void nucleating sites are considered.
Resumo:
In this article, we present a novel application of a quantum clustering (QC) technique to objectively cluster the conformations, sampled by molecular dynamics simulations performed on different ligand bound structures of the protein. We further portray each conformational population in terms of dynamically stable network parameters which beautifully capture the ligand induced variations in the ensemble in atomistic detail. The conformational populations thus identified by the QC method and verified by network parameters are evaluated for different ligand bound states of the protein pyrrolysyl-tRNA synthetase (DhPylRS) from D. hafniense. The ligand/environment induced re-distribution of protein conformational ensembles forms the basis for understanding several important biological phenomena such as allostery and enzyme catalysis. The atomistic level characterization of each population in the conformational ensemble in terms of the re-orchestrated networks of amino acids is a challenging problem, especially when the changes are minimal at the backbone level. Here we demonstrate that the QC method is sensitive to such subtle changes and is able to cluster MD snapshots which are similar at the side-chain interaction level. Although we have applied these methods on simulation trajectories of a modest time scale (20 ns each), we emphasize that our methodology provides a general approach towards an objective clustering of large-scale MD simulation data and may be applied to probe multistate equilibria at higher time scales, and to problems related to protein folding for any protein or protein-protein/RNA/DNA complex of interest with a known structure.