849 resultados para constrained clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Il task del data mining si pone come obiettivo l'estrazione automatica di schemi significativi da grandi quantità di dati. Un esempio di schemi che possono essere cercati sono raggruppamenti significativi dei dati, si parla in questo caso di clustering. Gli algoritmi di clustering tradizionali mostrano grossi limiti in caso di dataset ad alta dimensionalità, composti cioè da oggetti descritti da un numero consistente di attributi. Di fronte a queste tipologie di dataset è necessario quindi adottare una diversa metodologia di analisi: il subspace clustering. Il subspace clustering consiste nella visita del reticolo di tutti i possibili sottospazi alla ricerca di gruppi signicativi (cluster). Una ricerca di questo tipo è un'operazione particolarmente costosa dal punto di vista computazionale. Diverse ottimizzazioni sono state proposte al fine di rendere gli algoritmi di subspace clustering più efficienti. In questo lavoro di tesi si è affrontato il problema da un punto di vista diverso: l'utilizzo della parallelizzazione al fine di ridurre il costo computazionale di un algoritmo di subspace clustering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In questo lavoro di tesi si è studiato il clustering degli ammassi di galassie e la determinazione della posizione del picco BAO per ottenere vincoli sui parametri cosmologici. A tale scopo si è implementato un codice per la stima dell'errore tramite i metodi di jackknife e bootstrap. La misura del picco BAO confrontata con i modelli cosmologici, grazie all'errore stimato molto piccolo, è risultato in accordo con il modelli LambdaCDM, e permette di ottenere vincoli su alcuni parametri dei modelli cosmologici.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Power electronic converters are extensively adopted for the solution of timely issues, such as power quality improvement in industrial plants, energy management in hybrid electrical systems, and control of electrical generators for renewables. Beside nonlinearity, this systems are typically characterized by hard constraints on the control inputs, and sometimes the state variables. In this respect, control laws able to handle input saturation are crucial to formally characterize the systems stability and performance properties. From a practical viewpoint, a proper saturation management allows to extend the systems transient and steady-state operating ranges, improving their reliability and availability. The main topic of this thesis concern saturated control methodologies, based on modern approaches, applied to power electronics and electromechanical systems. The pursued objective is to provide formal results under any saturation scenario, overcoming the drawbacks of the classic solution commonly applied to cope with saturation of power converters, and enhancing performance. For this purpose two main approaches are exploited and extended to deal with power electronic applications: modern anti-windup strategies, providing formal results and systematic design rules for the anti-windup compensator, devoted to handle control saturation, and “one step” saturated feedback design techniques, relying on a suitable characterization of the saturation nonlinearity and less conservative extensions of standard absolute stability theory results. The first part of the thesis is devoted to present and develop a novel general anti-windup scheme, which is then specifically applied to a class of power converters adopted for power quality enhancement in industrial plants. In the second part a polytopic differential inclusion representation of saturation nonlinearity is presented and extended to deal with a class of multiple input power converters, used to manage hybrid electrical energy sources. The third part regards adaptive observers design for robust estimation of the parameters required for high performance control of power systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation studies the geometric static problem of under-constrained cable-driven parallel robots (CDPRs) supported by n cables, with n ≤ 6. The task consists of determining the overall robot configuration when a set of n variables is assigned. When variables relating to the platform posture are assigned, an inverse geometric static problem (IGP) must be solved; whereas, when cable lengths are given, a direct geometric static problem (DGP) must be considered. Both problems are challenging, as the robot continues to preserve some degrees of freedom even after n variables are assigned, with the final configuration determined by the applied forces. Hence, kinematics and statics are coupled and must be resolved simultaneously. In this dissertation, a general methodology is presented for modelling the aforementioned scenario with a set of algebraic equations. An elimination procedure is provided, aimed at solving the governing equations analytically and obtaining a least-degree univariate polynomial in the corresponding ideal for any value of n. Although an analytical procedure based on elimination is important from a mathematical point of view, providing an upper bound on the number of solutions in the complex field, it is not practical to compute these solutions as it would be very time-consuming. Thus, for the efficient computation of the solution set, a numerical procedure based on homotopy continuation is implemented. A continuation algorithm is also applied to find a set of robot parameters with the maximum number of real assembly modes for a given DGP. Finally, the end-effector pose depends on the applied load and may change due to external disturbances. An investigation into equilibrium stability is therefore performed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lo scopo del clustering è quindi quello di individuare strutture nei dati significative, ed è proprio dalla seguente definizione che è iniziata questa attività di tesi , fornendo un approccio innovativo ed inesplorato al cluster, ovvero non ricercando la relazione ma ragionando su cosa non lo sia. Osservando un insieme di dati ,cosa rappresenta la non relazione? Una domanda difficile da porsi , che ha intrinsecamente la sua risposta, ovvero l’indipendenza di ogni singolo dato da tutti gli altri. La ricerca quindi dell’indipendenza tra i dati ha portato il nostro pensiero all’approccio statistico ai dati , in quanto essa è ben descritta e dimostrata in statistica. Ogni punto in un dataset, per essere considerato “privo di collegamenti/relazioni” , significa che la stessa probabilità di essere presente in ogni elemento spaziale dell’intero dataset. Matematicamente parlando , ogni punto P in uno spazio S ha la stessa probabilità di cadere in una regione R ; il che vuol dire che tale punto può CASUALMENTE essere all’interno di una qualsiasi regione del dataset. Da questa assunzione inizia il lavoro di tesi, diviso in più parti. Il secondo capitolo analizza lo stato dell’arte del clustering, raffrontato alla crescente problematica della mole di dati, che con l’avvento della diffusione della rete ha visto incrementare esponenzialmente la grandezza delle basi di conoscenza sia in termini di attributi (dimensioni) che in termini di quantità di dati (Big Data). Il terzo capitolo richiama i concetti teorico-statistici utilizzati dagli algoritimi statistici implementati. Nel quarto capitolo vi sono i dettagli relativi all’implementazione degli algoritmi , ove sono descritte le varie fasi di investigazione ,le motivazioni sulle scelte architetturali e le considerazioni che hanno portato all’esclusione di una delle 3 versioni implementate. Nel quinto capitolo gli algoritmi 2 e 3 sono confrontati con alcuni algoritmi presenti in letteratura, per dimostrare le potenzialità e le problematiche dell’algoritmo sviluppato , tali test sono a livello qualitativo , in quanto l’obbiettivo del lavoro di tesi è dimostrare come un approccio statistico può rivelarsi un’arma vincente e non quello di fornire un nuovo algoritmo utilizzabile nelle varie problematiche di clustering. Nel sesto capitolo saranno tratte le conclusioni sul lavoro svolto e saranno elencati i possibili interventi futuri dai quali la ricerca appena iniziata del clustering statistico potrebbe crescere.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

When designing metaheuristic optimization methods, there is a trade-off between application range and effectiveness. For large real-world instances of combinatorial optimization problems out-of-the-box metaheuristics often fail, and optimization methods need to be adapted to the problem at hand. Knowledge about the structure of high-quality solutions can be exploited by introducing a so called bias into one of the components of the metaheuristic used. These problem-specific adaptations allow to increase search performance. This thesis analyzes the characteristics of high-quality solutions for three constrained spanning tree problems: the optimal communication spanning tree problem, the quadratic minimum spanning tree problem and the bounded diameter minimum spanning tree problem. Several relevant tree properties, that should be explored when analyzing a constrained spanning tree problem, are identified. Based on the gained insights on the structure of high-quality solutions, efficient and robust solution approaches are designed for each of the three problems. Experimental studies analyze the performance of the developed approaches compared to the current state-of-the-art.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis work deals, principally, with the development of different chemical protocols ranging from environmental sustainability peptide synthesis to asymmetric synthesis of modified tryptophans to a series of straightforward procedures for constraining peptide backbones without the need for a pre-formed scaffold. Much efforts have been dedicated to the structural analysis in a biomimetic environment, fundamental for predicting the in vivo conformation of compounds, as well as for giving a rationale to the experimentally determined bioactivity. The conformational analyses in solution has been done mostly by NMR (2D gCosy, Roesy, VT, titration experiments, molecular dynamics, etc.), FT-IR and ECD spectroscopy. As a practical application, 3D rigid scaffolds have been employed for the synthesis of biological active compounds based on peptidomimetic and retro-mimetic structures. These mimics have been investigated for their potential as antiflammatory agents and actually the results obtained are very promising. Moreover, the synthesis of Amo ring permitted the development of an alternative high effective synthetic pathway for obtaining Linezolid antibiotic. The final section is, instead, dedicated to the construction of a new biosensor based on zeolite L SAMs functionalized with the integrin ligand c[RGDfK], that has showed high efficiency for the selective detection of tumor cells. Such kind of sensor could, in fact, enable the convenient, non-invasive detection and diagnosis of cancer in early stages, from a few drops of a patient's blood or other biological fluids. In conclusion, the researches described herein demonstrate that the peptidomimetic approach to 3D definite structures, allows unambiguous investigation of the structure-activity relationships, giving an access to a wide range bioactive compounds of pharmaceutical interest to use not only as potential drugs but also for diagnostic and theranostic applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In questa tesi viene analizzato un problema di ottimizzazione proposto da alcuni esercizi commerciali che hanno la necessita` di selezionare e disporre i propri ar- ticoli in negozio. Il problema nasce dall’esigenza di massimizzare il profitto com- plessivo atteso dei prodotti in esposizione, trovando per ognuno una locazione sugli scaffali. I prodotti sono suddivisi in dipartimenti, dai quali solo un ele- mento deve essere selezionato ed esposto. In oltre si prevede la possibilita` di esprimere vincoli sulla locazione e compatibilita` dei prodotti. Il problema risul- tante `e una generalizzazione dei gia` noti Multiple-Choice Knapsack Problem e Multiple Knapsack Problem. Dopo una ricerca esaustiva in letteratura si `e ev- into che questo problema non `e ancora stato studiato. Si `e quindi provveduto a formalizzare il problema mediante un modello di programmazione lineare intera. Si propone un algoritmo esatto per la risoluzione del problema basato su column generation e branch and price. Sono stati formulati quattro modelli differenti per la risoluzione del pricing problem su cui si basa il column generation, per individuare quale sia il piu` efficiente. Tre dei quattro modelli proposti hanno performance comparabili, mentre l’ultimo si `e rivelato piu` inefficiente. Dai risul- tati ottenuti si evince che il metodo risolutivo proposto `e adatto a istanze di dimensione medio-bassa.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objectives: We assessed mortality associated with immunologic and virologic patterns of response at 6 months of highly active antiretroviral therapy (HAART) in HIV-infected individuals from resource-limited countries in Africa and South America. Methods: Patients who initiated HAART between 1996 and 2007, aged 16 years or older, and had at least 1 measurement (HIV-1 RNA plasma viral load or CD4 cell count) at 6 months of therapy (3-9 month window) were included. Therapy response was categorized as complete, discordant (virologic only or immunologic only), and absent. Associations between 6-month response to therapy and all-cause mortality were assessed by Cox proportional hazards regression. Robust standard errors were calculated to account for intrasite correlation. Results: A total of 7160 patients, corresponding to 15,107 person-years, were analyzed. In multivariable analysis adjusted for age at HAART initiation, baseline clinical stage and CD4 cell count, year of HAART initiation, clinic, occurrence of an AIDS-defining condition within the first 6 months of treatment, and discordant and absent responses were associated with increased risk of death. Conclusions: Similar to reports from high-income countries, discordant immunologic and virologic responses were associated with intermediate risk of death compared with complete and no response in this large cohort of HIV-1 patients from resource-limited countries. Our results support a recommendation for wider availability of plasma viral load testing to monitor antiretroviral therapy in these settings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, enamel matrix derivative (EMD) has garnered much interest in the dental field for its apparent bioactivity that stimulates regeneration of periodontal tissues including periodontal ligament, cementum and alveolar bone. Despite its widespread use, the underlying cellular mechanisms remain unclear and an understanding of its biological interactions could identify new strategies for tissue engineering. Previous in vitro research has demonstrated that EMD promotes premature osteoblast clustering at early time points. The aim of the present study was to evaluate the influence of cell clustering on vital osteoblast cell-cell communication and adhesion molecules, connexin 43 (cx43) and N-cadherin (N-cad) as assessed by immunofluorescence imaging, real-time PCR and Western blot analysis. In addition, differentiation markers of osteoblasts were quantified using alkaline phosphatase, osteocalcin and von Kossa staining. EMD significantly increased the expression of connexin 43 and N-cadherin at early time points ranging from 2 to 5 days. Protein expression was localized to cell membranes when compared to control groups. Alkaline phosphatase activity was also significantly increased on EMD-coated samples at 3, 5 and 7 days post seeding. Interestingly, higher activity was localized to cell cluster regions. There was a 3 fold increase in osteocalcin and bone sialoprotein mRNA levels for osteoblasts cultured on EMD-coated culture dishes. Moreover, EMD significantly increased extracellular mineral deposition in cell clusters as assessed through von Kossa staining at 5, 7, 10 and 14 days post seeding. We conclude that EMD up-regulates the expression of vital osteoblast cell-cell communication and adhesion molecules, which enhances the differentiation and mineralization activity of osteoblasts. These findings provide further support for the clinical evidence that EMD increases the speed and quality of new bone formation in vivo.