54 resultados para model-based clustering

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives: A recently introduced pragmatic scheme promises to be a useful catalog of interneuron names.We sought to automatically classify digitally reconstructed interneuronal morphologies according tothis scheme. Simultaneously, we sought to discover possible subtypes of these types that might emergeduring automatic classification (clustering). We also investigated which morphometric properties weremost relevant for this classification.Materials and methods: A set of 118 digitally reconstructed interneuronal morphologies classified into thecommon basket (CB), horse-tail (HT), large basket (LB), and Martinotti (MA) interneuron types by 42 of theworld?s leading neuroscientists, quantified by five simple morphometric properties of the axon and fourof the dendrites. We labeled each neuron with the type most commonly assigned to it by the experts. Wethen removed this class information for each type separately, and applied semi-supervised clustering tothose cells (keeping the others? cluster membership fixed), to assess separation from other types and lookfor the formation of new groups (subtypes). We performed this same experiment unlabeling the cells oftwo types at a time, and of half the cells of a single type at a time. The clustering model is a finite mixtureof Gaussians which we adapted for the estimation of local (per-cluster) feature relevance. We performedthe described experiments on three different subsets of the data, formed according to how many expertsagreed on type membership: at least 18 experts (the full data set), at least 21 (73 neurons), and at least26 (47 neurons).Results: Interneurons with more reliable type labels were classified more accurately. We classified HTcells with 100% accuracy, MA cells with 73% accuracy, and CB and LB cells with 56% and 58% accuracy,respectively. We identified three subtypes of the MA type, one subtype of CB and LB types each, andno subtypes of HT (it was a single, homogeneous type). We got maximum (adapted) Silhouette widthand ARI values of 1, 0.83, 0.79, and 0.42, when unlabeling the HT, CB, LB, and MA types, respectively,confirming the quality of the formed cluster solutions. The subtypes identified when unlabeling a singletype also emerged when unlabeling two types at a time, confirming their validity. Axonal morphometricproperties were more relevant that dendritic ones, with the axonal polar histogram length in the [pi, 2pi) angle interval being particularly useful.Conclusions: The applied semi-supervised clustering method can accurately discriminate among CB, HT, LB, and MA interneuron types while discovering potential subtypes, and is therefore useful for neuronal classification. The discovery of potential subtypes suggests that some of these types are more heteroge-neous that previously thought. Finally, axonal variables seem to be more relevant than dendritic ones fordistinguishing among the CB, HT, LB, and MA interneuron types.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis deals with the problem of efficiently tracking 3D objects in sequences of images. We tackle the efficient 3D tracking problem by using direct image registration. This problem is posed as an iterative optimization procedure that minimizes a brightness error norm. We review the most popular iterative methods for image registration in the literature, turning our attention to those algorithms that use efficient optimization techniques. Two forms of efficient registration algorithms are investigated. The first type comprises the additive registration algorithms: these algorithms incrementally compute the motion parameters by linearly approximating the brightness error function. We centre our attention on Hager and Belhumeur’s factorization-based algorithm for image registration. We propose a fundamental requirement that factorization-based algorithms must satisfy to guarantee good convergence, and introduce a systematic procedure that automatically computes the factorization. Finally, we also bring out two warp functions to register rigid and nonrigid 3D targets that satisfy the requirement. The second type comprises the compositional registration algorithms, where the brightness function error is written by using function composition. We study the current approaches to compositional image alignment, and we emphasize the importance of the Inverse Compositional method, which is known to be the most efficient image registration algorithm. We introduce a new algorithm, the Efficient Forward Compositional image registration: this algorithm avoids the necessity of inverting the warping function, and provides a new interpretation of the working mechanisms of the inverse compositional alignment. By using this information, we propose two fundamental requirements that guarantee the convergence of compositional image registration methods. Finally, we support our claims by using extensive experimental testing with synthetic and real-world data. We propose a distinction between image registration and tracking when using efficient algorithms. We show that, depending whether the fundamental requirements are hold, some efficient algorithms are eligible for image registration but not for tracking.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Models are an effective tool for systems and software design. They allow software architects to abstract from the non-relevant details. Those qualities are also useful for the technical management of networks, systems and software, such as those that compose service oriented architectures. Models can provide a set of well-defined abstractions over the distributed heterogeneous service infrastructure that enable its automated management. We propose to use the managed system as a source of dynamically generated runtime models, and decompose management processes into a composition of model transformations. We have created an autonomic service deployment and configuration architecture that obtains, analyzes, and transforms system models to apply the required actions, while being oblivious to the low-level details. An instrumentation layer automatically builds these models and interprets the planned management actions to the system. We illustrate these concepts with a distributed service update operation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Runtime management of distributed information systems is a complex and costly activity. One of the main challenges that must be addressed is obtaining a complete and updated view of all the managed runtime resources. This article presents a monitoring architecture for heterogeneous and distributed information systems. It is composed of two elements: an information model and an agent infrastructure. The model negates the complexity and variability of these systems and enables the abstraction over non-relevant details. The infrastructure uses this information model to monitor and manage the modeled environment, performing and detecting changes in execution time. The agents infrastructure is further detailed and its components and the relationships between them are explained. Moreover, the proposal is validated through a set of agents that instrument the JEE Glassfish application server, paying special attention to support distributed configuration scenarios.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes new approaches to improve the local and global approximation (matching) and modeling capability of Takagi–Sugeno (T-S) fuzzy model. The main aim is obtaining high function approximation accuracy and fast convergence. The main problem encountered is that T-S identification method cannot be applied when the membership functions are overlapped by pairs. This restricts the application of the T-S method because this type of membership function has been widely used during the last 2 decades in the stability, controller design of fuzzy systems and is popular in industrial control applications. The approach developed here can be considered as a generalized version of T-S identification method with optimized performance in approximating nonlinear functions. We propose a noniterative method through weighting of parameters approach and an iterative algorithm by applying the extended Kalman filter, based on the same idea of parameters’ weighting. We show that the Kalman filter is an effective tool in the identification of T-S fuzzy model. A fuzzy controller based linear quadratic regulator is proposed in order to show the effectiveness of the estimation method developed here in control applications. An illustrative example of an inverted pendulum is chosen to evaluate the robustness and remarkable performance of the proposed method locally and globally in comparison with the original T-S model. Simulation results indicate the potential, simplicity, and generality of the algorithm. An illustrative example is chosen to evaluate the robustness. In this paper, we prove that these algorithms converge very fast, thereby making them very practical to use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Open source is a software development paradigm that has seen a huge rise in recent years. It reduces IT costs and time to market, while increasing security and reliability. However, the difficulty in integrating developments from different communities and stakeholders prevents this model from reaching its full potential. This is mainly due to the challenge of determining and locating the correct dependencies for a given software artifact. To solve this problem we propose the development of an extensible software component repository based upon models. This repository should be capable of solving the dependencies between several components and work with already existing repositories to access the needed artifacts transparently. This repository will also be easily expandable, enabling the creation of modules that support new kinds of dependencies or other existing repository technologies. The proposed solution will work with OSGi components and use OSGi itself.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Industrial applications of computer vision sometimes require detection of atypical objects that occur as small groups of pixels in digital images. These objects are difficult to single out because they are small and randomly distributed. In this work we propose an image segmentation method using the novel Ant System-based Clustering Algorithm (ASCA). ASCA models the foraging behaviour of ants, which move through the data space searching for high data-density regions, and leave pheromone trails on their path. The pheromone map is used to identify the exact number of clusters, and assign the pixels to these clusters using the pheromone gradient. We applied ASCA to detection of microcalcifications in digital mammograms and compared its performance with state-of-the-art clustering algorithms such as 1D Self-Organizing Map, k-Means, Fuzzy c-Means and Possibilistic Fuzzy c-Means. The main advantage of ASCA is that the number of clusters needs not to be known a priori. The experimental results show that ASCA is more efficient than the other algorithms in detecting small clusters of atypical data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We use an automatic weather station and surface mass balance dataset spanning four melt seasons collected on Hurd Peninsula Glaciers, South Shetland Islands, to investigate the point surface energy balance, to determine the absolute and relative contribution of the various energy fluxes acting on the glacier surface and to estimate the sensitivity of melt to ambient temperature changes. Long-wave incoming radiation is the main energy source for melt, while short-wave radiation is the most important flux controlling the variation of both seasonal and daily mean surface energy balance. Short-wave and long-wave radiation fluxes do, in general, balance each other, resulting in a high correspondence between daily mean net radiation flux and available melt energy flux. We calibrate a distributed melt model driven by air temperature and an expression for the incoming short-wave radiation. The model is calibrated with the data from one of the melt seasons and validated with the data of the three remaining seasons. The model results deviate at most 140 mm w.e. from the corresponding observations using the glaciological method. The model is very sensitive to changes in ambient temperature: a 0.5 ◦ C increase results in 56 % higher melt rates.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article describes a knowledge-based method for generating multimedia descriptions that summarize the behavior of dynamic systems. We designed this method for users who monitor the behavior of a dynamic system with the help of sensor networks and make decisions according to prefixed management goals. Our method generates presentations using different modes such as text in natural language, 2D graphics and 3D animations. The method uses a qualitative representation of the dynamic system based on hierarchies of components and causal influences. The method includes an abstraction generator that uses the system representation to find and aggregate relevant data at an appropriate level of abstraction. In addition, the method includes a hierarchical planner to generate a presentation using a model with dis- course patterns. Our method provides an efficient and flexible solution to generate concise and adapted multimedia presentations that summarize thousands of time series. It is general to be adapted to differ- ent dynamic systems with acceptable knowledge acquisition effort by reusing and adapting intuitive rep- resentations. We validated our method and evaluated its practical utility by developing several models for an application that worked in continuous real time operation for more than 1 year, summarizing sen- sor data of a national hydrologic information system in Spain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although most of the research on Cognitive Radio is focused on communication bands above the HF upper limit (30 MHz), Cognitive Radio principles can also be applied to HF communications to make use of the extremely scarce spectrum more efficiently. In this work we consider legacy users as primary users since these users transmit without resorting to any smart procedure, and our stations using the HFDVL (HF Data+Voice Link) architecture as secondary users. Our goal is to enhance an efficient use of the HF band by detecting the presence of uncoordinated primary users and avoiding collisions with them while transmitting in different HF channels using our broad-band HF transceiver. A model of the primary user activity dynamics in the HF band is developed in this work to make short-term predictions of the sojourn time of a primary user in the band and avoid collisions. It is based on Hidden Markov Models (HMM) which are a powerful tool for modelling stochastic random processes and are trained with real measurements of the 14 MHz band. By using the proposed HMM based model, the prediction model achieves an average 10.3% prediction error rate with one minute-long channel knowledge but it can be reduced when this knowledge is extended: with the previous 8 min knowledge, an average 5.8% prediction error rate is achieved. These results suggest that the resulting activity model for the HF band could actually be used to predict primary users activity and included in a future HF cognitive radio based station.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Solar drying is one of the important processes used for extending the shelf life of agricultural products. Regarding consumer requirements, solar drying should be more suitable in terms of curtailing total drying time and preserving product quality. Therefore, the objective of this study was to develop a fuzzy logic-based control system, which performs a ?human-operator-like? control approach through using the previously developed low-cost model-based sensors. Fuzzy logic toolbox of MatLab and Borland C++ Builder tool were utilized to develop a required control system. An experimental solar dryer, constructed by CONA SOLAR (Austria) was used during the development of the control system. Sensirion sensors were used to characterize the drying air at different positions in the dryer, and also the smart sensor SMART-1 was applied to be able to include the rate of wood water extraction into the control system (the difference of absolute humidity of the air between the outlet and the inlet of solar dryer is considered by SMART-1 to be the extracted water). A comprehensive test over a 3 week period for different fuzzy control models has been performed, and data, obtained from these experiments, were analyzed. Findings from this study would suggest that the developed fuzzy logic-based control system is able to tackle difficulties, related to the control of solar dryer process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A simplified CFD wake model based on the actuator-disk concept is used to simulate the wind turbine, represented by an actuator disk upon which a distribution of forces, defined as axial momentum sources, are applied on the incoming flow. The rotor is supposed to be uniformly loaded, with the exerted forces as a function of the incident wind speed, the thrust coefficient and the rotor diameter. The model is validated through experimental measurements downwind of a wind turbine in terms of wind speed deficit. Validation on turbulence intensity will also be made in the near future.