36 resultados para scenario clustering

em Universidad Politécnica de Madrid


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Time series are proficiently converted into graphs via the horizontal visibility (HV) algorithm, which prompts interest in its capability for capturing the nature of different classes of series in a network context. We have recently shown [B. Luque et al., PLoS ONE 6, 9 (2011)] that dynamical systems can be studied from a novel perspective via the use of this method. Specifically, the period-doubling and band-splitting attractor cascades that characterize unimodal maps transform into families of graphs that turn out to be independent of map nonlinearity or other particulars. Here, we provide an in depth description of the HV treatment of the Feigenbaum scenario, together with analytical derivations that relate to the degree distributions, mean distances, clustering coefficients, etc., associated to the bifurcation cascades and their accumulation points. We describe how the resultant families of graphs can be framed into a renormalization group scheme in which fixed-point graphs reveal their scaling properties. These fixed points are then re-derived from an entropy optimization process defined for the graph sets, confirming a suggested connection between renormalization group and entropy optimization. Finally, we provide analytical and numerical results for the graph entropy and show that it emulates the Lyapunov exponent of the map independently of its sign.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an algorithm for generating scale-free networks with adjustable clustering coefficient. The algorithm is based on a random walk procedure combined with a triangle generation scheme which takes into account genetic factors; this way, preferential attachment and clustering control are implemented using only local information. Simulations are presented which support the validity of the scheme, characterizing its tuning capabilities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new method for detecting microcalcifications in regions of interest (ROIs) extracted from digitized mammograms is proposed. The top-hat transform is a technique based on mathematical morphology operations and, in this paper, is used to perform contrast enhancement of the mi-crocalcifications. To improve microcalcification detection, a novel image sub-segmentation approach based on the possibilistic fuzzy c-means algorithm is used. From the original ROIs, window-based features, such as the mean and standard deviation, were extracted; these features were used as an input vector in a classifier. The classifier is based on an artificial neural network to identify patterns belonging to microcalcifications and healthy tissue. Our results show that the proposed method is a good alternative for automatically detecting microcalcifications, because this stage is an important part of early breast cancer detection

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present uncertain global context of reaching an equal social stability and steady thriving economy, power demand expected to grow and global electricity generation could nearly double from 2005 to 2030. Fossil fuels will remain a significant contribution on this energy mix up to 2050, with an expected part of around 70% of global and ca. 60% of European electricity generation. Coal will remain a key player. Hence, a direct effect on the considered CO2 emissions business-as-usual scenario is expected, forecasting three times the present CO2 concentration values up to 1,200ppm by the end of this century. Kyoto protocol was the first approach to take global responsibility onto CO2 emissions monitoring and cap targets by 2012 with reference to 1990. Some of principal CO2emitters did not ratify the reduction targets. Although USA and China spur are taking its own actions and parallel reduction measures. More efficient combustion processes comprising less fuel consuming, a significant contribution from the electricity generation sector to a CO2 dwindling concentration levels, might not be sufficient. Carbon Capture and Storage (CCS) technologies have started to gain more importance from the beginning of the decade, with research and funds coming out to drive its come in useful. After first researching projects and initial scale testing, three principal capture processes came out available today with first figures showing up to 90% CO2 removal by its standard applications in coal fired power stations. Regarding last part of CO2 reduction chain, two options could be considered worthy, reusing (EOR & EGR) and storage. The study evaluates the state of the CO2 capture technology development, availability and investment cost of the different technologies, with few operation cost analysis possible at the time. Main findings and the abatement potential for coal applications are presented. DOE, NETL, MIT, European universities and research institutions, key technology enterprises and utilities, and key technology suppliers are the main sources of this study. A vision of the technology deployment is presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper shows the role that some foresight tools, such as scenario design, may play in exploring the future impacts of global challenges in our contemporary Society. Additionally, it provides some clues about how to reinforce scenario design so that it displays more in-depth analysis without losing its qualitative nature and communication advantages. Since its inception in the early seventies, scenario design has become one of the most popular foresight tools used in several fields of knowledge. Nevertheless, its wide acceptance has not been seconded by the urban planning academic and professional realm. In some instances, scenario design is just perceived as a story telling technique that generates oversimplified future visions without the support of rigorous and sound analysis. As a matter of fact, the potential of scenario design for providing more in-depth analysis and for connecting with quantitative methods has been generally missed, giving arguments away to its critics. Based on these premises, this document tries to prove the capability of scenario design to anticipate the impacts of complex global challenges and to do it in a more analytical way. These assumptions are tested through a scenario design exercise which explores the future evolution of the sustainable development paradigm (SD) and its implications in the Spanish urban development model. In order to reinforce the perception of scenario design as a useful and added value instrument to urban planners, three sets of implications –functional, parametric and spatial— are displayed to provide substantial and in-depth information for policy makers. This study shows some major findings. First, it is feasible to set up a systematic approach that provides anticipatory intelligence about future disruptive events that may affect the natural environment and socioeconomic fabric of a given territory. Second, there are opportunities for innovating in the Spanish urban planning processes and city governance models. Third, as a foresight tool, scenario design can be substantially reinforced if proper efforts are made to display functional, parametric and spatial implications generated by the scenarios. Fourth, the study confirms that foresight offers interesting opportunities for urban planners, such as anticipating changes, formulating visions, fostering participation and building networks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose to study the stability properties of an air flow wake forced by a dielectric barrier discharge (DBD) actuator, which is a type of electrohydrodynamic (EHD) actuator. These actuators add momentum to the flow around a cylinder in regions close to the wall and, in our case, are symmetrically disposed near the boundary layer separation point. Since the forcing frequencies, typical of DBD, are much higher than the natural shedding frequency of the flow, we will be considering the forcing actuation as stationary. In the first part, the flow around a circular cylinder modified by EHD actuators will be experimentally studied by means of particle image velocimetry (PIV). In the second part, the EHD actuators have been numerically implemented as a boundary condition on the cylinder surface. Using this boundary condition, the computationally obtained base flow is then compared with the experimental one in order to relate the control parameters from both methodologies. After validating the obtained agreement, we study the Hopf bifurcation that appears once the flow starts the vortex shedding through experimental and computational approaches. For the base flow derived from experimentally obtained snapshots, we monitor the evolution of the velocity amplitude oscillations. As to the computationally obtained base flow, its stability is analyzed by solving a global eigenvalue problem obtained from the linearized Navier–Stokes equations. Finally, the critical parameters obtained from both approaches are compared.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Industrial applications of computer vision sometimes require detection of atypical objects that occur as small groups of pixels in digital images. These objects are difficult to single out because they are small and randomly distributed. In this work we propose an image segmentation method using the novel Ant System-based Clustering Algorithm (ASCA). ASCA models the foraging behaviour of ants, which move through the data space searching for high data-density regions, and leave pheromone trails on their path. The pheromone map is used to identify the exact number of clusters, and assign the pixels to these clusters using the pheromone gradient. We applied ASCA to detection of microcalcifications in digital mammograms and compared its performance with state-of-the-art clustering algorithms such as 1D Self-Organizing Map, k-Means, Fuzzy c-Means and Possibilistic Fuzzy c-Means. The main advantage of ASCA is that the number of clusters needs not to be known a priori. The experimental results show that ASCA is more efficient than the other algorithms in detecting small clusters of atypical data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Large-scale structure formation can be modeled as a nonlinear process that transfers energy from the largest scales to successively smaller scales until it is dissipated, in analogy with Kolmogorov’s cascade model of incompressible turbulence. However, cosmic turbulence is very compressible, and vorticity plays a secondary role in it. The simplest model of cosmic turbulence is the adhesion model, which can be studied perturbatively or adapting to it Kolmogorov’s non-perturbative approach to incompressible turbulence. This approach leads to observationally testable predictions, e.g., to the power-law exponent of the matter density two-point correlation function.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Microarray technique is rather powerful, as it allows to test up thousands of genes at a time, but this produces an overwhelming set of data files containing huge amounts of data, which is quite difficult to pre-process, separate, classify and correlate for interesting conclusions to be extracted. Modern machine learning, data mining and clustering techniques based on information theory, are needed to read and interpret the information contents buried in those large data sets. Independent Component Analysis method can be used to correct the data affected by corruption processes or to filter the uncorrectable one and then clustering methods can group similar genes or classify samples. In this paper a hybrid approach is used to obtain a two way unsupervised clustering for a corrected microarray data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this work we propose an image acquisition and processing methodology (framework) developed for performance in-field grapes and leaves detection and quantification, based on a six step methodology: 1) image segmentation through Fuzzy C-Means with Gustafson Kessel (FCM-GK) clustering; 2) obtaining of FCM-GK outputs (centroids) for acting as seeding for K-Means clustering; 3) Identification of the clusters generated by K-Means using a Support Vector Machine (SVM) classifier. 4) Performance of morphological operations over the grapes and leaves clusters in order to fill holes and to eliminate small pixels clusters; 5)Creation of a mosaic image by Scale-Invariant Feature Transform (SIFT) in order to avoid overlapping between images; 6) Calculation of the areas of leaves and grapes and finding of the centroids in the grape bunches. Image data are collected using a colour camera fixed to a mobile platform. This platform was developed to give a stabilized surface to guarantee that the images were acquired parallel to de vineyard rows. In this way, the platform avoids the distortion of the images that lead to poor estimation of the areas. Our preliminary results are promissory, although they still have shown that it is necessary to implement a camera stabilization system to avoid undesired camera movements, and also a parallel processing procedure in order to speed up the mosaicking process.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

World Health Organization actively stresses the importance of health, nutrition and well-being of the mother to foster children development. This issue is critical in the rural areas of developing countries where monitoring of health status of children is hardly performed since population suffers from a lack of access to health care. The aim of this research is to design, implement and deploy an e-health information and communication system to support health care in 26 rural communities of Cusmapa, Nicaragua. The final solution consists of an hybrid WiMAX/WiFi architecture that provides good quality communications through VoIP taking advantage of low cost WiFi mobile devices. Thus, a WiMAX base station was installed in the health center to provide a radio link with the rural health post "El Carrizo" sited 7,4 km. in line of sight. This service makes possible personal broadband voice and data communication facilities with the health center based on WiFi enabled devices such as laptops and cellular phones without communications cost. A free software PBX was installed at "San José de Cusmapa" health care site to enable communications for physicians, nurses and a technician through mobile telephones with IEEE 802.11 b/g protocol and SIP provided by the project. Additionally, the rural health post staff (midwives, brigade) received two mobile phones with these same features. In a complementary way, the deployed health information system is ready to analyze the distribution of maternal-child population at risk and the distribution of diseases on a geographical baseline. The system works with four information layers: fertile women, children, people with disabilities and diseases. Thus, authorized staff can obtain reports about prenatal monitoring tasks, status of the communities, malnutrition, and immunization control. Data need to be updated by health care staff in order to timely detect the source of problem to implement measures addressed to alleviate and improve health status population permanently. Ongoing research is focused on a mobile platform that collects and automatically updates in the information system, the height and weight of the children locally gathered in the remote communities. This research is being granted by the program Millennium Rural Communities of the Technical University of Madrid.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The area of Human-Machine Interface is growing fast due to its high importance in all technological systems. The basic idea behind designing human-machine interfaces is to enrich the communication with the technology in a natural and easy way. Gesture interfaces are a good example of transparent interfaces. Such interfaces must identify properly the action the user wants to perform, so the proper gesture recognition is of the highest importance. However, most of the systems based on gesture recognition use complex methods requiring high-resource devices. In this work, we propose to model gestures capturing their temporal properties, which significantly reduce storage requirements, and use clustering techniques, namely self-organizing maps and unsupervised genetic algorithm, for their classification. We further propose to train a certain number of algorithms with different parameters and combine their decision using majority voting in order to decrease the false positive rate. The main advantage of the approach is its simplicity, which enables the implementation using devices with limited resources, and therefore low cost. The testing results demonstrate its high potential.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address a cognitive radio scenario, where a number of secondary users performs identification of which primary user, if any, is trans- mitting, in a distributed way and using limited location information. We propose two fully distributed algorithms: the first is a direct iden- tification scheme, and in the other a distributed sub-optimal detection based on a simplified Neyman-Pearson energy detector precedes the identification scheme. Both algorithms are studied analytically in a realistic transmission scenario, and the advantage obtained by detec- tion pre-processing is also verified via simulation. Finally, we give details of their fully distributed implementation via consensus aver- aging algorithms.