Biblioteca Digital

29 resultados para Real- and imaginary-time propagation

em Universidad Politécnica de Madrid

Analysis and optimization of propagation losses in LiNbO3 optical waveguides produced by swift heavy-ion irradiation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The propagation losses (PL) of lithium niobate optical planar waveguides fabricated by swift heavy-ion irradiation (SHI), an alternative to conventional ion implantation, have been investigated and optimized. For waveguide fabrication, congruently melting LiNbO3 substrates were irradiated with F ions at 20 MeV or 30 MeV and fluences in the range 1013–1014 cm−2. The influence of the temperature and time of post-irradiation annealing treatments has been systematically studied. Optimum propagation losses lower than 0.5 dB/cm have been obtained for both TE and TM modes, after a two-stage annealing treatment at 350 and 375∘C. Possible loss mechanisms are discussed.

Pulmonary Artery-Vein Segmentation in Real and Synthetic CT Images = Segmentación Arteria-Vena Pulmonares en Imágenes de CT Reales y Sintéticas

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La tomografía axial computerizada (TAC) es la modalidad de imagen médica preferente para el estudio de enfermedades pulmonares y el análisis de su vasculatura. La segmentación general de vasos en pulmón ha sido abordada en profundidad a lo largo de los últimos años por la comunidad científica que trabaja en el campo de procesamiento de imagen; sin embargo, la diferenciación entre irrigaciones arterial y venosa es aún un problema abierto. De hecho, la separación automática de arterias y venas está considerado como uno de los grandes retos futuros del procesamiento de imágenes biomédicas. La segmentación arteria-vena (AV) permitiría el estudio de ambas irrigaciones por separado, lo cual tendría importantes consecuencias en diferentes escenarios médicos y múltiples enfermedades pulmonares o estados patológicos. Características como la densidad, geometría, topología y tamaño de los vasos sanguíneos podrían ser analizados en enfermedades que conllevan remodelación de la vasculatura pulmonar, haciendo incluso posible el descubrimiento de nuevos biomarcadores específicos que aún hoy en dípermanecen ocultos. Esta diferenciación entre arterias y venas también podría ayudar a la mejora y el desarrollo de métodos de procesamiento de las distintas estructuras pulmonares. Sin embargo, el estudio del efecto de las enfermedades en los árboles arterial y venoso ha sido inviable hasta ahora a pesar de su indudable utilidad. La extrema complejidad de los árboles vasculares del pulmón hace inabordable una separación manual de ambas estructuras en un tiempo realista, fomentando aún más la necesidad de diseñar herramientas automáticas o semiautomáticas para tal objetivo. Pero la ausencia de casos correctamente segmentados y etiquetados conlleva múltiples limitaciones en el desarrollo de sistemas de separación AV, en los cuales son necesarias imágenes de referencia tanto para entrenar como para validar los algoritmos. Por ello, el diseño de imágenes sintéticas de TAC pulmonar podría superar estas dificultades ofreciendo la posibilidad de acceso a una base de datos de casos pseudoreales bajo un entorno restringido y controlado donde cada parte de la imagen (incluyendo arterias y venas) está unívocamente diferenciada. En esta Tesis Doctoral abordamos ambos problemas, los cuales están fuertemente interrelacionados. Primero se describe el diseño de una estrategia para generar, automáticamente, fantomas computacionales de TAC de pulmón en humanos. Partiendo de conocimientos a priori, tanto biológicos como de características de imagen de CT, acerca de la topología y relación entre las distintas estructuras pulmonares, el sistema desarrollado es capaz de generar vías aéreas, arterias y venas pulmonares sintéticas usando métodos de crecimiento iterativo, que posteriormente se unen para formar un pulmón simulado con características realistas. Estos casos sintéticos, junto a imágenes reales de TAC sin contraste, han sido usados en el desarrollo de un método completamente automático de segmentación/separación AV. La estrategia comprende una primera extracción genérica de vasos pulmonares usando partículas espacio-escala, y una posterior clasificación AV de tales partículas mediante el uso de Graph-Cuts (GC) basados en la similitud con arteria o vena (obtenida con algoritmos de aprendizaje automático) y la inclusión de información de conectividad entre partículas. La validación de los fantomas pulmonares se ha llevado a cabo mediante inspección visual y medidas cuantitativas relacionadas con las distribuciones de intensidad, dispersión de estructuras y relación entre arterias y vías aéreas, los cuales muestran una buena correspondencia entre los pulmones reales y los generados sintéticamente. La evaluación del algoritmo de segmentación AV está basada en distintas estrategias de comprobación de la exactitud en la clasificación de vasos, las cuales revelan una adecuada diferenciación entre arterias y venas tanto en los casos reales como en los sintéticos, abriendo así un amplio abanico de posibilidades en el estudio clínico de enfermedades cardiopulmonares y en el desarrollo de metodologías y nuevos algoritmos para el análisis de imágenes pulmonares. ABSTRACT Computed tomography (CT) is the reference image modality for the study of lung diseases and pulmonary vasculature. Lung vessel segmentation has been widely explored by the biomedical image processing community, however, differentiation of arterial from venous irrigations is still an open problem. Indeed, automatic separation of arterial and venous trees has been considered during last years as one of the main future challenges in the field. Artery-Vein (AV) segmentation would be useful in different medical scenarios and multiple pulmonary diseases or pathological states, allowing the study of arterial and venous irrigations separately. Features such as density, geometry, topology and size of vessels could be analyzed in diseases that imply vasculature remodeling, making even possible the discovery of new specific biomarkers that remain hidden nowadays. Differentiation between arteries and veins could also enhance or improve methods processing pulmonary structures. Nevertheless, AV segmentation has been unfeasible until now in clinical routine despite its objective usefulness. The huge complexity of pulmonary vascular trees makes a manual segmentation of both structures unfeasible in realistic time, encouraging the design of automatic or semiautomatic tools to perform the task. However, this lack of proper labeled cases seriously limits in the development of AV segmentation systems, where reference standards are necessary in both algorithm training and validation stages. For that reason, the design of synthetic CT images of the lung could overcome these difficulties by providing a database of pseudorealistic cases in a constrained and controlled scenario where each part of the image (including arteries and veins) is differentiated unequivocally. In this Ph.D. Thesis we address both interrelated problems. First, the design of a complete framework to automatically generate computational CT phantoms of the human lung is described. Starting from biological and imagebased knowledge about the topology and relationships between structures, the system is able to generate synthetic pulmonary arteries, veins, and airways using iterative growth methods that can be merged into a final simulated lung with realistic features. These synthetic cases, together with labeled real CT datasets, have been used as reference for the development of a fully automatic pulmonary AV segmentation/separation method. The approach comprises a vessel extraction stage using scale-space particles and their posterior artery-vein classification using Graph-Cuts (GC) based on arterial/venous similarity scores obtained with a Machine Learning (ML) pre-classification step and particle connectivity information. Validation of pulmonary phantoms from visual examination and quantitative measurements of intensity distributions, dispersion of structures and relationships between pulmonary air and blood flow systems, show good correspondence between real and synthetic lungs. The evaluation of the Artery-Vein (AV) segmentation algorithm, based on different strategies to assess the accuracy of vessel particles classification, reveal accurate differentiation between arteries and vein in both real and synthetic cases that open a huge range of possibilities in the clinical study of cardiopulmonary diseases and the development of methodological approaches for the analysis of pulmonary images.

Wave Statistics and Space-time Extremes via Stereo Imaging

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an analysis of the space-time dynamics of oceanic sea states exploiting stereo imaging techniques. In particular, a novel Wave Acquisition Stereo System (WASS) has been developed and deployed at the oceanographic tower Acqua Alta in the Northern Adriatic Sea, off the Venice coast in Italy. The analysis of WASS video measurements yields accurate estimates of the oceanic sea state dynamics, the associated directional spectra and wave surface statistics that agree well with theoretical models. Finally, we show that a space-time extreme, defined as the expected largest surface wave height over an area, is considerably larger than the maximum crest observed in time at a point, in agreement with theoretical predictions.

Mixing Snapshots and Fast Time Integration of PDEs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A local proper orthogonal decomposition (POD) plus Galerkin projection method was recently developed to accelerate time dependent numerical solvers of PDEs. This method is based on the combined use of a numerical code (NC) and a Galerkin sys- tem (GS) in a sequence of interspersed time intervals, INC and IGS, respectively. POD is performed on some sets of snapshots calculated by the numerical solver in the INC inter- vals. The governing equations are Galerkin projected onto the most energetic POD modes and the resulting GS is time integrated in the next IGS interval. The major computa- tional e®ort is associated with the snapshots calculation in the ¯rst INC interval, where the POD manifold needs to be completely constructed (it is only updated in subsequent INC intervals, which can thus be quite small). As the POD manifold depends only weakly on the particular values of the parameters of the problem, a suitable library can be con- structed adapting the snapshots calculated in other runs to drastically reduce the size of the ¯rst INC interval and thus the involved computational cost. The strategy is success- fully tested in (i) the one-dimensional complex Ginzburg-Landau equation, including the case in which it exhibits transient chaos, and (ii) the two-dimensional unsteady lid-driven cavity problem

Strict and non-strict independent and-parallelism in logic programs: Correctness, efficiency, and compile-time conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents some fundamental properties of independent and-parallelism and extends its applicability by enlarging the class of goals eligible for parallel execution. A simple model of (independent) and-parallel execution is proposed and issues of correctness and efficiency discussed in the light of this model. Two conditions, "strict" and "non-strict" independence, are defined and then proved sufficient to ensure correctness and efñciency of parallel execution: if goals which meet these conditions are executed in parallel the solutions obtained are the same as those produced by standard sequential execution. Also, in absence of failure, the parallel proof procedure does not genérate any additional work (with respect to standard SLD-resolution) while the actual execution time is reduced. Finally, in case of failure of any of the goals no slow down will occur. For strict independence the results are shown to hold independently of whether the parallel goals execute in the same environment or in sepárate environments. In addition, a formal basis is given for the automatic compile-time generation of independent and-parallelism: compile-time conditions to efficiently check goal independence at run-time are proposed and proved sufficient. Also, rules are given for constructing simpler conditions if information regarding the binding context of the goals to be executed in parallel is available to the compiler.

Integrating software testing and run-time checking in an assertion verification framework

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have designed and implemented a framework that unifies unit testing and run-time verification (as well as static verification and static debugging). A key contribution of our approach is that a unified assertion language is used for all of these tasks. We first propose methods for compiling runtime checks for (parts of) assertions which cannot be verified at compile-time via program transformation. This transformation allows checking preconditions and postconditions, including conditional postconditions, properties at arbitrary program points, and certain computational properties. The implemented transformation includes several optimizations to reduce run-time overhead. We also propose a minimal addition to the assertion language which allows defining unit tests to be run in order to detect possible violations of the (partial) specifications expressed by the assertions. This language can express for example the input data for performing the unit tests or the number of times that the unit tests should be repeated. We have implemented the framework within the Ciao/CiaoPP system and effectively applied it to the verification of ISO-prolog compliance and to the detection of different types of bugs in the Ciao system source code. Several experimental results are presented that ¡Ilústrate different trade-offs among program size, running time, or levéis of verbosity of the messages shown to the user.

Integrating software testing and run-time checking in an assertion verification framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have designed and implemented a framework that unifies unit testing and run-time verification (as well as static verification and static debugging). A key contribution of our approach is that a unified assertion language is used for all of these tasks. We first propose methods for compiling runtime checks for (parts of) assertions which cannot be verified at compile-time via program transformation. This transformation allows checking preconditions and postconditions, including conditional postconditions, properties at arbitrary program points, and certain computational properties. The implemented transformation includes several optimizations to reduce run-time overhead. We also propose a minimal addition to the assertion language which allows defining unit tests to be run in order to detect possible violations of the (partial) specifications expressed by the assertions. This language can express for example the input data for performing the unit tests or the number of times that the unit tests should be repeated. We have implemented the framework within the Ciao/CiaoPP system and effectively applied it to the verification of ISO-prolog compliance and to the detection of different types of bugs in the Ciao system source code. Several experimental results are presented that illustrate different trade-offs among program size, running time, or levels of verbosity of the messages shown to the user.

Calculus of the uncertainty in acoustic field measurements: comparative study between the uncertainty propagation method and the distribution propagation method

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The new Spanish Regulation in Building Acoustic establishes values and limits for the different acoustic magnitudes whose fulfillment can be verify by means field measurements. In this sense, an essential aspect of a field measurement is to give the measured magnitude and the uncertainty associated to such a magnitude. In the calculus of the uncertainty it is very usual to follow the uncertainty propagation method as described in the Guide to the expression of Uncertainty in Measurements (GUM). Other option is the numerical calculus based on the distribution propagation method by means of Monte Carlo simulation. In fact, at this stage, it is possible to find several publications developing this last method by using different software programs. In the present work, we used Excel for the Monte Carlo simulation for the calculus of the uncertainty associated to the different magnitudes derived from the field measurements following ISO 140-4, 140-5 and 140-7. We compare the results with the ones obtained by the uncertainty propagation method. Although both methods give similar values, some small differences have been observed. Some arguments to explain such differences are the asymmetry of the probability distributions associated to the entry magnitudes,the overestimation of the uncertainty following the GUM

Study of a-Si crystallization dependence on power and irradiation time using a CW green laser

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An advantage of laser crystallization over conventional heating methods is its ability to limit rapid heating and cooling to thin surface layers. Laser energy is used to heat the a-Si thin film to change the microstructure to poly-Si. Thin film samples of a-Si were irradiated with a CW-green laser source. Laser irradiated spots were produced by using different laser powers and irradiation times. These parameters are identified as key variables in the crystallization process. The power threshold for crystallization is reduced as the irradiation time is increased. When this threshold is reached the crystalline fraction increases lineally with power for each irradiation time. The experimental results are analysed with the aid of a numerical thermal model and the presence of two crystallization mechanisms are observed: one due to melting and the other due to solid phase transformation.

La arquitectura en la literatura de ciencia-ficción

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En esta Tesis se estudian las Imágenes Arquitectónicas que aparecen descritas en textos literarios de Ciencia Ficción dentro del periodo temporal que limitan los años 1.926 a 1.976. Consta de tres partes fundamentales: En la primera, se plantean las condiciones y objetivos del trabajo y se define el espacio conceptual que ocupa la Arquitectura descrita en Ciencia Ficción. En la segunda, se realizan análisis sobre las formas y temas arquitectónicos, así como de problemas relacionados con ellos que se encuentran en la narrativa fantacientífica, y se desarrollan ideas e hipótesis especulativas, sobre sugerencias referidas a situaciones reales o imaginarias, partiendo de posiciones implícitas en los relatos. En la tercera, se elabora una antología que data y ordena una amplia muestra de textos representativos. SUMMARY. This thesis studies the Architectural Images described in the Science Fiction stories between 1.926 and 1.976. It consist of three main parts: The first one, explains the conditions and objectives of this work and defines the conceptual space of Described Architecture in Science Fiction. The second one, analyses the architectural forms and items, as well as the problems related to the latter, that can be found in the scientific-fantastic narrative. It also develops ideas and speculative hipothesis about suggestions related to real and imaginary situations from implicit positions in the stories. The third one, builds up an anthology, dating and ordering a wide variety of representative excerpts of the S. F. texts o

Coupled model of initiation and propagation of corrosion in reinforced concrete

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Corrosion of reinforcing steel in concrete due to chloride ingress is one of the main causes of the deterioration of reinforced concrete structures. Structures most affected by such a corrosion are marine zone buildings and structures exposed to de-icing salts like highways and bridges. Such process is accompanied by an increase in volume of the corrosión products on the rebarsconcrete interface. Depending on the level of oxidation, iron can expand as much as six times its original volume. This increase in volume exerts tensile stresses in the surrounding concrete which result in cracking and spalling of the concrete cover if the concrete tensile strength is exceeded. The mechanism by which steel embedded in concrete corrodes in presence of chloride is the local breakdown of the passive layer formed in the highly alkaline condition of the concrete. It is assumed that corrosion initiates when a critical chloride content reaches the rebar surface. The mathematical formulation idealized the corrosion sequence as a two-stage process: an initiation stage, during which chloride ions penetrate to the reinforcing steel surface and depassivate it, and a propagation stage, in which active corrosion takes place until cracking of the concrete cover has occurred. The aim of this research is to develop computer tools to evaluate the duration of the service life of reinforced concrete structures, considering both the initiation and propagation periods. Such tools must offer a friendly interface to facilitate its use by the researchers even though their background is not in numerical simulation. For the evaluation of the initiation period different tools have been developed: Program TavProbabilidade: provides means to carry out a probability analysis of a chloride ingress model. Such a tool is necessary due to the lack of data and general uncertainties associated with the phenomenon of the chloride diffusion. It differs from the deterministic approach because it computes not just a chloride profile at a certain age, but a range of chloride profiles for each probability or occurrence. Program TavProbabilidade_Fiabilidade: carries out reliability analyses of the initiation period. It takes into account the critical value of the chloride concentration on the steel that causes breakdown of the passive layer and the beginning of the propagation stage. It differs from the deterministic analysis in that it does not predict if the corrosion is going to begin or not, but to quantifies the probability of corrosion initiation. Program TavDif_1D: was created to do a one dimension deterministic analysis of the chloride diffusion process by the finite element method (FEM) which numerically solves Fick’second Law. Despite of the different FEM solver already developed in one dimension, the decision to create a new code (TavDif_1D) was taken because of the need to have a solver with friendly interface for pre- and post-process according to the need of IETCC. An innovative tool was also developed with a systematic method devised to compare the ability of the different 1D models to predict the actual evolution of chloride ingress based on experimental measurements, and also to quantify the degree of agreement of the models with each others. For the evaluation of the entire service life of the structure: a computer program has been developed using finite elements method to do the coupling of both service life periods: initiation and propagation. The program for 2D (TavDif_2D) allows the complementary use of two external programs in a unique friendly interface: • GMSH - an finite element mesh generator and post-processing viewer • OOFEM – a finite element solver. This program (TavDif_2D) is responsible to decide in each time step when and where to start applying the boundary conditions of fracture mechanics module in function of the amount of chloride concentration and corrosion parameters (Icorr, etc). This program is also responsible to verify the presence and the degree of fracture in each element to send the Information of diffusion coefficient variation with the crack width. • GMSH - an finite element mesh generator and post-processing viewer • OOFEM – a finite element solver. The advantages of the FEM with the interface provided by the tool are: • the flexibility to input the data such as material property and boundary conditions as time dependent function. • the flexibility to predict the chloride concentration profile for different geometries. • the possibility to couple chloride diffusion (initiation stage) with chemical and mechanical behavior (propagation stage). The OOFEM code had to be modified to accept temperature, humidity and the time dependent values for the material properties, which is necessary to adequately describe the environmental variations. A 3-D simulation has been performed to simulate the behavior of the beam on both, action of the external load and the internal load caused by the corrosion products, using elements of imbedded fracture in order to plot the curve of the deflection of the central region of the beam versus the external load to compare with the experimental data.

Electrical Power Fluctuations in a Network of DC/AC inverters in a Large PV Plant: relationship between correlation, distance and time scale

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyzes the correlation between the fluctuations of the electrical power generated by the ensemble of 70 DC/AC inverters from a 45.6 MW PV plant. The use of real electrical power time series from a large collection of photovoltaic inverters of a same plant is an impor- tant contribution in the context of models built upon simplified assumptions to overcome the absence of such data. This data set is divided into three different fluctuation categories with a clustering proce- dure which performs correctly with the clearness index and the wavelet variances. Afterwards, the time dependent correlation between the electrical power time series of the inverters is esti- mated with the wavelet transform. The wavelet correlation depends on the distance between the inverters, the wavelet time scales and the daily fluctuation level. Correlation values for time scales below one minute are low without dependence on the daily fluctuation level. For time scales above 20 minutes, positive high correlation values are obtained, and the decay rate with the distance depends on the daily fluctuation level. At intermediate time scales the correlation depends strongly on the daily fluctuation level. The proposed methods have been implemented using free software. Source code is available as supplementary material.

PV self-consumption optimization with storage and Active DSM for the residential sector

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the rising prices of the retail electricity and the decreasing cost of the PV technology, grid parity with commercial electricity will soon become a reality in Europe. This fact, together with less attractive PV feed-in-tariffs in the near future and incentives to promote self-consumption suggest, that new operation modes for the PV Distributed Generation should be explored; differently from the traditional approach which is only based on maximizing the exported electricity to the grid. The smart metering is experiencing a growth in Europe and the United States but the possibilities of its use are still uncertain, in our system we propose their use to manage the storage and to allow the user to know their electrical power and energy balances. The ADSM has many benefits studied previously but also it has important challenges, in this paper we can observe and ADSM implementation example where we propose a solution to these challenges. In this paper we study the effects of the Active Demand-Side Management (ADSM) and storage systems in the amount of consumed local electrical energy. It has been developed on a prototype of a self-sufficient solar house called “MagicBox” equipped with grid connection, PV generation, lead–acid batteries, controllable appliances and smart metering. We carried out simulations for long-time experiments (yearly studies) and real measures for short and mid-time experiments (daily and weekly studies). Results show the relationship between the electricity flows and the storage capacity, which is not linear and becomes an important design criterion.

Interval-based resource usage verification: Formalization and prototype

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In an increasing number of applications (e.g., in embedded, real-time, or mobile systems) it is important or even essential to ensure conformance with respect to a specification expressing resource usages, such as execution time, memory, energy, or user-defined resources. In previous work we have presented a novel framework for data size-aware, static resource usage verification. Specifications can include both lower and upper bound resource usage functions. In order to statically check such specifications, both upper- and lower-bound resource usage functions (on input data sizes) approximating the actual resource usage of the program which are automatically inferred and compared against the specification. The outcome of the static checking of assertions can express intervals for the input data sizes such that a given specification can be proved for some intervals but disproved for others. After an overview of the approach in this paper we provide a number of novel contributions: we present a full formalization, and we report on and provide results from an implementation within the Ciao/CiaoPP framework (which provides a general, unified platform for static and run-time verification, as well as unit testing). We also generalize the checking of assertions to allow preconditions expressing intervals within which the input data size of a program is supposed to lie (i.e., intervals for which each assertion is applicable), and we extend the class of resource usage functions that can be checked.

Semi-supervised subspace clustering and applications to neuroscience

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.

«
1
2
»