Biblioteca Digital

25 resultados para Data Acquisition Methods.

em Universidad Politécnica de Madrid

Timing performances of a data acquisition system for time of flight PET

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We are investigating the performances of a data acquisition system for Time of Flight PET, based on LYSO crystal slabs and 64 channels Silicon Photomultipliers matrices (1.2 cm2 of active area each). Measurements have been performed to test the timing capability of the detection system (SiPM matices coupled to a LYSO slab and the read-out electronics) with both test signal and radioactive source.

Simplifying data acquisition in plant canopies- Measurements of leaf angles with a cell phone

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1. Canopies are complex multilayered structures comprising individual plant crowns exposing a multifaceted surface area to sunlight. Foliage arrangement and properties are the main mediators of canopy functions. The leaves act as light traps whose exposure to sunlight varies with time of the day, date and latitude in a trade-off between photosynthetic light harvesting and excessive or photoinhibitory light avoidance. To date, ecological research based upon leaf sampling has been limited by the available echnology, with which data acquisition becomes labour intensive and time-consuming, given the verwhelming number of leaves involved. 2. In the present study, our goal involved developing a tool capable of easuring a sufficient number of leaves to enable analysis of leaf populations, tree crowns and canopies.We specifically tested whether a cell phone working as a 3Dpointer could yield reliable, repeatable and valid leaf anglemeasurements with a simple gesture. We evaluated the accuracy of this method under controlled conditions, using a 3D digitizer, and we compared performance in the field with the methods commonly used. We presented an equation to estimate the potential proportion of the leaf exposed to direct sunlight (SAL) at any given time and compared the results with those obtained bymeans of a graphicalmethod. 3. We found a strong and highly significant correlation between the graphical methods and the equation presented. The calibration process showed a strong correlation between the results derived from the two methods with amean relative difference below 10%. Themean relative difference in calculation of instantaneous exposure was below 5%. Our device performed equally well in diverse locations, in which we characterized over 700 leaves in a single day. 4. The newmethod, involving the use of a cell phone, ismuchmore effective than the traditionalmethods or digitizers when the goal is to scale up from leaf position to performance of leaf populations, tree crowns or canopies. Our methodology constitutes an affordable and valuable tool within which to frame a wide range of ecological hypotheses and to support canopy modelling approaches.

Advanced data acquisition system implementation for the ITER Neutron Diagnostic use case using EPICS and FlexRIO technology on a PXIe platform

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the framework of the ITER Control Breakdown Structure (CBS), Plant System Instrumentation & Control (I&C) defines the hardware and software required to control one or more plant systems [1]. For diagnostics, most of the complex Plant System I&C are to be delivered by ITER Domestic Agencies (DAs). As an example for the DAs, ITER Organization (IO) has developed several use cases for diagnostics Plant System I&C that fully comply with guidelines presented in the Plant Control Design Handbook (PCDH) [2]. One such use case is for neutron diagnostics, specifically the Fission Chamber (FC), which is responsible for delivering time-resolved measurements of neutron source strength and fusion power to aid in assessing the functional performance of ITER [3]. ITER will deploy four Fission Chamber units, each consisting of three individual FC detectors. Two of these detectors contain Uranium 235 for Neutron detection, while a third "dummy" detector will provide gamma and noise detection. The neutron flux from each MFC is measured by the three methods: . Counting Mode: measures the number of individual pulses and their location in the record. Pulse parameters (threshold and width) are user configurable. . Campbelling Mode (Mean Square Voltage): measures the RMS deviation in signal amplitude from its average value. .Current Mode: integrates the signal amplitude over the measurement period

Characterization and Test of a Data Acquisition System for PET

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A small Positron Emission Tomography demonstrator based on LYSO slabs and Silicon Photomultiplier matrices is under construction at the University and INFN of Pisa. In this paper we present the characterization results of the read-out electronics and of the detection system. Two SiPM matrices, composed by 8 × 8 SiPM pixels, 1.5 mm pitch, have been coupled one to one to a LYSO crystals array. Custom Front-End ASICs were used to read the 64 channels of each matrix. Data from each Front-End were multiplexed and sent to a DAQ board for the digital conversion; a motherboard collects the data and communicates with a host computer through a USB port. Specific tests were carried out on the system in order to assess its performance. Futhermore we have measured some of the most important parameters of the system for PET application.

Physical data acquisition and PWM signal generation using a microcontroller allowing control for DC motor rotation speed

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo de este proyecto es diseñar un sistema capaz de controlar la velocidad de rotación de un motor DC en función del valor de temperatura obtenido de un sensor. Para ello se generará con un microcontrolador una señal PWM, cuyo ciclo de trabajo estará en función de la temperatura medida. En lo que respecta a la fase de diseño, hay dos partes claramente diferenciadas, relativas al hardware y al software. En cuanto al diseño del hardware puede hacerse a su vez una división en dos partes. En primer lugar, hubo que diseñar la circuitería necesaria para adaptar los niveles de tensión entregados por el sensor de temperatura a los niveles requeridos por ADC, requerido para digitalizar la información para su posterior procesamiento por parte del microcontrolador. Por tanto hubo que diseñar capaz de corregir el offset y la pendiente de la función tensión-temperatura del sensor, a fin de adaptarlo al rango de tensión requerido por el ADC. Por otro lado, hubo que diseñar el circuito encargado de controlar la velocidad de rotación del motor. Este circuito estará basado en un transistor MOSFET en conmutación, controlado mediante una señal PWM como se mencionó anteriormente. De esta manera, al variar el ciclo de trabajo de la señal PWM, variará de manera proporcional la tensión que cae en el motor, y por tanto su velocidad de rotación. En cuanto al diseño del software, se programó el microcontrolador para que generase una señal PWM en uno de sus pines en función del valor entregado por el ADC, a cuya entrada está conectada la tensión obtenida del circuito creado para adaptar la tensión generada por el sensor. Así mismo, se utiliza el microcontrolador para representar el valor de temperatura obtenido en una pantalla LCD. Para este proyecto se eligió una placa de desarrollo mbed, que incluye el microcontrolador integrado, debido a que facilita la tarea del prototipado. Posteriormente se procedió a la integración de ambas partes, y testeado del sistema para comprobar su correcto funcionamiento. Puesto que el resultado depende de la temperatura medida, fue necesario simular variaciones en ésta, para así comprobar los resultados obtenidos a distintas temperaturas. Para este propósito se empleó una bomba de aire caliente. Una vez comprobado el funcionamiento, como último paso se diseñó la placa de circuito impreso. Como conclusión, se consiguió desarrollar un sistema con un nivel de exactitud y precisión aceptable, en base a las limitaciones del sistema. SUMMARY: It is obvious that day by day people’s daily life depends more on technology and science. Tasks tend to be done automatically, making them simpler and as a result, user life is more comfortable. Every single task that can be controlled has an electronic system behind. In this project, a control system based on a microcontroller was designed for a fan, allowing it to go faster when temperature rises or slowing down as the environment gets colder. For this purpose, a microcontroller was programmed to generate a signal, to control the rotation speed of the fan depending on the data acquired from a temperature sensor. After testing the whole design developed in the laboratory, the next step taken was to build a prototype, which allows future improvements in the system that are discussed in the corresponding section of the thesis.

Implementation of intelligent data acquisition system for ITER fast controllers using RIO devices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A basic requirement of the data acquisition systems used in long pulse fusion experiments is the real time physical events detection in signals. Developing such applications is usually a complex task, so it is necessary to develop a set of hardware and software tools that simplify their implementation. This type of applications can be implemented in ITER using fast controllers. ITER is standardizing the architectures to be used for fast controller implementation. Until now the standards chosen are PXIe architectures (based on PCIe) for the hardware and EPICS middleware for the software. This work presents the methodology for implementing data acquisition and pre-processing using FPGA-based DAQ cards and how to integrate these in fast controllers using EPICS.

Integration of a data acquisition system based on FlexRIO technology with EPICS

Relevância:

100.00% 100.00%

Publicador:

Resumo:

EPICS (Experimental Physics and Industrial Control System) lies in a set of software tools and applications which provide a software infrastructure for building distributed data acquisition and control systems. Currently there is an increase in use of such systems in large Physics experiments like ITER, ESS, and FREIA. In these experiments, advanced data acquisition systems using FPGA-based technology like FlexRIO are more frequently been used. The particular case of ITER (International Thermonuclear Experimental Reactor), the instrumentation and control system is supported by CCS (CODAC Core System), based on RHEL (Red Hat Enterprise Linux) operating system, and by the plant design specifications in which every CCS element is defined either hardware, firmware or software. In this degree final project the methodology proposed in Implementation of Intelligent Data Acquisition Systems for Fusion Experiments using EPICS and FlexRIO Technology Sanz et al. [1] is used. The final objective is to provide a document describing the fulfilled process and the source code of the data acquisition system accomplished. The use of the proposed methodology leads to have two diferent stages. The first one consists of the hardware modelling with graphic design tools like LabVIEWFPGA which later will be implemented in the FlexRIO device. In the next stage the design cycle is completed creating an EPICS controller that manages the device using a generic device support layer named NDS (Nominal Device Support). This layer integrates the data acquisition system developed into CCS (Control, data access and communication Core System) as an EPICS interface to the system. The use of FlexRIO technology drives the use of LabVIEW and LabVIEW FPGA respectively. RESUMEN. EPICS (Experimental Physics and Industrial Control System) es un conjunto de herramientas software utilizadas para el desarrollo e implementación de sistemas de adquisición de datos y control distribuidos. Cada vez es más utilizado para entornos de experimentación física a gran escala como ITER, ESS y FREIA entre otros. En estos experimentos se están empezando a utilizar sistemas de adquisición de datos avanzados que usan tecnología basada en FPGA como FlexRIO. En el caso particular de ITER, el sistema de instrumentación y control adoptado se basa en el uso de la herramienta CCS (CODAC Core System) basado en el sistema operativo RHEL (Red Hat) y en las especificaciones del diseño del sistema de planta, en la cual define todos los elementos integrantes del CCS, tanto software como firmware y hardware. En este proyecto utiliza la metodología propuesta para la implementación de sistemas de adquisición de datos inteligente basada en EPICS y FlexRIO. Se desea generar una serie de ejemplos que cubran dicho ciclo de diseño completo y que serían propuestos como casos de uso de dichas tecnologías. Se proporcionará un documento en el que se describa el trabajo realizado así como el código fuente del sistema de adquisición. La metodología adoptada consta de dos etapas diferenciadas. En la primera de ellas se modela el hardware y se sintetiza en el dispositivo FlexRIO utilizando LabVIEW FPGA. Posteriormente se completa el ciclo de diseño creando un controlador EPICS que maneja cada dispositivo creado utilizando una capa software genérica de manejo de dispositivos que se denomina NDS (Nominal Device Support). Esta capa integra la solución en CCS realizando la interfaz con la capa EPICS del sistema. El uso de la tecnología FlexRIO conlleva el uso del lenguaje de programación y descripción hardware LabVIEW y LabVIEW FPGA respectivamente.

CompactRIO: advanced data acquisition systems integration in CODAC Core System

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Las herramientas de configuración basadas en lenguajes de alto nivel como LabVIEW permiten el desarrollo de sistemas de adquisición de datos basados en hardware reconfigurable FPGA muy complejos en un breve periodo de tiempo. La estandarización del ciclo de diseño hardware/software y la utilización de herramientas como EPICS facilita su integración con la plataforma de adquisición y control ITER CODAC CORE SYSTEM (CCS) basada en Linux. En este proyecto se propondrá una metodología que simplificará el ciclo completo de integración de plataformas novedosas, como cRIO, en las que el funcionamiento del hardware de adquisición puede ser modificado por el usuario para que éste se amolde a sus requisitos específicos. El objetivo principal de este proyecto fin de master es realizar la integración de un sistema cRIO NI9159 y diferentes módulos de E/S analógica y digital en EPICS y en CODAC CORE SYSTEM (CCS). Este último consiste en un conjunto de herramientas software que simplifican la integración de los sistemas de instrumentación y control del experimento ITER. Para cumplir el objetivo se realizarán las siguientes tareas: • Desarrollo de un sistema de adquisición de datos basado en FPGA con la plataforma hardware CompactRIO. En esta tarea se realizará la configuración del sistema y la implementación en LabVIEW para FPGA del hardware necesario para comunicarse con los módulos: NI9205, NI9264, NI9401.NI9477, NI9426, NI9425 y NI9476 • Implementación de un driver software utilizando la metodología de AsynDriver para integración del cRIO con EPICS. Esta tarea requiere definir todos los records necesarios que exige EPICS y crear las interfaces adecuadas que permitirán comunicarse con el hardware. • Implementar la descripción del sistema cRIO y del driver EPICS en el sistema de descripción de plantas de ITER llamado SDD. Esto automatiza la creación de las aplicaciones de EPICS que se denominan IOCs. SUMMARY The configuration tools based in high-level programing languages like LabVIEW allows the development of high complex data acquisition systems based on reconfigurable hardware FPGA in a short time period. The standardization of the hardware/software design cycle and the use of tools like EPICS ease the integration with the data acquisition and control platform of ITER, the CODAC Core System based on Linux. In this project a methodology is proposed in order to simplify the full integration cycle of new platforms like CompactRIO (cRIO), in which the data acquisition functionality can be reconfigured by the user to fits its concrete requirements. The main objective of this MSc final project is to develop the integration of a cRIO NI-9159 and its different analog and digital Input/Output modules with EPICS in a CCS. The CCS consists of a set of software tools that simplifies the integration of instrumentation and control systems in the International Thermonuclear Reactor (ITER) experiment. To achieve such goal the following tasks are carried out: • Development of a DAQ system based on FPGA using the cRIO hardware platform. This task comprehends the configuration of the system and the implementation of the mandatory hardware to communicate to the I/O adapter modules NI9205, NI9264, NI9401, NI9477, NI9426, NI9425 y NI9476 using LabVIEW for FPGA. • Implementation of a software driver using the asynDriver methodology to integrate such cRIO system with EPICS. This task requires the definition of the necessary EPICS records and the creation of the appropriate interfaces that allow the communication with the hardware. • Develop the cRIO system’s description and the EPICS driver in the ITER plant description tool named SDD. This development will automate the creation of EPICS applications, called IOCs.

Viscoelastic vibration damping identification methods. Application to laminated glass.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Laminatedglass is composed of two glass layers and a thin intermediate PVB layer, strongly influencing PVB's viscoelastic behaviour its dynamic response. While natural frequencies are relatively easily identified even with simplified FE models, damping ratios are not identified with such an ease. In order to determine to what extent external factors influence dampingidentification, different tests have been carried out. The external factors considered, apart from temperature, are accelerometers, connection cables and the effect of the glass layers. To analyse the influence of the accelerometers and their connection cables a laser measuring device was employed considering three possibilities: sample without instrumentation, sample with the accelerometers fixed and sample completely instrumented. When the sample is completely instrumented, accelerometer readings are also analysed. To take into consideration the effect of the glass layers, tests were realised both for laminatedglass and monolithic samples. This paper presents in depth data analysis of the different configurations and establishes criteria for data acquisition when testing laminatedglass.

Optimización de la Exactitud Geométrica al Integrar Diversas Fuentes de Datos en Proyectos de Inventario Forestal Basados en Escaneo Laser Aerotransportado=Optimizing Geometric Accuracy in Data Source Integration for Airborne Laser Scanning-based Forest Inventory Projects

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La mayoría de las aplicaciones forestales del escaneo laser aerotransportado (ALS, del inglés airborne laser scanning) requieren la integración y uso simultaneo de diversas fuentes de datos, con el propósito de conseguir diversos objetivos. Los proyectos basados en sensores remotos normalmente consisten en aumentar la escala de estudio progresivamente a lo largo de varias fases de fusión de datos: desde la información más detallada obtenida sobre un área limitada (la parcela de campo), hasta una respuesta general de la cubierta forestal detectada a distancia de forma más incierta pero cubriendo un área mucho más amplia (la extensión cubierta por el vuelo o el satélite). Todas las fuentes de datos necesitan en ultimo termino basarse en las tecnologías de sistemas de navegación global por satélite (GNSS, del inglés global navigation satellite systems), las cuales son especialmente erróneas al operar por debajo del dosel forestal. Otras etapas adicionales de procesamiento, como la ortorectificación, también pueden verse afectadas por la presencia de vegetación, deteriorando la exactitud de las coordenadas de referencia de las imágenes ópticas. Todos estos errores introducen ruido en los modelos, ya que los predictores se desplazan de la posición real donde se sitúa su variable respuesta. El grado por el que las estimaciones forestales se ven afectadas depende de la dispersión espacial de las variables involucradas, y también de la escala utilizada en cada caso. Esta tesis revisa las fuentes de error posicional que pueden afectar a los diversos datos de entrada involucrados en un proyecto de inventario forestal basado en teledetección ALS, y como las propiedades del dosel forestal en sí afecta a su magnitud, aconsejando en consecuencia métodos para su reducción. También se incluye una discusión sobre las formas más apropiadas de medir exactitud y precisión en cada caso, y como los errores de posicionamiento de hecho afectan a la calidad de las estimaciones, con vistas a una planificación eficiente de la adquisición de los datos. La optimización final en el posicionamiento GNSS y de la radiometría del sensor óptico permitió detectar la importancia de este ultimo en la predicción de la desidad relativa de un bosque monoespecífico de Pinus sylvestris L. ABSTRACT Most forestry applications of airborne laser scanning (ALS) require the integration and simultaneous use of various data sources, pursuing a variety of different objectives. Projects based on remotely-sensed data generally consist in upscaling data fusion stages: from the most detailed information obtained for a limited area (field plot) to a more uncertain forest response sensed over a larger extent (airborne and satellite swath). All data sources ultimately rely on global navigation satellite systems (GNSS), which are especially error-prone when operating under forest canopies. Other additional processing stages, such as orthorectification, may as well be affected by vegetation, hence deteriorating the accuracy of optical imagery’s reference coordinates. These errors introduce noise to the models, as predictors displace from their corresponding response. The degree to which forest estimations are affected depends on the spatial dispersion of the variables involved and the scale used. This thesis reviews the sources of positioning errors which may affect the different inputs involved in an ALS-assisted forest inventory project, and how the properties of the forest canopy itself affects their magnitude, advising on methods for diminishing them. It is also discussed how accuracy should be assessed, and how positioning errors actually affect forest estimation, toward a cost-efficient planning for data acquisition. The final optimization in positioning the GNSS and optical image allowed to detect the importance of the latter in predicting relative density in a monospecific Pinus sylvestris L. forest.

A set of canonical problems in sloshing. Part 0: Experimental setup and data processing

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In a series of attempts to research and document relevant sloshing type phenomena, a series of experiments have been conducted. The aim of this paper is to describe the setup and data processing of such experiments. A sloshing tank is subjected to angular motion. As a result pressure registers are obtained at several locations, together with the motion data, torque and a collection of image and video information. The experimental rig and the data acquisition systems are described. Useful information for experimental sloshing research practitioners is provided. This information is related to the liquids used in the experiments, the dying techniques, tank building processes, synchronization of acquisition systems, etc. A new procedure for reconstructing experimental data, that takes into account experimental uncertainties, is presented. This procedure is based on a least squares spline approximation of the data. Based on a deterministic approach to the first sloshing wave impact event in a sloshing experiment, an uncertainty analysis procedure of the associated first pressure peak value is described.

Silicon Photomultipliers (SiPM) as novel photodetectors for PET

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Next generation PET scanners should fulfill very high requirements in terms of spatial, energy and timing resolution. Modern scanner performances are inherently limited by the use of standard photomultiplier tubes. The use of Silicon Photomultipliers (SiPMs) is proposed for the construction of a 4D-PET module of 4.8×4.8 cm2 aimed to replace the standard PMT based PET block detector. The module will be based on a LYSO continuous crystal read on two faces by Silicon Photomultipliers. A high granularity detection surface made by SiPM matrices of 1.5 mm pitch will be used for the x–y photon hit position determination with submillimetric accuracy, while a low granularity surface constituted by 16 mm2 SiPM pixels will provide the fast timing information (t) that will be used to implement the Time of Flight technique (TOF). The spatial information collected by the two detector layers will be combined in order to measure the Depth of Interaction (DOI) of each event (z). The use of large area multi-pixel Silicon Photomultiplier (SiPM) detectors requires the development of a multichannel Data Acquisition system (DAQ) as well as of a dedicated front-end in order not to degrade the intrinsic detector capabilities and to manage many channels. The paper describes the progress made on the development of the proof of principle module under construction at the University of Pisa.

NetCDF based data archiving system applied to ITER Fast Plant System Control prototype

Relevância:

90.00% 90.00%

Publicador:

Resumo:

EURATOM/CIEMAT and Technical University of Madrid (UPM) have been involved in the development of a FPSC [1] (Fast Plant System Control) prototype for ITER, based on PXIe (PCI eXtensions for Instrumentation). One of the main focuses of this project has been data acquisition and all the related issues, including scientific data archiving. Additionally, a new data archiving solution has been developed to demonstrate the obtainable performances and possible bottlenecks of scientific data archiving in Fast Plant System Control. The presented system implements a fault tolerant architecture over a GEthernet network where FPSC data are reliably archived on remote, while remaining accessible to be redistributed, within the duration of a pulse. The storing service is supported by a clustering solution to guaranty scalability, so that FPSC management and configuration may be simplified, and a unique view of all archived data provided. All the involved components have been integrated under EPICS [2] (Experimental Physics and Industrial Control System), implementing in each case the necessary extensions, state machines and configuration process variables. The prototyped solution is based on the NetCDF-4 [3] and [4] (Network Common Data Format) file format in order to incorporate important features, such as scientific data models support, huge size files management, platform independent codification, or single-writer/multiple-readers concurrency. In this contribution, a complete description of the above mentioned solution is presented, together with the most relevant results of the tests performed, while focusing in the benefits and limitations of the applied technologies.

Gaussian Multiscale Aggregation oriented to Hand Biometric Segmentation in Mobile Devices

Relevância:

90.00% 90.00%

Publicador:

Resumo:

New trends in biometrics are oriented to mobile devices in order to increase the overall security in daily actions like bank account access, e-commerce or even document protection within the mobile. However, applying biometrics to mobile devices imply challenging aspects in biometric data acquisition, feature extraction or private data storage. Concretely, this paper attempts to deal with the problem of hand segmentation given a picture of the hand in an unknown background, requiring an accurate result in terms of hand isolation. For the sake of user acceptability, no restrictions are done on background, and therefore, hand images can be taken without any constraint, resulting segmentation in an exigent task. Multiscale aggregation strategies are proposed in order to solve this problem due to their accurate results in unconstrained and complicated scenarios, together with their properties in time performance. This method is evaluated with a public synthetic database with 480000 images considering different backgrounds and illumination environments. The results obtained in terms of accuracy and time performance highlight their capability of being a suitable solution for the problem of hand segmentation in contact-less environments, outperforming competitive methods in literature like Lossy Data Compression image segmentation (LDC).

Semi-supervised subspace clustering and applications to neuroscience

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.

«
1
2
»