Biblioteca Digital

59 resultados para Data acquisition card

Floating car data augmentation based on infrastructure sensors and neural networks

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The development of new-generation intelligent vehicle technologies will lead to a better level of road safety and CO2 emission reductions. However, the weak point of all these systems is their need for comprehensive and reliable data. For traffic data acquisition, two sources are currently available: 1) infrastructure sensors and 2) floating vehicles. The former consists of a set of fixed point detectors installed in the roads, and the latter consists of the use of mobile probe vehicles as mobile sensors. However, both systems still have some deficiencies. The infrastructure sensors retrieve information fromstatic points of the road, which are spaced, in some cases, kilometers apart. This means that the picture of the actual traffic situation is not a real one. This deficiency is corrected by floating cars, which retrieve dynamic information on the traffic situation. Unfortunately, the number of floating data vehicles currently available is too small and insufficient to give a complete picture of the road traffic. In this paper, we present a floating car data (FCD) augmentation system that combines information fromfloating data vehicles and infrastructure sensors, and that, by using neural networks, is capable of incrementing the amount of FCD with virtual information. This system has been implemented and tested on actual roads, and the results show little difference between the data supplied by the floating vehicles and the virtual vehicles.

Optimización de la Exactitud Geométrica al Integrar Diversas Fuentes de Datos en Proyectos de Inventario Forestal Basados en Escaneo Laser Aerotransportado=Optimizing Geometric Accuracy in Data Source Integration for Airborne Laser Scanning-based Forest Inventory Projects

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La mayoría de las aplicaciones forestales del escaneo laser aerotransportado (ALS, del inglés airborne laser scanning) requieren la integración y uso simultaneo de diversas fuentes de datos, con el propósito de conseguir diversos objetivos. Los proyectos basados en sensores remotos normalmente consisten en aumentar la escala de estudio progresivamente a lo largo de varias fases de fusión de datos: desde la información más detallada obtenida sobre un área limitada (la parcela de campo), hasta una respuesta general de la cubierta forestal detectada a distancia de forma más incierta pero cubriendo un área mucho más amplia (la extensión cubierta por el vuelo o el satélite). Todas las fuentes de datos necesitan en ultimo termino basarse en las tecnologías de sistemas de navegación global por satélite (GNSS, del inglés global navigation satellite systems), las cuales son especialmente erróneas al operar por debajo del dosel forestal. Otras etapas adicionales de procesamiento, como la ortorectificación, también pueden verse afectadas por la presencia de vegetación, deteriorando la exactitud de las coordenadas de referencia de las imágenes ópticas. Todos estos errores introducen ruido en los modelos, ya que los predictores se desplazan de la posición real donde se sitúa su variable respuesta. El grado por el que las estimaciones forestales se ven afectadas depende de la dispersión espacial de las variables involucradas, y también de la escala utilizada en cada caso. Esta tesis revisa las fuentes de error posicional que pueden afectar a los diversos datos de entrada involucrados en un proyecto de inventario forestal basado en teledetección ALS, y como las propiedades del dosel forestal en sí afecta a su magnitud, aconsejando en consecuencia métodos para su reducción. También se incluye una discusión sobre las formas más apropiadas de medir exactitud y precisión en cada caso, y como los errores de posicionamiento de hecho afectan a la calidad de las estimaciones, con vistas a una planificación eficiente de la adquisición de los datos. La optimización final en el posicionamiento GNSS y de la radiometría del sensor óptico permitió detectar la importancia de este ultimo en la predicción de la desidad relativa de un bosque monoespecífico de Pinus sylvestris L. ABSTRACT Most forestry applications of airborne laser scanning (ALS) require the integration and simultaneous use of various data sources, pursuing a variety of different objectives. Projects based on remotely-sensed data generally consist in upscaling data fusion stages: from the most detailed information obtained for a limited area (field plot) to a more uncertain forest response sensed over a larger extent (airborne and satellite swath). All data sources ultimately rely on global navigation satellite systems (GNSS), which are especially error-prone when operating under forest canopies. Other additional processing stages, such as orthorectification, may as well be affected by vegetation, hence deteriorating the accuracy of optical imagery’s reference coordinates. These errors introduce noise to the models, as predictors displace from their corresponding response. The degree to which forest estimations are affected depends on the spatial dispersion of the variables involved and the scale used. This thesis reviews the sources of positioning errors which may affect the different inputs involved in an ALS-assisted forest inventory project, and how the properties of the forest canopy itself affects their magnitude, advising on methods for diminishing them. It is also discussed how accuracy should be assessed, and how positioning errors actually affect forest estimation, toward a cost-efficient planning for data acquisition. The final optimization in positioning the GNSS and optical image allowed to detect the importance of the latter in predicting relative density in a monospecific Pinus sylvestris L. forest.

Meter data management for smart monitoring power networks

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The electrical power distribution and commercialization scenario is evolving worldwide, and electricity companies, faced with the challenge of new information requirements, are demanding IT solutions to deal with the smart monitoring of power networks. Two main challenges arise from data management and smart monitoring of power networks: real-time data acquisition and big data processing over short time periods. We present a solution in the form of a system architecture that conveys real time issues and has the capacity for big data management.

Diseño y simulación térmica de HB-LEDs con Autodesk y PCB para ensayos

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Este PFC es un trabajo muy práctico, los objetivos fueron impuestos por el tutor, como parte del desarrollo de herramientas (software y hardware) que serán utilizados posteriormente a nivel de docencia e investigación. El PFC tiene dos áreas de trabajo, la principal y primera que se expone es la utilización de una herramienta de simulación térmica para caracterizar dispositivos semiconductores con disipador, la segunda es la expansión de una tarjeta de adquisición de datos con unas PCBs diseñadas, que no estaban disponibles comercialmente. Se ha probado y configurado “Autodesk 2013 Inventor Fusion” y “Autodesk 2013 Simulation and Multiphysics” para simulación térmica de dispositivos de alta potencia. Estas aplicaciones son respectivamente de diseño mecánico y simulación térmica, y la UPM dispone actualmente de licencia. En esta parte del proyecto se realizará un manual de utilización, para que se continúe con esta línea de trabajo en otros PFC. Además se han diseñado mecánicamente y simulado térmicamente diodos LED de alta potencia luminosa (High Brightness Lights Emitting Diodes, HB-LEDs), tanto blancos como del ultravioleta cercano (UVA). Las simulaciones térmicas son de varios tipos de LEDs que actualmente se están empleando y caracterizando térmicamente en Proyectos Fin de Carrera y una Tesis doctoral. En la segunda parte del PFC se diseñan y realizan unas placas de circuito impreso (PCB) cuya función es formar parte de sistemas de instrumentación de adquisición automática de datos basados en LabVIEW. Con esta instrumentación se pueden realizar ensayos de fiabilidad y de otro tipo a dispositivos y sistemas electrónicos. ABSTRACT. The PFC is a very practical work, the objectives were set by the tutor, as part of the development of tools (software and hardware) that will be used later at level of teaching and research. The PFC has two parts, the first one explains the use of a software tool about thermal simulation to characterize devices semiconductors with heatsink, and second one is the expansion of card data acquisition with a PCBs designed, which were not available commercially. It has been tested and configured "Autodesk 2013 Inventor Fusion" and "Autodesk 2013 Simulation Multiphysics” for thermal simulation of high power devices. These applications are respectively of mechanical design and thermal simulation, and the UPM has at present license. In this part of the project a manual of use will be realized, so that it is continued by this line of work in other PFC. Also they have been designed mechanically and simulated thermally LEDs light (High Brightness Lights Emitting Diodes , HB- LEDs) both white and ultraviolet. Thermal simulations are several types of LEDs are now being used in thermally characterizing in Thesis and PhD. In the second part of the PFC there are designed and realized circuit board (PCB) whose function is to be a part of instrumentation systems of automatic acquisition based on LabVIEW data. With this instrumentation can perform reliability testing and other electronic devices and systems.

Desing and Validation of a Light Inference System to Support Embedded Context Reasoning

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Embedded context management in resource-constrained devices (e.g. mobile phones, autonomous sensors or smart objects) imposes special requirements in terms of lightness for data modelling and reasoning. In this paper, we explore the state-of-the-art on data representation and reasoning tools for embedded mobile reasoning and propose a light inference system (LIS) aiming at simplifying embedded inference processes offering a set of functionalities to avoid redundancy in context management operations. The system is part of a service-oriented mobile software framework, conceived to facilitate the creation of context-aware applications—it decouples sensor data acquisition and context processing from the application logic. LIS, composed of several modules, encapsulates existing lightweight tools for ontology data management and rule-based reasoning, and it is ready to run on Java-enabled handheld devices. Data management and reasoning processes are designed to handle a general ontology that enables communication among framework components. Both the applications running on top of the framework and the framework components themselves can configure the rule and query sets in order to retrieve the information they need from LIS. In order to test LIS features in a real application scenario, an ‘Activity Monitor’ has been designed and implemented: a personal health-persuasive application that provides feedback on the user’s lifestyle, combining data from physical and virtual sensors. In this case of use, LIS is used to timely evaluate the user’s activity level, to decide on the convenience of triggering notifications and to determine the best interface or channel to deliver these context-aware alerts.d

Self-organizing Routing Algorithm fo Wireless Sensors Networks (WSN) using Ant Colony Optimization (ACO) with Tinyos.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the basic tools to work with wireless sensors. TinyOShas a componentbased architecture which enables rapid innovation and implementation while minimizing code size as required by the severe memory constraints inherent in sensor networks. TinyOS's component library includes network protocols, distributed services, sensor drivers, and data acquisition tools ? all of which can be used asia or be further refined for a custom application. TinyOS was originally developed as a research project at the University of California Berkeley, but has since grown to have an international community of developers and users. Some algorithms concerning packet routing are shown. Incar entertainment systems can be based on wireless sensors in order to obtain information from Internet, but routing protocols must be implemented in order to avoid bottleneck problems. Ant Colony algorithms are really useful in such cases, therefore they can be embedded into the sensors to perform such routing task.

A New Method to Reduce Truncation Errors in Partial Spherical Near-Field Measurements

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new and effective method for reduction of truncation errors in partial spherical near-field (SNF) measurements is proposed. The method is useful when measuring electrically large antennas, where the measurement time with the classical SNF technique is prohibitively long and an acquisition over the whole spherical surface is not practical. Therefore, to reduce the data acquisition time, partial sphere measurement is usually made, taking samples over a portion of the spherical surface in the direction of the main beam. But in this case, the radiation pattern is not known outside the measured angular sector as well as a truncation error is present in the calculated far-field pattern within this sector. The method is based on the Gerchberg-Papoulis algorithm used to extrapolate functions and it is able to extend the valid region of the calculated far-field pattern up to the whole forward hemisphere. To verify the effectiveness of the method, several examples are presented using both simulated and measured truncated near-field data.

Silicon Photomultipliers (SiPM) as novel photodetectors for PET

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Next generation PET scanners should fulfill very high requirements in terms of spatial, energy and timing resolution. Modern scanner performances are inherently limited by the use of standard photomultiplier tubes. The use of Silicon Photomultipliers (SiPMs) is proposed for the construction of a 4D-PET module of 4.8×4.8 cm2 aimed to replace the standard PMT based PET block detector. The module will be based on a LYSO continuous crystal read on two faces by Silicon Photomultipliers. A high granularity detection surface made by SiPM matrices of 1.5 mm pitch will be used for the x–y photon hit position determination with submillimetric accuracy, while a low granularity surface constituted by 16 mm2 SiPM pixels will provide the fast timing information (t) that will be used to implement the Time of Flight technique (TOF). The spatial information collected by the two detector layers will be combined in order to measure the Depth of Interaction (DOI) of each event (z). The use of large area multi-pixel Silicon Photomultiplier (SiPM) detectors requires the development of a multichannel Data Acquisition system (DAQ) as well as of a dedicated front-end in order not to degrade the intrinsic detector capabilities and to manage many channels. The paper describes the progress made on the development of the proof of principle module under construction at the University of Pisa.

ITER fast plant system controller prototype based on PXIe platform

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The ITER CODAC design identifies slow and fast plant system controllers (PSC). The gast OSCs are based on embedded technologies, permit sampling rates greater than 1 KHz, meet stringent real-time requirements, and will be devoted to data acquisition tasks and control purposes. CIEMAT and UPM have implemented a prototype of a fast PSC based on commercial off-the-shelf (COTS) technologies with PXI hardware and software based on EPICS

ITER Fast Plant System Controller Prototype Based on Commercial PXI Platform

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The ITER CODAC design identifies slow and fast plant system controllers (PSC). The gast OSCs are based on embedded technologies, permit sampling rates greater than 1 KHz, meet stringent real-time requirements, and will be devoted to data acquisition tasks and control purposes. CIEMAT and UPM have implemented a prototype of a fast PSC based on commercial off-the-shelf (COTS) technologies with PXI hardware and software based on EPICS

Viscoelastic vibration damping identification methods. Application to laminated glass.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Laminatedglass is composed of two glass layers and a thin intermediate PVB layer, strongly influencing PVB's viscoelastic behaviour its dynamic response. While natural frequencies are relatively easily identified even with simplified FE models, damping ratios are not identified with such an ease. In order to determine to what extent external factors influence dampingidentification, different tests have been carried out. The external factors considered, apart from temperature, are accelerometers, connection cables and the effect of the glass layers. To analyse the influence of the accelerometers and their connection cables a laser measuring device was employed considering three possibilities: sample without instrumentation, sample with the accelerometers fixed and sample completely instrumented. When the sample is completely instrumented, accelerometer readings are also analysed. To take into consideration the effect of the glass layers, tests were realised both for laminatedglass and monolithic samples. This paper presents in depth data analysis of the different configurations and establishes criteria for data acquisition when testing laminatedglass.

Gaussian Multiscale Aggregation oriented to Hand Biometric Segmentation in Mobile Devices

Relevância:

80.00% 80.00%

Publicador:

Resumo:

New trends in biometrics are oriented to mobile devices in order to increase the overall security in daily actions like bank account access, e-commerce or even document protection within the mobile. However, applying biometrics to mobile devices imply challenging aspects in biometric data acquisition, feature extraction or private data storage. Concretely, this paper attempts to deal with the problem of hand segmentation given a picture of the hand in an unknown background, requiring an accurate result in terms of hand isolation. For the sake of user acceptability, no restrictions are done on background, and therefore, hand images can be taken without any constraint, resulting segmentation in an exigent task. Multiscale aggregation strategies are proposed in order to solve this problem due to their accurate results in unconstrained and complicated scenarios, together with their properties in time performance. This method is evaluated with a public synthetic database with 480000 images considering different backgrounds and illumination environments. The results obtained in terms of accuracy and time performance highlight their capability of being a suitable solution for the problem of hand segmentation in contact-less environments, outperforming competitive methods in literature like Lossy Data Compression image segmentation (LDC).

Development of low cost photovoltaic sensor for over-tree radiation assessment

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The crop diseases sometimes are related to the irradiance that the crop receives. When an experiment requires the measurement of the irradiance, usually it results in an expensive data acquisition system. If it is necessary to check many test points, the use of traditional sensors will increase the cost of the experiment. By using low cost sensors based in the photovoltaic effect, it is possible to perform a precise test of irradiance with a reduced price. This work presents an experiment performed in Ademuz (Valencia, Spain) during September of 2011 to check the validity of low cost sensors based on solar cells.

Simulación de maniobras de buques con sistemas de propulsión no convencional en aguas poco profundas

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Los requisitos cada vez más exigentes en cuanto a misiones, limitaciones operacionales y ambientales así como nuevas tecnologías, imponen permanentemente retos a los arquitectos navales para generar alternativas de buques y valorar su bondad en las primeras etapas del proyecto. Este es el caso de los Buques Patrulleros de Apoyo Fluvial Pesados PAF-P, que por requerimiento de la Armada Nacional de Colombia ha diseñado y construido COTECMAR. Los PAF-P, son buques fluviales cuya relación Manga-Calado excede la mayoría de los buques existentes (B/T=9,5), debido principalmente a las restricciones en el calado a consecuencia de la escasa profundidad de los ríos. Estos buques están equipados con sistemas de propulsión acimutales tipo “Pum-Jet”. Las particularidades del buque y del ambiente operacional, caracterizado por ríos tropicales con una variabilidad de profundidad dependiente del régimen de lluvias y sequía, así como la falta de canalización y la corriente, hacen que la maniobrabilidad y controlabilidad sean fundamentales para el cumplimiento de su misión; adicionalmente, no existen modelos matemáticos validados que permitan predecir en las primeras etapas del diseño la maniobrabilidad de este tipo de buques con los efectos asociados por profundidad. La presente tesis doctoral aborda el desarrollo de un modelo matemático para simulación de maniobrabilidad en aguas poco profundas de buques con relación manga-calado alta y con propulsores acimutales tipo “Pump-Jet”, cuyo chorro además de entregar el empuje necesario para el avance del buque, genera la fuerza de gobierno en función del ángulo de orientación del mismo, eliminando la necesidad de timones. El modelo matemático ha sido validado mediante los resultados obtenidos en las pruebas de maniobrabilidad a escala real del PAF-P, a través de la comparación de trayectorias, series temporales de las variables de estado más significativas y parámetros del círculo evolutivo como son diámetro de giro, diámetro táctico, avance y transferencia. El plan de pruebas se basó en técnicas de Diseño de Experimentos “DOE” para racionalizar el número de corridas en diferentes condiciones de profundidad, velocidad y orientación del chorro (ángulo de timón). En el marco de la presente investigación y para minimizar los errores por efectos ambientales y por inexactitud en los instrumentos de medición, se desarrolló un sistema de adquisición y procesamiento de datos de acuerdo con los lineamientos de ITTC. La literatura existente describe los efectos negativos de la profundidad en los parámetros de maniobrabilidad de buques convencionales (Efecto tipo S), principalmente las trayectorias descritas en los círculos evolutivos aumentan en la medida que disminuye la profundidad; no obstante, en buques de alta relación manga-calado, B/T=7,51 (Yoshimura, y otros, 1.988) y B/T=6,38 (Yasukawa, y otros, 1.995) ha sido reportado el efecto contrario (Efecto tipo NS Non Standart). Este último efecto sin embargo, ha sido observado mediante experimentación con modelos a escala pero no ha sido validado en pruebas de buques a escala real. El efecto tipo NS en buques dotados con hélice y timones, se atribuye al mayor incremento de la fuerza del timón comparativamente con las fuerzas del casco en la medida que disminuye la profundidad; en el caso de estudio, el fenómeno está asociado a la mejor eficiencia de la bomba de agua “Pump-Jet”, debido a la resistencia añadida en el casco por efecto de la disminución de la profundidad. Los resultados de las pruebas con buque a escala real validan el excelente desempeño de esta clase de buques, cumpliendo en exceso los criterios de maniobrabilidad existentes y muestran que el diámetro de giro y otras características de maniobrabilidad mejoran con la disminución de la profundidad en buques con alta relación manga-calado. ABSTRACT The increasingly demanding requirements in terms of missions, operational and environmental constraints as well as new technologies, constantly impose challenges to naval architects to generate alternatives and asses their performance in the early stages of design. That is the case of Riverine Support Patrol Vessel (RSPV), designed and built by COTECMAR for the Colombian Navy. RSPV are riverine ships with a Beam-Draft ratio exceeding most of existing ships (B/T=9,5), mainly due to the restrictions in draft as a result of shallow water environment. The ships are equipped with azimuthal propulsion system of the “Pump-Jet” type. The peculiarities of the ship and the operational environment, characterized by tropical rivers of variable depth depending on the rain and dry seasons, as well as the lack channels and the effect of water current, make manoeuvrability and controllability fundamental to fulfill its mission; on the other hand, there are not validated mathematical models available to predict the manoeuvrability of such ships with the associated water depth effects in the early stages of design. This dissertation addresses the development of a mathematical model for shallow waters’ manoeuvrability simulation of ships with high Beam-Draft ratio and azimuthal propulsion systems type “Pump-Jet”, whose stream generates the thrust required by the ship to advance and also the steering force depending on the orientation angle, eliminating the need of rudders. The mathematical model has been validated with the results of RSPV’s full scale manoeuvring tests, through a comparison of paths, time series of state variables and other parameters taken from turning tests, such as turning diameter, tactical diameter, advance and transfer. The test plan was developed applying techniques of Design of Experiments “DOE”, in order to rationalize the number of runs in different conditions of water depth, ship speed and jet stream orientation (rudder angle). A data acquisition and processing system was developed, following the guidelines of ITTC, as part of this research effort, in order to minimize errors by environmental effects and inaccuracy in measurement instruments, The negative effects of depth on manoeuvrability parameters for conventional ships (Effect Type S: the path described by the ship during turning test increase with decrease of water depth), has been documented in the open literature; however for wide-beam ships, B/T=7,51 (Yoshimura, y otros, 1.988) and B/T=6,38 (Yasukawa, y otros, 1.995) has been reported the opposite effect (Type NS). The latter effect has been observed thru model testing but until now had not been validated with full-scale results. In ships with propellers and rudders, type NS effect is due to the fact that increment of rudder force becomes larger than hull force with decrease of water depth; in the study case, the phenomenon is associated with better efficiency of the Pump-Jet once the vessel speed becomes lower, due to hull added resistance by the effect of the decrease of water depth. The results of full scale tests validates the excellent performance of this class of ships, fulfilling the manoeuvrability criteria in excess and showing that turning diameter and other parameters in high beam-draft ratio vessels do improve with the decrease of depth.

Semi-supervised subspace clustering and applications to neuroscience

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.

«
1
2
3
4
»