187 resultados para Preprocessing
Resumo:
Standard stereotaxic reference systems play a key role in human brain studies. Stereotaxic coordinate systems have also been developed for experimental animals including non-human primates, dogs, and rodents. However, they are lacking for other species being relevant in experimental neuroscience including sheep. Here, we present a spatial, unbiased ovine brain template with tissue probability maps (TPM) that offer a detailed stereotaxic reference frame for anatomical features and localization of brain areas, thereby enabling inter-individual and cross-study comparability. Three-dimensional data sets from healthy adult Merino sheep (Ovis orientalis aries, 12 ewes and 26 neutered rams) were acquired on a 1.5 T Philips MRI using a T1w sequence. Data were averaged by linear and non-linear registration algorithms. Moreover, animals were subjected to detailed brain volume analysis including examinations with respect to body weight (BW), age, and sex. The created T1w brain template provides an appropriate population-averaged ovine brain anatomy in a spatial standard coordinate system. Additionally, TPM for gray (GM) and white (WM) matter as well as cerebrospinal fluid (CSF) classification enabled automatic prior-based tissue segmentation using statistical parametric mapping (SPM). Overall, a positive correlation of GM volume and BW explained about 15% of the variance of GM while a positive correlation between WM and age was found. Absolute tissue volume differences were not detected, indeed ewes showed significantly more GM per bodyweight as compared to neutered rams. The created framework including spatial brain template and TPM represent a useful tool for unbiased automatic image preprocessing and morphological characterization in sheep. Therefore, the reported results may serve as a starting point for further experimental and/or translational research aiming at in vivo analysis in this species.
Resumo:
In this paper, we describe NewsCATS (news categorization and trading system), a system implemented to predict stock price trends for the time immediately after the publication of press releases. NewsCATS consists mainly of three components. The first component retrieves relevant information from press releases through the application of text preprocessing techniques. The second component sorts the press releases into predefined categories. Finally, appropriate trading strategies are derived by the third component by means of the earlier categorization. The findings indicate that a categorization of press releases is able to provide additional information that can be used to forecast stock price trends, but that an adequate trading strategy is essential for the results of the categorization to be fully exploited.
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.
Resumo:
Geostrophic surface velocities can be derived from the gradients of the mean dynamic topography-the difference between the mean sea surface and the geoid. Therefore, independently observed mean dynamic topography data are valuable input parameters and constraints for ocean circulation models. For a successful fit to observational dynamic topography data, not only the mean dynamic topography on the particular ocean model grid is required, but also information about its inverse covariance matrix. The calculation of the mean dynamic topography from satellite-based gravity field models and altimetric sea surface height measurements, however, is not straightforward. For this purpose, we previously developed an integrated approach to combining these two different observation groups in a consistent way without using the common filter approaches (Becker et al. in J Geodyn 59(60):99-110, 2012, doi:10.1016/j.jog.2011.07.0069; Becker in Konsistente Kombination von Schwerefeld, Altimetrie und hydrographischen Daten zur Modellierung der dynamischen Ozeantopographie, 2012, http://nbn-resolving.de/nbn:de:hbz:5n-29199). Within this combination method, the full spectral range of the observations is considered. Further, it allows the direct determination of the normal equations (i.e., the inverse of the error covariance matrix) of the mean dynamic topography on arbitrary grids, which is one of the requirements for ocean data assimilation. In this paper, we report progress through selection and improved processing of altimetric data sets. We focus on the preprocessing steps of along-track altimetry data from Jason-1 and Envisat to obtain a mean sea surface profile. During this procedure, a rigorous variance propagation is accomplished, so that, for the first time, the full covariance matrix of the mean sea surface is available. The combination of the mean profile and a combined GRACE/GOCE gravity field model yields a mean dynamic topography model for the North Atlantic Ocean that is characterized by a defined set of assumptions. We show that including the geodetically derived mean dynamic topography with the full error structure in a 3D stationary inverse ocean model improves modeled oceanographic features over previous estimates.
Resumo:
The Imbrie and Kipp transfer function method (IKM) and the modern analog technique (MAT) are accepted tools for quantitative paleoenvironmental reconstructions. However, no uncomplicated, flexible software has been available to apply these methods on modern computer devices. For this reason the software packages PaleoToolBox, MacTransfer, WinTransfer, MacMAT, and PanPlot have been developed. The PaleoToolBox package provides a flexible tool for the preprocessing of microfossil reference and downcore data as well as hydrographic reference parameters. It includes procedures to randomize the raw databases; to switch specific species in or out of the total species list; to establish individual ranking systems and their application on the reference and downcore databasessemi; and to convert the prepared databases into the file formats of IKM and MAT software for estimation of paleohydrographic parameters.
Resumo:
This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.
Resumo:
La diabetes mellitus es una enfermedad que se caracteriza por la nula o insuficiente producción de insulina, o la resistencia del organismo a la misma. La insulina es una hormona que ayuda a que la glucosa (por ejemplo la obtenida a partir de los alimentos ingeridos) llegue a los tejidos periféricos y al sistema nervioso para suministrar energía. Hoy en día la tecnología actual permite abordar el desarrollo del llamado “páncreas endocrino artificial”, que consta de un sensor continuo de glucosa subcutánea, una bomba de infusión subcutánea de insulina y un algoritmo de control en lazo cerrado que calcule la dosis de insulina requerida por el paciente en cada momento, según la medida de glucosa obtenida por el sensor y según unos objetivos. El mayor problema que presentan los sistemas de control en lazo cerrado son los retardos, el sensor de glucosa subcutánea mide la glucosa del líquido intersticial, que representa la que hubo en la sangre un tiempo atrás, por tanto, un cambio en los niveles de glucosa en la sangre, debidos por ejemplo, a una ingesta, tardaría un tiempo en ser detectado por el sensor. Además, una dosis de insulina suministrada al paciente, tarda un tiempo aproximado de 20-30 minutos para la llegar a la sangre. Para evitar trabajar en la medida que sea posible con estos retardos, se intenta predecir cuál será el nivel de glucosa en un futuro próximo, para ello se utilizara un predictor de glucosa subcutánea, con la información disponible de glucosa e insulina. El objetivo del proyecto es diseñar una metodología para estimar el valor futuro de los niveles de glucosa obtenida a partir de un sensor subcutáneo, basada en la identificación recursiva del sistema glucorregulatorio a través de modelos lineales y determinando un horizonte de predicción óptimo de trabajo y analizando la influencia de la insulina en los resultados de la predicción. Se ha implementado un predictor paramétrico basado en un modelo autorregresivo ARX que predice con mejor precisión y con menor RMSE que un predictor ZOH a un horizonte de predicción de treinta minutos. Utilizar información relativa a la insulina no tiene efecto en la predicción. El preprocesado, postprocesado y el tratamiento de la estabilidad tienen un efecto muy beneficioso en la predicción. Diabetes mellitusis a group of metabolic diseases in which a person has high blood sugar, either because the body does not produce enough insulin, or because cells do not respond to the insulin produced. The insulin is a hormone that helps the glucose to reach to outlying tissues and the nervous system to supply energy. Nowadays, the actual technology allows raising the development of the “artificial endocrine pancreas”. It involves a continuous glucose sensor, an insulin bump, and a full closed loop algorithm that calculate the insulin units required by patient at any time, according to the glucose measure obtained by the sensor and any target. The main problem of the full closed loop systems is the delays, the glucose sensor measures the glucose in the interstitial fluid that represents the glucose was in the blood some time ago. Because of this, a change in the glucose in blood would take some time to be detected by the sensor. In addition, insulin units administered by a patient take about 20-30 minutes to reach the blood stream. In order to avoid this effect, it will try to predict the glucose level in the near future. To do that, a subcutaneous glucose predictor is used to predict the future glucose with the information about insulin and glucose. The goal of the proyect is to design a method in order to estimate the future valor of glucose obtained by a subcutaneous sensor. It is based on the recursive identification of the regulatory system through the linear models, determining optimal prediction horizon and analyzing the influence of insuline on the prediction results. A parametric predictor based in ARX autoregressive model predicts with better precision and with lesser RMSE than ZOH predictor in a thirty minutes prediction horizon. Using the relative insulin information has no effect in the prediction. The preprocessing, the postprocessing and the stability treatment have many advantages in the prediction.
Resumo:
Sampling a network with a given probability distribution has been identified as a useful operation. In this paper we propose distributed algorithms for sampling networks, so that nodes are selected by a special node, called the source, with a given probability distribution. All these algorithms are based on a new class of random walks, that we call Random Centrifugal Walks (RCW). A RCW is a random walk that starts at the source and always moves away from it. Firstly, an algorithm to sample any connected network using RCW is proposed. The algorithm assumes that each node has a weight, so that the sampling process must select a node with a probability proportional to its weight. This algorithm requires a preprocessing phase before the sampling of nodes. In particular, a minimum diameter spanning tree (MDST) is created in the network, and then nodes weights are efficiently aggregated using the tree. The good news are that the preprocessing is done only once, regardless of the number of sources and the number of samples taken from the network. After that, every sample is done with a RCW whose length is bounded by the network diameter. Secondly, RCW algorithms that do not require preprocessing are proposed for grids and networks with regular concentric connectivity, for the case when the probability of selecting a node is a function of its distance to the source. The key features of the RCW algorithms (unlike previous Markovian approaches) are that (1) they do not need to warm-up (stabilize), (2) the sampling always finishes in a number of hops bounded by the network diameter, and (3) it selects a node with the exact probability distribution.
Resumo:
La medicina ha evolucionado de forma que las imágenes digitales tienen un papel de gran relevancia para llevar a cabo el diagnóstico de enfermedades. Son muchos y de diversa naturaleza los problemas que pueden presentar el aparato fonador. Un paso previo para la caracterización de imágenes digitales de la laringe es la segmentación de las cuerdas vocales. Hasta el momento se han desarrollado algoritmos que permiten la segmentación de la glotis. El presente proyecto pretende avanzar un paso más en el estudio, procurando asimismo la segmentación de las cuerdas vocales. Para ello, es necesario aprovechar la información de color que ofrecen las imágenes, pues es lo que va a determinar la diferencia entre una región y otra de la imagen. En este proyecto se ha desarrollado un novedoso método de segmentación de imágenes en color estroboscópicas de la laringe basado en el crecimiento de regiones a partir de píxeles-semilla. Debido a los problemas que presentan las imágenes obtenidas por la técnica de la estroboscopia, para conseguir óptimos resultados de la segmentación es necesario someter a las imágenes a un preprocesado, que consiste en la eliminación de altos brillos y aplicación de un filtro de difusión anisotrópica. Tras el preprocesado, comienza el crecimiento de la región a partir de unas semillas que se obtienen previamente. La condición de inclusión de un píxel en la región se basa en un parámetro de tolerancia que se determina de forma adaptativa. Este parámetro comienza teniendo un valor muy bajo y va aumentando de forma recursiva hasta alcanzar una condición de parada. Esta condición se basa en el análisis de la distribución estadística de los píxeles dentro de la región que va creciendo. La última fase del proyecto consiste en la realización de las pruebas necesarias para verificar el funcionamiento del sistema diseñado, obteniéndose buenos resultados en la segmentación de la glotis y resultados esperanzadores para seguir mejorando el sistema para la segmentación de las cuerdas vocales. ABSTRACT Medicine has evolved so that digital images have a very important role to perform disease diagnosis. There are wide variety of problems that can present the vocal apparatus. A preliminary step for characterization of digital images of the larynx is the segmentation of the vocal folds. To date, some algorithms that allow the segmentation of the glottis have been developed. This project aims to go one step further in the study, also seeking the segmentation of the vocal folds. To do this, we must use the color information offered by images, since this is what will determine the difference between different regions in a picture. In this project a novel method of larynx color images segmentation based on region growing from a pixel seed is developed. Due to the problems of the images obtained by the technique of stroboscopy, to achieve optimal results of the segmentation is necessary a preprocessing of the images, which involves the removal of high brightness and applying an anisotropic diffusion filter. After this preprocessing, the growth of the region from previously obtained seeds starts. The condition for inclusion of a pixel in the region is based on a tolerance parameter, which is adaptively determined. It initially has a low value and this is recursively increased until a stop condition is reached. This condition is based on the analysis of the statistical distribution of the pixels within the grown region. The last phase of the project involves the necessary tests to verify the proper working of the designed system, obtaining very good results in the segmentation of the glottis and encouraging results to keep improving the system for the segmentation of the vocal folds.
Resumo:
En este proyecto se ha desarrollado un código de MATLAB para el procesamiento de imágenes tomográficas 3D, de muestras de asfalto de carreteras en Polonia. Estas imágenes en 3D han sido tomadas por un equipo de investigación de la Universidad Tecnológica de Lodz (LUT). El objetivo de este proyecto es crear una herramienta que se pueda utilizar para estudiar las diferentes muestras de asfalto 3D y pueda servir para estudiar las pruebas de estrés que experimentan las muestras en el laboratorio. Con el objetivo final de encontrar soluciones a la degradación sufrida en las carreteras de Polonia, debido a diferentes causas, como son las condiciones meteorológicas. La degradación de las carreteras es un tema que se ha investigado desde hace muchos años, debido a la fuerte degradación causada por diferentes factores como son climáticos, la falta de mantenimiento o el tráfico excesivo en algunos casos. Es en Polonia, donde estos tres factores hacen que la composición de muchas carreteras se degrade rápidamente, sobre todo debido a las condiciones meteorológicas sufridas a lo largo del año, con temperaturas que van desde 30° C en verano a -20° C en invierno. Esto hace que la composición de las carreteras sufra mucho y el asfalto se levante, lo que aumenta los costos de mantenimiento y los accidentes de carretera. Este proyecto parte de la base de investigación que se lleva a cabo en la LUT, tratando de mejorar el análisis de las muestras de asfalto, por lo que se realizarán las pruebas de estrés y encontrar soluciones para mejorar el asfalto en las carreteras polacas. Esto disminuiría notablemente el costo de mantenimiento. A pesar de no entrar en aspectos muy técnicos sobre el asfalto y su composición, se ha necesitado realizar un estudio profundo sobre todas sus características, para crear un código capaz de obtener los mejores resultados. Por estas razones, se ha desarrollado en Matlab, los algoritmos que permiten el estudio de los especímenes 3D de asfalto. Se ha utilizado este software, ya que Matlab es una poderosa herramienta matemática que permite operar con matrices para realización de operaciones rápidamente, permitiendo desarrollar un código específico para el tratamiento y procesamiento de imágenes en 3D. Gracias a esta herramienta, estos algoritmos realizan procesos tales como, la segmentación de la imagen 3D, pre y post procesamiento de la imagen, filtrado o todo tipo de análisis microestructural de las muestras de asfalto que se están estudiando. El código presentado para la segmentación de las muestras de asfalto 3D es menos complejo en su diseño y desarrollo, debido a las herramientas de procesamiento de imágenes que incluye Matlab, que facilitan significativamente la tarea de programación, así como el método de segmentación utilizado. Respecto al código, este ha sido diseñado teniendo en cuenta el objetivo de facilitar el trabajo de análisis y estudio de las imágenes en 3D de las muestras de asfalto. Por lo tanto, el principal objetivo es el de crear una herramienta para el estudio de este código, por ello fue desarrollado para que pueda ser integrado en un entorno visual, de manera que sea más fácil y simple su utilización. Ese es el motivo por el cual todos estos algoritmos y funciones, que ha sido desarrolladas, se integrarán en una herramienta visual que se ha desarrollado con el GUIDE de Matlab. Esta herramienta ha sido creada en colaboración con Jorge Vega, y fue desarrollada en su proyecto final de carrera, cuyo título es: Segmentación microestructural de Imágenes en 3D de la muestra de asfalto utilizando Matlab. En esta herramienta se ha utilizado todo las funciones programadas en este proyecto, y tiene el objetivo de desarrollar una herramienta que permita crear un entorno gráfico intuitivo y de fácil uso para el estudio de las muestras de 3D de asfalto. Este proyecto se ha dividido en 4 capítulos, en un primer lugar estará la introducción, donde se presentarán los aspectos más importante que se va a componer el proyecto. En el segundo capítulo se presentarán todos los datos técnicos que se han tenido que estudiar para desarrollar la herramienta, entre los que cabe los tres temas más importantes que se han estudiado en este proyecto: materiales asfálticos, los principios de la tomografías 3D y el procesamiento de imágenes. Esta será la base para el tercer capítulo, que expondrá la metodología utilizada en la elaboración del código, con la explicación del entorno de trabajo utilizado en Matlab y todas las funciones de procesamiento de imágenes utilizadas. Además, se muestra todo el código desarrollado, así como una descripción teórica de los métodos utilizados para el pre-procesamiento y segmentación de las imagenes en 3D. En el capítulo 4, se mostrarán los resultados obtenidos en el estudio de una de las muestras de asfalto, y, finalmente, el último capítulo se basa en las conclusiones sobre el desarrollo de este proyecto. En este proyecto se ha llevado han realizado todos los puntos que se establecieron como punto de partida en el anteproyecto para crear la herramienta, a pesar de que se ha dejado para futuros proyectos nuevas posibilidades de este codigo, como por ejemplo, la detección automática de las diferentes regiones de una muestra de asfalto debido a su composición. Como se muestra en este proyecto, las técnicas de procesamiento de imágenes se utilizan cada vez más en multitud áreas, como pueden ser industriales o médicas. En consecuencia, este tipo de proyecto tiene multitud de posibilidades, y pudiendo ser la base para muchas nuevas aplicaciones que se puedan desarrollar en un futuro. Por último, se concluye que este proyecto ha contribuido a fortalecer las habilidades de programación, ampliando el conocimiento de Matlab y de la teoría de procesamiento de imágenes. Del mismo modo, este trabajo proporciona una base para el desarrollo de un proyecto más amplio cuyo alcance será una herramienta que puedas ser utilizada por el equipo de investigación de la Universidad Tecnológica de Lodz y en futuros proyectos. ABSTRACT In this project has been developed one code in MATLAB to process X-ray tomographic 3D images of asphalt specimens. These images 3D has been taken by a research team of the Lodz University of Technology (LUT). The aim of this project is to create a tool that can be used to study differents asphalt specimen and can be used to study them after stress tests undergoing the samples. With the final goal to find solutions to the degradation suffered roads in Poland due to differents causes, like weather conditions. The degradation of the roads is an issue that has been investigated many years ago, due to strong degradation suffered caused by various factors such as climate, poor maintenance or excessive traffic in some cases. It is in Poland where these three factors make the composition of many roads degrade rapidly, especially due to the weather conditions suffered along the year, with temperatures ranging from 30 o C in summer to -20 ° C in winter. This causes the roads suffers a lot and asphalt rises shortly after putting, increasing maintenance costs and road accident. This project part of the base that research is taking place at the LUT, in order to better analyze the asphalt specimens, they are tested for stress and find solutions to improve the asphalt on Polish roads. This would decrease remarkable maintenance cost. Although this project will not go into the technical aspect as asphalt and composition, but it has been required a deep study about all of its features, to create a code able to obtain the best results. For these reasons, there have been developed in Matlab, algorithms that allow the study of 3D specimens of asphalt. Matlab is a powerful mathematical tool, which allows arrays operate fastly, allowing to develop specific code for the treatment and processing of 3D images. Thus, these algorithms perform processes such as the multidimensional matrix sgementation, pre and post processing with the same filtering algorithms or microstructural analysis of asphalt specimen which being studied. All these algorithms and function that has been developed to be integrated into a visual tool which it be developed with the GUIDE of Matlab. This tool has been created in the project of Jorge Vega which name is: Microstructural segmentation of 3D images of asphalt specimen using Matlab engine. In this tool it has been used all the functions programmed in this project, and it has the aim to develop an easy and intuitive graphical environment for the study of 3D samples of asphalt. This project has been divided into 4 chapters plus the introduction, the second chapter introduces the state-of-the-art of the three of the most important topics that have been studied in this project: asphalt materials, principle of X-ray tomography and image processing. This will be the base for the third chapter, which will outline the methodology used in developing the code, explaining the working environment of Matlab and all the functions of processing images used. In addition, it will be shown all the developed code created, as well as a theoretical description of the methods used for preprocessing and 3D image segmentation. In Chapter 4 is shown the results obtained from the study of one of the specimens of asphalt, and finally the last chapter draws the conclusions regarding the development of this project.
Resumo:
Due to the relative transparency of its embryos and larvae, the zebrafish is an ideal model organism for bioimaging approaches in vertebrates. Novel microscope technologies allow the imaging of developmental processes in unprecedented detail, and they enable the use of complex image-based read-outs for high-throughput/high-content screening. Such applications can easily generate Terabytes of image data, the handling and analysis of which becomes a major bottleneck in extracting the targeted information. Here, we describe the current state of the art in computational image analysis in the zebrafish system. We discuss the challenges encountered when handling high-content image data, especially with regard to data quality, annotation, and storage. We survey methods for preprocessing image data for further analysis, and describe selected examples of automated image analysis, including the tracking of cells during embryogenesis, heartbeat detection, identification of dead embryos, recognition of tissues and anatomical landmarks, and quantification of behavioral patterns of adult fish. We review recent examples for applications using such methods, such as the comprehensive analysis of cell lineages during early development, the generation of a three-dimensional brain atlas of zebrafish larvae, and high-throughput drug screens based on movement patterns. Finally, we identify future challenges for the zebrafish image analysis community, notably those concerning the compatibility of algorithms and data formats for the assembly of modular analysis pipelines.
Resumo:
La obtención de energía a partir de la fusión nuclear por confinamiento magnético del plasma, es uno de los principales objetivos dentro de la comunidad científica dedicada a la energía nuclear. Desde la construcción del primer dispositivo de fusión, hasta la actualidad, se han llevado a cabo multitud de experimentos, que hoy en día, gran parte de ellos dan soporte al proyecto International Thermonuclear Experimental Reactor (ITER). El principal problema al que se enfrenta ITER, se basa en la monitorización y el control del plasma. Gracias a las nuevas tecnologías, los sistemas de instrumentación y control permiten acercarse más a la solución del problema, pero a su vez, es más complicado estandarizar los sistemas de adquisición de datos que se usan, no solo en ITER, sino en otros proyectos de igual complejidad. Desarrollar nuevas implementaciones hardware y software bajo los requisitos de los diagnósticos definidos por los científicos, supone una gran inversión de tiempo, retrasando la ejecución de nuevos experimentos. Por ello, la solución que plantea esta tesis, consiste en la definición de una metodología de diseño que permite implementar sistemas de adquisición de datos inteligentes y su fácil integración en entornos de fusión para la implementación de diagnósticos. Esta metodología requiere del uso de los dispositivos Reconfigurable Input/Output (RIO) y Flexible RIO (FlexRIO), que son sistemas embebidos basados en tecnología Field-Programmable Gate Array (FPGA). Para completar la metodología de diseño, estos dispositivos van a ser soportados por un software basado en EPICS Device Support utilizando la tecnología EPICS software asynDriver. Esta metodología se ha evaluado implementando prototipos para los controladores rápidos de planta de ITER, tanto para casos prácticos de ámbito general como adquisición de datos e imágenes, como para casos concretos como el diagnóstico del fission chamber, implementando pre-procesado en tiempo real. Además de casos prácticos, esta metodología se ha utilizado para implementar casos reales, como el Ion Source Hydrogen Positive (ISHP), desarrollada por el European Spallation Source (ESS Bilbao) y la Universidad del País Vasco. Finalmente, atendiendo a las necesidades que los experimentos en los entornos de fusión requieren, se ha diseñado un mecanismo mediante el cual los sistemas de adquisición de datos, que pueden ser implementados mediante la metodología de diseño propuesta, pueden integrar un reloj hardware capaz de sincronizarse con el protocolo IEEE1588-V2, permitiendo a estos, obtener los TimeStamps de las muestras adquiridas con una exactitud y precisión de decenas de nanosegundos y realizar streaming de datos con TimeStamps. ABSTRACT Fusion energy reaching by means of nuclear fusion plasma confinement is one of the main goals inside nuclear energy scientific community. Since the first fusion device was built, many experiments have been carried out and now, most of them give support to the International Thermonuclear Experimental Reactor (ITER) project. The main difficulty that ITER has to overcome is the plasma monitoring and control. Due to new technologies, the instrumentation and control systems allow an approaching to the solution, but in turn, the standardization of the used data acquisition systems, not only in ITER but also in other similar projects, is more complex. To develop new hardware and software implementations under scientific diagnostics requirements, entail time costs, delaying new experiments execution. Thus, this thesis presents a solution that consists in a design methodology definition, that permits the implementation of intelligent data acquisition systems and their easy integration into fusion environments for diagnostic purposes. This methodology requires the use of Reconfigurable Input/Output (RIO) and Flexible RIO (FlexRIO) devices, based on Field-Programmable Gate Array (FPGA) embedded technology. In order to complete the design methodology, these devices are going to be supported by an EPICS Device Support software, using asynDriver technology. This methodology has been evaluated implementing ITER PXIe fast controllers prototypes, as well as data and image acquisition, so as for concrete solutions like the fission chamber diagnostic use case, using real time preprocessing. Besides of these prototypes solutions, this methodology has been applied for the implementation of real experiments like the Ion Source Hydrogen Positive (ISHP), developed by the European Spallation Source and the Basque country University. Finally, a hardware mechanism has been designed to integrate a hardware clock into RIO/FlexRIO devices, to get synchronization with the IEEE1588-V2 precision time protocol. This implementation permits to data acquisition systems implemented under the defined methodology, to timestamp all data acquired with nanoseconds accuracy, permitting high throughput timestamped data streaming.
Diseño de algoritmos de guerra electrónica y radar para su implementación en sistemas de tiempo real
Resumo:
Esta tesis se centra en el estudio y desarrollo de algoritmos de guerra electrónica {electronic warfare, EW) y radar para su implementación en sistemas de tiempo real. La llegada de los sistemas de radio, radar y navegación al terreno militar llevó al desarrollo de tecnologías para combatirlos. Así, el objetivo de los sistemas de guerra electrónica es el control del espectro electomagnético. Una de la funciones de la guerra electrónica es la inteligencia de señales {signals intelligence, SIGINT), cuya labor es detectar, almacenar, analizar, clasificar y localizar la procedencia de todo tipo de señales presentes en el espectro. El subsistema de inteligencia de señales dedicado a las señales radar es la inteligencia electrónica {electronic intelligence, ELINT). Un sistema de tiempo real es aquel cuyo factor de mérito depende tanto del resultado proporcionado como del tiempo en que se da dicho resultado. Los sistemas radar y de guerra electrónica tienen que proporcionar información lo más rápido posible y de forma continua, por lo que pueden encuadrarse dentro de los sistemas de tiempo real. La introducción de restricciones de tiempo real implica un proceso de realimentación entre el diseño del algoritmo y su implementación en plataformas “hardware”. Las restricciones de tiempo real son dos: latencia y área de la implementación. En esta tesis, todos los algoritmos presentados se han implementado en plataformas del tipo field programmable gate array (FPGA), ya que presentan un buen compromiso entre velocidad, coste total, consumo y reconfigurabilidad. La primera parte de la tesis está centrada en el estudio de diferentes subsistemas de un equipo ELINT: detección de señales mediante un detector canalizado, extracción de los parámetros de pulsos radar, clasificación de modulaciones y localization pasiva. La transformada discreta de Fourier {discrete Fourier transform, DFT) es un detector y estimador de frecuencia quasi-óptimo para señales de banda estrecha en presencia de ruido blanco. El desarrollo de algoritmos eficientes para el cálculo de la DFT, conocidos como fast Fourier transform (FFT), han situado a la FFT como el algoritmo más utilizado para la detección de señales de banda estrecha con requisitos de tiempo real. Así, se ha diseñado e implementado un algoritmo de detección y análisis espectral para su implementación en tiempo real. Los parámetros más característicos de un pulso radar son su tiempo de llegada y anchura de pulso. Se ha diseñado e implementado un algoritmo capaz de extraer dichos parámetros. Este algoritmo se puede utilizar con varios propósitos: realizar un reconocimiento genérico del radar que transmite dicha señal, localizar la posición de dicho radar o bien puede utilizarse como la parte de preprocesado de un clasificador automático de modulaciones. La clasificación automática de modulaciones es extremadamente complicada en entornos no cooperativos. Un clasificador automático de modulaciones se divide en dos partes: preprocesado y el algoritmo de clasificación. Los algoritmos de clasificación basados en parámetros representativos calculan diferentes estadísticos de la señal de entrada y la clasifican procesando dichos estadísticos. Los algoritmos de localization pueden dividirse en dos tipos: triangulación y sistemas cuadráticos. En los algoritmos basados en triangulación, la posición se estima mediante la intersección de las rectas proporcionadas por la dirección de llegada de la señal. En cambio, en los sistemas cuadráticos, la posición se estima mediante la intersección de superficies con igual diferencia en el tiempo de llegada (time difference of arrival, TDOA) o diferencia en la frecuencia de llegada (frequency difference of arrival, FDOA). Aunque sólo se ha implementado la estimación del TDOA y FDOA mediante la diferencia de tiempos de llegada y diferencia de frecuencias, se presentan estudios exhaustivos sobre los diferentes algoritmos para la estimación del TDOA, FDOA y localización pasiva mediante TDOA-FDOA. La segunda parte de la tesis está dedicada al diseño e implementación filtros discretos de respuesta finita (finite impulse response, FIR) para dos aplicaciones radar: phased array de banda ancha mediante filtros retardadores (true-time delay, TTD) y la mejora del alcance de un radar sin modificar el “hardware” existente para que la solución sea de bajo coste. La operación de un phased array de banda ancha mediante desfasadores no es factible ya que el retardo temporal no puede aproximarse mediante un desfase. La solución adoptada e implementada consiste en sustituir los desfasadores por filtros digitales con retardo programable. El máximo alcance de un radar depende de la relación señal a ruido promedio en el receptor. La relación señal a ruido depende a su vez de la energía de señal transmitida, potencia multiplicado por la anchura de pulso. Cualquier cambio hardware que se realice conlleva un alto coste. La solución que se propone es utilizar una técnica de compresión de pulsos, consistente en introducir una modulación interna a la señal, desacoplando alcance y resolución. ABSTRACT This thesis is focused on the study and development of electronic warfare (EW) and radar algorithms for real-time implementation. The arrival of radar, radio and navigation systems to the military sphere led to the development of technologies to fight them. Therefore, the objective of EW systems is the control of the electromagnetic spectrum. Signals Intelligence (SIGINT) is one of the EW functions, whose mission is to detect, collect, analyze, classify and locate all kind of electromagnetic emissions. Electronic intelligence (ELINT) is the SIGINT subsystem that is devoted to radar signals. A real-time system is the one whose correctness depends not only on the provided result but also on the time in which this result is obtained. Radar and EW systems must provide information as fast as possible on a continuous basis and they can be defined as real-time systems. The introduction of real-time constraints implies a feedback process between the design of the algorithms and their hardware implementation. Moreover, a real-time constraint consists of two parameters: Latency and area of the implementation. All the algorithms in this thesis have been implemented on field programmable gate array (FPGAs) platforms, presenting a trade-off among performance, cost, power consumption and reconfigurability. The first part of the thesis is related to the study of different key subsystems of an ELINT equipment: Signal detection with channelized receivers, pulse parameter extraction, modulation classification for radar signals and passive location algorithms. The discrete Fourier transform (DFT) is a nearly optimal detector and frequency estimator for narrow-band signals buried in white noise. The introduction of fast algorithms to calculate the DFT, known as FFT, reduces the complexity and the processing time of the DFT computation. These properties have placed the FFT as one the most conventional methods for narrow-band signal detection for real-time applications. An algorithm for real-time spectral analysis for user-defined bandwidth, instantaneous dynamic range and resolution is presented. The most characteristic parameters of a pulsed signal are its time of arrival (TOA) and the pulse width (PW). The estimation of these basic parameters is a fundamental task in an ELINT equipment. A basic pulse parameter extractor (PPE) that is able to estimate all these parameters is designed and implemented. The PPE may be useful to perform a generic radar recognition process, perform an emitter location technique and can be used as the preprocessing part of an automatic modulation classifier (AMC). Modulation classification is a difficult task in a non-cooperative environment. An AMC consists of two parts: Signal preprocessing and the classification algorithm itself. Featurebased algorithms obtain different characteristics or features of the input signals. Once these features are extracted, the classification is carried out by processing these features. A feature based-AMC for pulsed radar signals with real-time requirements is studied, designed and implemented. Emitter passive location techniques can be divided into two classes: Triangulation systems, in which the emitter location is estimated with the intersection of the different lines of bearing created from the estimated directions of arrival, and quadratic position-fixing systems, in which the position is estimated through the intersection of iso-time difference of arrival (TDOA) or iso-frequency difference of arrival (FDOA) quadratic surfaces. Although TDOA and FDOA are only implemented with time of arrival and frequency differences, different algorithms for TDOA, FDOA and position estimation are studied and analyzed. The second part is dedicated to FIR filter design and implementation for two different radar applications: Wideband phased arrays with true-time delay (TTD) filters and the range improvement of an operative radar with no hardware changes to minimize costs. Wideband operation of phased arrays is unfeasible because time delays cannot be approximated by phase shifts. The presented solution is based on the substitution of the phase shifters by FIR discrete delay filters. The maximum range of a radar depends on the averaged signal to noise ratio (SNR) at the receiver. Among other factors, the SNR depends on the transmitted signal energy that is power times pulse width. Any possible hardware change implies high costs. The proposed solution lies in the use of a signal processing technique known as pulse compression, which consists of introducing an internal modulation within the pulse width, decoupling range and resolution.
Resumo:
Models for prediction of oil content as percentage of dried weight in olive fruits were comput- ed through PLS regression on NIR spectra. Spectral preprocessing was carried out by apply- ing multiplicative signal correction (MSC), Sa vitzky–Golay algorithm, standard normal variate correction (SNV), and detrending (D) to NIR spectra. MSC was the preprocessing technique showing the best performance. Further reduction of variability was performed by applying the Wold method of orthogonal signal correction (OSC). The calibration model achieved a R 2 of 0.93, a SEPc of 1.42, and a RPD of 3.8. The R 2 obtained with the validation set remained 0.93, and the SEPc was 1.41.