28 resultados para Multi-dimensional

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson’s Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson’s patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson’s disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Interneuron classification is an important and long-debated topic in neuroscience. A recent study provided a data set of digitally reconstructed interneurons classified by 42 leading neuroscientists according to a pragmatic classification scheme composed of five categorical variables, namely, of the interneuron type and four features of axonal morphology. From this data set we now learned a model which can classify interneurons, on the basis of their axonal morphometric parameters, into these five descriptive variables simultaneously. Because of differences in opinion among the neuroscientists, especially regarding neuronal type, for many interneurons we lacked a unique, agreed-upon classification, which we could use to guide model learning. Instead, we guided model learning with a probability distribution over the neuronal type and the axonal features, obtained, for each interneuron, from the neuroscientists’ classification choices. We conveniently encoded such probability distributions with Bayesian networks, calling them label Bayesian networks (LBNs), and developed a method to predict them. This method predicts an LBN by forming a probabilistic consensus among the LBNs of the interneurons most similar to the one being classified. We used 18 axonal morphometric parameters as predictor variables, 13 of which we introduce in this paper as quantitative counterparts to the categorical axonal features. We were able to accurately predict interneuronal LBNs. Furthermore, when extracting crisp (i.e., non-probabilistic) predictions from the predicted LBNs, our method outperformed related work on interneuron classification. Our results indicate that our method is adequate for multi-dimensional classification of interneurons with probabilistic labels. Moreover, the introduced morphometric parameters are good predictors of interneuron type and the four features of axonal morphology and thus may serve as objective counterparts to the subjective, categorical axonal features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Interneuron classification is an important and long-debated topic in neuroscience. A recent study provided a data set of digitally reconstructed interneurons classified by 42 leading neuroscientists according to a pragmatic classification scheme composed of five categorical variables, namely, of the interneuron type and four features of axonal morphology. From this data set we now learned a model which can classify interneurons, on the basis of their axonal morphometric parameters, into these five descriptive variables simultaneously. Because of differences in opinion among the neuroscientists, especially regarding neuronal type, for many interneurons we lacked a unique, agreed-upon classification, which we could use to guide model learning. Instead, we guided model learning with a probability distribution over the neuronal type and the axonal features, obtained, for each interneuron, from the neuroscientists’ classification choices. We conveniently encoded such probability distributions with Bayesian networks, calling them label Bayesian networks (LBNs), and developed a method to predict them. This method predicts an LBN by forming a probabilistic consensus among the LBNs of the interneurons most similar to the one being classified. We used 18 axonal morphometric parameters as predictor variables, 13 of which we introduce in this paper as quantitative counterparts to the categorical axonal features. We were able to accurately predict interneuronal LBNs. Furthermore, when extracting crisp (i.e., non-probabilistic) predictions from the predicted LBNs, our method outperformed related work on interneuron classification. Our results indicate that our method is adequate for multi-dimensional classification of interneurons with probabilistic labels. Moreover, the introduced morphometric parameters are good predictors of interneuron type and the four features of axonal morphology and thus may serve as objective counterparts to the subjective, categorical axonal features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The multi-dimensional classification problem is a generalisation of the recently-popularised task of multi-label classification, where each data instance is associated with multiple class variables. There has been relatively little research carried out specific to multi-dimensional classification and, although one of the core goals is similar (modelling dependencies among classes), there are important differences; namely a higher number of possible classifications. In this paper we present method for multi-dimensional classification, drawing from the most relevant multi-label research, and combining it with important novel developments. Using a fast method to model the conditional dependence between class variables, we form super-class partitions and use them to build multi-dimensional learners, learning each super-class as an ordinary class, and thus explicitly modelling class dependencies. Additionally, we present a mechanism to deal with the many class values inherent to super-classes, and thus make learning efficient. To investigate the effectiveness of this approach we carry out an empirical evaluation on a range of multi-dimensional datasets, under different evaluation metrics, and in comparison with high-performing existing multi-dimensional approaches from the literature. Analysis of results shows that our approach offers important performance gains over competing methods, while also exhibiting tractable running time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The impact of the Parkinson's disease and its treatment on the patients' health-related quality of life can be estimated either by means of generic measures such as the european quality of Life-5 Dimensions (EQ-5D) or specific measures such as the 8-item Parkinson's disease questionnaire (PDQ-8). In clinical studies, PDQ-8 could be used in detriment of EQ-5D due to the lack of resources, time or clinical interest in generic measures. Nevertheless, PDQ-8 cannot be applied in cost-effectiveness analyses which require generic measures and quantitative utility scores, such as EQ-5D. To deal with this problem, a commonly used solution is the prediction of EQ-5D from PDQ-8. In this paper, we propose a new probabilistic method to predict EQ-5D from PDQ-8 using multi-dimensional Bayesian network classifiers. Our approach is evaluated using five-fold cross-validation experiments carried out on a Parkinson's data set containing 488 patients, and is compared with two additional Bayesian network-based approaches, two commonly used mapping methods namely, ordinary least squares and censored least absolute deviations, and a deterministic model. Experimental results are promising in terms of predictive performance as well as the identification of dependence relationships among EQ-5D and PDQ-8 items that the mapping approaches are unable to detect

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-dimensional classification (MDC) is the supervised learning problem where an instance is associated with multiple classes, rather than with a single class, as in traditional classification problems. Since these classes are often strongly correlated, modeling the dependencies between them allows MDC methods to improve their performance – at the expense of an increased computational cost. In this paper we focus on the classifier chains (CC) approach for modeling dependencies, one of the most popular and highest-performing methods for multi-label classification (MLC), a particular case of MDC which involves only binary classes (i.e., labels). The original CC algorithm makes a greedy approximation, and is fast but tends to propagate errors along the chain. Here we present novel Monte Carlo schemes, both for finding a good chain sequence and performing efficient inference. Our algorithms remain tractable for high-dimensional data sets and obtain the best predictive performance across several real data sets.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper proposes a new multi-objective estimation of distribution algorithm (EDA) based on joint modeling of objectives and variables. This EDA uses the multi-dimensional Bayesian network as its probabilistic model. In this way it can capture the dependencies between objectives, variables and objectives, as well as the dependencies learnt between variables in other Bayesian network-based EDAs. This model leads to a problem decomposition that helps the proposed algorithm to find better trade-off solutions to the multi-objective problem. In addition to Pareto set approximation, the algorithm is also able to estimate the structure of the multi-objective problem. To apply the algorithm to many-objective problems, the algorithm includes four different ranking methods proposed in the literature for this purpose. The algorithm is applied to the set of walking fish group (WFG) problems, and its optimization performance is compared with an evolutionary algorithm and another multi-objective EDA. The experimental results show that the proposed algorithm performs significantly better on many of the problems and for different objective space dimensions, and achieves comparable results on some compared with the other algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Global linear instability theory is concerned with the temporal or spatial development of small-amplitude perturbations superposed upon laminar steady or time-periodic threedimensional flows, which are inhomogeneous in two (and periodic in one) or all three spatial directions.1 The theory addresses flows developing in complex geometries, in which the parallel or weakly nonparallel basic flow approximation invoked by classic linear stability theory does not hold. As such, global linear theory is called to fill the gap in research into stability and transition in flows over or through complex geometries. Historically, global linear instability has been (and still is) concerned with solution of multi-dimensional eigenvalue problems; the maturing of non-modal linear instability ideas in simple parallel flows during the last decade of last century2–4 has given rise to investigation of transient growth scenarios in an ever increasing variety of complex flows. After a brief exposition of the theory, connections are sought with established approaches for structure identification in flows, such as the proper orthogonal decomposition and topology theory in the laminar regime and the open areas for future research, mainly concerning turbulent and three-dimensional flows, are highlighted. Recent results obtained in our group are reported in both the time-stepping and the matrix-forming approaches to global linear theory. In the first context, progress has been made in implementing a Jacobian-Free Newton Krylov method into a standard finite-volume aerodynamic code, such that global linear instability results may now be obtained in compressible flows of aeronautical interest. In the second context a new stable very high-order finite difference method is implemented for the spatial discretization of the operators describing the spatial BiGlobal EVP, PSE-3D and the TriGlobal EVP; combined with sparse matrix treatment, all these problems may now be solved on standard desktop computers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Synthetic Aperture Radar (SAR) images a target region reflectivity function in the multi-dimensional spatial domain of range and cross-range. SAR synthesizes a large aperture radar in order to achieve finer azimuth resolution than the one provided by any on-board real antenna. Conventional SAR techniques assume a single reflection of transmitted waveforms from targets. Nevertheless, today¿s new scenes force SAR systems to work in urban environments. Consequently, multiple-bounce returns are added to direct-scatter echoes. We refer to these as ghost images, since they obscure true target image and lead to poor resolution. By analyzing the quadratic phase error (QPE), this paper demonstrates that Earth¿s curvature influences the defocusing degree of multipath returns. In addition to the QPE, other parameters such as integrated sidelobe ratio (ISLR), peak sidelobe ratio (PSLR), contrast and entropy provide us with the tools to identify direct-scatter echoes in images containing undesired returns coming from multipath.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Synthetic Aperture Radar (SAR) images a target region reflectivity function in the multi-dimensional spatial domain of range and cross-range. SAR synthesizes a large aperture radar in order to achieve a finer azimuth resolution than the one provided by any on-board real antenna. Conventional SAR techniques assume a single reflection of transmitted waveforms from targets. Nevertheless, today¿s new scenes force SAR systems to work in urban environments. Consequently, multiple-bounce returns are added to directscatter echoes. We refer to these as ghost images, since they obscure true target image and lead to poor resolution. By analyzing the quadratic phase error (QPE), this paper demonstrates that Earth¿s curvature influences the defocusing degree of multipath returns. In addition to the QPE, other parameters such as integrated sidelobe ratio (ISLR), peak sidelobe ratio (PSLR), contrast (C) and entropy (E) provide us with the tools to identify direct-scatter echoes in images containing undesired returns coming from multipath.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Synthetic Aperture Radar (SAR) images a target region reflectivity function in the multi-dimensional spatial domain of range and cross-range with a finer azimuth resolution than the one provided by any on-board real antenna. Conventional SAR techniques assume a single reflection of transmitted waveforms from targets. Nevertheless, new uses of Unmanned Aerial Vehicles (UAVs) for civilian-security applications force SAR systems to work in much more complex scenes such as urban environments. Consequently, multiple-bounce returns are additionally superposed to direct-scatter echoes. They are known as ghost images, since they obscure true target image and lead to poor resolution. All this may involve a significant problem in applications related to surveillance and security. In this work, an innovative multipath mitigation technique is presented in which Time Reversal (TR) concept is applied to SAR images when the target is concealed in clutter, leading to TR-SAR technique. This way, the effect of multipath is considerably reduced ?or even removed?, recovering the lost resolution due to multipath propagation. Furthermore, some focusing indicators such as entropy (E), contrast (C) and Rényi entropy (RE) provide us with a good focusing criterion when using TR-SAR.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract The creation of atlases, or digital models where information from different subjects can be combined, is a field of increasing interest in biomedical imaging. When a single image does not contain enough information to appropriately describe the organism under study, it is then necessary to acquire images of several individuals, each of them containing complementary data with respect to the rest of the components in the cohort. This approach allows creating digital prototypes, ranging from anatomical atlases of human patients and organs, obtained for instance from Magnetic Resonance Imaging, to gene expression cartographies of embryo development, typically achieved from Light Microscopy. Within such context, in this PhD Thesis we propose, develop and validate new dedicated image processing methodologies that, based on image registration techniques, bring information from multiple individuals into alignment within a single digital atlas model. We also elaborate a dedicated software visualization platform to explore the resulting wealth of multi-dimensional data and novel analysis algo-rithms to automatically mine the generated resource in search of bio¬logical insights. In particular, this work focuses on gene expression data from developing zebrafish embryos imaged at the cellular resolution level with Two-Photon Laser Scanning Microscopy. Disposing of quantitative measurements relating multiple gene expressions to cell position and their evolution in time is a fundamental prerequisite to understand embryogenesis multi-scale processes. However, the number of gene expressions that can be simultaneously stained in one acquisition is limited due to optical and labeling constraints. These limitations motivate the implementation of atlasing strategies that can recreate a virtual gene expression multiplex. The developed computational tools have been tested in two different scenarios. The first one is the early zebrafish embryogenesis where the resulting atlas constitutes a link between the phenotype and the genotype at the cellular level. The second one is the late zebrafish brain where the resulting atlas allows studies relating gene expression to brain regionalization and neurogenesis. The proposed computational frameworks have been adapted to the requirements of both scenarios, such as the integration of partial views of the embryo into a whole embryo model with cellular resolution or the registration of anatom¬ical traits with deformable transformation models non-dependent on any specific labeling. The software implementation of the atlas generation tool (Match-IT) and the visualization platform (Atlas-IT) together with the gene expression atlas resources developed in this Thesis are to be made freely available to the scientific community. Lastly, a novel proof-of-concept experiment integrates for the first time 3D gene expression atlas resources with cell lineages extracted from live embryos, opening up the door to correlate genetic and cellular spatio-temporal dynamics. La creación de atlas, o modelos digitales, donde la información de distintos sujetos puede ser combinada, es un campo de creciente interés en imagen biomédica. Cuando una sola imagen no contiene suficientes datos como para describir apropiadamente el organismo objeto de estudio, se hace necesario adquirir imágenes de varios individuos, cada una de las cuales contiene información complementaria respecto al resto de componentes del grupo. De este modo, es posible crear prototipos digitales, que pueden ir desde atlas anatómicos de órganos y pacientes humanos, adquiridos por ejemplo mediante Resonancia Magnética, hasta cartografías de la expresión genética del desarrollo de embrionario, típicamente adquiridas mediante Microscopía Optica. Dentro de este contexto, en esta Tesis Doctoral se introducen, desarrollan y validan nuevos métodos de procesado de imagen que, basándose en técnicas de registro de imagen, son capaces de alinear imágenes y datos provenientes de múltiples individuos en un solo atlas digital. Además, se ha elaborado una plataforma de visualization específicamente diseñada para explorar la gran cantidad de datos, caracterizados por su multi-dimensionalidad, que resulta de estos métodos. Asimismo, se han propuesto novedosos algoritmos de análisis y minería de datos que permiten inspeccionar automáticamente los atlas generados en busca de conclusiones biológicas significativas. En particular, este trabajo se centra en datos de expresión genética del desarrollo embrionario del pez cebra, adquiridos mediante Microscopía dos fotones con resolución celular. Disponer de medidas cuantitativas que relacionen estas expresiones genéticas con las posiciones celulares y su evolución en el tiempo es un prerrequisito fundamental para comprender los procesos multi-escala característicos de la morfogénesis. Sin embargo, el número de expresiones genéticos que pueden ser simultáneamente etiquetados en una sola adquisición es reducido debido a limitaciones tanto ópticas como del etiquetado. Estas limitaciones requieren la implementación de estrategias de creación de atlas que puedan recrear un multiplexado virtual de expresiones genéticas. Las herramientas computacionales desarrolladas han sido validadas en dos escenarios distintos. El primer escenario es el desarrollo embrionario temprano del pez cebra, donde el atlas resultante permite constituir un vínculo, a nivel celular, entre el fenotipo y el genotipo de este organismo modelo. El segundo escenario corresponde a estadios tardíos del desarrollo del cerebro del pez cebra, donde el atlas resultante permite relacionar expresiones genéticas con la regionalización del cerebro y la formación de neuronas. La plataforma computacional desarrollada ha sido adaptada a los requisitos y retos planteados en ambos escenarios, como la integración, a resolución celular, de vistas parciales dentro de un modelo consistente en un embrión completo, o el alineamiento entre estructuras de referencia anatómica equivalentes, logrado mediante el uso de modelos de transformación deformables que no requieren ningún marcador específico. Está previsto poner a disposición de la comunidad científica tanto la herramienta de generación de atlas (Match-IT), como su plataforma de visualización (Atlas-IT), así como las bases de datos de expresión genética creadas a partir de estas herramientas. Por último, dentro de la presente Tesis Doctoral, se ha incluido una prueba conceptual innovadora que permite integrar los mencionados atlas de expresión genética tridimensionales dentro del linaje celular extraído de una adquisición in vivo de un embrión. Esta prueba conceptual abre la puerta a la posibilidad de correlar, por primera vez, las dinámicas espacio-temporales de genes y células.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The understanding of the embryogenesis in living systems requires reliable quantitative analysis of the cell migration throughout all the stages of development. This is a major challenge of the "in-toto" reconstruction based on different modalities of "in-vivo" imaging techniques -spatio-temporal resolution and image artifacts and noise. Several methods for cell tracking are available, but expensive manual interaction -time and human resources- is always required to enforce coherence. Because of this limitation it is necessary to restrict the experiments or assume an uncontrolled error rate. Is it possible to obtain automated reliable measurements of migration? can we provide a seed for biologists to complete cell lineages efficiently? We propose a filtering technique that considers trajectories as spatio-temporal connected structures that prunes out those that might introduce noise and false positives by using multi-dimensional morphological operators.