909 resultados para Hidden Markov random fields
Resumo:
There is an emerging interest in modeling spatially correlated survival data in biomedical and epidemiological studies. In this paper, we propose a new class of semiparametric normal transformation models for right censored spatially correlated survival data. This class of models assumes that survival outcomes marginally follow a Cox proportional hazard model with unspecified baseline hazard, and their joint distribution is obtained by transforming survival outcomes to normal random variables, whose joint distribution is assumed to be multivariate normal with a spatial correlation structure. A key feature of the class of semiparametric normal transformation models is that it provides a rich class of spatial survival models where regression coefficients have population average interpretation and the spatial dependence of survival times is conveniently modeled using the transformed variables by flexible normal random fields. We study the relationship of the spatial correlation structure of the transformed normal variables and the dependence measures of the original survival times. Direct nonparametric maximum likelihood estimation in such models is practically prohibited due to the high dimensional intractable integration of the likelihood function and the infinite dimensional nuisance baseline hazard parameter. We hence develop a class of spatial semiparametric estimating equations, which conveniently estimate the population-level regression coefficients and the dependence parameters simultaneously. We study the asymptotic properties of the proposed estimators, and show that they are consistent and asymptotically normal. The proposed method is illustrated with an analysis of data from the East Boston Ashma Study and its performance is evaluated using simulations.
Resumo:
We focus on kernels incorporating different kinds of prior knowledge on functions to be approximated by Kriging. A recent result on random fields with paths invariant under a group action is generalised to combinations of composition operators, and a characterisation of kernels leading to random fields with additive paths is obtained as a corollary. A discussion follows on some implications on design of experiments, and it is shown in the case of additive kernels that the so-called class of “axis designs” outperforms Latin hypercubes in terms of the IMSE criterion.
Resumo:
Sterols are an essential class of lipids in eukaryotes, where they serve as structural components of membranes and play important roles as signaling molecules. Sterols are also of high pharmacological significance: cholesterol-lowering drugs are blockbusters in human health, and inhibitors of ergosterol biosynthesis are widely used as antifungals. Inhibitors of ergosterol synthesis are also being developed for Chagas's disease, caused by Trypanosoma cruzi. Here we develop an in silico pipeline to globally evaluate sterol metabolism and perform comparative genomics. We generate a library of hidden Markov model-based profiles for 42 sterol biosynthetic enzymes, which allows expressing the genomic makeup of a given species as a numerical vector. Hierarchical clustering of these vectors functionally groups eukaryote proteomes and reveals convergent evolution, in particular metabolic reduction in obligate endoparasites. We experimentally explore sterol metabolism by testing a set of sterol biosynthesis inhibitors against trypanosomatids, Plasmodium falciparum, Giardia, and mammalian cells, and by quantifying the expression levels of sterol biosynthetic genes during the different life stages of T. cruzi and Trypanosoma brucei. The phenotypic data correlate with genomic makeup for simvastatin, which showed activity against trypanosomatids. Other findings, such as the activity of terbinafine against Giardia, are not in agreement with the genotypic profile.
Resumo:
We explore a generalisation of the L´evy fractional Brownian field on the Euclidean space based on replacing the Euclidean norm with another norm. A characterisation result for admissible norms yields a complete description of all self-similar Gaussian random fields with stationary increments. Several integral representations of the introduced random fields are derived. In a similar vein, several non-Euclidean variants of the fractional Poisson field are introduced and it is shown that they share the covariance structure with the fractional Brownian field and converge to it. The shape parameters of the Poisson and Brownian variants are related by convex geometry transforms, namely the radial pth mean body and the polar projection transforms.
Resumo:
Ecological speciation is the process by which reproductively isolated populations emerge as a consequence of divergent natural or ecologically-mediated sexual selection. Most genomic studies of ecological speciation have investigated allopatric populations, making it difficult to infer reproductive isolation. The few studies on sympatric ecotypes have focused on advanced stages of the speciation process after thousands of generations of divergence. As a consequence, we still do not know what genomic signatures of the early onset of ecological speciation look like. Here, we examined genomic differentiation among migratory lake and resident stream ecotypes of threespine stickleback reproducing in sympatry in one stream, and in parapatry in another stream. Importantly, these ecotypes started diverging less than 150 years ago. We obtained 34,756 SNPs with restriction-site associated DNA sequencing and identified genomic islands of differentiation using a Hidden Markov Model approach. Consistent with incipient ecological speciation, we found significant genomic differentiation between ecotypes both in sympatry and parapatry. Of 19 islands of differentiation resisting gene flow in sympatry, all were also differentiated in parapatry and were thus likely driven by divergent selection among habitats. These islands clustered in quantitative trait loci controlling divergent traits among the ecotypes, many of them concentrated in one region with low to intermediate recombination. Our findings suggest that adaptive genomic differentiation at many genetic loci can arise and persist in sympatry at the very early stage of ecotype divergence, and that the genomic architecture of adaptation may facilitate this.
Resumo:
SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^
Resumo:
Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. Many recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. The current study incorporated gene network information into gene-based analysis of GWAS data for Crohn's disease (CD). The purpose was to develop statistical models to boost the power of identifying disease-associated genes and gene subnetworks by maximizing the use of existing biological knowledge from multiple sources. The results revealed that Markov random field (MRF) based mixture model incorporating direct neighborhood information from a single gene network is not efficient in identifying CD-related genes based on the GWAS data. The incorporation of solely direct neighborhood information might lead to the low efficiency of these models. Alternative MRF models looking beyond direct neighboring information are necessary to be developed in the future for the purpose of this study.^
Resumo:
The classification of airborne lidar data is a relevant task in different disciplines. The information about the geometry and the full waveform can be used in order to classify the 3D point cloud. In Wadden Sea areas the classification of lidar data is of main interest for the scientific monitoring of coastal morphology and habitats, but it becomes a challenging task due to flat areas with hardly any discriminative objects. For the classification we combine a Conditional Random Fields framework with a Random Forests approach. By classifying in this way, we benefit from the consideration of context on the one hand and from the opportunity to utilise a high number of classification features on the other hand. We investigate the relevance of different features for the lidar points in coastal areas as well as for the interaction of neighbouring points.
Resumo:
Este trabajo de Tesis ha abordado el objetivo de dar robustez y mejorar la Detección de Actividad de Voz en entornos acústicos adversos con el fin de favorecer el comportamiento de muchas aplicaciones vocales, por ejemplo aplicaciones de telefonía basadas en reconocimiento automático de voz, aplicaciones en sistemas de transcripción automática, aplicaciones en sistemas multicanal, etc. En especial, aunque se han tenido en cuenta todos los tipos de ruido, se muestra especial interés en el estudio de las voces de fondo, principal fuente de error de la mayoría de los Detectores de Actividad en la actualidad. Las tareas llevadas a cabo poseen como punto de partida un Detector de Actividad basado en Modelos Ocultos de Markov, cuyo vector de características contiene dos componentes: la energía normalizada y la variación de la energía. Las aportaciones fundamentales de esta Tesis son las siguientes: 1) ampliación del vector de características de partida dotándole así de información espectral, 2) ajuste de los Modelos Ocultos de Markov al entorno y estudio de diferentes topologías y, finalmente, 3) estudio e inclusión de nuevas características, distintas de las del punto 1, para filtrar los pulsos de pronunciaciones que proceden de las voces de fondo. Los resultados de detección, teniendo en cuenta los tres puntos anteriores, muestran con creces los avances realizados y son significativamente mejores que los resultados obtenidos, bajo las mismas condiciones, con otros detectores de actividad de referencia. This work has been focused on improving the robustness at Voice Activity Detection in adverse acoustic environments in order to enhance the behavior of many vocal applications, for example telephony applications based on automatic speech recognition, automatic transcription applications, multichannel systems applications, and so on. In particular, though all types of noise have taken into account, this research has special interest in the study of pronunciations coming from far-field speakers, the main error source of most activity detectors today. The tasks carried out have, as starting point, a Hidden Markov Models Voice Activity Detector which a feature vector containing two components: normalized energy and delta energy. The key points of this Thesis are the following: 1) feature vector extension providing spectral information, 2) Hidden Markov Models adjustment to environment and study of different Hidden Markov Model topologies and, finally, 3) study and inclusion of new features, different from point 1, to reject the pronunciations coming from far-field speakers. Detection results, taking into account the above three points, show the advantages of using this method and are significantly better than the results obtained under the same conditions by other well-known voice activity detectors.
Resumo:
Belief propagation (BP) is a technique for distributed inference in wireless networks and is often used even when the underlying graphical model contains cycles. In this paper, we propose a uniformly reweighted BP scheme that reduces the impact of cycles by weighting messages by a constant ?edge appearance probability? rho ? 1. We apply this algorithm to distributed binary hypothesis testing problems (e.g., distributed detection) in wireless networks with Markov random field models. We demonstrate that in the considered setting the proposed method outperforms standard BP, while maintaining similar complexity. We then show that the optimal ? can be approximated as a simple function of the average node degree, and can hence be computed in a distributed fashion through a consensus algorithm.
Resumo:
Light detection and ranging (LiDAR) technology is beginning to have an impact on agriculture. Canopy volume and/or fruit tree leaf area can be estimated using terrestrial laser sensors based on this technology. However, the use of these devices may have different options depending on the resolution and scanning mode. As a consequence, data accuracy and LiDAR derived parameters are affected by sensor configuration, and may vary according to vegetative characteristics of tree crops. Given this scenario, users and suppliers of these devices need to know how to use the sensor in each case. This paper presents a computer program to determine the best configuration, allowing simulation and evaluation of different LiDAR configurations in various tree structures (or training systems). The ultimate goal is to optimise the use of laser scanners in field operations. The software presented generates a virtual orchard, and then allows the scanning simulation with a laser sensor. Trees are created using a hidden Markov tree (HMT) model. Varying the foliar structure of the orchard the LiDAR simulation was applied to twenty different artificially created orchards with or without leaves from two positions (lateral and zenith). To validate the laser sensor configuration, leaf surface of simulated trees was compared with the parameters obtained by LiDAR measurements: the impacted leaf area, the impacted total area (leaves and wood), and th impacted area in the three outer layers of leaves.
Resumo:
Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.
Resumo:
MP2RAGE has proven to be a bias-free MR acquisition with excellent contrast between grey and white matter. We investigated the ability of three state-of-the-art algorithms to automatically extract white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF) from MPRAGE and MP2RAGE images: unified Segmentation (S) in SPM82 , its extension New Segment (NS), and an in-house Expectation-Maximization Markov Random Field tissue classification3 (EM-MRF) with Graph Cut (GC) optimization4 . Our goal is to quantify the differences between MPRAGE and MP2RAGE-based brain tissue probability maps.
Resumo:
En esta tesis doctoral se propone una técnica biométrica de verificación en teléfonos móviles consistente en realizar una firma en el aire con la mano que sujeta el teléfono móvil. Los acelerómetros integrados en el dispositivo muestrean las aceleraciones del movimiento de la firma en el aire, generando tres señales temporales que pueden utilizarse para la verificación del usuario. Se proponen varios enfoques para la implementación del sistema de verificación, a partir de los enfoques más utilizados en biometría de firma manuscrita: correspondencia de patrones, con variantes de los algoritmos de Needleman-Wusch (NW) y Dynamic Time Warping (DTW), modelos ocultos de Markov (HMM) y clasificador estadístico basado en Máquinas de Vector Soporte (SVM). Al no existir bases de datos públicas de firmas en el aire y con el fin de evaluar los métodos propuestos en esta tesis doctoral, se han capturado dos con distintas características; una con falsificaciones reales a partir del estudio de las grabaciones de usuarios auténticos y otra con muestras de usuarios obtenidas en diferentes sesiones a lo largo del tiempo. Utilizando estas bases de datos se han evaluado una gran cantidad de algoritmos para implementar un sistema de verificación basado en firma en el aire. Esta evaluación se ha realizado de acuerdo con el estándar ISO/IEC 19795, añadiendo el caso de verificación en mundo abierto no incluido en la norma. Además, se han analizado las características que hacen que una firma sea suficientemente segura. Por otro lado, se ha estudiado la permanencia de las firmas en el aire a lo largo del tiempo, proponiendo distintos métodos de actualización, basados en una adaptación dinámica del patrón, para mejorar su rendimiento. Finalmente, se ha implementado un prototipo de la técnica de firma en el aire para teléfonos Android e iOS. Los resultados de esta tesis doctoral han tenido un gran impacto, generando varias publicaciones en revistas internacionales, congresos y libros. La firma en el aire ha sido nombrada también en varias revistas de divulgación, portales de noticias Web y televisión. Además, se han obtenido varios premios en competiciones de ideas innovadoras y se ha firmado un acuerdo de explotación de la tecnología con una empresa extranjera. ABSTRACT This thesis proposes a biometric verification technique on mobile phones consisting on making a signature in the air with the hand holding a mobile phone. The accelerometers integrated in the device capture the movement accelerations, generating three temporal signals that can be used for verification. This thesis suggests several approaches for implementing the verification system, based on the most widely used approaches in handwritten signature biometrics: template matching, with a lot of variations of the Needleman- Wusch (NW) and Dynamic Time Warping (DTW) algorithms, Hidden Markov Models (HMM) and Supported Vector Machines (SVM). As there are no public databases of in-air signatures and with the aim of assessing the proposed methods, there have been captured two databases; one. with real falsification attempts from the study of recordings captured when genuine users made their signatures in front of a camera, and other, with samples obtained in different sessions over a long period of time. These databases have been used to evaluate a lot of algorithms in order to implement a verification system based on in-air signatures. This evaluation has been conducted according to the standard ISO/IEC 19795, adding the open-set verification scenario not included in the norm. In addition, the characteristics of a secure signature are also investigated, as well as the permanence of in-air signatures over time, proposing several updating strategies to improve its performance. Finally, a prototype of in-air signature has been developed for iOS and Android phones. The results of this thesis have achieved a high impact, publishing several articles in SCI journals, conferences and books. The in-air signature deployed in this thesis has been also referred in numerous media. Additionally, this technique has won several awards in the entrepreneurship field and also an exploitation agreement has been signed with a foreign company.
Resumo:
For most of us, speaking in a non-native language involves deviating to some extent from native pronunciation norms. However, the detailed basis for foreign accent (FA) remains elusive, in part due to methodological challenges in isolating segmental from suprasegmental factors. The current study examines the role of segmental features in conveying FA through the use of a generative approach in which accent is localised to single consonantal segments. Three techniques are evaluated: the first requires a highly-proficiency bilingual to produce words with isolated accented segments; the second uses cross-splicing of context-dependent consonants from the non-native language into native words; the third employs hidden Markov model synthesis to blend voice models for both languages. Using English and Spanish as the native/non-native languages respectively, listener cohorts from both languages identified words and rated their degree of FA. All techniques were capable of generating accented words, but to differing degrees. Naturally-produced speech led to the strongest FA ratings and synthetic speech the weakest, which we interpret as the outcome of over-smoothing. Nevertheless, the flexibility offered by synthesising localised accent encourages further development of the method.