672 resultados para Dirichlet eigenvalues
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
In this paper we propose the design of communication systems based on using periodic nonlinear Fourier transform (PNFT), following the introduction of the method in the Part I. We show that the famous "eigenvalue communication" idea [A. Hasegawa and T. Nyu, J. Lightwave Technol. 11, 395 (1993)] can also be generalized for the PNFT application: In this case, the main spectrum attributed to the PNFT signal decomposition remains constant with the propagation down the optical fiber link. Therefore, the main PNFT spectrum can be encoded with data in the same way as soliton eigenvalues in the original proposal. The results are presented in terms of the bit-error rate (BER) values for different modulation techniques and different constellation sizes vs. the propagation distance, showing a good potential of the technique.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Scheduling optimization is concerned with the optimal allocation of events to time slots. In this paper, we look at one particular example of scheduling problems - the 2015 Joint Statistical Meetings. We want to assign each session among similar topics to time slots to reduce scheduling conflicts. Chapter 1 briefly talks about the motivation for this example as well as the constraints and the optimality criterion. Chapter 2 proposes use of Latent Dirichlet Allocation (LDA) to identify the topic proportions in each session and talks about the fitting of the model. Chapter 3 translates these ideas into a mathematical formulation and introduces a Greedy Algorithm to minimize conflicts. Chapter 4 demonstrates the improvement of the scheduling with this method.
Resumo:
OBJECTIVES: Three dental topography measurements: Dirichlet Normal Energy (DNE), Relief Index (RFI), and Orientation Patch Count Rotated (OPCR) are examined for their interaction with measures of wear, within and between upper and lower molars in Alouatta palliata. Potential inferences of the "dental sculpting" phenomenon are explored. MATERIALS AND METHODS: Fifteen occluding pairs of howling monkey first molars (15 upper, 15 lower) opportunistically collected from La Pacifica, Costa Rica, were selected to sample wear stages ranging from unworn to heavily worn as measured by the Dentine Exposure Ratio (DER). DNE, RFI, and OPCR were measured from three-dimensional surface reconstructions (PLY files) derived from high-resolution CT scans. Relationships among the variables were tested with regression analyses. RESULTS: Upper molars have more cutting edges, exhibiting significantly higher DNE, but have significantly lower RFI values. However, the relationships among the measures are concordant across both sets of molars. DER and EDJL are curvilinearly related. DER is positively correlated with DNE, negatively correlated with RFI, and uncorrelated with OPCR. EDJL is not correlated with DNE, or RFI, but is positively correlated with OPCR among lower molars only. DISCUSSION: The relationships among these metrics suggest that howling monkey teeth adaptively engage macrowear. DNE increases with wear in this sample presumably improving food breakdown. RFI is initially high but declines with wear, suggesting that the initially high RFI safeguards against dental senescence. OPCR values in howling monkey teeth do not show a clear relationship with wear changes.
Resumo:
© 2016 Springer Science+Business Media New YorkResearchers studying mammalian dentitions from functional and adaptive perspectives increasingly have moved towards using dental topography measures that can be estimated from 3D surface scans, which do not require identification of specific homologous landmarks. Here we present molaR, a new R package designed to assist researchers in calculating four commonly used topographic measures: Dirichlet Normal Energy (DNE), Relief Index (RFI), Orientation Patch Count (OPC), and Orientation Patch Count Rotated (OPCR) from surface scans of teeth, enabling a unified application of these informative new metrics. In addition to providing topographic measuring tools, molaR has complimentary plotting functions enabling highly customizable visualization of results. This article gives a detailed description of the DNE measure, walks researchers through installing, operating, and troubleshooting molaR and its functions, and gives an example of a simple comparison that measured teeth of the primates Alouatta and Pithecia in molaR and other available software packages. molaR is a free and open source software extension, which can be found at the doi:10.13140/RG.2.1.3563.4961(molaR v. 2.0) as well as on the Internet repository CRAN, which stores R packages.
Resumo:
Bayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.
Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.
Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with
little or no prior knowledge
Resumo:
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
Resumo:
For decades scientists have attempted to use ideas of classical mechanics to choose basis functions for calculating spectra. The hope is that a classically-motivated basis set will be small because it covers only the dynamically important part of phase space. One popular idea is to use phase space localized (PSL) basis functions. This thesis improves on previous efforts to use PSL functions and examines the usefulness of these improvements. Because the overlap matrix, in the matrix eigenvalue problem obtained by using PSL functions with the variational method, is not an identity, it is costly to use iterative methods to solve the matrix eigenvalue problem. We show that it is possible to circumvent the orthogonality (overlap) problem and use iterative eigensolvers. We also present an altered method of calculating the matrix elements that improves the performance of the PSL basis functions, and also a new method which more efficiently chooses which PSL functions to include. These improvements are applied to a variety of single well molecules. We conclude that for single minimum molecules, the PSL functions are inferior to other basis functions. However, the ideas developed here can be applied to other types of basis functions, and PSL functions may be useful for multi-well systems.
Resumo:
Archaeozoological mortality profiles have been used to infer site-specific subsistence strategies. There is however no common agreement on the best way to present these profiles and confidence intervals around age class proportions. In order to deal with these issues, we propose the use of the Dirichlet distribution and present a new approach to perform age-at-death multivariate graphical comparisons. We demonstrate the efficiency of this approach using domestic sheep/goat dental remains from 10 Cardial sites (Early Neolithic) located in South France and the Iberian Peninsula. We show that the Dirichlet distribution in age-at-death analysis can be used: (i) to generate Bayesian credible intervals around each age class of a mortality profile, even when not all age classes are observed; and (ii) to create 95% kernel density contours around each age-at-death frequency distribution when multiple sites are compared using correspondence analysis. The statistical procedure we present is applicable to the analysis of any categorical count data and particularly well-suited to archaeological data (e.g. potsherds, arrow heads) where sample sizes are typically small.
Resumo:
Text cohesion is an important element of discourse processing. This paper presents a new approach to modeling, quantifying, and visualizing text cohesion using automated cohesion flow indices that capture semantic links among paragraphs. Cohesion flow is calculated by applying Cohesion Network Analysis, a combination of semantic distances, Latent Semantic Analysis, and Latent Dirichlet Allocation, as well as Social Network Analysis. Experiments performed on 315 timed essays indicated that cohesion flow indices are significantly correlated with human ratings of text coherence and essay quality. Visualizations of the global cohesion indices are also included to support a more facile understanding of how cohesion flow impacts coherence in terms of semantic dependencies between paragraphs.
Resumo:
This paper introduces a novel, in-depth approach of analyzing the differences in writing style between two famous Romanian orators, based on automated textual complexity indices for Romanian language. The considered authors are: (a) Mihai Eminescu, Romania’s national poet and a remarkable journalist of his time, and (b) Ion C. Brătianu, one of the most important Romanian politicians from the middle of the 18th century. Both orators have a common journalistic interest consisting in their desire to spread the word about political issues in Romania via the printing press, the most important public voice at that time. In addition, both authors exhibit writing style particularities, and our aim is to explore these differences through our ReaderBench framework that computes a wide range of lexical and semantic textual complexity indices for Romanian and other languages. The used corpus contains two collections of speeches for each orator that cover the period 1857–1880. The results of this study highlight the lexical and cohesive textual complexity indices that reflect very well the differences in writing style, measures relying on Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) semantic models.
Resumo:
El artículo analiza los cambios político electorales en León, Guanajuato, a partir de cómo se fue configurando el desplazamiento del Partido Revolucionario Institucional (PRI) por el Partido Acción Nacional (PAN) en este ayuntamiento en el año 1988, hasta el cambio de correlación de fuerzas en el año 2012. Ello da pauta para analizar los escenarios que podrían caracterizar las próximas elecciones de este año. Con este objetivo se propone un modelo estadístico para dicho estudio: el modelo de regresión Dirichlet, el cual permite considerar la naturaleza de los datos electorales.
The article analyzes the electoral changes in León, Guanajuato, based on how it was setting the displacement of the Institutional Revolutionary Party (PRI) by the National Action Party (PAN) in this council in 1988, until the change of correlation forces in 2012, which gives guidelines to analyze the scenarios that could characterize the upcoming elections this year. With this aim the authors proposed a statistical model for the study: the Dirichlet regression model, which allows to consider the nature of electoral data.
Resumo:
Tests for dependence of continuous, discrete and mixed continuous-discrete variables are ubiquitous in science. The goal of this paper is to derive Bayesian alternatives to frequentist null hypothesis significance tests for dependence. In particular, we will present three Bayesian tests for dependence of binary, continuous and mixed variables. These tests are nonparametric and based on the Dirichlet Process, which allows us to use the same prior model for all of them. Therefore, the tests are “consistent” among each other, in the sense that the probabilities that variables are dependent computed with these tests are commensurable across the different types of variables being tested. By means of simulations with artificial data, we show the effectiveness of the new tests.
Resumo:
There has been an increasing interest in the development of new methods using Pareto optimality to deal with multi-objective criteria (for example, accuracy and time complexity). Once one has developed an approach to a problem of interest, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Standard tests used for this purpose are able to consider jointly neither performance measures nor multiple competitors at once. The aim of this paper is to resolve these issues by developing statistical procedures that are able to account for multiple competing measures at the same time and to compare multiple algorithms altogether. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameters of such models, as usually the number of studied cases is very reduced in such comparisons. Data from a comparison among general purpose classifiers is used to show a practical application of our tests.