3 resultados para quantitative factor analysis
em Duke University
Resumo:
© 2016 Springer Science+Business Media New YorkResearchers studying mammalian dentitions from functional and adaptive perspectives increasingly have moved towards using dental topography measures that can be estimated from 3D surface scans, which do not require identification of specific homologous landmarks. Here we present molaR, a new R package designed to assist researchers in calculating four commonly used topographic measures: Dirichlet Normal Energy (DNE), Relief Index (RFI), Orientation Patch Count (OPC), and Orientation Patch Count Rotated (OPCR) from surface scans of teeth, enabling a unified application of these informative new metrics. In addition to providing topographic measuring tools, molaR has complimentary plotting functions enabling highly customizable visualization of results. This article gives a detailed description of the DNE measure, walks researchers through installing, operating, and troubleshooting molaR and its functions, and gives an example of a simple comparison that measured teeth of the primates Alouatta and Pithecia in molaR and other available software packages. molaR is a free and open source software extension, which can be found at the doi:10.13140/RG.2.1.3563.4961(molaR v. 2.0) as well as on the Internet repository CRAN, which stores R packages.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Multi-output Gaussian processes provide a convenient framework for multi-task problems. An illustrative and motivating example of a multi-task problem is multi-region electrophysiological time-series data, where experimentalists are interested in both power and phase coherence between channels. Recently, the spectral mixture (SM) kernel was proposed to model the spectral density of a single task in a Gaussian process framework. This work develops a novel covariance kernel for multiple outputs, called the cross-spectral mixture (CSM) kernel. This new, flexible kernel represents both the power and phase relationship between multiple observation channels. The expressive capabilities of the CSM kernel are demonstrated through implementation of 1) a Bayesian hidden Markov model, where the emission distribution is a multi-output Gaussian process with a CSM covariance kernel, and 2) a Gaussian process factor analysis model, where factor scores represent the utilization of cross-spectral neural circuits. Results are presented for measured multi-region electrophysiological data.