940 resultados para maximum likelihood analysis
Resumo:
Recently, the target function for crystallographic refinement has been improved through a maximum likelihood analysis, which makes proper allowance for the effects of data quality, model errors, and incompleteness. The maximum likelihood target reduces the significance of false local minima during the refinement process, but it does not completely eliminate them, necessitating the use of stochastic optimization methods such as simulated annealing for poor initial models. It is shown that the combination of maximum likelihood with cross-validation, which reduces overfitting, and simulated annealing by torsion angle molecular dynamics, which simplifies the conformational search problem, results in a major improvement of the radius of convergence of refinement and the accuracy of the refined structure. Torsion angle molecular dynamics and the maximum likelihood target function interact synergistically, the combination of both methods being significantly more powerful than each method individually. This is demonstrated in realistic test cases at two typical minimum Bragg spacings (dmin = 2.0 and 2.8 Å, respectively), illustrating the broad applicability of the combined method. In an application to the refinement of a new crystal structure, the combined method automatically corrected a mistraced loop in a poor initial model, moving the backbone by 4 Å.
Resumo:
The work presented evaluates the statistical characteristics of regional bias and expected error in reconstructions of real positron emission tomography (PET) data of human brain fluoro-deoxiglucose (FDG) studies carried out by the maximum likelihood estimator (MLE) method with a robust stopping rule, and compares them with the results of filtered backprojection (FBP) reconstructions and with the method of sieves. The task of evaluating radioisotope uptake in regions-of-interest (ROIs) is investigated. An assessment of bias and variance in uptake measurements is carried out with simulated data. Then, by using three different transition matrices with different degrees of accuracy and a components of variance model for statistical analysis, it is shown that the characteristics obtained from real human FDG brain data are consistent with the results of the simulation studies.
Resumo:
The variogram is essential for local estimation and mapping of any variable by kriging. The variogram itself must usually be estimated from sample data. The sampling density is a compromise between precision and cost, but it must be sufficiently dense to encompass the principal spatial sources of variance. A nested, multi-stage, sampling with separating distances increasing in geometric progression from stage to stage will do that. The data may then be analyzed by a hierarchical analysis of variance to estimate the components of variance for every stage, and hence lag. By accumulating the components starting from the shortest lag one obtains a rough variogram for modest effort. For balanced designs the analysis of variance is optimal; for unbalanced ones, however, these estimators are not necessarily the best, and the analysis by residual maximum likelihood (REML) will usually be preferable. The paper summarizes the underlying theory and illustrates its application with data from three surveys, one in which the design had four stages and was balanced and two implemented with unbalanced designs to economize when there were more stages. A Fortran program is available for the analysis of variance, and code for the REML analysis is listed in the paper. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
An unbalanced nested sampling design was used to investigate the spatial scale of soil and herbicide interactions at the field scale. A hierarchical analysis of variance based on residual maximum likelihood (REML) was used to analyse the data and provide a first estimate of the variogram. Soil samples were taken at 108 locations at a range of separating distances in a 9 ha field to explore small and medium scale spatial variation. Soil organic matter content, pH, particle size distribution, microbial biomass and the degradation and sorption of the herbicide, isoproturon, were determined for each soil sample. A large proportion of the spatial variation in isoproturon degradation and sorption occurred at sampling intervals less than 60 m, however, the sampling design did not resolve the variation present at scales greater than this. A sampling interval of 20-25 m should ensure that the main spatial structures are identified for isoproturon degradation rate and sorption without too great a loss of information in this field.
Resumo:
The variogram is essential for local estimation and mapping of any variable by kriging. The variogram itself must usually be estimated from sample data. The sampling density is a compromise between precision and cost, but it must be sufficiently dense to encompass the principal spatial sources of variance. A nested, multi-stage, sampling with separating distances increasing in geometric progression from stage to stage will do that. The data may then be analyzed by a hierarchical analysis of variance to estimate the components of variance for every stage, and hence lag. By accumulating the components starting from the shortest lag one obtains a rough variogram for modest effort. For balanced designs the analysis of variance is optimal; for unbalanced ones, however, these estimators are not necessarily the best, and the analysis by residual maximum likelihood (REML) will usually be preferable. The paper summarizes the underlying theory and illustrates its application with data from three surveys, one in which the design had four stages and was balanced and two implemented with unbalanced designs to economize when there were more stages. A Fortran program is available for the analysis of variance, and code for the REML analysis is listed in the paper. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
The modal analysis of a structural system consists on computing its vibrational modes. The experimental way to estimate these modes requires to excite the system with a measured or known input and then to measure the system output at different points using sensors. Finally, system inputs and outputs are used to compute the modes of vibration. When the system refers to large structures like buildings or bridges, the tests have to be performed in situ, so it is not possible to measure system inputs such as wind, traffic, . . .Even if a known input is applied, the procedure is usually difficult and expensive, and there are still uncontrolled disturbances acting at the time of the test. These facts led to the idea of computing the modes of vibration using only the measured vibrations and regardless of the inputs that originated them, whether they are ambient vibrations (wind, earthquakes, . . . ) or operational loads (traffic, human loading, . . . ). This procedure is usually called Operational Modal Analysis (OMA), and in general consists on to fit a mathematical model to the measured data assuming the unobserved excitations are realizations of a stationary stochastic process (usually white noise processes). Then, the modes of vibration are computed from the estimated model. The first issue investigated in this thesis is the performance of the Expectation- Maximization (EM) algorithm for the maximum likelihood estimation of the state space model in the field of OMA. The algorithm is described in detail and it is analysed how to apply it to vibration data. After that, it is compared to another well known method, the Stochastic Subspace Identification algorithm. The maximum likelihood estimate enjoys some optimal properties from a statistical point of view what makes it very attractive in practice, but the most remarkable property of the EM algorithm is that it can be used to address a wide range of situations in OMA. In this work, three additional state space models are proposed and estimated using the EM algorithm: • The first model is proposed to estimate the modes of vibration when several tests are performed in the same structural system. Instead of analyse record by record and then compute averages, the EM algorithm is extended for the joint estimation of the proposed state space model using all the available data. • The second state space model is used to estimate the modes of vibration when the number of available sensors is lower than the number of points to be tested. In these cases it is usual to perform several tests changing the position of the sensors from one test to the following (multiple setups of sensors). Here, the proposed state space model and the EM algorithm are used to estimate the modal parameters taking into account the data of all setups. • And last, a state space model is proposed to estimate the modes of vibration in the presence of unmeasured inputs that cannot be modelled as white noise processes. In these cases, the frequency components of the inputs cannot be separated from the eigenfrequencies of the system, and spurious modes are obtained in the identification process. The idea is to measure the response of the structure corresponding to different inputs; then, it is assumed that the parameters common to all the data correspond to the structure (modes of vibration), and the parameters found in a specific test correspond to the input in that test. The problem is solved using the proposed state space model and the EM algorithm. Resumen El análisis modal de un sistema estructural consiste en calcular sus modos de vibración. Para estimar estos modos experimentalmente es preciso excitar el sistema con entradas conocidas y registrar las salidas del sistema en diferentes puntos por medio de sensores. Finalmente, los modos de vibración se calculan utilizando las entradas y salidas registradas. Cuando el sistema es una gran estructura como un puente o un edificio, los experimentos tienen que realizarse in situ, por lo que no es posible registrar entradas al sistema tales como viento, tráfico, . . . Incluso si se aplica una entrada conocida, el procedimiento suele ser complicado y caro, y todavía están presentes perturbaciones no controladas que excitan el sistema durante el test. Estos hechos han llevado a la idea de calcular los modos de vibración utilizando sólo las vibraciones registradas en la estructura y sin tener en cuenta las cargas que las originan, ya sean cargas ambientales (viento, terremotos, . . . ) o cargas de explotación (tráfico, cargas humanas, . . . ). Este procedimiento se conoce en la literatura especializada como Análisis Modal Operacional, y en general consiste en ajustar un modelo matemático a los datos registrados adoptando la hipótesis de que las excitaciones no conocidas son realizaciones de un proceso estocástico estacionario (generalmente ruido blanco). Posteriormente, los modos de vibración se calculan a partir del modelo estimado. El primer problema que se ha investigado en esta tesis es la utilización de máxima verosimilitud y el algoritmo EM (Expectation-Maximization) para la estimación del modelo espacio de los estados en el ámbito del Análisis Modal Operacional. El algoritmo se describe en detalle y también se analiza como aplicarlo cuando se dispone de datos de vibraciones de una estructura. A continuación se compara con otro método muy conocido, el método de los Subespacios. Los estimadores máximo verosímiles presentan una serie de propiedades que los hacen óptimos desde un punto de vista estadístico, pero la propiedad más destacable del algoritmo EM es que puede utilizarse para resolver un amplio abanico de situaciones que se presentan en el Análisis Modal Operacional. En este trabajo se proponen y estiman tres modelos en el espacio de los estados: • El primer modelo se utiliza para estimar los modos de vibración cuando se dispone de datos correspondientes a varios experimentos realizados en la misma estructura. En lugar de analizar registro a registro y calcular promedios, se utiliza algoritmo EM para la estimación conjunta del modelo propuesto utilizando todos los datos disponibles. • El segundo modelo en el espacio de los estados propuesto se utiliza para estimar los modos de vibración cuando el número de sensores disponibles es menor que vi Resumen el número de puntos que se quieren analizar en la estructura. En estos casos es usual realizar varios ensayos cambiando la posición de los sensores de un ensayo a otro (múltiples configuraciones de sensores). En este trabajo se utiliza el algoritmo EM para estimar los parámetros modales teniendo en cuenta los datos de todas las configuraciones. • Por último, se propone otro modelo en el espacio de los estados para estimar los modos de vibración en la presencia de entradas al sistema que no pueden modelarse como procesos estocásticos de ruido blanco. En estos casos, las frecuencias de las entradas no se pueden separar de las frecuencias del sistema y se obtienen modos espurios en la fase de identificación. La idea es registrar la respuesta de la estructura correspondiente a diferentes entradas; entonces se adopta la hipótesis de que los parámetros comunes a todos los registros corresponden a la estructura (modos de vibración), y los parámetros encontrados en un registro específico corresponden a la entrada en dicho ensayo. El problema se resuelve utilizando el modelo propuesto y el algoritmo EM.
Resumo:
Thesis--Illinois.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
This paper presents a time-domain stochastic system identification method based on maximum likelihood estimation (MLE) with the expectation maximization (EM) algorithm. The effectiveness of this structural identification method is evaluated through numerical simulation in the context of the ASCE benchmark problem on structural health monitoring. The benchmark structure is a four-story, two-bay by two-bay steel-frame scale model structure built in the Earthquake Engineering Research Laboratory at the University of British Columbia, Canada. This paper focuses on Phase I of the analytical benchmark studies. A MATLAB-based finite element analysis code obtained from the IASC-ASCE SHM Task Group web site is used to calculate the dynamic response of the prototype structure. A number of 100 simulations have been made using this MATLAB-based finite element analysis code in order to evaluate the proposed identification method. There are several techniques to realize system identification. In this work, stochastic subspace identification (SSI)method has been used for comparison. SSI identification method is a well known method and computes accurate estimates of the modal parameters. The principles of the SSI identification method has been introduced in the paper and next the proposed MLE with EM algorithm has been explained in detail. The advantages of the proposed structural identification method can be summarized as follows: (i) the method is based on maximum likelihood, that implies minimum variance estimates; (ii) EM is a computational simpler estimation procedure than other optimization algorithms; (iii) estimate more parameters than SSI, and these estimates are accurate. On the contrary, the main disadvantages of the method are: (i) EM algorithm is an iterative procedure and it consumes time until convergence is reached; and (ii) this method needs starting values for the parameters. Modal parameters (eigenfrequencies, damping ratios and mode shapes) of the benchmark structure have been estimated using both the SSI method and the proposed MLE + EM method. The numerical results show that the proposed method identifies eigenfrequencies, damping ratios and mode shapes reasonably well even in the presence of 10% measurement noises. These modal parameters are more accurate than the SSI estimated modal parameters.
Resumo:
In this paper we propose a method to estimate by maximum likelihood the divergence time between two populations, specifically designed for the analysis of nonrecurrent rare mutations. Given the rapidly growing amount of data, rare disease mutations affecting humans seem the most suitable candidates for this method. The estimator RD, and its conditional version RDc, were derived, assuming that the population dynamics of rare alleles can be described by using a birth–death process approximation and that each mutation arose before the split of a common ancestral population into the two diverging populations. The RD estimator seems more suitable for large sample sizes and few alleles, whose age can be approximated, whereas the RDc estimator appears preferable when this is not the case. When applied to three cystic fibrosis mutations, the estimator RD could not exclude a very recent time of divergence among three Mediterranean populations. On the other hand, the divergence time between these populations and the Danish population was estimated to be, on the average, 4,500 or 15,000 years, assuming or not a selective advantage for cystic fibrosis carriers, respectively. Confidence intervals are large, however, and can probably be reduced only by analyzing more alleles or loci.
Resumo:
This report discusses the calculation of analytic second-order bias techniques for the maximum likelihood estimates (for short, MLEs) of the unknown parameters of the distribution in quality and reliability analysis. It is well-known that the MLEs are widely used to estimate the unknown parameters of the probability distributions due to their various desirable properties; for example, the MLEs are asymptotically unbiased, consistent, and asymptotically normal. However, many of these properties depend on an extremely large sample sizes. Those properties, such as unbiasedness, may not be valid for small or even moderate sample sizes, which are more practical in real data applications. Therefore, some bias-corrected techniques for the MLEs are desired in practice, especially when the sample size is small. Two commonly used popular techniques to reduce the bias of the MLEs, are ‘preventive’ and ‘corrective’ approaches. They both can reduce the bias of the MLEs to order O(n−2), whereas the ‘preventive’ approach does not have an explicit closed form expression. Consequently, we mainly focus on the ‘corrective’ approach in this report. To illustrate the importance of the bias-correction in practice, we apply the bias-corrected method to two popular lifetime distributions: the inverse Lindley distribution and the weighted Lindley distribution. Numerical studies based on the two distributions show that the considered bias-corrected technique is highly recommended over other commonly used estimators without bias-correction. Therefore, special attention should be paid when we estimate the unknown parameters of the probability distributions under the scenario in which the sample size is small or moderate.
Resumo:
Didanosine-loaded chitosan microspheres were developed applying a surface-response methodology and using a modified Maximum Likelihood Classification. The operational conditions were optimized with the aim of maintaining the active form of didanosine (ddI), which is sensitive to acid pH, and to develop a modified and mucoadhesive formulation. The loading of the drug within the chitosan microspheres was carried out by ionotropic gelation technique with sodium tripolyphosphate (TPP) as cross-linking agent and magnesium hydroxide (Mg(OH)2) to assure the stability of ddI. The optimization conditions were set using a surface-response methodology and applying the Maximum Likelihood Classification, where the initial chitosan concentration, TPP and ddI concentration were set as the independent variables. The maximum ddI-loaded in microspheres (i.e. 1433mg of ddI/g chitosan), was obtained with 2% (w/v) chitosan and 10% TPP. The microspheres depicted an average diameter of 11.42μm and ddI was gradually released during 2h in simulated enteric fluid.
Resumo:
The development of genetic maps for auto-incompatible species, such as the yellow passion fruit (Passiflora edulis Sims f.flavicarpa Deg.) is restricted due to the unfeasibility of obtaining traditional mapping populations based on inbred lines. For this reason, yellow passion fruit linkage maps were generally constructed using a strategy known as two-way pseudo-testeross, based on monoparental dominant markers segregating in a 1:1 fashion. Due to the lack of information from these markers in one of the parents, two individual (parental) maps were obtained. However, integration of these maps is essential, and biparental markers can be used for such an operation. The objective of our study was to construct an integrated molecular map for a full-sib population of yellow passion fruit combining different loci configuration generated from amplified fragment length polymorphisms (AFLPs) and microsatellite markers and using a novel approach based on simultaneous maximum-likelihood estimation of linkage and linkage phases, specially designed for outcrossing species. Of the total number of loci, approximate to 76%, 21%, 0.7%, and 2.3% did segregate in 1:1, 3:1, 1:2:1, and 1:1:1:1 ratios, respectively. Ten linkage groups (LGs) were established with a logarithm of the odds (LOD) score >= 5.0 assuming a recombination fraction : <= 0.35. On average, 24 markers were assigned per LG, representing a total map length of 1687 cM, with a marker density of 6.9 cM. No markers were placed as accessories on the map as was done with previously constructed individual maps.