939 resultados para Covariance.
Resumo:
Objective To explore the characteristics of regional distribution of cancer deaths in Shandong Province with the principle components analysis. Methods The principle components analysis with co-variance matrix for age-adjusted mortality rates and percentages of 20 types of cancer in 22 counties (cities) were carried out using SAS Software. Results Over 90% of the total information could be reflected by the top 3 principle components and the first principle component alone represented more than half of the overall regional variances. The first component mainly reflected the area differences of esophageal cancer. The second component mainly reflected the area differences of lung cancer, stomach cancer and liver cancer. The value of the first principal component scores showed a clear trend that the west areas possessed higher values and the east the lower values. Based on the top two components,the 22 counties (cities) could be divided into several geographical clusters. Conclusion The overall difference of regional distribution of cancers in Shandong is dominated by several major cancers including esophageal cancer, lung cancer, stomach cancer and liver cancer. Among them,esophageal cancer makes the largest contribution. If the range of counties (cities) analyzed could be further widened, the characteristics of regional distribution of cancer mortality would be better examined. Abstract in Chinese 目的 利用主成分分析探讨山东省恶性肿瘤死亡的地区分布特征. 方法 利用SAS软件对山东省22个县市区2004~2006午的20种恶性肿瘤标化死亡率和构成比分别进行协方差矩阵主成分分析. 结果 前3个主成分就反映了总体差异90%以上的信息,其中仅第1主成分就提供了总体差异一半以上的信息.第1主成分主要反映了食管癌的地区差异,第2主成分主要反映肺癌的地区差异,兼顾胃癌和肝癌.各地区第1主成分得分呈现西高东低的趋势,根据第1和第2主成分可以将调查地区分为若干类别,表现为明显的地理聚集性. 结论 山东省各地区恶性肿瘤死亡的总体差异主要取决于少数高发肿瘤,包括食管癌、肺癌、胃癌、肝癌等,其中以食管癌地位最为突出.如能进一步扩大分析范围,可更好地查明恶性肿瘤死亡的地区特征.
Resumo:
In this paper we introduce a novel domain-invariant covariance normalization (DICN) technique to relocate both in-domain and out-domain i-vectors into a third dataset-invariant space, providing an improvement for out-domain PLDA speaker verification with a very small number of unlabelled in-domain adaptation i-vectors. By capturing the dataset variance from a global mean using both development out-domain i-vectors and limited unlabelled in-domain i-vectors, we could obtain domain- invariant representations of PLDA training data. The DICN- compensated out-domain PLDA system is shown to perform as well as in-domain PLDA training with as few as 500 unlabelled in-domain i-vectors for NIST-2010 SRE and 2000 unlabelled in-domain i-vectors for NIST-2008 SRE, and considerable relative improvement over both out-domain and in-domain PLDA development if more are available.
Resumo:
Spatial data analysis has become more and more important in the studies of ecology and economics during the last decade. One focus of spatial data analysis is how to select predictors, variance functions and correlation functions. However, in general, the true covariance function is unknown and the working covariance structure is often misspecified. In this paper, our target is to find a good strategy to identify the best model from the candidate set using model selection criteria. This paper is to evaluate the ability of some information criteria (corrected Akaike information criterion, Bayesian information criterion (BIC) and residual information criterion (RIC)) for choosing the optimal model when the working correlation function, the working variance function and the working mean function are correct or misspecified. Simulations are carried out for small to moderate sample sizes. Four candidate covariance functions (exponential, Gaussian, Matern and rational quadratic) are used in simulation studies. With the summary in simulation results, we find that the misspecified working correlation structure can still capture some spatial correlation information in model fitting. When the sample size is large enough, BIC and RIC perform well even if the the working covariance is misspecified. Moreover, the performance of these information criteria is related to the average level of model fitting which can be indicated by the average adjusted R square ( [GRAPHICS] ), and overall RIC performs well.
Resumo:
We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.
Resumo:
A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given.
Resumo:
Aims: Develop and validate tools to estimate residual noise covariance in Planck frequency maps. Quantify signal error effects and compare different techniques to produce low-resolution maps. Methods: We derive analytical estimates of covariance of the residual noise contained in low-resolution maps produced using a number of map-making approaches. We test these analytical predictions using Monte Carlo simulations and their impact on angular power spectrum estimation. We use simulations to quantify the level of signal errors incurred in different resolution downgrading schemes considered in this work. Results: We find an excellent agreement between the optimal residual noise covariance matrices and Monte Carlo noise maps. For destriping map-makers, the extent of agreement is dictated by the knee frequency of the correlated noise component and the chosen baseline offset length. The significance of signal striping is shown to be insignificant when properly dealt with. In map resolution downgrading, we find that a carefully selected window function is required to reduce aliasing to the sub-percent level at multipoles, ell > 2Nside, where Nside is the HEALPix resolution parameter. We show that sufficient characterization of the residual noise is unavoidable if one is to draw reliable contraints on large scale anisotropy. Conclusions: We have described how to compute the low-resolution maps, with a controlled sky signal level, and a reliable estimate of covariance of the residual noise. We have also presented a method to smooth the residual noise covariance matrices to describe the noise correlations in smoothed, bandwidth limited maps.
Resumo:
Volatile organic compounds (VOCs) are emitted into the atmosphere from natural and anthropogenic sources, vegetation being the dominant source on a global scale. Some of these reactive compounds are deemed major contributors or inhibitors to aerosol particle formation and growth, thus making VOC measurements essential for current climate change research. This thesis discusses ecosystem scale VOC fluxes measured above a boreal Scots pine dominated forest in southern Finland. The flux measurements were performed using the micrometeorological disjunct eddy covariance (DEC) method combined with proton transfer reaction mass spectrometry (PTR-MS), which is an online technique for measuring VOC concentrations. The measurement, calibration, and calculation procedures developed in this work proved to be well suited to long-term VOC concentration and flux measurements with PTR-MS. A new averaging approach based on running averaged covariance functions improved the determination of the lag time between wind and concentration measurements, which is a common challenge in DEC when measuring fluxes near the detection limit. The ecosystem scale emissions of methanol, acetaldehyde, and acetone were substantial. These three oxygenated VOCs made up about half of the total emissions, with the rest comprised of monoterpenes. Contrary to the traditional assumption that monoterpene emissions from Scots pine originate mainly as evaporation from specialized storage pools, the DEC measurements indicated a significant contribution from de novo biosynthesis to the ecosystem scale monoterpene emissions. This thesis offers practical guidelines for long-term DEC measurements with PTR-MS. In particular, the new averaging approach to the lag time determination seems useful in the automation of DEC flux calculations. Seasonal variation in the monoterpene biosynthesis and the detailed structure of a revised hybrid algorithm, describing both de novo and pool emissions, should be determined in further studies to improve biological realism in the modelling of monoterpene emissions from Scots pine forests. The increasing number of DEC measurements of oxygenated VOCs will probably enable better estimates of the role of these compounds in plant physiology and tropospheric chemistry. Keywords: disjunct eddy covariance, lag time determination, long-term flux measurements, proton transfer reaction mass spectrometry, Scots pine forests, volatile organic compounds
Resumo:
Eddy covariance (EC)-flux measurement technique is based on measurement of turbulent motions of air with accurate and fast measurement devices. For instance, in order to measure methane flux a fast methane gas analyser is needed which measures methane concentration at least ten times in a second in addition to a sonic anemometer, which measures the three wind components with the same sampling interval. Previously measurement of methane flux was almost impossible to carry out with EC-technique due to lack of fast enough gas analysers. However during the last decade new instruments have been developed and thus methane EC-flux measurements have become more common. Performance of four methane gas analysers suitable for eddy covariance measurements are assessed in this thesis. The assessment and comparison was performed by analysing EC-data obtained during summer 2010 (1.4.-26.10.) at Siikaneva fen. The four participating methane gas analysers are TGA-100A (Campbell Scientific Inc., USA), RMT-200 (Los Gatos Research, USA), G1301-f (Picarro Inc., USA) and Prototype-7700 (LI-COR Biosciences, USA). RMT-200 functioned most reliably throughout the measurement campaign and the corresponding methane flux data had the smallest random error. In addition, methane fluxes calculated from data obtained from G1301-f and RMT-200 agree remarkably well throughout the measurement campaign. The calculated cospectra and power spectra agree well with corresponding temperature spectra. Prototype-7700 functioned only slightly over one month in the beginning of the measurement campaign and thus its accuracy and long-term performance is difficult to assess.
Resumo:
This paper addresses the problem of separation of pitched sounds in monaural recordings. We present a novel feature for the estimation of parameters of overlapping harmonics which considers the covariance of partials of pitched sounds. Sound templates are formed from the monophonic parts of the mixture recording. A match for every note is found among these templates on the basis of covariance profile of their harmonics. The matching template for the note provides the second order characteristics for the overlapped harmonics of the note. The algorithm is tested on the RWC music database instrument sounds. The results clearly show that the covariance characteristics can be used to reconstruct overlapping harmonics effectively.
Resumo:
We consider the problem of extracting a signature representation of similar entities employing covariance descriptors. Covariance descriptors can efficiently represent objects and are robust to scale and pose changes. We posit that covariance descriptors corresponding to similar objects share a common geometrical structure which can be extracted through joint diagonalization. We term this diagonalizing matrix as the Covariance Profile (CP). CP can be used to measure the distance of a novel object to an object set through the diagonality measure. We demonstrate how CP can be employed on images as well as for videos, for applications such as face recognition and object-track clustering.
Resumo:
High wind poses a number of hazards in different areas such as structural safety, aviation, and wind energy-where low wind speed is also a concern, pollutant transport, to name a few. Therefore, usage of a good prediction tool for wind speed is necessary in these areas. Like many other natural processes, behavior of wind is also associated with considerable uncertainties stemming from different sources. Therefore, to develop a reliable prediction tool for wind speed, these uncertainties should be taken into account. In this work, we propose a probabilistic framework for prediction of wind speed from measured spatio-temporal data. The framework is based on decompositions of spatio-temporal covariance and simulation using these decompositions. A novel simulation method based on a tensor decomposition is used here in this context. The proposed framework is composed of a set of four modules, and the modules have flexibility to accommodate further modifications. This framework is applied on measured data on wind speed in Ireland. Both short-and long-term predictions are addressed.
Resumo:
Variable selection for regression is a classical statistical problem, motivated by concerns that too large a number of covariates may bring about overfitting and unnecessarily high measurement costs. Novel difficulties arise in streaming contexts, where the correlation structure of the process may be drifting, in which case it must be constantly tracked so that selections may be revised accordingly. A particularly interesting phenomenon is that non-selected covariates become missing variables, inducing bias on subsequent decisions. This raises an intricate exploration-exploitation tradeoff, whose dependence on the covariance tracking algorithm and the choice of variable selection scheme is too complex to be dealt with analytically. We hence capitalise on the strength of simulations to explore this problem, taking the opportunity to tackle the difficult task of simulating dynamic correlation structures. © 2008 IEEE.