Biblioteca Digital

29 resultados para Population set-based methods

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain

BadiRate: Estimating Family Turnover Rates by Likelihood-Based Methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.

BadiRate: estimating family turnover rates by likelihood-based methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.

Introduction of sensor spectral response into image fusion methods. Application to wavelet-based methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Usual image fusion methods inject features from a high spatial resolution panchromatic sensor into every low spatial resolution multispectral band trying to preserve spectral signatures and improve spatial resolution to that of the panchromatic sensor. The objective is to obtain the image that would be observed by a sensor with the same spectral response (i.e., spectral sensitivity and quantum efficiency) as the multispectral sensors and the spatial resolution of the panchromatic sensor. But in these methods, features from electromagnetic spectrum regions not covered by multispectral sensors are injected into them, and physical spectral responses of the sensors are not considered during this process. This produces some undesirable effects, such as resolution overinjection images and slightly modified spectral signatures in some features. The authors present a technique which takes into account the physical electromagnetic spectrum responses of sensors during the fusion process, which produces images closer to the image obtained by the ideal sensor than those obtained by usual wavelet-based image fusion methods. This technique is used to define a new wavelet-based image fusion method.

Determining shoal membership: A comparison between momentary and trajectory-based methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Miller and Gerlai proposed two methods for determining shoal membership in Danio rerio, one based on momentary mean inter-individual distances and the other on post hoc analysis of the trajectories of nearest-neighbor distances. We propose a method based on momentary nearest-neighbor distances and compare the three methods using simulation. In general, our method yielded results that were more similar to their second method than their first one, and is computationally simpler.

Intramolecular basis set superposition error effects on the planarity of benzene and other aromatic molecules: A solution to the problem

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, the surprising result that ab initio calculations on benzene and other planar arenes at correlated MP2, MP3, configuration interaction with singles and doubles (CISD), and coupled cluster with singles and doubles levels of theory using standard Pople’s basis sets yield nonplanar minima has been reported. The planar optimized structures turn out to be transition states presenting one or more large imaginary frequencies, whereas single-determinant-based methods lead to the expected planar minima and no imaginary frequencies. It has been suggested that such anomalous behavior can be originated by two-electron basis set incompleteness error. In this work, we show that the reported pitfalls can be interpreted in terms of intramolecular basis set superposition error (BSSE) effects, mostly between the C–H moieties constituting the arenes. We have carried out counterpoise-corrected optimizations and frequency calculations at the Hartree–Fock, B3LYP, MP2, and CISD levels of theory with several basis sets for a number of arenes. In all cases, correcting for intramolecular BSSE fixes the anomalous behavior of the correlated methods, whereas no significant differences are observed in the single-determinant case. Consequently, all systems studied are planar at all levels of theory. The effect of different intramolecular fragment definitions and the particular case of charged species, namely, cyclopentadienyl and indenyl anions, respectively, are also discussed

Minimizing recombinations in consensus networks for phylogeographic studies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: We address the problem of studying recombinational variations in (human) populations. In this paper, our focus is on one computational aspect of the general task: Given two networks G1 and G2, with both mutation and recombination events, defined on overlapping sets of extant units the objective is to compute a consensus network G3 with minimum number of additional recombinations. We describe a polynomial time algorithm with a guarantee that the number of computed new recombination events is within ϵ = sz(G1, G2) (function sz is a well-behaved function of the sizes and topologies of G1 and G2) of the optimal number of recombinations. To date, this is the best known result for a network consensus problem.Results: Although the network consensus problem can be applied to a variety of domains, here we focus on structure of human populations. With our preliminary analysis on a segment of the human Chromosome X data we are able to infer ancient recombinations, population-specific recombinations and more, which also support the widely accepted 'Out of Africa' model. These results have been verified independently using traditional manual procedures. To the best of our knowledge, this is the first recombinations-based characterization of human populations. Conclusion: We show that our mathematical model identifies recombination spots in the individual haplotypes; the aggregate of these spots over a set of haplotypes defines a recombinational landscape that has enough signal to detect continental as well as population divide based on a short segment of Chromosome X. In particular, we are able to infer ancient recombinations, population-specific recombinations and more, which also support the widely accepted 'Out of Africa' model. The agreement with mutation-based analysis can be viewed as an indirect validation of our results and the model. Since the model in principle gives us more information embedded in the networks, in our future work, we plan to investigate more non-traditional questions via these structures computed by our methodology.

Geometry-based demosaicking

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Demosaicking is a particular case of interpolation problems where, from a scalar image in which each pixel has either the red, the green or the blue component, we want to interpolate the full-color image. State-of-the-art demosaicking algorithms perform interpolation along edges, but these edges are estimated locally. We propose a level-set-based geometric method to estimate image edges, inspired by the image in-painting literature. This method has a time complexity of O(S) , where S is the number of pixels in the image, and compares favorably with the state-of-the-art algorithms both visually and in most relevant image quality measures.

On the performance of small-area estimators: Fixed vs. random area parameters

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most methods for small-area estimation are based on composite estimators derived from design- or model-based methods. A composite estimator is a linear combination of a direct and an indirect estimator with weights that usually depend on unknown parameters which need to be estimated. Although model-based small-area estimators are usually based on random-effects models, the assumption of fixed effects is at face value more appropriate.Model-based estimators are justified by the assumption of random (interchangeable) area effects; in practice, however, areas are not interchangeable. In the present paper we empirically assess the quality of several small-area estimators in the setting in which the area effects are treated as fixed. We consider two settings: one that draws samples from a theoretical population, and another that draws samples from an empirical population of a labor force register maintained by the National Institute of Social Security (NISS) of Catalonia. We distinguish two types of composite estimators: a) those that use weights that involve area specific estimates of bias and variance; and, b) those that use weights that involve a common variance and a common squared bias estimate for all the areas. We assess their precision and discuss alternatives to optimizing composite estimation in applications.

The importance of individual heterogeneity in the decomposition of measures of socioeconomic inequality in health: An approach based on quantile regression

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper shows how recently developed regression-based methods for thedecomposition of health inequality can be extended to incorporateindividual heterogeneity in the responses of health to the explanatoryvariables. We illustrate our method with an application to the CanadianNPHS of 1994. Our strategy for the estimation of heterogeneous responsesis based on the quantile regression model. The results suggest that thereis an important degree of heterogeneity in the association of health toexplanatory variables which, in turn, accounts for a substantial percentageof inequality in observed health. A particularly interesting finding isthat the marginal response of health to income is zero for healthyindividuals but positive and significant for unhealthy individuals. Theheterogeneity in the income response reduces both overall health inequalityand income related health inequality.

Exemplar-based interpolation of sparsely sampled images

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A nonlocal variational formulation for interpolating a sparsel sampled image is introduced in this paper. The proposed variational formulation, originally motivated by image inpainting problems, encouragesthe transfer of information between similar image patches, following the paradigm of exemplar-based methods. Contrary to the classical inpaintingproblem, no complete patches are available from the sparse imagesamples, and the patch similarity criterion has to be redefined as here proposed. Initial experimental results with the proposed framework, at very low sampling densities, are very encouraging. We also explore somedepartures from the variational setting, showing a remarkable ability to recover textures at low sampling densities.

Kernel-PCA data integration with enhanced interpretability

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Kernel-PCA data integration with enhanced interpretability

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Avaluació dels nivells de nitrats i duresa a l'aigua de consum a quatre zones d'Espanya participants a l'estudi epidemiològic Infància i Medi Ambientl (INMA)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L’aigua és un dels components bàsics per a la vida i una font d’exposició a contaminants ubiqua, ja que tota la població en consumeix. L’estudi epidemiològic INMA avaluarà si l’exposició a nitrats durant l’embaràs i a la duresa de l’aigua durant la infància es relaciona amb el baix pes al néixer i l’èczema atòpica, respectivament. Objectiu: Fer una avaluació dels nivells de nitrats i duresa de l’aigua en aigua de consum de la població de l’estudi INMA. Metodologia: l’estudi descriptiu realitzat a quatre de les set cohorts INMA, a Astúries, Guipúscoa, Sabadell i València. S’ha recopilat dades dels nivells de nitrats i duresa a l’aigua de consum dels municipis durant el període d’interès (2003 al 2008 i 2004 al 2012), a través d’ajuntaments i companyies d’aigua. S’ha calculat la mitjana, la desviació estàndard, el màxim i el mínim dels nivells de nitrat i de duresa en total i segons l’àrea geogràfica, l’any i l’estació. A Sabadell s’han fet tres mostrejos d’aigua per analitzar la duresa a diferents punts de la ciutat. Resultats: el nivell promig de nitrats (mg/L NO3-) és de 4,2 a Astúries, 4,0 a Guipúscoa, 9,2 a Sabadell i 15,2 a València. El nivell promig de duresa (mg/L CaCO3) és de 89,1 a Astúries, 132,7 al Guipúscoa, 178,3 a València i 230,9 a Sabadell. En l’anàlisi que es va realitzar a Sabadell, es detecta una duresa lleugerament inferior a la reportada sense variabilitat geogràfica. No s’observa una pauta clara de variabilitat estacional ni de variabilitat temporal tant per nitrats com per duresa. Conclusions: S’ha detectat variabilitat en els nivells de nitrats i duresa de l’aigua a les zones d’estudi. Els nivells de nitrats són moderats i els més alts es troben a zones agrícoles de València. La duresa de l’aigua és força alta degut al domini calcari dels subsòls de les zones d’estudi.

Indirect likelihood inference (revised)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Standard indirect Inference (II) estimators take a given finite-dimensional statistic, Z_{n} , and then estimate the parameters by matching the sample statistic with the model-implied population moment. We here propose a novel estimation method that utilizes all available information contained in the distribution of Z_{n} , not just its first moment. This is done by computing the likelihood of Z_{n}, and then estimating the parameters by either maximizing the likelihood or computing the posterior mean for a given prior of the parameters. These are referred to as the maximum indirect likelihood (MIL) and Bayesian Indirect Likelihood (BIL) estimators, respectively. We show that the IL estimators are first-order equivalent to the corresponding moment-based II estimator that employs the optimal weighting matrix. However, due to higher-order features of Z_{n} , the IL estimators are higher order efficient relative to the standard II estimator. The likelihood of Z_{n} will in general be unknown and so simulated versions of IL estimators are developed. Monte Carlo results for a structural auction model and a DSGE model show that the proposed estimators indeed have attractive finite sample properties.

Estimation of parametric and nonparametric models for univariate claim severity distributions - an approach using R

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described

«
1
2
»