67 resultados para Real data

em Université de Lausanne, Switzerland


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Analyzing functional data often leads to finding common factors, for which functional principal component analysis proves to be a useful tool to summarize and characterize the random variation in a function space. The representation in terms of eigenfunctions is optimal in the sense of L-2 approximation. However, the eigenfunctions are not always directed towards an interesting and interpretable direction in the context of functional data and thus could obscure the underlying structure. To overcome such difficulty, an alternative to functional principal component analysis is proposed that produces directed components which may be more informative and easier to interpret. These structural components are similar to principal components, but are adapted to situations in which the domain of the function may be decomposed into disjoint intervals such that there is effectively independence between intervals and positive correlation within intervals. The approach is demonstrated with synthetic examples as well as real data. Properties for special cases are also studied.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Evaluation of segmentation methods is a crucial aspect in image processing, especially in the medical imaging field, where small differences between segmented regions in the anatomy can be of paramount importance. Usually, segmentation evaluation is based on a measure that depends on the number of segmented voxels inside and outside of some reference regions that are called gold standards. Although some other measures have been also used, in this work we propose a set of new similarity measures, based on different features, such as the location and intensity values of the misclassified voxels, and the connectivity and the boundaries of the segmented data. Using the multidimensional information provided by these measures, we propose a new evaluation method whose results are visualized applying a Principal Component Analysis of the data, obtaining a simplified graphical method to compare different segmentation results. We have carried out an intensive study using several classic segmentation methods applied to a set of MRI simulated data of the brain with several noise and RF inhomogeneity levels, and also to real data, showing that the new measures proposed here and the results that we have obtained from the multidimensional evaluation, improve the robustness of the evaluation and provides better understanding about the difference between segmentation methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Genome-wide association studies have been instrumental in identifying genetic variants associated with complex traits such as human disease or gene expression phenotypes. It has been proposed that extending existing analysis methods by considering interactions between pairs of loci may uncover additional genetic effects. However, the large number of possible two-marker tests presents significant computational and statistical challenges. Although several strategies to detect epistasis effects have been proposed and tested for specific phenotypes, so far there has been no systematic attempt to compare their performance using real data. We made use of thousands of gene expression traits from linkage and eQTL studies, to compare the performance of different strategies. We found that using information from marginal associations between markers and phenotypes to detect epistatic effects yielded a lower false discovery rate (FDR) than a strategy solely using biological annotation in yeast, whereas results from human data were inconclusive. For future studies whose aim is to discover epistatic effects, we recommend incorporating information about marginal associations between SNPs and phenotypes instead of relying solely on biological annotation. Improved methods to discover epistatic effects will result in a more complete understanding of complex genetic effects.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Genotypic frequencies at codominant marker loci in population samples convey information on mating systems. A classical way to extract this information is to measure heterozygote deficiencies (FIS) and obtain the selfing rate s from FIS = s/(2 - s), assuming inbreeding equilibrium. A major drawback is that heterozygote deficiencies are often present without selfing, owing largely to technical artefacts such as null alleles or partial dominance. We show here that, in the absence of gametic disequilibrium, the multilocus structure can be used to derive estimates of s independent of FIS and free of technical biases. Their statistical power and precision are comparable to those of FIS, although they are sensitive to certain types of gametic disequilibria, a bias shared with progeny-array methods but not FIS. We analyse four real data sets spanning a range of mating systems. In two examples, we obtain s = 0 despite positive FIS, strongly suggesting that the latter are artefactual. In the remaining examples, all estimates are consistent. All the computations have been implemented in a open-access and user-friendly software called rmes (robust multilocus estimate of selfing) available at http://ftp.cefe.cnrs.fr, and can be used on any multilocus data. Being able to extract the reliable information from imperfect data, our method opens the way to make use of the ever-growing number of published population genetic studies, in addition to the more demanding progeny-array approaches, to investigate selfing rates.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Typically MEG source reconstruction is used to estimate the distribution of current flow on a single anatomically derived cortical surface model. In this study we use two such models representing superficial and deep cortical laminae. We establish how well we can discriminate between these two different cortical layer models based on the same MEG data in the presence of different levels of co-registration noise, Signal-to-Noise Ratio (SNR) and cortical patch size. We demonstrate that it is possible to make a distinction between superficial and deep cortical laminae for levels of co-registration noise of less than 2mm translation and 2° rotation at SNR>11dB. We also show that an incorrect estimate of cortical patch size will tend to bias layer estimates. We then use a 3D printed head-cast (Troebinger et al., 2014) to achieve comparable levels of co-registration noise, in an auditory evoked response paradigm, and show that it is possible to discriminate between these cortical layer models in real data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Diffusion MRI is a well established imaging modality providing a powerful way to probe the structure of the white matter non-invasively. Despite its potential, the intrinsic long scan times of these sequences have hampered their use in clinical practice. For this reason, a large variety of methods have been recently proposed to shorten the acquisition times. Among them, spherical deconvolution approaches have gained a lot of interest for their ability to reliably recover the intra-voxel fiber configuration with a relatively small number of data samples. To overcome the intrinsic instabilities of deconvolution, these methods use regularization schemes generally based on the assumption that the fiber orientation distribution (FOD) to be recovered in each voxel is sparse. The well known Constrained Spherical Deconvolution (CSD) approach resorts to Tikhonov regularization, based on an ℓ(2)-norm prior, which promotes a weak version of sparsity. Also, in the last few years compressed sensing has been advocated to further accelerate the acquisitions and ℓ(1)-norm minimization is generally employed as a means to promote sparsity in the recovered FODs. In this paper, we provide evidence that the use of an ℓ(1)-norm prior to regularize this class of problems is somewhat inconsistent with the fact that the fiber compartments all sum up to unity. To overcome this ℓ(1) inconsistency while simultaneously exploiting sparsity more optimally than through an ℓ(2) prior, we reformulate the reconstruction problem as a constrained formulation between a data term and a sparsity prior consisting in an explicit bound on the ℓ(0)norm of the FOD, i.e. on the number of fibers. The method has been tested both on synthetic and real data. Experimental results show that the proposed ℓ(0) formulation significantly reduces modeling errors compared to the state-of-the-art ℓ(2) and ℓ(1) regularization approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mendelian randomization refers to the random allocation of alleles at the time of gamete formation. In observational epidemiology, this refers to the use of genetic variants to estimate a causal effect between a modifiable risk factor and an outcome of interest. In this review, we recall the principles of a "Mendelian randomization" approach in observational epidemiology, which is based on the technique of instrumental variables; we provide simulations and an example based on real data to demonstrate its implications; we present the results of a systematic search on original articles having used this approach; and we discuss some limitations of this approach in view of what has been found so far.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We present a method for segmenting white matter tracts from high angular resolution diffusion MR. images by representing the data in a 5 dimensional space of position and orientation. Whereas crossing fiber tracts cannot be separated in 3D position space, they clearly disentangle in 5D position-orientation space. The segmentation is done using a 5D level set method applied to hyper-surfaces evolving in 5D position-orientation space. In this paper we present a methodology for constructing the position-orientation space. We then show how to implement the standard level set method in such a non-Euclidean high dimensional space. The level set theory is basically defined for N-dimensions but there are several practical implementation details to consider, such as mean curvature. Finally, we will show results from a synthetic model and a few preliminary results on real data of a human brain acquired by high angular resolution diffusion MRI.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cette thèse s'intéresse à étudier les propriétés extrémales de certains modèles de risque d'intérêt dans diverses applications de l'assurance, de la finance et des statistiques. Cette thèse se développe selon deux axes principaux, à savoir: Dans la première partie, nous nous concentrons sur deux modèles de risques univariés, c'est-à- dire, un modèle de risque de déflation et un modèle de risque de réassurance. Nous étudions le développement des queues de distribution sous certaines conditions des risques commun¬s. Les principaux résultats sont ainsi illustrés par des exemples typiques et des simulations numériques. Enfin, les résultats sont appliqués aux domaines des assurances, par exemple, les approximations de Value-at-Risk, d'espérance conditionnelle unilatérale etc. La deuxième partie de cette thèse est consacrée à trois modèles à deux variables: Le premier modèle concerne la censure à deux variables des événements extrême. Pour ce modèle, nous proposons tout d'abord une classe d'estimateurs pour les coefficients de dépendance et la probabilité des queues de distributions. Ces estimateurs sont flexibles en raison d'un paramètre de réglage. Leurs distributions asymptotiques sont obtenues sous certaines condi¬tions lentes bivariées de second ordre. Ensuite, nous donnons quelques exemples et présentons une petite étude de simulations de Monte Carlo, suivie par une application sur un ensemble de données réelles d'assurance. L'objectif de notre deuxième modèle de risque à deux variables est l'étude de coefficients de dépendance des queues de distributions obliques et asymétriques à deux variables. Ces distri¬butions obliques et asymétriques sont largement utiles dans les applications statistiques. Elles sont générées principalement par le mélange moyenne-variance de lois normales et le mélange de lois normales asymétriques d'échelles, qui distinguent la structure de dépendance de queue comme indiqué par nos principaux résultats. Le troisième modèle de risque à deux variables concerne le rapprochement des maxima de séries triangulaires elliptiques obliques. Les résultats théoriques sont fondés sur certaines hypothèses concernant le périmètre aléatoire sous-jacent des queues de distributions. -- This thesis aims to investigate the extremal properties of certain risk models of interest in vari¬ous applications from insurance, finance and statistics. This thesis develops along two principal lines, namely: In the first part, we focus on two univariate risk models, i.e., deflated risk and reinsurance risk models. Therein we investigate their tail expansions under certain tail conditions of the common risks. Our main results are illustrated by some typical examples and numerical simu¬lations as well. Finally, the findings are formulated into some applications in insurance fields, for instance, the approximations of Value-at-Risk, conditional tail expectations etc. The second part of this thesis is devoted to the following three bivariate models: The first model is concerned with bivariate censoring of extreme events. For this model, we first propose a class of estimators for both tail dependence coefficient and tail probability. These estimators are flexible due to a tuning parameter and their asymptotic distributions are obtained under some second order bivariate slowly varying conditions of the model. Then, we give some examples and present a small Monte Carlo simulation study followed by an application on a real-data set from insurance. The objective of our second bivariate risk model is the investigation of tail dependence coefficient of bivariate skew slash distributions. Such skew slash distributions are extensively useful in statistical applications and they are generated mainly by normal mean-variance mixture and scaled skew-normal mixture, which distinguish the tail dependence structure as shown by our principle results. The third bivariate risk model is concerned with the approximation of the component-wise maxima of skew elliptical triangular arrays. The theoretical results are based on certain tail assumptions on the underlying random radius.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MOTIVATION: Supporting the functionality of recent duplicate gene copies is usually difficult, owing to high sequence similarity between duplicate counterparts and shallow phylogenies, which hamper both the statistical and experimental inference. RESULTS: We developed an integrated evolutionary approach to identify functional duplicate gene copies and other lineage-specific genes. By repeatedly simulating neutral evolution, our method estimates the probability that an ORF was selectively conserved and is therefore likely to represent a bona fide coding region. In parallel, our method tests whether the accumulation of non-synonymous substitutions reveals signatures of selective constraint. We show that our approach has high power to identify functional lineage-specific genes using simulated and real data. For example, a coding region of average length (approximately 1400 bp), restricted to hominoids, can be predicted to be functional in approximately 94-100% of cases. Notably, the method may support functionality for instances where classical selection tests based on the ratio of non-synonymous to synonymous substitutions fail to reveal signatures of selection. Our method is available as an automated tool, ReEVOLVER, which will also be useful to systematically detect functional lineage-specific genes of closely related species on a large scale. AVAILABILITY: ReEVOLVER is available at http://www.unil.ch/cig/page7858.html.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents and discusses the use of Bayesian procedures - introduced through the use of Bayesian networks in Part I of this series of papers - for 'learning' probabilities from data. The discussion will relate to a set of real data on characteristics of black toners commonly used in printing and copying devices. Particular attention is drawn to the incorporation of the proposed procedures as an integral part in probabilistic inference schemes (notably in the form of Bayesian networks) that are intended to address uncertainties related to particular propositions of interest (e.g., whether or not a sample originates from a particular source). The conceptual tenets of the proposed methodologies are presented along with aspects of their practical implementation using currently available Bayesian network software.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

SummaryDiscrete data arise in various research fields, typically when the observations are count data.I propose a robust and efficient parametric procedure for estimation of discrete distributions. The estimation is done in two phases. First, a very robust, but possibly inefficient, estimate of the model parameters is computed and used to indentify outliers. Then the outliers are either removed from the sample or given low weights, and a weighted maximum likelihood estimate (WML) is computed.The weights are determined via an adaptive process such that if the data follow the model, then asymptotically no observation is downweighted.I prove that the final estimator inherits the breakdown point of the initial one, and that its influence function at the model is the same as the influence function of the maximum likelihood estimator, which strongly suggests that it is asymptotically fully efficient.The initial estimator is a minimum disparity estimator (MDE). MDEs can be shown to have full asymptotic efficiency, and some MDEs have very high breakdown points and very low bias under contamination. Several initial estimators are considered, and the performances of the WMLs based on each of them are studied.It results that in a great variety of situations the WML substantially improves the initial estimator, both in terms of finite sample mean square error and in terms of bias under contamination. Besides, the performances of the WML are rather stable under a change of the MDE even if the MDEs have very different behaviors.Two examples of application of the WML to real data are considered. In both of them, the necessity for a robust estimator is clear: the maximum likelihood estimator is badly corrupted by the presence of a few outliers.This procedure is particularly natural in the discrete distribution setting, but could be extended to the continuous case, for which a possible procedure is sketched.RésuméLes données discrètes sont présentes dans différents domaines de recherche, en particulier lorsque les observations sont des comptages.Je propose une méthode paramétrique robuste et efficace pour l'estimation de distributions discrètes. L'estimation est faite en deux phases. Tout d'abord, un estimateur très robuste des paramètres du modèle est calculé, et utilisé pour la détection des données aberrantes (outliers). Cet estimateur n'est pas nécessairement efficace. Ensuite, soit les outliers sont retirés de l'échantillon, soit des faibles poids leur sont attribués, et un estimateur du maximum de vraisemblance pondéré (WML) est calculé.Les poids sont déterminés via un processus adaptif, tel qu'asymptotiquement, si les données suivent le modèle, aucune observation n'est dépondérée.Je prouve que le point de rupture de l'estimateur final est au moins aussi élevé que celui de l'estimateur initial, et que sa fonction d'influence au modèle est la même que celle du maximum de vraisemblance, ce qui suggère que cet estimateur est pleinement efficace asymptotiquement.L'estimateur initial est un estimateur de disparité minimale (MDE). Les MDE sont asymptotiquement pleinement efficaces, et certains d'entre eux ont un point de rupture très élevé et un très faible biais sous contamination. J'étudie les performances du WML basé sur différents MDEs.Le résultat est que dans une grande variété de situations le WML améliore largement les performances de l'estimateur initial, autant en terme du carré moyen de l'erreur que du biais sous contamination. De plus, les performances du WML restent assez stables lorsqu'on change l'estimateur initial, même si les différents MDEs ont des comportements très différents.Je considère deux exemples d'application du WML à des données réelles, où la nécessité d'un estimateur robuste est manifeste : l'estimateur du maximum de vraisemblance est fortement corrompu par la présence de quelques outliers.La méthode proposée est particulièrement naturelle dans le cadre des distributions discrètes, mais pourrait être étendue au cas continu.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A new strategy for incremental building of multilayer feedforward neural networks is proposed in the context of approximation of functions from R-p to R-q using noisy data. A stopping criterion based on the properties of the noise is also proposed. Experimental results for both artificial and real data are performed and two alternatives of the proposed construction strategy are compared.