166 resultados para Doubly robust estimation

em Université de Lausanne, Switzerland


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the past 20 years the theory of robust estimation has become an important topic of mathematical statistics. We discuss here some basic concepts of this theory with the help of simple examples. Furthermore we describe a subroutine library for the application of robust statistical procedures, which was developed with the support of the Swiss National Science Foundation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Abstract : In the subject of fingerprints, the rise of computers tools made it possible to create powerful automated search algorithms. These algorithms allow, inter alia, to compare a fingermark to a fingerprint database and therefore to establish a link between the mark and a known source. With the growth of the capacities of these systems and of data storage, as well as increasing collaboration between police services on the international level, the size of these databases increases. The current challenge for the field of fingerprint identification consists of the growth of these databases, which makes it possible to find impressions that are very similar but coming from distinct fingers. However and simultaneously, this data and these systems allow a description of the variability between different impressions from a same finger and between impressions from different fingers. This statistical description of the withinand between-finger variabilities computed on the basis of minutiae and their relative positions can then be utilized in a statistical approach to interpretation. The computation of a likelihood ratio, employing simultaneously the comparison between the mark and the print of the case, the within-variability of the suspects' finger and the between-variability of the mark with respect to a database, can then be based on representative data. Thus, these data allow an evaluation which may be more detailed than that obtained by the application of rules established long before the advent of these large databases or by the specialists experience. The goal of the present thesis is to evaluate likelihood ratios, computed based on the scores of an automated fingerprint identification system when the source of the tested and compared marks is known. These ratios must support the hypothesis which it is known to be true. Moreover, they should support this hypothesis more and more strongly with the addition of information in the form of additional minutiae. For the modeling of within- and between-variability, the necessary data were defined, and acquired for one finger of a first donor, and two fingers of a second donor. The database used for between-variability includes approximately 600000 inked prints. The minimal number of observations necessary for a robust estimation was determined for the two distributions used. Factors which influence these distributions were also analyzed: the number of minutiae included in the configuration and the configuration as such for both distributions, as well as the finger number and the general pattern for between-variability, and the orientation of the minutiae for within-variability. In the present study, the only factor for which no influence has been shown is the orientation of minutiae The results show that the likelihood ratios resulting from the use of the scores of an AFIS can be used for evaluation. Relatively low rates of likelihood ratios supporting the hypothesis known to be false have been obtained. The maximum rate of likelihood ratios supporting the hypothesis that the two impressions were left by the same finger when the impressions came from different fingers obtained is of 5.2 %, for a configuration of 6 minutiae. When a 7th then an 8th minutia are added, this rate lowers to 3.2 %, then to 0.8 %. In parallel, for these same configurations, the likelihood ratios obtained are on average of the order of 100,1000, and 10000 for 6,7 and 8 minutiae when the two impressions come from the same finger. These likelihood ratios can therefore be an important aid for decision making. Both positive evolutions linked to the addition of minutiae (a drop in the rates of likelihood ratios which can lead to an erroneous decision and an increase in the value of the likelihood ratio) were observed in a systematic way within the framework of the study. Approximations based on 3 scores for within-variability and on 10 scores for between-variability were found, and showed satisfactory results. Résumé : Dans le domaine des empreintes digitales, l'essor des outils informatisés a permis de créer de puissants algorithmes de recherche automatique. Ces algorithmes permettent, entre autres, de comparer une trace à une banque de données d'empreintes digitales de source connue. Ainsi, le lien entre la trace et l'une de ces sources peut être établi. Avec la croissance des capacités de ces systèmes, des potentiels de stockage de données, ainsi qu'avec une collaboration accrue au niveau international entre les services de police, la taille des banques de données augmente. Le défi actuel pour le domaine de l'identification par empreintes digitales consiste en la croissance de ces banques de données, qui peut permettre de trouver des impressions très similaires mais provenant de doigts distincts. Toutefois et simultanément, ces données et ces systèmes permettent une description des variabilités entre différentes appositions d'un même doigt, et entre les appositions de différents doigts, basées sur des larges quantités de données. Cette description statistique de l'intra- et de l'intervariabilité calculée à partir des minuties et de leurs positions relatives va s'insérer dans une approche d'interprétation probabiliste. Le calcul d'un rapport de vraisemblance, qui fait intervenir simultanément la comparaison entre la trace et l'empreinte du cas, ainsi que l'intravariabilité du doigt du suspect et l'intervariabilité de la trace par rapport à une banque de données, peut alors se baser sur des jeux de données représentatifs. Ainsi, ces données permettent d'aboutir à une évaluation beaucoup plus fine que celle obtenue par l'application de règles établies bien avant l'avènement de ces grandes banques ou par la seule expérience du spécialiste. L'objectif de la présente thèse est d'évaluer des rapports de vraisemblance calcul és à partir des scores d'un système automatique lorsqu'on connaît la source des traces testées et comparées. Ces rapports doivent soutenir l'hypothèse dont il est connu qu'elle est vraie. De plus, ils devraient soutenir de plus en plus fortement cette hypothèse avec l'ajout d'information sous la forme de minuties additionnelles. Pour la modélisation de l'intra- et l'intervariabilité, les données nécessaires ont été définies, et acquises pour un doigt d'un premier donneur, et deux doigts d'un second donneur. La banque de données utilisée pour l'intervariabilité inclut environ 600000 empreintes encrées. Le nombre minimal d'observations nécessaire pour une estimation robuste a été déterminé pour les deux distributions utilisées. Des facteurs qui influencent ces distributions ont, par la suite, été analysés: le nombre de minuties inclus dans la configuration et la configuration en tant que telle pour les deux distributions, ainsi que le numéro du doigt et le dessin général pour l'intervariabilité, et la orientation des minuties pour l'intravariabilité. Parmi tous ces facteurs, l'orientation des minuties est le seul dont une influence n'a pas été démontrée dans la présente étude. Les résultats montrent que les rapports de vraisemblance issus de l'utilisation des scores de l'AFIS peuvent être utilisés à des fins évaluatifs. Des taux de rapports de vraisemblance relativement bas soutiennent l'hypothèse que l'on sait fausse. Le taux maximal de rapports de vraisemblance soutenant l'hypothèse que les deux impressions aient été laissées par le même doigt alors qu'en réalité les impressions viennent de doigts différents obtenu est de 5.2%, pour une configuration de 6 minuties. Lorsqu'une 7ème puis une 8ème minutie sont ajoutées, ce taux baisse d'abord à 3.2%, puis à 0.8%. Parallèlement, pour ces mêmes configurations, les rapports de vraisemblance sont en moyenne de l'ordre de 100, 1000, et 10000 pour 6, 7 et 8 minuties lorsque les deux impressions proviennent du même doigt. Ces rapports de vraisemblance peuvent donc apporter un soutien important à la prise de décision. Les deux évolutions positives liées à l'ajout de minuties (baisse des taux qui peuvent amener à une décision erronée et augmentation de la valeur du rapport de vraisemblance) ont été observées de façon systématique dans le cadre de l'étude. Des approximations basées sur 3 scores pour l'intravariabilité et sur 10 scores pour l'intervariabilité ont été trouvées, et ont montré des résultats satisfaisants.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Over thirty years ago, Leamer (1983) - among many others - expressed doubts about the quality and usefulness of empirical analyses for the economic profession by stating that "hardly anyone takes data analyses seriously. Or perhaps more accurately, hardly anyone takes anyone else's data analyses seriously" (p.37). Improvements in data quality, more robust estimation methods and the evolution of better research designs seem to make that assertion no longer justifiable (see Angrist and Pischke (2010) for a recent response to Leamer's essay). The economic profes- sion and policy makers alike often rely on empirical evidence as a means to investigate policy relevant questions. The approach of using scientifically rigorous and systematic evidence to identify policies and programs that are capable of improving policy-relevant outcomes is known under the increasingly popular notion of evidence-based policy. Evidence-based economic policy often relies on randomized or quasi-natural experiments in order to identify causal effects of policies. These can require relatively strong assumptions or raise concerns of external validity. In the context of this thesis, potential concerns are for example endogeneity of policy reforms with respect to the business cycle in the first chapter, the trade-off between precision and bias in the regression-discontinuity setting in chapter 2 or non-representativeness of the sample due to self-selection in chapter 3. While the identification strategies are very useful to gain insights into the causal effects of specific policy questions, transforming the evidence into concrete policy conclusions can be challenging. Policy develop- ment should therefore rely on the systematic evidence of a whole body of research on a specific policy question rather than on a single analysis. In this sense, this thesis cannot and should not be viewed as a comprehensive analysis of specific policy issues but rather as a first step towards a better understanding of certain aspects of a policy question. The thesis applies new and innovative identification strategies to policy-relevant and topical questions in the fields of labor economics and behavioral environmental economics. Each chapter relies on a different identification strategy. In the first chapter, we employ a difference- in-differences approach to exploit the quasi-experimental change in the entitlement of the max- imum unemployment benefit duration to identify the medium-run effects of reduced benefit durations on post-unemployment outcomes. Shortening benefit duration carries a double- dividend: It generates fiscal benefits without deteriorating the quality of job-matches. On the contrary, shortened benefit durations improve medium-run earnings and employment possibly through containing the negative effects of skill depreciation or stigmatization. While the first chapter provides only indirect evidence on the underlying behavioral channels, in the second chapter I develop a novel approach that allows to learn about the relative impor- tance of the two key margins of job search - reservation wage choice and search effort. In the framework of a standard non-stationary job search model, I show how the exit rate from un- employment can be decomposed in a way that is informative on reservation wage movements over the unemployment spell. The empirical analysis relies on a sharp discontinuity in unem- ployment benefit entitlement, which can be exploited in a regression-discontinuity approach to identify the effects of extended benefit durations on unemployment and survivor functions. I find evidence that calls for an important role of reservation wage choices for job search be- havior. This can have direct implications for the optimal design of unemployment insurance policies. The third chapter - while thematically detached from the other chapters - addresses one of the major policy challenges of the 21st century: climate change and resource consumption. Many governments have recently put energy efficiency on top of their agendas. While pricing instru- ments aimed at regulating the energy demand have often been found to be short-lived and difficult to enforce politically, the focus of energy conservation programs has shifted towards behavioral approaches - such as provision of information or social norm feedback. The third chapter describes a randomized controlled field experiment in which we discuss the effective- ness of different types of feedback on residential electricity consumption. We find that detailed and real-time feedback caused persistent electricity reductions on the order of 3 to 5 % of daily electricity consumption. Also social norm information can generate substantial electricity sav- ings when designed appropriately. The findings suggest that behavioral approaches constitute effective and relatively cheap way of improving residential energy-efficiency.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Pulse wave velocity (PWV) is a surrogate of arterial stiffness and represents a non-invasive marker of cardiovascular risk. The non-invasive measurement of PWV requires tracking the arrival time of pressure pulses recorded in vivo, commonly referred to as pulse arrival time (PAT). In the state of the art, PAT is estimated by identifying a characteristic point of the pressure pulse waveform. This paper demonstrates that for ambulatory scenarios, where signal-to-noise ratios are below 10 dB, the performance in terms of repeatability of PAT measurements through characteristic points identification degrades drastically. Hence, we introduce a novel family of PAT estimators based on the parametric modeling of the anacrotic phase of a pressure pulse. In particular, we propose a parametric PAT estimator (TANH) that depicts high correlation with the Complior(R) characteristic point D1 (CC = 0.99), increases noise robustness and reduces by a five-fold factor the number of heartbeats required to obtain reliable PAT measurements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

SummaryDiscrete data arise in various research fields, typically when the observations are count data.I propose a robust and efficient parametric procedure for estimation of discrete distributions. The estimation is done in two phases. First, a very robust, but possibly inefficient, estimate of the model parameters is computed and used to indentify outliers. Then the outliers are either removed from the sample or given low weights, and a weighted maximum likelihood estimate (WML) is computed.The weights are determined via an adaptive process such that if the data follow the model, then asymptotically no observation is downweighted.I prove that the final estimator inherits the breakdown point of the initial one, and that its influence function at the model is the same as the influence function of the maximum likelihood estimator, which strongly suggests that it is asymptotically fully efficient.The initial estimator is a minimum disparity estimator (MDE). MDEs can be shown to have full asymptotic efficiency, and some MDEs have very high breakdown points and very low bias under contamination. Several initial estimators are considered, and the performances of the WMLs based on each of them are studied.It results that in a great variety of situations the WML substantially improves the initial estimator, both in terms of finite sample mean square error and in terms of bias under contamination. Besides, the performances of the WML are rather stable under a change of the MDE even if the MDEs have very different behaviors.Two examples of application of the WML to real data are considered. In both of them, the necessity for a robust estimator is clear: the maximum likelihood estimator is badly corrupted by the presence of a few outliers.This procedure is particularly natural in the discrete distribution setting, but could be extended to the continuous case, for which a possible procedure is sketched.RésuméLes données discrètes sont présentes dans différents domaines de recherche, en particulier lorsque les observations sont des comptages.Je propose une méthode paramétrique robuste et efficace pour l'estimation de distributions discrètes. L'estimation est faite en deux phases. Tout d'abord, un estimateur très robuste des paramètres du modèle est calculé, et utilisé pour la détection des données aberrantes (outliers). Cet estimateur n'est pas nécessairement efficace. Ensuite, soit les outliers sont retirés de l'échantillon, soit des faibles poids leur sont attribués, et un estimateur du maximum de vraisemblance pondéré (WML) est calculé.Les poids sont déterminés via un processus adaptif, tel qu'asymptotiquement, si les données suivent le modèle, aucune observation n'est dépondérée.Je prouve que le point de rupture de l'estimateur final est au moins aussi élevé que celui de l'estimateur initial, et que sa fonction d'influence au modèle est la même que celle du maximum de vraisemblance, ce qui suggère que cet estimateur est pleinement efficace asymptotiquement.L'estimateur initial est un estimateur de disparité minimale (MDE). Les MDE sont asymptotiquement pleinement efficaces, et certains d'entre eux ont un point de rupture très élevé et un très faible biais sous contamination. J'étudie les performances du WML basé sur différents MDEs.Le résultat est que dans une grande variété de situations le WML améliore largement les performances de l'estimateur initial, autant en terme du carré moyen de l'erreur que du biais sous contamination. De plus, les performances du WML restent assez stables lorsqu'on change l'estimateur initial, même si les différents MDEs ont des comportements très différents.Je considère deux exemples d'application du WML à des données réelles, où la nécessité d'un estimateur robuste est manifeste : l'estimateur du maximum de vraisemblance est fortement corrompu par la présence de quelques outliers.La méthode proposée est particulièrement naturelle dans le cadre des distributions discrètes, mais pourrait être étendue au cas continu.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

AbstractFor a wide range of environmental, hydrological, and engineering applications there is a fast growing need for high-resolution imaging. In this context, waveform tomographic imaging of crosshole georadar data is a powerful method able to provide images of pertinent electrical properties in near-surface environments with unprecedented spatial resolution. In contrast, conventional ray-based tomographic methods, which consider only a very limited part of the recorded signal (first-arrival traveltimes and maximum first-cycle amplitudes), suffer from inherent limitations in resolution and may prove to be inadequate in complex environments. For a typical crosshole georadar survey the potential improvement in resolution when using waveform-based approaches instead of ray-based approaches is in the range of one order-of- magnitude. Moreover, the spatial resolution of waveform-based inversions is comparable to that of common logging methods. While in exploration seismology waveform tomographic imaging has become well established over the past two decades, it is comparably still underdeveloped in the georadar domain despite corresponding needs. Recently, different groups have presented finite-difference time-domain waveform inversion schemes for crosshole georadar data, which are adaptations and extensions of Tarantola's seminal nonlinear generalized least-squares approach developed for the seismic case. First applications of these new crosshole georadar waveform inversion schemes on synthetic and field data have shown promising results. However, there is little known about the limits and performance of such schemes in complex environments. To this end, the general motivation of my thesis is the evaluation of the robustness and limitations of waveform inversion algorithms for crosshole georadar data in order to apply such schemes to a wide range of real world problems.One crucial issue to making applicable and effective any waveform scheme to real-world crosshole georadar problems is the accurate estimation of the source wavelet, which is unknown in reality. Waveform inversion schemes for crosshole georadar data require forward simulations of the wavefield in order to iteratively solve the inverse problem. Therefore, accurate knowledge of the source wavelet is critically important for successful application of such schemes. Relatively small differences in the estimated source wavelet shape can lead to large differences in the resulting tomograms. In the first part of my thesis, I explore the viability and robustness of a relatively simple iterative deconvolution technique that incorporates the estimation of the source wavelet into the waveform inversion procedure rather than adding additional model parameters into the inversion problem. Extensive tests indicate that this source wavelet estimation technique is simple yet effective, and is able to provide remarkably accurate and robust estimates of the source wavelet in the presence of strong heterogeneity in both the dielectric permittivity and electrical conductivity as well as significant ambient noise in the recorded data. Furthermore, our tests also indicate that the approach is insensitive to the phase characteristics of the starting wavelet, which is not the case when directly incorporating the wavelet estimation into the inverse problem.Another critical issue with crosshole georadar waveform inversion schemes which clearly needs to be investigated is the consequence of the common assumption of frequency- independent electromagnetic constitutive parameters. This is crucial since in reality, these parameters are known to be frequency-dependent and complex and thus recorded georadar data may show significant dispersive behaviour. In particular, in the presence of water, there is a wide body of evidence showing that the dielectric permittivity can be significantly frequency dependent over the GPR frequency range, due to a variety of relaxation processes. The second part of my thesis is therefore dedicated to the evaluation of the reconstruction limits of a non-dispersive crosshole georadar waveform inversion scheme in the presence of varying degrees of dielectric dispersion. I show that the inversion algorithm, combined with the iterative deconvolution-based source wavelet estimation procedure that is partially able to account for the frequency-dependent effects through an "effective" wavelet, performs remarkably well in weakly to moderately dispersive environments and has the ability to provide adequate tomographic reconstructions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Atlas registration is a recognized paradigm for the automatic segmentation of normal MR brain images. Unfortunately, atlas-based segmentation has been of limited use in presence of large space-occupying lesions. In fact, brain deformations induced by such lesions are added to normal anatomical variability and they may dramatically shift and deform anatomically or functionally important brain structures. In this work, we chose to focus on the problem of inter-subject registration of MR images with large tumors, inducing a significant shift of surrounding anatomical structures. First, a brief survey of the existing methods that have been proposed to deal with this problem is presented. This introduces the discussion about the requirements and desirable properties that we consider necessary to be fulfilled by a registration method in this context: To have a dense and smooth deformation field and a model of lesion growth, to model different deformability for some structures, to introduce more prior knowledge, and to use voxel-based features with a similarity measure robust to intensity differences. In a second part of this work, we propose a new approach that overcomes some of the main limitations of the existing techniques while complying with most of the desired requirements above. Our algorithm combines the mathematical framework for computing a variational flow proposed by Hermosillo et al. [G. Hermosillo, C. Chefd'Hotel, O. Faugeras, A variational approach to multi-modal image matching, Tech. Rep., INRIA (February 2001).] with the radial lesion growth pattern presented by Bach et al. [M. Bach Cuadra, C. Pollo, A. Bardera, O. Cuisenaire, J.-G. Villemure, J.-Ph. Thiran, Atlas-based segmentation of pathological MR brain images using a model of lesion growth, IEEE Trans. Med. Imag. 23 (10) (2004) 1301-1314.]. Results on patients with a meningioma are visually assessed and compared to those obtained with the most similar method from the state-of-the-art.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A major issue in the application of waveform inversion methods to crosshole ground-penetrating radar (GPR) data is the accurate estimation of the source wavelet. Here, we explore the viability and robustness of incorporating this step into a recently published time-domain inversion procedure through an iterative deconvolution approach. Our results indicate that, at least in non-dispersive electrical environments, such an approach provides remarkably accurate and robust estimates of the source wavelet even in the presence of strong heterogeneity of both the dielectric permittivity and electrical conductivity. Our results also indicate that the proposed source wavelet estimation approach is relatively insensitive to ambient noise and to the phase characteristics of the starting wavelet. Finally, there appears to be little to no trade-off between the wavelet estimation and the tomographic imaging procedures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio omega. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the omega ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set of simulated data. First, we investigated the impact of sequence divergence. We found evidence of underestimation of the synonymous substitution rate for values as small as 0.5, with a slight increase in false positives for the branch-site test. When dS increases further, underestimation of dS is worse, but false positives decrease. Interestingly, the detection of true positives follows a similar distribution, with a maximum for intermediary values of dS. Thus, high dS is more of a concern for a loss of power (false negatives) than for false positives of the test. Second, we investigated the impact of GC content. We showed that there is no significant difference of false positives between high GC (up to similar to 80%) and low GC (similar to 30%) genes. Moreover, neither shifts of GC content on a specific branch nor major shifts in GC along the gene sequence generate many false positives. Our results confirm that the branch-site is a very conservative test.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goal of this study was to investigate the impact of computing parameters and the location of volumes of interest (VOI) on the calculation of 3D noise power spectrum (NPS) in order to determine an optimal set of computing parameters and propose a robust method for evaluating the noise properties of imaging systems. Noise stationarity in noise volumes acquired with a water phantom on a 128-MDCT and a 320-MDCT scanner were analyzed in the spatial domain in order to define locally stationary VOIs. The influence of the computing parameters in the 3D NPS measurement: the sampling distances bx,y,z and the VOI lengths Lx,y,z, the number of VOIs NVOI and the structured noise were investigated to minimize measurement errors. The effect of the VOI locations on the NPS was also investigated. Results showed that the noise (standard deviation) varies more in the r-direction (phantom radius) than z-direction plane. A 25 × 25 × 40 mm(3) VOI associated with DFOV = 200 mm (Lx,y,z = 64, bx,y = 0.391 mm with 512 × 512 matrix) and a first-order detrending method to reduce structured noise led to an accurate NPS estimation. NPS estimated from off centered small VOIs had a directional dependency contrary to NPS obtained from large VOIs located in the center of the volume or from small VOIs located on a concentric circle. This showed that the VOI size and location play a major role in the determination of NPS when images are not stationary. This study emphasizes the need for consistent measurement methods to assess and compare image quality in CT.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spatial data analysis mapping and visualization is of great importance in various fields: environment, pollution, natural hazards and risks, epidemiology, spatial econometrics, etc. A basic task of spatial mapping is to make predictions based on some empirical data (measurements). A number of state-of-the-art methods can be used for the task: deterministic interpolations, methods of geostatistics: the family of kriging estimators (Deutsch and Journel, 1997), machine learning algorithms such as artificial neural networks (ANN) of different architectures, hybrid ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et al., 1996), etc. All the methods mentioned above can be used for solving the problem of spatial data mapping. Environmental empirical data are always contaminated/corrupted by noise, and often with noise of unknown nature. That's one of the reasons why deterministic models can be inconsistent, since they treat the measurements as values of some unknown function that should be interpolated. Kriging estimators treat the measurements as the realization of some spatial randomn process. To obtain the estimation with kriging one has to model the spatial structure of the data: spatial correlation function or (semi-)variogram. This task can be complicated if there is not sufficient number of measurements and variogram is sensitive to outliers and extremes. ANN is a powerful tool, but it also suffers from the number of reasons. of a special type ? multiplayer perceptrons ? are often used as a detrending tool in hybrid (ANN+geostatistics) models (Kanevski and Maignank, 2004). Therefore, development and adaptation of the method that would be nonlinear and robust to noise in measurements, would deal with the small empirical datasets and which has solid mathematical background is of great importance. The present paper deals with such model, based on Statistical Learning Theory (SLT) - Support Vector Regression. SLT is a general mathematical framework devoted to the problem of estimation of the dependencies from empirical data (Hastie et al, 2004; Vapnik, 1998). SLT models for classification - Support Vector Machines - have shown good results on different machine learning tasks. The results of SVM classification of spatial data are also promising (Kanevski et al, 2002). The properties of SVM for regression - Support Vector Regression (SVR) are less studied. First results of the application of SVR for spatial mapping of physical quantities were obtained by the authorsin for mapping of medium porosity (Kanevski et al, 1999), and for mapping of radioactively contaminated territories (Kanevski and Canu, 2000). The present paper is devoted to further understanding of the properties of SVR model for spatial data analysis and mapping. Detailed description of the SVR theory can be found in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations for the nonlinear modeling are given in section 2. Section 3 discusses the application of SVR for spatial data mapping on the real case study - soil pollution by Cs137 radionuclide. Section 4 discusses the properties of the modelapplied to noised data or data with outliers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: To compare the heart-rate monitoring with the doubly labelled water (2H2(18)O) method to estimate total daily energy expenditure in obese and non-obese children. DESIGN: Cross sectional study of obese and normal weight children. SUBJECTS: 13 prepubertal children: six obese (4M, 2F, 9.1 +/- 1.5 years, 47.3 +/- 9.7 kg) and seven non-obese (3M, 4F, 9.3 +/- 0.6 years, 31.8 +/- 3.2 kg). MEASUREMENTS: Total daily energy expenditure was assessed by means of the doubly labelled water method (TEEDLW) and of heart-rate monitoring (TEEHR). RESULTS: TEEHR was significantly (P < 0.05) higher than TEEDLW in obese children (9.47 +/- 0.84 MJ/d vs 8.99 +/- 0.63 MJ/d) whereas it was not different in non-obese children (8.43 +/- 2.02 MJ/d vs 8.42 +/- 2.30 MJ/d, P = NS). The difference of TEE assessed by HR monitoring in the obese group averaged 6.2 +/- 4.7%. At the individual level, the degree of agreement (difference between TEEHR and TEEDLW +/- 2s.d.) was low both in obese (-0.36, 1.32 MJ/d) and in non-obese children (-1.30, 1.34 MJ/d). At the group level, the agreement between the two methods was good in nonobese children (95% c.i. for the bias:-0.59, 0.63 MJ/d) but not in obese children (0.04, 0.92 MJ/d). Duration of sleep and energy expenditure during resting and physical activity were not significantly different in the two groups. Patterns of heart-rate (or derived energy expenditure) during the day-time were similar in obese and non-obese children. CONCLUSION: The HR monitoring technique provides an estimation of TEE close to that assessed by the DLW method in non-obese prepubertal children. In comparison with DLW, the HR monitoring method yields a greater TEE value in obese children.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The accurate estimation of total daily energy expenditure (TEE) in chronic kidney patients is essential to allow the provision of nutritional requirements; however, it remains a challenge to collect actual physical activity and resting energy expenditure in maintenance dialysis patients. The direct measurement of TEE by direct calorimetry or doubly labeled water cannot be used easily so that, in clinical practice, TEE is usually estimated from resting energy expenditure and physical activity. Prediction equations may also be used to estimate resting energy expenditure; however, their use has been poorly documented in dialysis patients. Recently, a new system called SenseWear Armband (BodyMedia, Pittsburgh, PA) was developed to assess TEE, but so far no data have been published in chronic kidney disease patients. The aim of this review is to describe new measurements of energy expenditure and physical activity in chronic kidney disease patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.