75 resultados para kernel estimator
Resumo:
1. Aim - Concerns over how global change will influence species distributions, in conjunction with increased emphasis on understanding niche dynamics in evolutionary and community contexts, highlight the growing need for robust methods to quantify niche differences between or within taxa. We propose a statistical framework to describe and compare environmental niches from occurrence and spatial environmental data.¦2. Location - Europe, North America, South America¦3. Methods - The framework applies kernel smoothers to densities of species occurrence in gridded environmental space to calculate metrics of niche overlap and test hypotheses regarding niche conservatism. We use this framework and simulated species with predefined distributions and amounts of niche overlap to evaluate several ordination and species distribution modeling techniques for quantifying niche overlap. We illustrate the approach with data on two well-studied invasive species.¦4. Results - We show that niche overlap can be accurately detected with the framework when variables driving the distributions are known. The method is robust to known and previously undocumented biases related to the dependence of species occurrences on the frequency of environmental conditions that occur across geographic space. The use of a kernel smoother makes the process of moving from geographical space to multivariate environmental space independent of both sampling effort and arbitrary choice of resolution in environmental space. However, the use of ordination and species distribution model techniques for selecting, combining and weighting variables on which niche overlap is calculated provide contrasting results.¦5. Main conclusions - The framework meets the increasing need for robust methods to quantify niche differences. It is appropriate to study niche differences between species, subspecies or intraspecific lineages that differ in their geographical distributions. Alternatively, it can be used to measure the degree to which the environmental niche of a species or intraspecific lineage has changed over time.
Resumo:
In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).
Resumo:
Introduction: The Thalidomide-Dexamethasone (TD) regimen has provided encouraging results in relapsed MM. To improve results, bortezomib (Velcade) has been added to the combination in previous phase II studies, the so called VTD regimen. In January 2006, the European Group for Blood and Marrow Transplantation (EBMT) and the Intergroupe Francophone du Myélome (IFM) initiated a prospective, randomized, parallel-group, open-label phase III, multicenter study, comparing VTD (arm A) with TD (arm B) for MM patients progressing or relapsing after autologous transplantation. Patients and Methods: Inclusion criteria: patients in first progression or relapse after at least one autologous transplantation, including those who had received bortezomib or thalidomide before transplant. Exclusion criteria: subjects with neuropathy above grade 1 or non secretory MM. Primary study end point was time to progression (TTP). Secondary end points included safety, response rate, progression-free survival (PFS) and overall survival (OS). Treatment was scheduled as follows: bortezomib 1.3 mg/m2 was given as an i.v bolus on Days 1, 4, 8 and 11 followed by a 10-Day rest period (days 12 to 21) for 8 cycles (6 months) and then on Days 1, 8, 15, 22 followed by a 20-Day rest period (days 23 to 42) for 4 cycles (6 months). In both arms, thalidomide was scheduled at 200 mg/Day orally for one year and dexamethasone 40 mg/Day orally four days every three weeks for one year. Patients reaching remission could proceed to a new stem cell harvest. However, transplantation, either autologous or allogeneic, could only be performed in patients who completed the planned one year treatment period. Response was assessed by EBMT criteria, with additional category of near complete remission (nCR). Adverse events were graded by the NCI-CTCAE, Version 3.0.The trial was based on a group sequential design, with 4 planned interim analyses and one final analysis that allowed stopping for efficacy as well as futility. The overall alpha and power were set equal to 0.025 and 0.90 respectively. The test for decision making was based on the comparison in terms of the ratio of the cause-specific hazards of relapse/progression, estimated in a Cox model stratified on the number of previous autologous transplantations. Relapse/progression cumulative incidence was estimated using the proper nonparametric estimator, the comparison was done by the Gray test. PFS and OS probabilities were estimated by the Kaplan-Meier curves, the comparison was performed by the Log-Rank test. An interim safety analysis was performed when the first hundred patients had been included. The safety committee recommended to continue the trial. Results: As of 1st July 2010, 269 patients had been enrolled in the study, 139 in France (IFM 2005-04 study), 21 in Italy, 38 in Germany, 19 in Switzerland (a SAKK study), 23 in Belgium, 8 in Austria, 8 in the Czech republic, 11 in Hungary, 1 in the UK and 1 in Israel. One hundred and sixty nine patients were males and 100 females; the median age was 61 yrs (range 29-76). One hundred and thirty six patients were randomized to receive VTD and 133 to receive TD. The current analysis is based on 246 patients (124 in arm A, 122 in arm B) included in the second interim analysis, carried out when 134 events were observed. Following this analysis, the trial was stopped because of significant superiority of VTD over TD. The remaining patients were too premature to contribute to the analysis. The number of previous autologous transplants was one in 63 vs 60 and two or more in 61 vs 62 patients in arm A vs B respectively. The median follow-up was 25 months. The median TTP was 20 months vs 15 months respectively in arm A and B, with cumulative incidence of relapse/progression at 2 years equal to 52% (95% CI: 42%-64%) vs 70% (95% CI: 61%-81%) (p=0.0004, Gray test). The same superiority of arm A was also observed when stratifying on the number of previous autologous transplantations. At 2 years, PFS was 39% (95% CI: 30%-51%) vs 23% (95% CI: 16%-34%) (A vs B, p=0.0006, Log-Rank test). OS in the first two years was comparable in the two groups. Conclusion: VTD resulted in significantly longer TTP and PFS in patients relapsing after ASCT. Analysis of response and safety data are on going and results will be presented at the meeting. Protocol EU-DRACT number: 2005-001628-35.
Resumo:
The interhemispheric asymmetries that originate from connectivity-related structuring of the cortex are compromised in schizophrenia (SZ). Under the assumption that such abnormalities affect functional connectivity, we analyzed its correlate-EEG synchronization-in SZ patients and matched controls. We applied multivariate synchronization measures based on Laplacian EEG and tuned to various spatial scales. Compared to the controls who had rightward asymmetry at a local level (EEG power), rightward anterior and leftward posterior asymmetries at an intraregional level (1st and 2nd order S-estimator), and rightward global asymmetry (hemispheric S-estimator), SZ patients showed generally attenuated asymmetry, the effect being strongest for intraregional synchronization in the alpha and beta bands. The abnormalities of asymmetry increased with the duration of the disease and correlated with the negative symptoms. We discuss the tentative links between these findings and gross anatomical asymmetries, including the cerebral torque and gyrification pattern, in normal subjects and SZ patients.
Resumo:
The OLS estimator of the intergenerational earnings correlation is biased towards zero, while the instrumental variables estimator is biased upwards. The first of these results arises because of measurement error, while the latter rests on the presumption that the education of the parent family is an invalid instrument. We propose a panel data framework for quantifying the asymptotic biases of these estimators, as well as a mis-specification test for the IV estimator. [Author]
Resumo:
A semisupervised support vector machine is presented for the classification of remote sensing images. The method exploits the wealth of unlabeled samples for regularizing the training kernel representation locally by means of cluster kernels. The method learns a suitable kernel directly from the image and thus avoids assuming a priori signal relations by using a predefined kernel structure. Good results are obtained in image classification examples when few labeled samples are available. The method scales almost linearly with the number of unlabeled samples and provides out-of-sample predictions.
Resumo:
Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.
Resumo:
In groundwater applications, Monte Carlo methods are employed to model the uncertainty on geological parameters. However, their brute-force application becomes computationally prohibitive for highly detailed geological descriptions, complex physical processes, and a large number of realizations. The Distance Kernel Method (DKM) overcomes this issue by clustering the realizations in a multidimensional space based on the flow responses obtained by means of an approximate (computationally cheaper) model; then, the uncertainty is estimated from the exact responses that are computed only for one representative realization per cluster (the medoid). Usually, DKM is employed to decrease the size of the sample of realizations that are considered to estimate the uncertainty. We propose to use the information from the approximate responses for uncertainty quantification. The subset of exact solutions provided by DKM is then employed to construct an error model and correct the potential bias of the approximate model. Two error models are devised that both employ the difference between approximate and exact medoid solutions, but differ in the way medoid errors are interpolated to correct the whole set of realizations. The Local Error Model rests upon the clustering defined by DKM and can be seen as a natural way to account for intra-cluster variability; the Global Error Model employs a linear interpolation of all medoid errors regardless of the cluster to which the single realization belongs. These error models are evaluated for an idealized pollution problem in which the uncertainty of the breakthrough curve needs to be estimated. For this numerical test case, we demonstrate that the error models improve the uncertainty quantification provided by the DKM algorithm and are effective in correcting the bias of the estimate computed solely from the MsFV results. The framework presented here is not specific to the methods considered and can be applied to other combinations of approximate models and techniques to select a subset of realizations
Resumo:
Alzheimer's disease (AD) disrupts functional connectivity in distributed cortical networks. We analyzed changes in the S-estimator, a measure of multivariate intraregional synchronization, in electroencephalogram (EEG) source space in 15 mild AD patients versus 15 age-matched controls to evaluate its potential as a marker of AD progression. All participants underwent 2 clinical evaluations and 2 EEG recording sessions on diagnosis and after a year. The main effect of AD was hyposynchronization in the medial temporal and frontal regions and relative hypersynchronization in posterior cingulate, precuneus, cuneus, and parietotemporal cortices. However, the S-estimator did not change over time in either group. This result motivated an analysis of rapidly progressing AD versus slow-progressing patients. Rapidly progressing AD patients showed a significant reduction in synchronization with time, manifest in left frontotemporal cortex. Thus, the evolution of source EEG synchronization over time is correlated with the rate of disease progression and should be considered as a cost-effective AD biomarker.
Resumo:
Preface The starting point for this work and eventually the subject of the whole thesis was the question: how to estimate parameters of the affine stochastic volatility jump-diffusion models. These models are very important for contingent claim pricing. Their major advantage, availability T of analytical solutions for characteristic functions, made them the models of choice for many theoretical constructions and practical applications. At the same time, estimation of parameters of stochastic volatility jump-diffusion models is not a straightforward task. The problem is coming from the variance process, which is non-observable. There are several estimation methodologies that deal with estimation problems of latent variables. One appeared to be particularly interesting. It proposes the estimator that in contrast to the other methods requires neither discretization nor simulation of the process: the Continuous Empirical Characteristic function estimator (EGF) based on the unconditional characteristic function. However, the procedure was derived only for the stochastic volatility models without jumps. Thus, it has become the subject of my research. This thesis consists of three parts. Each one is written as independent and self contained article. At the same time, questions that are answered by the second and third parts of this Work arise naturally from the issues investigated and results obtained in the first one. The first chapter is the theoretical foundation of the thesis. It proposes an estimation procedure for the stochastic volatility models with jumps both in the asset price and variance processes. The estimation procedure is based on the joint unconditional characteristic function for the stochastic process. The major analytical result of this part as well as of the whole thesis is the closed form expression for the joint unconditional characteristic function for the stochastic volatility jump-diffusion models. The empirical part of the chapter suggests that besides a stochastic volatility, jumps both in the mean and the volatility equation are relevant for modelling returns of the S&P500 index, which has been chosen as a general representative of the stock asset class. Hence, the next question is: what jump process to use to model returns of the S&P500. The decision about the jump process in the framework of the affine jump- diffusion models boils down to defining the intensity of the compound Poisson process, a constant or some function of state variables, and to choosing the distribution of the jump size. While the jump in the variance process is usually assumed to be exponential, there are at least three distributions of the jump size which are currently used for the asset log-prices: normal, exponential and double exponential. The second part of this thesis shows that normal jumps in the asset log-returns should be used if we are to model S&P500 index by a stochastic volatility jump-diffusion model. This is a surprising result. Exponential distribution has fatter tails and for this reason either exponential or double exponential jump size was expected to provide the best it of the stochastic volatility jump-diffusion models to the data. The idea of testing the efficiency of the Continuous ECF estimator on the simulated data has already appeared when the first estimation results of the first chapter were obtained. In the absence of a benchmark or any ground for comparison it is unreasonable to be sure that our parameter estimates and the true parameters of the models coincide. The conclusion of the second chapter provides one more reason to do that kind of test. Thus, the third part of this thesis concentrates on the estimation of parameters of stochastic volatility jump- diffusion models on the basis of the asset price time-series simulated from various "true" parameter sets. The goal is to show that the Continuous ECF estimator based on the joint unconditional characteristic function is capable of finding the true parameters. And, the third chapter proves that our estimator indeed has the ability to do so. Once it is clear that the Continuous ECF estimator based on the unconditional characteristic function is working, the next question does not wait to appear. The question is whether the computation effort can be reduced without affecting the efficiency of the estimator, or whether the efficiency of the estimator can be improved without dramatically increasing the computational burden. The efficiency of the Continuous ECF estimator depends on the number of dimensions of the joint unconditional characteristic function which is used for its construction. Theoretically, the more dimensions there are, the more efficient is the estimation procedure. In practice, however, this relationship is not so straightforward due to the increasing computational difficulties. The second chapter, for example, in addition to the choice of the jump process, discusses the possibility of using the marginal, i.e. one-dimensional, unconditional characteristic function in the estimation instead of the joint, bi-dimensional, unconditional characteristic function. As result, the preference for one or the other depends on the model to be estimated. Thus, the computational effort can be reduced in some cases without affecting the efficiency of the estimator. The improvement of the estimator s efficiency by increasing its dimensionality faces more difficulties. The third chapter of this thesis, in addition to what was discussed above, compares the performance of the estimators with bi- and three-dimensional unconditional characteristic functions on the simulated data. It shows that the theoretical efficiency of the Continuous ECF estimator based on the three-dimensional unconditional characteristic function is not attainable in practice, at least for the moment, due to the limitations on the computer power and optimization toolboxes available to the general public. Thus, the Continuous ECF estimator based on the joint, bi-dimensional, unconditional characteristic function has all the reasons to exist and to be used for the estimation of parameters of the stochastic volatility jump-diffusion models.
Resumo:
In this paper we propose an innovative methodology for automated profiling of illicit tablets bytheir surface granularity; a feature previously unexamined for this purpose. We make use of the tinyinconsistencies at the tablet surface, referred to as speckles, to generate a quantitative granularity profileof tablets. Euclidian distance is used as a measurement of (dis)similarity between granularity profiles.The frequency of observed distances is then modelled by kernel density estimation in order to generalizethe observations and to calculate likelihood ratios (LRs). The resulting LRs are used to evaluate thepotential of granularity profiles to differentiate between same-batch and different-batches tablets.Furthermore, we use the LRs as a similarity metric to refine database queries. We are able to derivereliable LRs within a scope that represent the true evidential value of the granularity feature. Thesemetrics are used to refine candidate hit-lists form a database containing physical features of illicittablets. We observe improved or identical ranking of candidate tablets in 87.5% of cases when granularityis considered.
Resumo:
Questionnaire studies indicate that high-anxious musicians may suffer from hyperventilation symptoms before and/or during performance. Reported symptoms include amongst others shortness of breath, fast or deep breathing, dizziness and thumping heart. A self-report study by Widmer, Conway, Cohen and Davies (1997) shows that up to seventy percent of the tested highly anxious musicians are hyperventilators during performance. However, no study has yet tested if these self-reported symptoms reflect actual cardiorespiratory changes just before and during performance. Disturbances in breathing patterns and hyperventilation may negatively affect the performance quality in stressful performance situations. The main goal of this study is to determine if music performance anxiety is manifest physiologically in specific correlates of cardiorespiratory activity. We studied 74 professional music students of Swiss Music Universities divided into two groups (high- and lowanxious) based on their self-reported performance anxiety (State-Trait Anxiety Inventory by Spielberger). The students were tested in three distinct situations: baseline, performance without audience, performance with audience. We measured a) breathing patterns, end-tidal carbon dioxide, which is a good non-invasive estimator for hyperventilation, and cardiac activation and b) self-perceived emotions and self-perceived physiological activation. Analyses of heart rate, respiratory rate, self-perceived palpitations, self-perceived shortness of breath and self-perceived anxiety for the 15 most and the 15 least anxious musicians show that high-anxious and low-anxious music students have a comparable physiological activation during the different measurement periods. However, highanxious music students feel significantly more anxious and perceive significantly stronger palpitations and significantly stronger shortness of breath just before and during a public performance. The results indicate that low- and high-anxious music students a) do not differ in the considered physiological responses and b) differ in the considered self-perceived physiological symptoms and the selfreported anxiety before and/or during a public performance.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
Robust estimators for accelerated failure time models with asymmetric (or symmetric) error distribution and censored observations are proposed. It is assumed that the error model belongs to a log-location-scale family of distributions and that the mean response is the parameter of interest. Since scale is a main component of mean, scale is not treated as a nuisance parameter. A three steps procedure is proposed. In the first step, an initial high breakdown point S estimate is computed. In the second step, observations that are unlikely under the estimated model are rejected or down weighted. Finally, a weighted maximum likelihood estimate is computed. To define the estimates, functions of censored residuals are replaced by their estimated conditional expectation given that the response is larger than the observed censored value. The rejection rule in the second step is based on an adaptive cut-off that, asymptotically, does not reject any observation when the data are generat ed according to the model. Therefore, the final estimate attains full efficiency at the model, with respect to the maximum likelihood estimate, while maintaining the breakdown point of the initial estimator. Asymptotic results are provided. The new procedure is evaluated with the help of Monte Carlo simulations. Two examples with real data are discussed.
Resumo:
Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending the corresponding approaches to the regional scale represents a major, and as-of-yet largely unresolved, challenge. To address this problem, we have developed a downscaling procedure based on a non-linear Bayesian sequential simulation approach. The basic objective of this algorithm is to estimate the value of the sparsely sampled hydraulic conductivity at non-sampled locations based on its relation to the electrical conductivity, which is available throughout the model space. The in situ relationship between the hydraulic and electrical conductivities is described through a non-parametric multivariate kernel density function. This method is then applied to the stochastic integration of low-resolution, re- gional-scale electrical resistivity tomography (ERT) data in combination with high-resolution, local-scale downhole measurements of the hydraulic and electrical conductivities. Finally, the overall viability of this downscaling approach is tested and verified by performing and comparing flow and transport simulation through the original and the downscaled hydraulic conductivity fields. Our results indicate that the proposed procedure does indeed allow for obtaining remarkably faithful estimates of the regional-scale hydraulic conductivity structure and correspondingly reliable predictions of the transport characteristics over relatively long distances.