75 resultados para kernel estimator
Resumo:
We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to existing approaches to deep learning whilst yielding competitive error rates compared to those methods, and existing shallow semi-supervised techniques.
Resumo:
In this study we propose an evaluation of the angular effects altering the spectral response of the land-cover over multi-angle remote sensing image acquisitions. The shift in the statistical distribution of the pixels observed in an in-track sequence of WorldView-2 images is analyzed by means of a kernel-based measure of distance between probability distributions. Afterwards, the portability of supervised classifiers across the sequence is investigated by looking at the evolution of the classification accuracy with respect to the changing observation angle. In this context, the efficiency of various physically and statistically based preprocessing methods in obtaining angle-invariant data spaces is compared and possible synergies are discussed.
Resumo:
Background: Glutathione (GSH) dysregulation at the gene, protein and functional levels observed in schizophrenia patients, and schizophrenia-like anomalies in GSH deficit experimental models, suggest that genetic glutathione synthesis impairments represent one major risk factor for the disease (Do et al., 2009). In a randomized, double blind, placebo controlled, add-on clinical trial of 140 patients, the GSH precursor N-Acetyl-Cysteine (NAC, 2 g/day, 6 months) significantly improved the negative symptoms and reduced side-effects due to antipsychotics (Berk et al., 2008). In a subset of patients (n=7), NAC (2 g/day, 2 months, cross-over design) also improved auditory evoked potentials, the NMDAdependent mismatch negativity (Lavoie et al, 2008). Methods: To determine whether increased GSH levels would modulate the topography of functional brain connectivity, we applied a multivariate phase synchronization (MPS) estimator (Knyazeva et al, 2008) to dense-array EEGs recorded during rest with eyes closed at the protocol onset, the point of crossover, and at its end. Phase synchronization phenomena are appealing because they can be associated to synchronized phases while the amplitudes stay uncorrelated. MPS measures the degree of interactions among the recorded neuronal oscillators by quantifiying to what extent they behave like a macro-oscillator (i.e. the oscillators are phase synchronous). To assess the whole-head synchronization topography, we computed the MPS sensor-wise over the cluster of locations defined by the sensor itself and he surrounding ones belonging to its second-order neighborhood (Carmeli et al, 2005). Such a cluster spans about 12 cm on average. Abstracts 245 Results: The whole-head imaging revealed a specific synchronization landscape in NAC compared to placebo condition. In particular, NAC increased MPS over frontal and left temporal regions in a frequency-specific manner. Importantly, the topography and direction of MPS changes were similar and robust in all 7 patients. Moreover, these changes correlated with the changes in the Liddle's score of disorganization (Liddle, 1987) thus linking EEG synchronization to the improvement of clinical picture. Discussion: The data suggest an important pathway towards new therapeutic strategies that target GSH dysregulation in schizophrenia. They also show the utility of MPS mapping as a marker of treatment efficacy.
Resumo:
It is estimated that around 230 people die each year due to radon (222Rn) exposure in Switzerland. 222Rn occurs mainly in closed environments like buildings and originates primarily from the subjacent ground. Therefore it depends strongly on geology and shows substantial regional variations. Correct identification of these regional variations would lead to substantial reduction of 222Rn exposure of the population based on appropriate construction of new and mitigation of already existing buildings. Prediction of indoor 222Rn concentrations (IRC) and identification of 222Rn prone areas is however difficult since IRC depend on a variety of different variables like building characteristics, meteorology, geology and anthropogenic factors. The present work aims at the development of predictive models and the understanding of IRC in Switzerland, taking into account a maximum of information in order to minimize the prediction uncertainty. The predictive maps will be used as a decision-support tool for 222Rn risk management. The construction of these models is based on different data-driven statistical methods, in combination with geographical information systems (GIS). In a first phase we performed univariate analysis of IRC for different variables, namely the detector type, building category, foundation, year of construction, the average outdoor temperature during measurement, altitude and lithology. All variables showed significant associations to IRC. Buildings constructed after 1900 showed significantly lower IRC compared to earlier constructions. We observed a further drop of IRC after 1970. In addition to that, we found an association of IRC with altitude. With regard to lithology, we observed the lowest IRC in sedimentary rocks (excluding carbonates) and sediments and the highest IRC in the Jura carbonates and igneous rock. The IRC data was systematically analyzed for potential bias due to spatially unbalanced sampling of measurements. In order to facilitate the modeling and the interpretation of the influence of geology on IRC, we developed an algorithm based on k-medoids clustering which permits to define coherent geological classes in terms of IRC. We performed a soil gas 222Rn concentration (SRC) measurement campaign in order to determine the predictive power of SRC with respect to IRC. We found that the use of SRC is limited for IRC prediction. The second part of the project was dedicated to predictive mapping of IRC using models which take into account the multidimensionality of the process of 222Rn entry into buildings. We used kernel regression and ensemble regression tree for this purpose. We could explain up to 33% of the variance of the log transformed IRC all over Switzerland. This is a good performance compared to former attempts of IRC modeling in Switzerland. As predictor variables we considered geographical coordinates, altitude, outdoor temperature, building type, foundation, year of construction and detector type. Ensemble regression trees like random forests allow to determine the role of each IRC predictor in a multidimensional setting. We found spatial information like geology, altitude and coordinates to have stronger influences on IRC than building related variables like foundation type, building type and year of construction. Based on kernel estimation we developed an approach to determine the local probability of IRC to exceed 300 Bq/m3. In addition to that we developed a confidence index in order to provide an estimate of uncertainty of the map. All methods allow an easy creation of tailor-made maps for different building characteristics. Our work is an essential step towards a 222Rn risk assessment which accounts at the same time for different architectural situations as well as geological and geographical conditions. For the communication of 222Rn hazard to the population we recommend to make use of the probability map based on kernel estimation. The communication of 222Rn hazard could for example be implemented via a web interface where the users specify the characteristics and coordinates of their home in order to obtain the probability to be above a given IRC with a corresponding index of confidence. Taking into account the health effects of 222Rn, our results have the potential to substantially improve the estimation of the effective dose from 222Rn delivered to the Swiss population.
Resumo:
Descriptors: music performance anxiety, respiration, hyperventilation Surveys indicate that high-anxious musicians may suffer from hyperventilation (HV) before or during performance. Reported symptoms include shortness of breath, fast/ deep breathing and thumping heart. However, no study has yet tested if these selfreported symptoms reflect actual cardiorespiratory activity. Themain goal of this study was to determine if MPA is manifested physiologically in specific correlates of cardiorespiratory activity associated with HV.We studied 74 professional music students from Swiss Music Academies. In this study, we compared the most anxious students (highanxious; n 5 20) with the least anxious students (low-anxious; n 5 23) based on their self-reported performance anxiety. We measured cardiorespiratory patterns with the Lifeshirt system, end-tidal CO2 with a capnograph (EtCO2, a good non-invasive estimator of HV), self-perceived physiological activation and affective experience in three situations on different days: baseline, performance without audience, and performance with audience. Comparing measures for the private vs. the public concert, high- compared to low-anxious students showed a significant drop in EtCO2 before the public concert and reported larger increases in anxiety, tension, palpitations and breathing difficulties. In contrast, heart rate, respiratory rate and volume did not differ significantly between groups. The results of this study support the hypothesis thatMPA may be associated with a tendency to hyperventilate and, thus, point to a potential hyperventilation problem in high-anxious music students.
Resumo:
Introduction: The Thalidomide-Dexamethasone (TD) regimen has provided encouraging results in relapsed MM. To improve results, bortezomib (Velcade) has been added to the combination in previous phase II studies, the so called VTD regimen. In January 2006, the European Group for Blood and Marrow Transplantation (EBMT) and the Intergroupe Francophone du Myélome (IFM) initiated a prospective, randomized, parallel-group, open-label phase III, multicenter study, comparing VTD (arm A) with TD (arm B) for MM patients progressing or relapsing after autologous transplantation. Patients and Methods: Inclusion criteria: patients in first progression or relapse after at least one autologous transplantation, including those who had received bortezomib or thalidomide before transplant. Exclusion criteria: subjects with neuropathy above grade 1 or non secretory MM. Primary study end point was time to progression (TTP). Secondary end points included safety, response rate, progression-free survival (PFS) and overall survival (OS). Treatment was scheduled as follows: bortezomib 1.3 mg/m2 was given as an i.v bolus on Days 1, 4, 8 and 11 followed by a 10-Day rest period (days 12 to 21) for 8 cycles (6 months) and then on Days 1, 8, 15, 22 followed by a 20-Day rest period (days 23 to 42) for 4 cycles (6 months). In both arms, thalidomide was scheduled at 200 mg/Day orally for one year and dexamethasone 40 mg/Day orally four days every three weeks for one year. Patients reaching remission could proceed to a new stem cell harvest. However, transplantation, either autologous or allogeneic, could only be performed in patients who completed the planned one year treatment period. Response was assessed by EBMT criteria, with additional category of near complete remission (nCR). Adverse events were graded by the NCI-CTCAE, Version 3.0.The trial was based on a group sequential design, with 4 planned interim analyses and one final analysis that allowed stopping for efficacy as well as futility. The overall alpha and power were set equal to 0.025 and 0.90 respectively. The test for decision making was based on the comparison in terms of the ratio of the cause-specific hazards of relapse/progression, estimated in a Cox model stratified on the number of previous autologous transplantations. Relapse/progression cumulative incidence was estimated using the proper nonparametric estimator, the comparison was done by the Gray test. PFS and OS probabilities were estimated by the Kaplan-Meier curves, the comparison was performed by the Log-Rank test. An interim safety analysis was performed when the first hundred patients had been included. The safety committee recommended to continue the trial. Results: As of 1st July 2010, 269 patients had been enrolled in the study, 139 in France (IFM 2005-04 study), 21 in Italy, 38 in Germany, 19 in Switzerland (a SAKK study), 23 in Belgium, 8 in Austria, 8 in the Czech republic, 11 in Hungary, 1 in the UK and 1 in Israel. One hundred and sixty nine patients were males and 100 females; the median age was 61 yrs (range 29-76). One hundred and thirty six patients were randomized to receive VTD and 133 to receive TD. The current analysis is based on 246 patients (124 in arm A, 122 in arm B) included in the second interim analysis, carried out when 134 events were observed. Following this analysis, the trial was stopped because of significant superiority of VTD over TD. The remaining patients were too premature to contribute to the analysis. The number of previous autologous transplants was one in 63 vs 60 and two or more in 61 vs 62 patients in arm A vs B respectively. The median follow-up was 25 months. The median TTP was 20 months vs 15 months respectively in arm A and B, with cumulative incidence of relapse/progression at 2 years equal to 52% (95% CI: 42%-64%) vs 70% (95% CI: 61%-81%) (p=0.0004, Gray test). The same superiority of arm A was also observed when stratifying on the number of previous autologous transplantations. At 2 years, PFS was 39% (95% CI: 30%-51%) vs 23% (95% CI: 16%-34%) (A vs B, p=0.0006, Log-Rank test). OS in the first two years was comparable in the two groups. Conclusion: VTD resulted in significantly longer TTP and PFS in patients relapsing after ASCT. Analysis of response and safety data are on going and results will be presented at the meeting.
Resumo:
The state of the art to describe image quality in medical imaging is to assess the performance of an observer conducting a task of clinical interest. This can be done by using a model observer leading to a figure of merit such as the signal-to-noise ratio (SNR). Using the non-prewhitening (NPW) model observer, we objectively characterised the evolution of its figure of merit in various acquisition conditions. The NPW model observer usually requires the use of the modulation transfer function (MTF) as well as noise power spectra. However, although the computation of the MTF poses no problem when dealing with the traditional filtered back-projection (FBP) algorithm, this is not the case when using iterative reconstruction (IR) algorithms, such as adaptive statistical iterative reconstruction (ASIR) or model-based iterative reconstruction (MBIR). Given that the target transfer function (TTF) had already shown it could accurately express the system resolution even with non-linear algorithms, we decided to tune the NPW model observer, replacing the standard MTF by the TTF. It was estimated using a custom-made phantom containing cylindrical inserts surrounded by water. The contrast differences between the inserts and water were plotted for each acquisition condition. Then, mathematical transformations were performed leading to the TTF. As expected, the first results showed a dependency of the image contrast and noise levels on the TTF for both ASIR and MBIR. Moreover, FBP also proved to be dependent of the contrast and noise when using the lung kernel. Those results were then introduced in the NPW model observer. We observed an enhancement of SNR every time we switched from FBP to ASIR to MBIR. IR algorithms greatly improve image quality, especially in low-dose conditions. Based on our results, the use of MBIR could lead to further dose reduction in several clinical applications.
Resumo:
We consider the problem of estimating the mean hospital cost of stays of a class of patients (e.g., a diagnosis-related group) as a function of patient characteristics. The statistical analysis is complicated by the asymmetry of the cost distribution, the possibility of censoring on the cost variable, and the occurrence of outliers. These problems have often been treated separately in the literature, and a method offering a joint solution to all of them is still missing. Indirect procedures have been proposed, combining an estimate of the duration distribution with an estimate of the conditional cost for a given duration. We propose a parametric version of this approach, allowing for asymmetry and censoring in the cost distribution and providing a mean cost estimator that is robust in the presence of extreme values. In addition, the new method takes covariate information into account.
Resumo:
This letter presents advanced classification methods for very high resolution images. Efficient multisource information, both spectral and spatial, is exploited through the use of composite kernels in support vector machines. Weighted summations of kernels accounting for separate sources of spectral and spatial information are analyzed and compared to classical approaches such as pure spectral classification or stacked approaches using all the features in a single vector. Model selection problems are addressed, as well as the importance of the different kernels in the weighted summation.
Resumo:
We examine the power of different exact tests of differentiation for diploid populations. Since there is not necessarily random mating within populations, the appropriate hypothesis to construct exact tests is that of independent sampling of genotypes. There are two categories of tests, FST-estimator tests and goodness of fit tests. In this latter category, we distinguish "allelic statistics", which account for the nature of alleles within genotypes, from "genotypic statistics" that do not. We show that the power of FST-estimator tests and of allelic goodness of fit tests are similar when sampling is balanced, and higher than the power of genotypic goodness of fit tests. When sampling is unbalanced, the most powerful tests are shown to belong to the allelic goodness of fit group.
Resumo:
In this paper, mixed spectral-structural kernel machines are proposed for the classification of very-high resolution images. The simultaneous use of multispectral and structural features (computed using morphological filters) allows a significant increase in classification accuracy of remote sensing images. Subsequently, weighted summation kernel support vector machines are proposed and applied in order to take into account the multiscale nature of the scene considered. Such classifiers use the Mercer property of kernel matrices to compute a new kernel matrix accounting simultaneously for two scale parameters. Tests on a Zurich QuickBird image show the relevance of the proposed method : using the mixed spectral-structural features, the classification accuracy increases of about 5%, achieving a Kappa index of 0.97. The multikernel approach proposed provide an overall accuracy of 98.90% with related Kappa index of 0.985.
Resumo:
We consider robust parametric procedures for univariate discrete distributions, focusing on the negative binomial model. The procedures are based on three steps: ?First, a very robust, but possibly inefficient, estimate of the model parameters is computed. ?Second, this initial model is used to identify outliers, which are then removed from the sample. ?Third, a corrected maximum likelihood estimator is computed with the remaining observations. The final estimate inherits the breakdown point (bdp) of the initial one and its efficiency can be significantly higher. Analogous procedures were proposed in [1], [2], [5] for the continuous case. A comparison of the asymptotic bias of various estimates under point contamination points out the minimum Neyman's chi-squared disparity estimate as a good choice for the initial step. Various minimum disparity estimators were explored by Lindsay [4], who showed that the minimum Neyman's chi-squared estimate has a 50% bdp under point contamination; in addition, it is asymptotically fully efficient at the model. However, the finite sample efficiency of this estimate under the uncontaminated negative binomial model is usually much lower than 100% and the bias can be strong. We show that its performance can then be greatly improved using the three step procedure outlined above. In addition, we compare the final estimate with the procedure described in
Resumo:
Notre consommation en eau souterraine, en particulier comme eau potable ou pour l'irrigation, a considérablement augmenté au cours des années. De nombreux problèmes font alors leur apparition, allant de la prospection de nouvelles ressources à la remédiation des aquifères pollués. Indépendamment du problème hydrogéologique considéré, le principal défi reste la caractérisation des propriétés du sous-sol. Une approche stochastique est alors nécessaire afin de représenter cette incertitude en considérant de multiples scénarios géologiques et en générant un grand nombre de réalisations géostatistiques. Nous rencontrons alors la principale limitation de ces approches qui est le coût de calcul dû à la simulation des processus d'écoulements complexes pour chacune de ces réalisations. Dans la première partie de la thèse, ce problème est investigué dans le contexte de propagation de l'incertitude, oú un ensemble de réalisations est identifié comme représentant les propriétés du sous-sol. Afin de propager cette incertitude à la quantité d'intérêt tout en limitant le coût de calcul, les méthodes actuelles font appel à des modèles d'écoulement approximés. Cela permet l'identification d'un sous-ensemble de réalisations représentant la variabilité de l'ensemble initial. Le modèle complexe d'écoulement est alors évalué uniquement pour ce sousensemble, et, sur la base de ces réponses complexes, l'inférence est faite. Notre objectif est d'améliorer la performance de cette approche en utilisant toute l'information à disposition. Pour cela, le sous-ensemble de réponses approximées et exactes est utilisé afin de construire un modèle d'erreur, qui sert ensuite à corriger le reste des réponses approximées et prédire la réponse du modèle complexe. Cette méthode permet de maximiser l'utilisation de l'information à disposition sans augmentation perceptible du temps de calcul. La propagation de l'incertitude est alors plus précise et plus robuste. La stratégie explorée dans le premier chapitre consiste à apprendre d'un sous-ensemble de réalisations la relation entre les modèles d'écoulement approximé et complexe. Dans la seconde partie de la thèse, cette méthodologie est formalisée mathématiquement en introduisant un modèle de régression entre les réponses fonctionnelles. Comme ce problème est mal posé, il est nécessaire d'en réduire la dimensionnalité. Dans cette optique, l'innovation du travail présenté provient de l'utilisation de l'analyse en composantes principales fonctionnelles (ACPF), qui non seulement effectue la réduction de dimensionnalités tout en maximisant l'information retenue, mais permet aussi de diagnostiquer la qualité du modèle d'erreur dans cet espace fonctionnel. La méthodologie proposée est appliquée à un problème de pollution par une phase liquide nonaqueuse et les résultats obtenus montrent que le modèle d'erreur permet une forte réduction du temps de calcul tout en estimant correctement l'incertitude. De plus, pour chaque réponse approximée, une prédiction de la réponse complexe est fournie par le modèle d'erreur. Le concept de modèle d'erreur fonctionnel est donc pertinent pour la propagation de l'incertitude, mais aussi pour les problèmes d'inférence bayésienne. Les méthodes de Monte Carlo par chaîne de Markov (MCMC) sont les algorithmes les plus communément utilisés afin de générer des réalisations géostatistiques en accord avec les observations. Cependant, ces méthodes souffrent d'un taux d'acceptation très bas pour les problèmes de grande dimensionnalité, résultant en un grand nombre de simulations d'écoulement gaspillées. Une approche en deux temps, le "MCMC en deux étapes", a été introduite afin d'éviter les simulations du modèle complexe inutiles par une évaluation préliminaire de la réalisation. Dans la troisième partie de la thèse, le modèle d'écoulement approximé couplé à un modèle d'erreur sert d'évaluation préliminaire pour le "MCMC en deux étapes". Nous démontrons une augmentation du taux d'acceptation par un facteur de 1.5 à 3 en comparaison avec une implémentation classique de MCMC. Une question reste sans réponse : comment choisir la taille de l'ensemble d'entrainement et comment identifier les réalisations permettant d'optimiser la construction du modèle d'erreur. Cela requiert une stratégie itérative afin que, à chaque nouvelle simulation d'écoulement, le modèle d'erreur soit amélioré en incorporant les nouvelles informations. Ceci est développé dans la quatrième partie de la thèse, oú cette méthodologie est appliquée à un problème d'intrusion saline dans un aquifère côtier. -- Our consumption of groundwater, in particular as drinking water and for irrigation, has considerably increased over the years and groundwater is becoming an increasingly scarce and endangered resource. Nofadays, we are facing many problems ranging from water prospection to sustainable management and remediation of polluted aquifers. Independently of the hydrogeological problem, the main challenge remains dealing with the incomplete knofledge of the underground properties. Stochastic approaches have been developed to represent this uncertainty by considering multiple geological scenarios and generating a large number of realizations. The main limitation of this approach is the computational cost associated with performing complex of simulations in each realization. In the first part of the thesis, we explore this issue in the context of uncertainty propagation, where an ensemble of geostatistical realizations is identified as representative of the subsurface uncertainty. To propagate this lack of knofledge to the quantity of interest (e.g., the concentration of pollutant in extracted water), it is necessary to evaluate the of response of each realization. Due to computational constraints, state-of-the-art methods make use of approximate of simulation, to identify a subset of realizations that represents the variability of the ensemble. The complex and computationally heavy of model is then run for this subset based on which inference is made. Our objective is to increase the performance of this approach by using all of the available information and not solely the subset of exact responses. Two error models are proposed to correct the approximate responses follofing a machine learning approach. For the subset identified by a classical approach (here the distance kernel method) both the approximate and the exact responses are knofn. This information is used to construct an error model and correct the ensemble of approximate responses to predict the "expected" responses of the exact model. The proposed methodology makes use of all the available information without perceptible additional computational costs and leads to an increase in accuracy and robustness of the uncertainty propagation. The strategy explored in the first chapter consists in learning from a subset of realizations the relationship between proxy and exact curves. In the second part of this thesis, the strategy is formalized in a rigorous mathematical framework by defining a regression model between functions. As this problem is ill-posed, it is necessary to reduce its dimensionality. The novelty of the work comes from the use of functional principal component analysis (FPCA), which not only performs the dimensionality reduction while maximizing the retained information, but also allofs a diagnostic of the quality of the error model in the functional space. The proposed methodology is applied to a pollution problem by a non-aqueous phase-liquid. The error model allofs a strong reduction of the computational cost while providing a good estimate of the uncertainty. The individual correction of the proxy response by the error model leads to an excellent prediction of the exact response, opening the door to many applications. The concept of functional error model is useful not only in the context of uncertainty propagation, but also, and maybe even more so, to perform Bayesian inference. Monte Carlo Markov Chain (MCMC) algorithms are the most common choice to ensure that the generated realizations are sampled in accordance with the observations. Hofever, this approach suffers from lof acceptance rate in high dimensional problems, resulting in a large number of wasted of simulations. This led to the introduction of two-stage MCMC, where the computational cost is decreased by avoiding unnecessary simulation of the exact of thanks to a preliminary evaluation of the proposal. In the third part of the thesis, a proxy is coupled to an error model to provide an approximate response for the two-stage MCMC set-up. We demonstrate an increase in acceptance rate by a factor three with respect to one-stage MCMC results. An open question remains: hof do we choose the size of the learning set and identify the realizations to optimize the construction of the error model. This requires devising an iterative strategy to construct the error model, such that, as new of simulations are performed, the error model is iteratively improved by incorporating the new information. This is discussed in the fourth part of the thesis, in which we apply this methodology to a problem of saline intrusion in a coastal aquifer.
Resumo:
Abstract This work studies the multi-label classification of turns in simple English Wikipedia talk pages into dialog acts. The treated dataset was created and multi-labeled by (Ferschke et al., 2012). The first part analyses dependences between labels, in order to examine the annotation coherence and to determine a classification method. Then, a multi-label classification is computed, after transforming the problem into binary relevance. Regarding features, whereas (Ferschke et al., 2012) use features such as uni-, bi-, and trigrams, time distance between turns or the indentation level of the turn, other features are considered here: lemmas, part-of-speech tags and the meaning of verbs (according to WordNet). The dataset authors applied approaches such as Naive Bayes or Support Vector Machines. The present paper proposes, as an alternative, to use Schoenberg transformations which, following the example of kernel methods, transform original Euclidean distances into other Euclidean distances, in a space of high dimensionality. Résumé Ce travail étudie la classification supervisée multi-étiquette en actes de dialogue des tours de parole des contributeurs aux pages de discussion de Simple English Wikipedia (Wikipédia en anglais simple). Le jeu de données considéré a été créé et multi-étiqueté par (Ferschke et al., 2012). Une première partie analyse les relations entre les étiquettes pour examiner la cohérence des annotations et pour déterminer une méthode de classification. Ensuite, une classification supervisée multi-étiquette est effectuée, après recodage binaire des étiquettes. Concernant les variables, alors que (Ferschke et al., 2012) utilisent des caractéristiques telles que les uni-, bi- et trigrammes, le temps entre les tours de parole ou l'indentation d'un tour de parole, d'autres descripteurs sont considérés ici : les lemmes, les catégories morphosyntaxiques et le sens des verbes (selon WordNet). Les auteurs du jeu de données ont employé des approches telles que le Naive Bayes ou les Séparateurs à Vastes Marges (SVM) pour la classification. Cet article propose, de façon alternative, d'utiliser et d'étendre l'analyse discriminante linéaire aux transformations de Schoenberg qui, à l'instar des méthodes à noyau, transforment les distances euclidiennes originales en d'autres distances euclidiennes, dans un espace de haute dimensionnalité.
Resumo:
Aim: Emerging polyploids may depend on environmental niche shifts for successful establishment. Using the alpine plant Ranunculus kuepferi as a model system, we explore the niche shift hypothesis at different spatial resolutions and in contrasting parts of the species range. Location: European Alps. Methods: We sampled 12 individuals from each of 102 populations of R. kuepferi across the Alps, determined their ploidy levels, derived coarse-grain (100x100m) environmental descriptors for all sampling sites by downscaling WorldClim maps, and calculated fine-scale environmental descriptors (2x2m) from indicator values of the vegetation accompanying the sampled individuals. Both coarse and fine-scale variables were further computed for 8239 vegetation plots from across the Alps. Subsequently, we compared niche optima and breadths of diploid and tetraploid cytotypes by combining principal components analysis and kernel smoothing procedures. Comparisons were done separately for coarse and fine-grain data sets and for sympatric, allopatric and the total set of populations. Results: All comparisons indicate that the niches of the two cytotypes differ in optima and/or breadths, but results vary in important details. The whole-range analysis suggests differentiation along the temperature gradient to be most important. However, sympatric comparisons indicate that this climatic shift was not a direct response to competition with diploid ancestors. Moreover, fine-grained analyses demonstrate niche contraction of tetraploids, especially in the sympatric range, that goes undetected with coarse-grained data. Main conclusions: Although the niche optima of the two cytotypes differ, separation along ecological gradients was probably less decisive for polyploid establishment than a shift towards facultative apomixis, a particularly effective strategy to avoid minority cytotype exclusion. In addition, our results suggest that coarse-grained analyses overestimate niche breadths of widely distributed taxa. Niche comparison analyses should hence be conducted at environmental data resolutions appropriate for the organism and question under study.