882 resultados para Least-squares support vector machine
Resumo:
Whereas numerical modeling using finite-element methods (FEM) can provide transient temperature distribution in the component with enough accuracy, it is of the most importance the development of compact dynamic thermal models that can be used for electrothermal simulation. While in most cases single power sources are considered, here we focus on the simultaneous presence of multiple sources. The thermal model will be in the form of a thermal impedance matrix containing the thermal impedance transfer functions between two arbitrary ports. Eachindividual transfer function element ( ) is obtained from the analysis of the thermal temperature transient at node ¿ ¿ after a power step at node ¿ .¿ Different options for multiexponential transient analysis are detailed and compared. Among the options explored, small thermal models can be obtained by constrained nonlinear least squares (NLSQ) methods if the order is selected properly using validation signals. The methods are applied to the extraction of dynamic compact thermal models for a new ultrathin chip stack technology (UTCS).
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
Objective: Health status measures usually have an asymmetric distribution and present a highpercentage of respondents with the best possible score (ceiling effect), specially when they areassessed in the overall population. Different methods to model this type of variables have beenproposed that take into account the ceiling effect: the tobit models, the Censored Least AbsoluteDeviations (CLAD) models or the two-part models, among others. The objective of this workwas to describe the tobit model, and compare it with the Ordinary Least Squares (OLS) model,that ignores the ceiling effect.Methods: Two different data sets have been used in order to compare both models: a) real datacomming from the European Study of Mental Disorders (ESEMeD), in order to model theEQ5D index, one of the measures of utilities most commonly used for the evaluation of healthstatus; and b) data obtained from simulation. Cross-validation was used to compare thepredicted values of the tobit model and the OLS models. The following estimators werecompared: the percentage of absolute error (R1), the percentage of squared error (R2), the MeanSquared Error (MSE) and the Mean Absolute Prediction Error (MAPE). Different datasets werecreated for different values of the error variance and different percentages of individuals withceiling effect. The estimations of the coefficients, the percentage of explained variance and theplots of residuals versus predicted values obtained under each model were compared.Results: With regard to the results of the ESEMeD study, the predicted values obtained with theOLS model and those obtained with the tobit models were very similar. The regressioncoefficients of the linear model were consistently smaller than those from the tobit model. In thesimulation study, we observed that when the error variance was small (s=1), the tobit modelpresented unbiased estimations of the coefficients and accurate predicted values, specially whenthe percentage of individuals wiht the highest possible score was small. However, when theerrror variance was greater (s=10 or s=20), the percentage of explained variance for the tobitmodel and the predicted values were more similar to those obtained with an OLS model.Conclusions: The proportion of variability accounted for the models and the percentage ofindividuals with the highest possible score have an important effect in the performance of thetobit model in comparison with the linear model.
Resumo:
A aplicação de técnicas espectroscópicas que utilizam a radiação infravermelha (NIRS-Near Infrared Spectroscopy e DRIFTS-Diffuse Reflectance Fourier Transformed Spectroscopy) na análise inorgânica do solo tem sido proposta desde a década de 1970, mas até os dias atuais são raros os métodos implementados rotineiramente no Brasil. Isso deve-se à dificuldade em construir modelos de calibração, por meio de métodos estatísticos multivariados, utilizando-se amostras reais de solo, de constituição complexa, que varia geograficamente e de acordo com o manejo. Por isso, os objetivos deste trabalho foram construir modelos de calibração em NIRS e DRIFTS para a quantificação das frações de argila e areia, em amostras de solos de classes diferentes - Latossolo Vermelho (predominante), Nitossolo, Argissolo Vermelho e Neossolo Quartzarênico - e avaliar qual dessas duas técnicas é mais adequada para essa finalidade, assim como a interferência do agrupamento de amostras e da seleção de variáveis espectrais na qualidade desses modelos. Para isso, valores de referência obtidos pelo método do densímetro, método largamente utilizado nos laboratórios de análise de solo, foram correlacionados com valores de absorbância em NIRS e DRIFTS pela ferramenta estatística PLS (Partial Least Squares), obtendo-se altos coeficientes de determinação (R²), de 0,95, 0,90 e 0,91 para argila, silte e areia, respectivamente, na validação externa. Isso confirma a aplicabilidade das técnicas espectroscópicas na análise granulométrica do solo para fins agrícolas. O agrupamento das amostras segundo a localização e a seleção de variáveis espectrais pouco influenciou na qualidade dos modelos. A técnica espectroscópica mais indicada para essa finalidade foi a DRIFTS.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
Intensity-modulated radiotherapy (IMRT) treatment plan verification by comparison with measured data requires having access to the linear accelerator and is time consuming. In this paper, we propose a method for monitor unit (MU) calculation and plan comparison for step and shoot IMRT based on the Monte Carlo code EGSnrc/BEAMnrc. The beamlets of an IMRT treatment plan are individually simulated using Monte Carlo and converted into absorbed dose to water per MU. The dose of the whole treatment can be expressed through a linear matrix equation of the MU and dose per MU of every beamlet. Due to the positivity of the absorbed dose and MU values, this equation is solved for the MU values using a non-negative least-squares fit optimization algorithm (NNLS). The Monte Carlo plan is formed by multiplying the Monte Carlo absorbed dose to water per MU with the Monte Carlo/NNLS MU. Several treatment plan localizations calculated with a commercial treatment planning system (TPS) are compared with the proposed method for validation. The Monte Carlo/NNLS MUs are close to the ones calculated by the TPS and lead to a treatment dose distribution which is clinically equivalent to the one calculated by the TPS. This procedure can be used as an IMRT QA and further development could allow this technique to be used for other radiotherapy techniques like tomotherapy or volumetric modulated arc therapy.
Resumo:
PURPOSE: To compare different techniques for positive contrast imaging of susceptibility markers with MRI for three-dimensional visualization. As several different techniques have been reported, the choice of the suitable method depends on its properties with regard to the amount of positive contrast and the desired background suppression, as well as other imaging constraints needed for a specific application. MATERIALS AND METHODS: Six different positive contrast techniques are investigated for their ability to image at 3 Tesla a single susceptibility marker in vitro. The white marker method (WM), susceptibility gradient mapping (SGM), inversion recovery with on-resonant water suppression (IRON), frequency selective excitation (FSX), fast low flip-angle positive contrast SSFP (FLAPS), and iterative decomposition of water and fat with echo asymmetry and least-squares estimation (IDEAL) were implemented and investigated. RESULTS: The different methods were compared with respect to the volume of positive contrast, the product of volume and signal intensity, imaging time, and the level of background suppression. Quantitative results are provided, and strengths and weaknesses of the different approaches are discussed. CONCLUSION: The appropriate choice of positive contrast imaging technique depends on the desired level of background suppression, acquisition speed, and robustness against artifacts, for which in vitro comparative data are now available.
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.
Resumo:
BACKGROUND: Evidence regarding the effectiveness of oral vitamin B12 in patients with serum vitamin B12 levels between 125-200 pM/l is lacking. We compared the effectiveness of one-month oral vitamin B12 supplementation in patients with a subtle vitamin B12 deficiency to that of a placebo. METHODS: This multicentre (13 general practices, two nursing homes, and one primary care center in western Switzerland), parallel, randomised, controlled, closed-label, observer-blind trial included 50 patients with serum vitamin B12 levels between 125-200 pM/l who were randomized to receive either oral vitamin B12 (1000 μg daily, N = 26) or placebo (N = 24) for four weeks. The institution's pharmacist used simple randomisation to generate a table and allocate treatments. The primary outcome was the change in serum methylmalonic acid (MMA) levels after one month of treatment. Secondary outcomes were changes in total homocysteine and serum vitamin B12 levels. Blood samples were centralised for analysis and adherence to treatment was verified by an electronic device (MEMS; Aardex Europe, Switzerland). Trial registration: ISRCTN 22063938. RESULTS: Baseline characteristics and adherence to treatment were similar in both groups. After one month, one patient in the placebo group was lost to follow-up. Data were evaluated by intention-to-treat analysis. One month of vitamin B12 treatment (N = 26) lowered serum MMA levels by 0.13 μmol/l (95%CI 0.06-0.19) more than the change observed in the placebo group (N = 23). The number of patients needed to treat to detect a metabolic response in MMA after one month was 2.6 (95% CI 1.7-6.4). A significant change was observed for the B12 serum level, but not for the homocysteine level, hematocrit, or mean corpuscular volume. After three months without active treatment (at four months), significant differences in MMA levels were no longer detected. CONCLUSIONS: Oral vitamin B12 treatment normalised the metabolic markers of vitamin B12 deficiency. However, a one-month daily treatment with 1000 μg oral vitamin B12 was not sufficient to normalise the deficiency markers for four months, and treatment had no effect on haematological signs of B12 deficiency.
Resumo:
Drainage-basin and channel-geometry multiple-regression equations are presented for estimating design-flood discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at stream sites on rural, unregulated streams in Iowa. Design-flood discharge estimates determined by Pearson Type-III analyses using data collected through the 1990 water year are reported for the 188 streamflow-gaging stations used in either the drainage-basin or channel-geometry regression analyses. Ordinary least-squares multiple-regression techniques were used to identify selected drainage-basin and channel-geometry regions. Weighted least-squares multiple-regression techniques, which account for differences in the variance of flows at different gaging stations and for variable lengths in station records, were used to estimate the regression parameters. Statewide drainage-basin equations were developed from analyses of 164 streamflow-gaging stations. Drainage-basin characteristics were quantified using a geographic-information-system (GIS) procedure to process topographic maps and digital cartographic data. The significant characteristics identified for the drainage-basin equations included contributing drainage area, relative relief, drainage frequency, and 2-year, 24-hour precipitation intensity. The average standard errors of prediction for the drainage-basin equations ranged from 38.6% to 50.2%. The GIS procedure expanded the capability to quantitatively relate drainage-basin characteristics to the magnitude and frequency of floods for stream sites in Iowa and provides a flood-estimation method that is independent of hydrologic regionalization. Statewide and regional channel-geometry regression equations were developed from analyses of 157 streamflow-gaging stations. Channel-geometry characteristics were measured on site and on topographic maps. Statewide and regional channel-geometry regression equations that are dependent on whether a stream has been channelized were developed on the basis of bankfull and active-channel characteristics. The significant channel-geometry characteristics identified for the statewide and regional regression equations included bankfull width and bankfull depth for natural channels unaffected by channelization, and active-channel width for stabilized channels affected by channelization. The average standard errors of prediction ranged from 41.0% to 68.4% for the statewide channel-geometry equations and from 30.3% to 70.0% for the regional channel-geometry equations. Procedures provided for applying the drainage-basin and channel-geometry regression equations depend on whether the design-flood discharge estimate is for a site on an ungaged stream, an ungaged site on a gaged stream, or a gaged site. When both a drainage-basin and a channel-geometry regression-equation estimate are available for a stream site, a procedure is presented for determining a weighted average of the two flood estimates.
Resumo:
Relaxation rates provide important information about tissue microstructure. Multi-parameter mapping (MPM) estimates multiple relaxation parameters from multi-echo FLASH acquisitions with different basic contrasts, i.e., proton density (PD), T1 or magnetization transfer (MT) weighting. Motion can particularly affect maps of the apparent transverse relaxation rate R2(*), which are derived from the signal of PD-weighted images acquired at different echo times. To address the motion artifacts, we introduce ESTATICS, which robustly estimates R2(*) from images even when acquired with different basic contrasts. ESTATICS extends the fitted signal model to account for inherent contrast differences in the PDw, T1w and MTw images. The fit was implemented as a conventional ordinary least squares optimization and as a robust fit with a small or large confidence interval. These three different implementations of ESTATICS were tested on data affected by severe motion artifacts and data with no prominent motion artifacts as determined by visual assessment or fast optical motion tracking. ESTATICS improved the quality of the R2(*) maps and reduced the coefficient of variation for both types of data-with average reductions of 30% when severe motion artifacts were present. ESTATICS can be applied to any protocol comprised of multiple 2D/3D multi-echo FLASH acquisitions as used in the general research and clinical setting.
Resumo:
Due to the advances in sensor networks and remote sensing technologies, the acquisition and storage rates of meteorological and climatological data increases every day and ask for novel and efficient processing algorithms. A fundamental problem of data analysis and modeling is the spatial prediction of meteorological variables in complex orography, which serves among others to extended climatological analyses, for the assimilation of data into numerical weather prediction models, for preparing inputs to hydrological models and for real time monitoring and short-term forecasting of weather.In this thesis, a new framework for spatial estimation is proposed by taking advantage of a class of algorithms emerging from the statistical learning theory. Nonparametric kernel-based methods for nonlinear data classification, regression and target detection, known as support vector machines (SVM), are adapted for mapping of meteorological variables in complex orography.With the advent of high resolution digital elevation models, the field of spatial prediction met new horizons. In fact, by exploiting image processing tools along with physical heuristics, an incredible number of terrain features which account for the topographic conditions at multiple spatial scales can be extracted. Such features are highly relevant for the mapping of meteorological variables because they control a considerable part of the spatial variability of meteorological fields in the complex Alpine orography. For instance, patterns of orographic rainfall, wind speed and cold air pools are known to be correlated with particular terrain forms, e.g. convex/concave surfaces and upwind sides of mountain slopes.Kernel-based methods are employed to learn the nonlinear statistical dependence which links the multidimensional space of geographical and topographic explanatory variables to the variable of interest, that is the wind speed as measured at the weather stations or the occurrence of orographic rainfall patterns as extracted from sequences of radar images. Compared to low dimensional models integrating only the geographical coordinates, the proposed framework opens a way to regionalize meteorological variables which are multidimensional in nature and rarely show spatial auto-correlation in the original space making the use of classical geostatistics tangled.The challenges which are explored during the thesis are manifolds. First, the complexity of models is optimized to impose appropriate smoothness properties and reduce the impact of noisy measurements. Secondly, a multiple kernel extension of SVM is considered to select the multiscale features which explain most of the spatial variability of wind speed. Then, SVM target detection methods are implemented to describe the orographic conditions which cause persistent and stationary rainfall patterns. Finally, the optimal splitting of the data is studied to estimate realistic performances and confidence intervals characterizing the uncertainty of predictions.The resulting maps of average wind speeds find applications within renewable resources assessment and opens a route to decrease the temporal scale of analysis to meet hydrological requirements. Furthermore, the maps depicting the susceptibility to orographic rainfall enhancement can be used to improve current radar-based quantitative precipitation estimation and forecasting systems and to generate stochastic ensembles of precipitation fields conditioned upon the orography.