938 resultados para Non-parametric regression methods
Resumo:
In the scope of the European project Hydroptimet, INTERREG IIIB-MEDOCC programme, limited area model (LAM) intercomparison of intense events that produced many damages to people and territory is performed. As the comparison is limited to single case studies, the work is not meant to provide a measure of the different models' skill, but to identify the key model factors useful to give a good forecast on such a kind of meteorological phenomena. This work focuses on the Spanish flash-flood event, also known as "Montserrat-2000" event. The study is performed using forecast data from seven operational LAMs, placed at partners' disposal via the Hydroptimet ftp site, and observed data from Catalonia rain gauge network. To improve the event analysis, satellite rainfall estimates have been also considered. For statistical evaluation of quantitative precipitation forecasts (QPFs), several non-parametric skill scores based on contingency tables have been used. Furthermore, for each model run it has been possible to identify Catalonia regions affected by misses and false alarms using contingency table elements. Moreover, the standard "eyeball" analysis of forecast and observed precipitation fields has been supported by the use of a state-of-the-art diagnostic method, the contiguous rain area (CRA) analysis. This method allows to quantify the spatial shift forecast error and to identify the error sources that affected each model forecasts. High-resolution modelling and domain size seem to have a key role for providing a skillful forecast. Further work is needed to support this statement, including verification using a wider observational data set.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
Purpose: Recent studies showed that pericardial fat was independently correlated with the development of coronary artery disease (CAD). The mechanism remains unclear. We aimed at assessing a possible relationship between pericardial fat volume and endothelium-dependent coronary vasomotion, a surrogate of future cardiovascular events.Methods: Fifty healthy volunteers without known CAD or cardiovascular risk factors (CRF) were enrolled. They all underwent a dynamic Rb- 82 cardiac PET/CT to quantify myocardial blood flow (MBF) at rest, during MBF response to cold pressure test (CPT-MBF) and adenosine stress. Pericardial fat volume (PFV) was measured using a 3D volumetric CT method and common biological CRF (glucose and insulin levels, HOMA-IR, cholesterol, triglyceride, hs-CRP). Relationships between MBF response to CPT, PFV and other CRF were assessed using non-parametric Spearman correlation and multivariate regression analysis of variables with significant correlation on univariate analysis (Stata 11.0).Results: All of the 50 participants had normal MBF response to adenosine (2.7±0.6 mL/min/g; 95%CI: 2.6−2.9) and myocardial flow reserve (2.8±0.8; 95%CI: 2.6−3.0) excluding underlying CAD. Simple regression analysis revealed a significant correlation between absolute CPTMBF and triglyceride level (rho = −0.32, p = 0.024) fasting blood insulin (rho = −0.43, p = 0.0024), HOMA-IR (rho = −0.39, p = 0.007) and PFV (rho = −0.52, p = 0.0001). MBF response to adenosine was only correlated with PFV (rho = −0.32, p = 0.026). On multivariate regression analysis PFV emerged as the only significant predictor of MBF response to CPT (p = 0.002).Conclusion: PFV is significantly correlated with endothelium-dependent coronary vasomotion. High PF burden might negatively influence MBF response to CPT, as well as to adenosine stress, even in persons with normal hyperemic myocardial perfusion imaging, suggesting a link between PF and future cardiovascular events. While outside-to-inside adipokines secretion through the arterial wall has been described, our results might suggest an effect upon NO-dependent and -independent vasodilatation. Further studies are needed to elucidate this mechanism.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Tämän tutkielman tavoitteena on selvittää Venäjän, Slovakian, Tsekin, Romanian, Bulgarian, Unkarin ja Puolan osakemarkkinoiden heikkojen ehtojen tehokkuutta. Tämä tutkielma on kvantitatiivinen tutkimus ja päiväkohtaiset indeksin sulkemisarvot kerättiin Datastreamin tietokannasta. Data kerättiin pörssien ensimmäisestä kaupankäyntipäivästä aina vuoden 2006 elokuun loppuun saakka. Analysoinnin tehostamiseksi dataa tutkittiin koko aineistolla, sekä kahdella aliperiodilla. Osakemarkkinoiden tehokkuutta on testattu neljällä tilastollisella metodilla, mukaan lukien autokorrelaatiotesti ja epäparametrinen runs-testi. Tavoitteena on myös selvittääesiintyykö kyseisillä markkinoilla viikonpäiväanomalia. Viikonpäiväanomalian esiintymistä tutkitaan käyttämällä pienimmän neliösumman menetelmää (OLS). Viikonpäiväanomalia on löydettävissä kaikilta edellä mainituilta osakemarkkinoilta paitsi Tsekin markkinoilta. Merkittävää, positiivista tai negatiivista autokorrelaatiota, on löydettävissä kaikilta osakemarkkinoilta, myös Ljung-Box testi osoittaa kaikkien markkinoiden tehottomuutta täydellä periodilla. Osakemarkkinoiden satunnaiskulku hylätään runs-testin perusteella kaikilta muilta paitsi Slovakian osakemarkkinoilla, ainakin tarkastellessa koko aineistoa tai ensimmäistä aliperiodia. Aineisto ei myöskään ole normaalijakautunut minkään indeksin tai aikajakson kohdalla. Nämä havainnot osoittavat, että kyseessä olevat markkinat eivät ole heikkojen ehtojen mukaan tehokkaita
Resumo:
Tämän tutkielman tavoitteena on tarkastella Kiinan osakemarkkinoiden tehokkuutta ja random walk -hypoteesin voimassaoloa. Tavoitteena on myös selvittää esiintyykö viikonpäiväanomalia Kiinan osakemarkkinoilla. Tutkimusaineistona käytetään Shanghain osakepörssin A-sarjan,B-sarjan ja yhdistelmä-sarjan ja Shenzhenin yhdistelmä-sarjan indeksien päivittäisiä logaritmisoituja tuottoja ajalta 21.2.1992-30.12.2005 sekä Shenzhenin osakepörssin A-sarjan ja B-sarjan indeksien päivittäisiä logaritmisoituja tuottoja ajalta 5.10.1992-30.12.2005. Tutkimusmenetelminä käytetään neljä tilastollista menetelmää, mukaan lukien autokorrelaatiotestiä, epäparametrista runs-testiä, varianssisuhdetestiä sekä Augmented Dickey-Fullerin yksikköjuuritestiä. Viikonpäiväanomalian esiintymistä tutkitaan käyttämällä pienimmän neliösumman menetelmää (OLS). Testejä tehdään sekä koko aineistolla että kolmella erillisellä ajanjaksolla. Tämän tutkielman empiiriset tulokset tukevat aikaisempia tutkimuksia Kiinan osakemarkkinoiden tehottomuudesta. Lukuun ottamatta yksikköjuuritestien saatuja tuloksia, autokorrelaatio-, runs- ja varianssisuhdetestien perusteella random walk-hypoteesi hylättiin molempien Kiinan osakemarkkinoiden kohdalla. Tutkimustulokset osoittavat, että molemmilla osakepörssillä B-sarjan indeksien käyttäytyminenon ollut huomattavasti enemmän random walk -hypoteesin vastainen kuin A-sarjan indeksit. Paitsi B-sarjan markkinat, molempien Kiinan osakemarkkinoiden tehokkuus näytti myös paranevan vuoden 2001 markkinabuumin jälkeen. Tutkimustulokset osoittavat myös viikonpäiväanomalian esiintyvän Shanghain osakepörssillä, muttei kuitenkaan Shenzhenin osakepörssillä koko tarkasteluajanjaksolla.
Resumo:
BACKGROUND: Hallux valgus is one of the most common forefoot problems in females. Studies have looked at gait alterations due to hallux valgus deformity, assessing temporal, kinematic or plantar pressure parameters individually. The present study, however, aims to assess all listed parameters at once and to isolate the most clinically relevant gait parameters for moderate to severe hallux valgus deformity with the intent of improving post-operative patient prognosis and rehabilitation. METHODS: The study included 26 feet with moderate to severe hallux valgus deformity and 30 feet with no sign of hallux valgus in female participants. Initially, weight bearing radiographs and foot and ankle clinical scores were assessed. Gait assessment was then performed utilizing pressure insoles (PEDAR®) and inertial sensors (Physilog®) and the two groups were compared using a non-parametric statistical hypothesis test (Wilcoxon rank sum, P<0.05). Furthermore, forward stepwise regression was used to reduce the number of gait parameters to the most clinically relevant and correlation of these parameters was assessed with the clinical score. FINDINGS: Overall, the results showed clear deterioration in several gait parameters in the hallux valgus group compared to controls and 9 gait parameters (effect size between 1.03 and 1.76) were successfully isolated to best describe the altered gait in hallux valgus deformity (r(2)=0.71) as well as showed good correlation with clinical scores. INTERPRETATION: Our results, and nine listed parameters, could serve as benchmark for characterization of hallux valgus and objective evaluation of treatment efficacy.
Resumo:
PURPOSE: Thoracic fat has been associated with an increased risk of coronary artery disease (CAD). As endothelium-dependent vasoreactivity is a surrogate of cardiovascular events and is impaired early in atherosclerosis, we aimed at assessing the possible relationship between thoracic fat volume (TFV) and endothelium-dependent coronary vasomotion. METHODS: Fifty healthy volunteers without known CAD or major cardiovascular risk factors (CRFs) prospectively underwent a (82)Rb cardiac PET/CT to quantify myocardial blood flow (MBF) at rest, and MBF response to cold pressor testing (CPT-MBF) and adenosine (i.e., stress-MBF). TFV was measured by a 2D volumetric CT method and common laboratory blood tests (glucose and insulin levels, HOMA-IR, cholesterol, triglyceride, hsCRP) were performed. Relationships between CPT-MBF, TFV and other CRFs were assessed using non-parametric Spearman rank correlation testing and multivariate linear regression analysis. RESULTS: All of the 50 participants (58 ± 10y) had normal stress-MBF (2.7 ± 0.6 mL/min/g; 95 % CI: 2.6-2.9) and myocardial flow reserve (2.8 ± 0.8; 95 % CI: 2.6-3.0) excluding underlying CAD. Univariate analysis revealed a significant inverse relation between absolute CPT-MBF and sex (ρ = -0.47, p = 0.0006), triglyceride (ρ = -0.32, p = 0.024) and insulin levels (ρ = -0.43, p = 0.0024), HOMA-IR (ρ = -0.39, p = 0.007), BMI (ρ = -0.51, p = 0.0002) and TFV (ρ = -0.52, p = 0.0001). MBF response to adenosine was also correlated with TFV (ρ = -0.32, p = 0.026). On multivariate analysis, TFV emerged as the only significant predictor of MBF response to CPT (p = 0.014). CONCLUSIONS: TFV is significantly correlated with endothelium-dependent and -independent coronary vasomotion. High TF burden might negatively influence MBF response to CPT and to adenosine stress, even in persons without CAD, suggesting a link between thoracic fat and future cardiovascular events.
Comparação de duas metodologias de amostragem atmosférica com ferramenta estatística não paramétrica
Resumo:
In atmospheric aerosol sampling, it is inevitable that the air that carries particles is in motion, as a result of both externally driven wind and the sucking action of the sampler itself. High or low air flow sampling speeds may lead to significant particle size bias. The objective of this work is the validation of measurements enabling the comparison of species concentration from both air flow sampling techniques. The presence of several outliers and increase of residuals with concentration becomes obvious, requiring non-parametric methods, recommended for the handling of data which may not be normally distributed. This way, conversion factors are obtained for each of the various species under study using Kendall regression.
Resumo:
The purpose of the thesis is to analyze whether the returns of general stock market indices of Estonia, Latvia and Lithuania follow the random walk hypothesis (RWH), and in addition, whether they are consistent with the weak-form efficiency criterion. Also the existence of the day-of-the-week anomaly is examined in the same regional markets. The data consists of daily closing quotes of the OMX Tallinn, Riga and Vilnius total return indices for the sample period from January 3, 2000 to August 28, 2009. Moreover, the full sample period is also divided into two sub-periods. The RWH is tested by applying three quantitative methods (i.e. the Augmented Dickey-Fuller unit root test, serial correlation test and non-parametric runs test). Ordinary Least Squares (OLS) regression with dummy variables is employed to detect the day-of-the-week anomalies. The random walk hypothesis (RWH) is rejected in the Estonian and Lithuanian stock markets. The Latvian stock market exhibits more efficient behaviour, although some evidence of inefficiency is also found, mostly during the first sub-period from 2000 to 2004. Day-of-the-week anomalies are detected on every stock market examined, though no longer during the later sub-period.
Resumo:
Ten common doubts of chemistry students and professionals about their statistical applications are discussed. The use of the N-1 denominator instead of N is described for the standard deviation. The statistical meaning of the denominators of the root mean square error of calibration (RMSEC) and root mean square error of validation (RMSEV) are given for researchers using multivariate calibration methods. The reason why scientists and engineers use the average instead of the median is explained. Several problematic aspects about regression and correlation are treated. The popular use of triplicate experiments in teaching and research laboratories is seen to have its origin in statistical confidence intervals. Nonparametric statistics and bootstrapping methods round out the discussion.
Resumo:
L'objectif principal de ce travail est d’étudier en profondeur certaines techniques biostatistiques avancées en recherche évaluative en chirurgie cardiaque adulte. Les études ont été conçues pour intégrer les concepts d'analyse de survie, analyse de régression avec “propensity score”, et analyse de coûts. Le premier manuscrit évalue la survie après la réparation chirurgicale de la dissection aigüe de l’aorte ascendante. Les analyses statistiques utilisées comprennent : analyses de survie avec régression paramétrique des phases de risque et d'autres méthodes paramétriques (exponentielle, Weibull), semi-paramétriques (Cox) ou non-paramétriques (Kaplan-Meier) ; survie comparée à une cohorte appariée pour l’âge, le sexe et la race utilisant des tables de statistiques de survie gouvernementales ; modèles de régression avec “bootstrapping” et “multinomial logit model”. L'étude a démontrée que la survie s'est améliorée sur 25 ans en lien avec des changements dans les techniques chirurgicales et d’imagerie diagnostique. Le second manuscrit est axé sur les résultats des pontages coronariens isolés chez des patients ayant des antécédents d'intervention coronarienne percutanée. Les analyses statistiques utilisées comprennent : modèles de régression avec “propensity score” ; algorithme complexe d'appariement (1:3) ; analyses statistiques appropriées pour les groupes appariés (différences standardisées, “generalized estimating equations”, modèle de Cox stratifié). L'étude a démontrée que l’intervention coronarienne percutanée subie 14 jours ou plus avant la chirurgie de pontages coronariens n'est pas associée à des résultats négatifs à court ou long terme. Le troisième manuscrit évalue les conséquences financières et les changements démographiques survenant pour un centre hospitalier universitaire suite à la mise en place d'un programme de chirurgie cardiaque satellite. Les analyses statistiques utilisées comprennent : modèles de régression multivariée “two-way” ANOVA (logistique, linéaire ou ordinale) ; “propensity score” ; analyses de coûts avec modèles paramétriques Log-Normal. Des modèles d’analyse de « survie » ont également été explorés, utilisant les «coûts» au lieu du « temps » comme variable dépendante, et ont menés à des conclusions similaires. L'étude a démontrée que, après la mise en place du programme satellite, moins de patients de faible complexité étaient référés de la région du programme satellite au centre hospitalier universitaire, avec une augmentation de la charge de travail infirmier et des coûts.
Resumo:
Study on variable stars is an important topic of modern astrophysics. After the invention of powerful telescopes and high resolving powered CCD’s, the variable star data is accumulating in the order of peta-bytes. The huge amount of data need lot of automated methods as well as human experts. This thesis is devoted to the data analysis on variable star’s astronomical time series data and hence belong to the inter-disciplinary topic, Astrostatistics. For an observer on earth, stars that have a change in apparent brightness over time are called variable stars. The variation in brightness may be regular (periodic), quasi periodic (semi-periodic) or irregular manner (aperiodic) and are caused by various reasons. In some cases, the variation is due to some internal thermo-nuclear processes, which are generally known as intrinsic vari- ables and in some other cases, it is due to some external processes, like eclipse or rotation, which are known as extrinsic variables. Intrinsic variables can be further grouped into pulsating variables, eruptive variables and flare stars. Extrinsic variables are grouped into eclipsing binary stars and chromospheri- cal stars. Pulsating variables can again classified into Cepheid, RR Lyrae, RV Tauri, Delta Scuti, Mira etc. The eruptive or cataclysmic variables are novae, supernovae, etc., which rarely occurs and are not periodic phenomena. Most of the other variations are periodic in nature. Variable stars can be observed through many ways such as photometry, spectrophotometry and spectroscopy. The sequence of photometric observa- xiv tions on variable stars produces time series data, which contains time, magni- tude and error. The plot between variable star’s apparent magnitude and time are known as light curve. If the time series data is folded on a period, the plot between apparent magnitude and phase is known as phased light curve. The unique shape of phased light curve is a characteristic of each type of variable star. One way to identify the type of variable star and to classify them is by visually looking at the phased light curve by an expert. For last several years, automated algorithms are used to classify a group of variable stars, with the help of computers. Research on variable stars can be divided into different stages like observa- tion, data reduction, data analysis, modeling and classification. The modeling on variable stars helps to determine the short-term and long-term behaviour and to construct theoretical models (for eg:- Wilson-Devinney model for eclips- ing binaries) and to derive stellar properties like mass, radius, luminosity, tem- perature, internal and external structure, chemical composition and evolution. The classification requires the determination of the basic parameters like pe- riod, amplitude and phase and also some other derived parameters. Out of these, period is the most important parameter since the wrong periods can lead to sparse light curves and misleading information. Time series analysis is a method of applying mathematical and statistical tests to data, to quantify the variation, understand the nature of time-varying phenomena, to gain physical understanding of the system and to predict future behavior of the system. Astronomical time series usually suffer from unevenly spaced time instants, varying error conditions and possibility of big gaps. This is due to daily varying daylight and the weather conditions for ground based observations and observations from space may suffer from the impact of cosmic ray particles. Many large scale astronomical surveys such as MACHO, OGLE, EROS, xv ROTSE, PLANET, Hipparcos, MISAO, NSVS, ASAS, Pan-STARRS, Ke- pler,ESA, Gaia, LSST, CRTS provide variable star’s time series data, even though their primary intention is not variable star observation. Center for Astrostatistics, Pennsylvania State University is established to help the astro- nomical community with the aid of statistical tools for harvesting and analysing archival data. Most of these surveys releases the data to the public for further analysis. There exist many period search algorithms through astronomical time se- ries analysis, which can be classified into parametric (assume some underlying distribution for data) and non-parametric (do not assume any statistical model like Gaussian etc.,) methods. Many of the parametric methods are based on variations of discrete Fourier transforms like Generalised Lomb-Scargle peri- odogram (GLSP) by Zechmeister(2009), Significant Spectrum (SigSpec) by Reegen(2007) etc. Non-parametric methods include Phase Dispersion Minimi- sation (PDM) by Stellingwerf(1978) and Cubic spline method by Akerlof(1994) etc. Even though most of the methods can be brought under automation, any of the method stated above could not fully recover the true periods. The wrong detection of period can be due to several reasons such as power leakage to other frequencies which is due to finite total interval, finite sampling interval and finite amount of data. Another problem is aliasing, which is due to the influence of regular sampling. Also spurious periods appear due to long gaps and power flow to harmonic frequencies is an inherent problem of Fourier methods. Hence obtaining the exact period of variable star from it’s time series data is still a difficult problem, in case of huge databases, when subjected to automation. As Matthew Templeton, AAVSO, states “Variable star data analysis is not always straightforward; large-scale, automated analysis design is non-trivial”. Derekas et al. 2007, Deb et.al. 2010 states “The processing of xvi huge amount of data in these databases is quite challenging, even when looking at seemingly small issues such as period determination and classification”. It will be beneficial for the variable star astronomical community, if basic parameters, such as period, amplitude and phase are obtained more accurately, when huge time series databases are subjected to automation. In the present thesis work, the theories of four popular period search methods are studied, the strength and weakness of these methods are evaluated by applying it on two survey databases and finally a modified form of cubic spline method is intro- duced to confirm the exact period of variable star. For the classification of new variable stars discovered and entering them in the “General Catalogue of Vari- able Stars” or other databases like “Variable Star Index“, the characteristics of the variability has to be quantified in term of variable star parameters.
Resumo:
This work is an assessment of frequency of extreme values (EVs) of daily rainfall in the city of Sao Paulo. Brazil, over the period 1933-2005, based on the peaks-over-threshold (POT) and Generalized Pareto Distribution (GPD) approach. Usually. a GPD model is fitted to a sample of POT Values Selected With a constant threshold. However. in this work we use time-dependent thresholds, composed of relatively large p quantities (for example p of 0.97) of daily rainfall amounts computed from all available data. Samples of POT values were extracted with several Values of p. Four different GPD models (GPD-1, GPD-2, GPD-3. and GDP-4) were fitted to each one of these samples by the maximum likelihood (ML) method. The shape parameter was assumed constant for the four models, but time-varying covariates were incorporated into scale parameter of GPD-2. GPD-3, and GPD-4, describing annual cycle in GPD-2. linear trend in GPD-3, and both annual cycle and linear trend in GPD-4. The GPD-1 with constant scale and shape parameters is the simplest model. For identification of the best model among the four models WC used rescaled Akaike Information Criterion (AIC) with second-order bias correction. This criterion isolates GPD-3 as the best model, i.e. the one with positive linear trend in the scale parameter. The slope of this trend is significant compared to the null hypothesis of no trend, for about 98% confidence level. The non-parametric Mann-Kendall test also showed presence of positive trend in the annual frequency of excess over high thresholds. with p-value being virtually zero. Therefore. there is strong evidence that high quantiles of daily rainfall in the city of Sao Paulo have been increasing in magnitude and frequency over time. For example. 0.99 quantiles of daily rainfall amount have increased by about 40 mm between 1933 and 2005. Copyright (C) 2008 Royal Meteorological Society