4 resultados para outliers

em Universidade Complutense de Madrid


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider a robust version of the classical Wald test statistics for testing simple and composite null hypotheses for general parametric models. These test statistics are based on the minimum density power divergence estimators instead of the maximum likelihood estimators. An extensive study of their robustness properties is given though the influence functions as well as the chi-square inflation factors. It is theoretically established that the level and power of these robust tests are stable against outliers, whereas the classical Wald test breaks down. Some numerical examples confirm the validity of the theoretical results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

El análisis de datos actual se enfrenta a problemas derivados de la combinación de datos procedentes de diversas fuentes de información. El valor de la información puede enriquecerse enormemente facilitando la integración de nuevas fuentes de datos y la industria es muy consciente de ello en la actualidad. Sin embargo, no solo el volumen sino también la gran diversidad de los datos constituye un problema previo al análisis. Una buena integración de los datos garantiza unos resultados fiables y por ello merece la pena detenerse en la mejora de procesos de especificación, recolección, limpieza e integración de los datos. Este trabajo está dedicado a la fase de limpieza e integración de datos analizando los procedimientos existentes y proponiendo una solución que se aplica a datos médicos, centrándose así en los proyectos de predicción (con finalidad de prevención) en ciencias de la salud. Además de la implementación de los procesos de limpieza, se desarrollan algoritmos de detección de outliers que permiten mejorar la calidad del conjunto de datos tras su eliminación. El trabajo también incluye la implementación de un proceso de predicción que sirva de ayuda a la toma de decisiones. Concretamente este trabajo realiza un análisis predictivo de los datos de pacientes drogodependientes de la Clínica Nuestra Señora de la Paz, con la finalidad de poder brindar un apoyo en la toma de decisiones del médico a cargo de admitir el internamiento de pacientes en dicha clínica. En la mayoría de los casos el estudio de los datos facilitados requiere un pre-procesado adecuado para que los resultados de los análisis estadísticos tradicionales sean fiables. En tal sentido en este trabajo se implementan varias formas de detectar los outliers: un algoritmo propio (Detección de Outliers con Cadenas No Monótonas), que utiliza las ventajas del algoritmo Knuth-Morris-Pratt para reconocimiento de patrones, y las librerías outliers y Rcmdr de R. La aplicación de procedimientos de cleaning e integración de datos, así como de eliminación de datos atípicos proporciona una base de datos limpia y fiable sobre la que se implementarán procedimientos de predicción de los datos con el algoritmo de clasificación Naive Bayes en R.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We measured the distribution in absolute magnitude - circular velocity space for a well-defined sample of 199 rotating galaxies of the Calar Alto Legacy Integral Field Area Survey (CALIFA) using their stellar kinematics. Our aim in this analysis is to avoid subjective selection criteria and to take volume and large-scale structure factors into account. Using stellar velocity fields instead of gas emission line kinematics allows including rapidly rotating early-type galaxies. Our initial sample contains 277 galaxies with available stellar velocity fields and growth curve r-band photometry. After rejecting 51 velocity fields that could not be modelled because of the low number of bins, foreground contamination, or significant interaction, we performed Markov chain Monte Carlo modelling of the velocity fields, from which we obtained the rotation curve and kinematic parameters and their realistic uncertainties. We performed an extinction correction and calculated the circular velocity v_circ accounting for the pressure support of a given galaxy. The resulting galaxy distribution on the M-r - v(circ) plane was then modelled as a mixture of two distinct populations, allowing robust and reproducible rejection of outliers, a significant fraction of which are slow rotators. The selection effects are understood well enough that we were able to correct for the incompleteness of the sample. The 199 galaxies were weighted by volume and large-scale structure factors, which enabled us to fit a volume-corrected Tully-Fisher relation (TFR). More importantly, we also provide the volume-corrected distribution of galaxies in the M_r - v_circ plane, which can be compared with cosmological simulations. The joint distribution of the luminosity and circular velocity space densities, representative over the range of -20 > M_r > -22 mag, can place more stringent constraints on the galaxy formation and evolution scenarios than linear TFR fit parameters or the luminosity function alone.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

¿What have we learnt from the 2006-2012 crisis, including events such as the subprime crisis, the bankruptcy of Lehman Brothers or the European sovereign debt crisis, among others? It is usually assumed that in firms that have a CDS quotation, this CDS is the key factor in establishing the credit premiumrisk for a new financial asset. Thus, the CDS is a key element for any investor in taking relative value opportunities across a firm’s capital structure. In the first chapter we study the most relevant aspects of the microstructure of the CDS market in terms of pricing, to have a clear idea of how this market works. We consider that such an analysis is a necessary point for establishing a solid base for the rest of the chapters in order to carry out the different empirical studies we perform. In its document “Basel III: A global regulatory framework for more resilient banks and banking systems”, Basel sets the requirement of a capital charge for credit valuation adjustment (CVA) risk in the trading book and its methodology for the computation for the capital requirement. This regulatory requirement has added extra pressure for in-depth knowledge of the CDS market and this motivates the analysis performed in this thesis. The problem arises in estimating of the credit risk premium for those counterparties without a directly quoted CDS in the market. How can we estimate the credit spread for an issuer without CDS? In addition to this, given the high volatility period in the credit market in the last few years and, in particular, after the default of Lehman Brothers on 15 September 2008, we observe the presence of big outliers in the distribution of credit spread in the different combinations of rating, industry and region. After an exhaustive analysis of the results from the different models studied, we have reached the following conclusions. It is clear that hierarchical regression models fit the data much better than those of non-hierarchical regression. Furthermore,we generally prefer the median model (50%-quantile regression) to the mean model (standard OLS regression) due to its robustness when assigning the price to a new credit asset without spread,minimizing the “inversion problem”. Finally, an additional fundamental reason to prefer the median model is the typical "right skewness" distribution of CDS spreads...