89 resultados para statistical distance
Resumo:
For single-user MIMO communication with uncoded and coded QAM signals, we propose bit and power loading schemes that rely only on channel distribution information at the transmitter. To that end, we develop the relationship between the average bit error probability at the output of a ZF linear receiver and the bit rates and powers allocated at the transmitter. This relationship, and the fact that a ZF receiver decouples the MIMO parallel channels, allow leveraging bit loading algorithms already existing in the literature. We solve dual bit rate maximization and power minimization problems and present performance resultsthat illustrate the gains of the proposed scheme with respect toa non-optimized transmission.
Resumo:
In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.
Resumo:
This paper presents a webservice architecture for Statistical Machine Translation aimed at non-technical users. A workfloweditor allows a user to combine different webservices using a graphical user interface. In the current state of this project,the webservices have been implemented for a range of sentential and sub-sententialaligners. The advantage of a common interface and a common data format allows the user to build workflows exchanging different aligners.
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Centralnotations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform.In this way very elaborated aspects of mathematical statistics can be understoodeasily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating,combination of likelihood and robust M-estimation functions are simple additions/perturbations in A2(Pprior). Weighting observations corresponds to a weightedaddition of the corresponding evidence.Likelihood based statistics for general exponential families turns out to have aparticularly easy interpretation in terms of A2(P). Regular exponential families formfinite dimensional linear subspaces of A2(P) and they correspond to finite dimensionalsubspaces formed by their posterior in the dual information space A2(Pprior).The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P.The discussion of A2(P) valued random variables, such as estimation functionsor likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
Correspondence analysis, when used to visualize relationships in a table of counts(for example, abundance data in ecology), has been frequently criticized as being too sensitiveto objects (for example, species) that occur with very low frequency or in very few samples. Inthis statistical report we show that this criticism is generally unfounded. We demonstrate this inseveral data sets by calculating the actual contributions of rare objects to the results ofcorrespondence analysis and canonical correspondence analysis, both to the determination ofthe principal axes and to the chi-square distance. It is a fact that rare objects are oftenpositioned as outliers in correspondence analysis maps, which gives the impression that theyare highly influential, but their low weight offsets their distant positions and reduces their effecton the results. An alternative scaling of the correspondence analysis solution, the contributionbiplot, is proposed as a way of mapping the results in order to avoid the problem of outlying andlow contributing rare objects.
Resumo:
The well-known lack of power of unit root tests has often been attributed to the shortlength of macroeconomic variables and also to DGP s that depart from the I(1)-I(0)alternatives. This paper shows that by using long spans of annual real GNP and GNPper capita (133 years) high power can be achieved, leading to the rejection of both theunit root and the trend-stationary hypothesis. This suggests that possibly neither modelprovides a good characterization of these data. Next, more flexible representations areconsidered, namely, processes containing structural breaks (SB) and fractional ordersof integration (FI). Economic justification for the presence of these features in GNP isprovided. It is shown that the latter models (FI and SB) are in general preferred to theARIMA (I(1) or I(0)) ones. As a novelty in this literature, new techniques are appliedto discriminate between FI and SB models. It turns out that the FI specification ispreferred, implying that GNP and GNP per capita are non-stationary, highly persistentbut mean-reverting series. Finally, it is shown that the results are robust when breaksin the deterministic component are allowed for in the FI model. Some macroeconomicimplications of these findings are also discussed.
Resumo:
A new parametric minimum distance time-domain estimator for ARFIMA processes is introduced in this paper. The proposed estimator minimizes the sum of squared correlations of residuals obtained after filtering a series through ARFIMA parameters. The estimator iseasy to compute and is consistent and asymptotically normally distributed for fractionallyintegrated (FI) processes with an integration order d strictly greater than -0.75. Therefore, it can be applied to both stationary and non-stationary processes. Deterministic components are also allowed in the DGP. Furthermore, as a by-product, the estimation procedure provides an immediate check on the adequacy of the specified model. This is so because the criterion function, when evaluated at the estimated values, coincides with the Box-Pierce goodness of fit statistic. Empirical applications and Monte-Carlo simulations supporting the analytical results and showing the good performance of the estimator in finite samples are also provided.
Resumo:
Although correspondence analysis is now widely available in statistical software packages and applied in a variety of contexts, notably the social and environmental sciences, there are still some misconceptions about this method as well as unresolved issues which remain controversial to this day. In this paper we hope to settle these matters, namely (i) the way CA measures variance in a two-way table and how to compare variances between tables of different sizes, (ii) the influence, or rather lack of influence, of outliers in the usual CA maps, (iii) the scaling issue and the biplot interpretation of maps,(iv) whether or not to rotate a solution, and (v) statistical significance of results.
Resumo:
The set covering problem is an NP-hard combinatorial optimization problemthat arises in applications ranging from crew scheduling in airlines todriver scheduling in public mass transport. In this paper we analyze searchspace characteristics of a widely used set of benchmark instances throughan analysis of the fitness-distance correlation. This analysis shows thatthere exist several classes of set covering instances that have a largelydifferent behavior. For instances with high fitness distance correlation,we propose new ways of generating core problems and analyze the performanceof algorithms exploiting these core problems.
Resumo:
This paper exploits an unusual transportation setting to estimate the value of a statistical life(VSL). We estimate the trade-offs individuals are willing to make between mortality risk andcost as they travel to and from the international airport in Sierra Leone (which is separated fromthe capital Freetown by a body of water). Travelers choose from among multiple transportoptions ? namely, ferry, helicopter, hovercraft, and water taxi. The setting and original datasetallow us to address some typical omitted variable concerns in order to generate some of the firstrevealed preference VSL estimates from Africa. The data also allows us to compare VSLestimates for travelers from 56 countries, including 20 African and 36 non-African countries, allfacing the same choice situation. The average VSL estimate for African travelers in the sample isUS$577,000 compared to US$924,000 for non-Africans. Individual characteristics, particularlyjob earnings, can largely account for the difference between Africans and non-Africans; Africansin the sample typically earn somewhat less. There is little evidence that individual VSL estimatesare driven by a lack of information, predicted life expectancy, or cultural norms around risktakingor fatalism. The data implies an income elasticity of the VSL of 1.77. These revealedpreference VSL estimates from a developing country fill an important gap in the existingliterature, and can be used for a variety of public policy purposes, including in current debateswithin Sierra Leone regarding the desirability of constructing new transportation infrastructure.
Resumo:
The work presented evaluates the statistical characteristics of regional bias and expected error in reconstructions of real positron emission tomography (PET) data of human brain fluoro-deoxiglucose (FDG) studies carried out by the maximum likelihood estimator (MLE) method with a robust stopping rule, and compares them with the results of filtered backprojection (FBP) reconstructions and with the method of sieves. The task of evaluating radioisotope uptake in regions-of-interest (ROIs) is investigated. An assessment of bias and variance in uptake measurements is carried out with simulated data. Then, by using three different transition matrices with different degrees of accuracy and a components of variance model for statistical analysis, it is shown that the characteristics obtained from real human FDG brain data are consistent with the results of the simulation studies.
Resumo:
In the scope of the European project Hydroptimet, INTERREG IIIB-MEDOCC programme, limited area model (LAM) intercomparison of intense events that produced many damages to people and territory is performed. As the comparison is limited to single case studies, the work is not meant to provide a measure of the different models' skill, but to identify the key model factors useful to give a good forecast on such a kind of meteorological phenomena. This work focuses on the Spanish flash-flood event, also known as "Montserrat-2000" event. The study is performed using forecast data from seven operational LAMs, placed at partners' disposal via the Hydroptimet ftp site, and observed data from Catalonia rain gauge network. To improve the event analysis, satellite rainfall estimates have been also considered. For statistical evaluation of quantitative precipitation forecasts (QPFs), several non-parametric skill scores based on contingency tables have been used. Furthermore, for each model run it has been possible to identify Catalonia regions affected by misses and false alarms using contingency table elements. Moreover, the standard "eyeball" analysis of forecast and observed precipitation fields has been supported by the use of a state-of-the-art diagnostic method, the contiguous rain area (CRA) analysis. This method allows to quantify the spatial shift forecast error and to identify the error sources that affected each model forecasts. High-resolution modelling and domain size seem to have a key role for providing a skillful forecast. Further work is needed to support this statement, including verification using a wider observational data set.
Resumo:
Ground clutter caused by anomalous propagation (anaprop) can affect seriously radar rain rate estimates, particularly in fully automatic radar processing systems, and, if not filtered, can produce frequent false alarms. A statistical study of anomalous propagation detected from two operational C-band radars in the northern Italian region of Emilia Romagna is discussed, paying particular attention to its diurnal and seasonal variability. The analysis shows a high incidence of anaprop in summer, mainly in the morning and evening, due to the humid and hot summer climate of the Po Valley, particularly in the coastal zone. Thereafter, a comparison between different techniques and datasets to retrieve the vertical profile of the refractive index gradient in the boundary layer is also presented. In particular, their capability to detect anomalous propagation conditions is compared. Furthermore, beam path trajectories are simulated using a multilayer ray-tracing model and the influence of the propagation conditions on the beam trajectory and shape is examined. High resolution radiosounding data are identified as the best available dataset to reproduce accurately the local propagation conditions, while lower resolution standard TEMP data suffers from interpolation degradation and Numerical Weather Prediction model data (Lokal Model) are able to retrieve a tendency to superrefraction but not to detect ducting conditions. Observing the ray tracing of the centre, lower and upper limits of the radar antenna 3-dB half-power main beam lobe it is concluded that ducting layers produce a change in the measured volume and in the power distribution that can lead to an additional error in the reflectivity estimate and, subsequently, in the estimated rainfall rate.
Resumo:
[spa] El índice del máximo y el mínimo nivel es una técnica muy útil, especialmente para toma de decisiones, que usa la distancia de Hamming y el coeficiente de adecuación en el mismo problema. En este trabajo, se propone una generalización a través de utilizar medias generalizadas y cuasi aritméticas. A estos operadores de agregación, se les denominará el índice del máximo y el mínimo nivel medio ponderado ordenado generalizado (GOWAIMAM) y cuasi aritmético (Quasi-OWAIMAM). Estos nuevos operadores generalizan una amplia gama de casos particulares como el índice del máximo y el mínimo nivel generalizado (GIMAM), el OWAIMAM, y otros. También se desarrolla una aplicación en la toma de decisiones sobre selección de productos.