8 resultados para Robust scatter matrices

em Helda - Digital Repository of University of Helsinki


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reorganizing a dataset so that its hidden structure can be observed is useful in any data analysis task. For example, detecting a regularity in a dataset helps us to interpret the data, compress the data, and explain the processes behind the data. We study datasets that come in the form of binary matrices (tables with 0s and 1s). Our goal is to develop automatic methods that bring out certain patterns by permuting the rows and columns. We concentrate on the following patterns in binary matrices: consecutive-ones (C1P), simultaneous consecutive-ones (SC1P), nestedness, k-nestedness, and bandedness. These patterns reflect specific types of interplay and variation between the rows and columns, such as continuity and hierarchies. Furthermore, their combinatorial properties are interlinked, which helps us to develop the theory of binary matrices and efficient algorithms. Indeed, we can detect all these patterns in a binary matrix efficiently, that is, in polynomial time in the size of the matrix. Since real-world datasets often contain noise and errors, we rarely witness perfect patterns. Therefore we also need to assess how far an input matrix is from a pattern: we count the number of flips (from 0s to 1s or vice versa) needed to bring out the perfect pattern in the matrix. Unfortunately, for most patterns it is an NP-complete problem to find the minimum distance to a matrix that has the perfect pattern, which means that the existence of a polynomial-time algorithm is unlikely. To find patterns in datasets with noise, we need methods that are noise-tolerant and work in practical time with large datasets. The theory of binary matrices gives rise to robust heuristics that have good performance with synthetic data and discover easily interpretable structures in real-world datasets: dialectical variation in the spoken Finnish language, division of European locations by the hierarchies found in mammal occurrences, and co-occuring groups in network data. In addition to determining the distance from a dataset to a pattern, we need to determine whether the pattern is significant or a mere occurrence of a random chance. To this end, we use significance testing: we deem a dataset significant if it appears exceptional when compared to datasets generated from a certain null hypothesis. After detecting a significant pattern in a dataset, it is up to domain experts to interpret the results in the terms of the application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this study, a quality assessment method based on sampling of primary laser inventory units (microsegments) was analysed. The accuracy of a laser inventory carried out in Kuhmo was analysed as a case study. Field sample plots were measured on the sampled microsegments in the Kuhmo inventory area. Two main questions were considered. Did the ALS based inventory meet the accuracy requirements set for the provider and how should a reliable, cost-efficient and independent quality assessment be undertaken. The agreement between control measurement and ALS based inventory was analysed in four ways: 1) The root mean squared errors (RMSEs) and bias were calculated. 2) Scatter plots with 95% confidence intervals were plotted and the placing of identity lines was checked. 3) Bland-Altman plots were drawn so that the mean difference of attributes between the control method and ALS-method was calculated and plotted against average value of attributes. 4) The tolerance limits were defined and combined with Bland-Altman plots. The RMSE values were compared to a reference study from which the accuracy requirements had been set to the service provider. The accuracy requirements in Kuhmo were achieved, however comparison of RMSE values proved to be difficult. Field control measurements are costly and time-consuming, but they are considered to be robust. However, control measurements might include errors, which are difficult to take into account. Using the Bland-Altman plots none of the compared methods are considered to be completely exact, so this offers a fair way to interpret results of assessment. The tolerance limits to be set on order combined with Bland-Altman plots were suggested to be taken in practise. In addition, bias should be calculated for total area. Some other approaches for quality control were briefly examined. No method was found to fulfil all the required demands of statistical reliability, cost-efficiency, time efficiency, simplicity and speed of implementation. Some benefits and shortcomings of the studied methods were discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Accurate and stable time series of geodetic parameters can be used to help in understanding the dynamic Earth and its response to global change. The Global Positioning System, GPS, has proven to be invaluable in modern geodynamic studies. In Fennoscandia the first GPS networks were set up in 1993. These networks form the basis of the national reference frames in the area, but they also provide long and important time series for crustal deformation studies. These time series can be used, for example, to better constrain the ice history of the last ice age and the Earth s structure, via existing glacial isostatic adjustment models. To improve the accuracy and stability of the GPS time series, the possible nuisance parameters and error sources need to be minimized. We have analysed GPS time series to study two phenomena. First, we study the refraction in the neutral atmosphere of the GPS signal, and, second, we study the surface loading of the crust by environmental factors, namely the non-tidal Baltic Sea, atmospheric load and varying continental water reservoirs. We studied the atmospheric effects on the GPS time series by comparing the standard method to slant delays derived from a regional numerical weather model. We have presented a method for correcting the atmospheric delays at the observational level. The results show that both standard atmosphere modelling and the atmospheric delays derived from a numerical weather model by ray-tracing provide a stable solution. The advantage of the latter is that the number of unknowns used in the computation decreases and thus, the computation may become faster and more robust. The computation can also be done with any processing software that allows the atmospheric correction to be turned off. The crustal deformation due to loading was computed by convolving Green s functions with surface load data, that is to say, global hydrology models, global numerical weather models and a local model for the Baltic Sea. The result was that the loading factors can be seen in the GPS coordinate time series. Reducing the computed deformation from the vertical time series of GPS coordinates reduces the scatter of the time series; however, the long term trends are not influenced. We show that global hydrology models and the local sea surface can explain up to 30% of the GPS time series variation. On the other hand atmospheric loading admittance in the GPS time series is low, and different hydrological surface load models could not be validated in the present study. In order to be used for GPS corrections in the future, both atmospheric loading and hydrological models need further analysis and improvements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Time-dependent backgrounds in string theory provide a natural testing ground for physics concerning dynamical phenomena which cannot be reliably addressed in usual quantum field theories and cosmology. A good, tractable example to study is the rolling tachyon background, which describes the decay of an unstable brane in bosonic and supersymmetric Type II string theories. In this thesis I use boundary conformal field theory along with random matrix theory and Coulomb gas thermodynamics techniques to study open and closed string scattering amplitudes off the decaying brane. The calculation of the simplest example, the tree-level amplitude of n open strings, would give us the emission rate of the open strings. However, even this has been unknown. I will organize the open string scattering computations in a more coherent manner and will argue how to make further progress.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article we introduce and evaluate testing procedures for specifying the number k of nearest neighbours in the weights matrix of spatial econometric models. The spatial J-test is used for specification search. Two testing procedures are suggested: an increasing neighbours testing procedure and a decreasing neighbours testing procedure. Simulations show that the increasing neighbours testing procedures can be used in large samples to determine k. The decreasing neighbours testing procedure is found to have low power, and is not recommended for use in practice. An empirical example involving house price data is provided to show how to use the testing procedures with real data.