993 resultados para Statistical Error
Resumo:
When the data are counts or the frequencies of particular events and can be expressed as a contingency table, then they can be analysed using the chi-square distribution. When applied to a 2 x 2 table, the test is approximate and care needs to be taken in analysing tables when the expected frequencies are small either by applying Yate’s correction or by using Fisher’s exact test. Larger contingency tables can also be analysed using this method. Note that it is a serious statistical error to use any of these tests on measurement data!
Resumo:
We provide a new multivariate calibration-function based on South Atlantic modern assemblages of planktonic foraminifera and atlas water column parameters from the Antarctic Circumpolar Current to the Subtropical Gyre and tropical warm waters (i.e., 60°S to 0°S). Therefore, we used a dataset with the abundance pattern of 35 taxonomic groups of planktonic foraminifera in 141 surface sediment samples. Five factors were taken into consideration for the analysis, which account for 93% of the total variance of the original data representing the regional main oceanographic fronts. The new calibration-function F141-35-5 enables the reconstruction of Late Quaternary summer and winter sea-surface temperatures with a statistical error of ~0.5°C. Our function was verified by its application to a sediment core extracted from the western South Atlantic. The downcore reconstruction shows negative anomalies in sea-surface temperatures during the early-mid Holocene and temperatures within the range of modern values during the late Holocene. This pattern is consistent with available reconstructions.
Resumo:
In this work, we introduce a new class of numerical schemes for rarefied gas dynamic problems described by collisional kinetic equations. The idea consists in reformulating the problem using a micro-macro decomposition and successively in solving the microscopic part by using asymptotic preserving Monte Carlo methods. We consider two types of decompositions, the first leading to the Euler system of gas dynamics while the second to the Navier-Stokes equations for the macroscopic part. In addition, the particle method which solves the microscopic part is designed in such a way that the global scheme becomes computationally less expensive as the solution approaches the equilibrium state as opposite to standard methods for kinetic equations which computational cost increases with the number of interactions. At the same time, the statistical error due to the particle part of the solution decreases as the system approach the equilibrium state. This causes the method to degenerate to the sole solution of the macroscopic hydrodynamic equations (Euler or Navier-Stokes) in the limit of infinite number of collisions. In a last part, we will show the behaviors of this new approach in comparisons to standard Monte Carlo techniques for solving the kinetic equation by testing it on different problems which typically arise in rarefied gas dynamic simulations.
Resumo:
The uncertainties in the determination of the stratigraphic profile of natural soils is one of the main problems in geotechnics, in particular for landslide characterization and modeling. The study deals with a new approach in geotechnical modeling which relays on a stochastic generation of different soil layers distributions, following a boolean logic – the method has been thus called BoSG (Boolean Stochastic Generation). In this way, it is possible to randomize the presence of a specific material interdigitated in a uniform matrix. In the building of a geotechnical model it is generally common to discard some stratigraphic data in order to simplify the model itself, assuming that the significance of the results of the modeling procedure would not be affected. With the proposed technique it is possible to quantify the error associated with this simplification. Moreover, it could be used to determine the most significant zones where eventual further investigations and surveys would be more effective to build the geotechnical model of the slope. The commercial software FLAC was used for the 2D and 3D geotechnical model. The distribution of the materials was randomized through a specifically coded MatLab program that automatically generates text files, each of them representing a specific soil configuration. Besides, a routine was designed to automate the computation of FLAC with the different data files in order to maximize the sample number. The methodology is applied with reference to a simplified slope in 2D, a simplified slope in 3D and an actual landslide, namely the Mortisa mudslide (Cortina d’Ampezzo, BL, Italy). However, it could be extended to numerous different cases, especially for hydrogeological analysis and landslide stability assessment, in different geological and geomorphological contexts.
Resumo:
We investigate the performance of error-correcting codes, where the code word comprises products of K bits selected from the original message and decoding is carried out utilizing a connectivity tensor with C connections per index. Shannon's bound for the channel capacity is recovered for large K and zero temperature when the code rate K/C is finite. Close to optimal error-correcting capability is obtained for finite K and C. We examine the finite-temperature case to assess the use of simulated annealing for decoding and extend the analysis to accommodate other types of noisy channels.
Resumo:
A variation of low-density parity check (LDPC) error-correcting codes defined over Galois fields (GF(q)) is investigated using statistical physics. A code of this type is characterised by a sparse random parity check matrix composed of C non-zero elements per column. We examine the dependence of the code performance on the value of q, for finite and infinite C values, both in terms of the thermodynamical transition point and the practical decoding phase characterised by the existence of a unique (ferromagnetic) solution. We find different q-dependence in the cases of C = 2 and C ≥ 3; the analytical solutions are in agreement with simulation results, providing a quantitative measure to the improvement in performance obtained using non-binary alphabets.
Resumo:
We study the performance of Low Density Parity Check (LDPC) error-correcting codes using the methods of statistical physics. LDPC codes are based on the generation of codewords using Boolean sums of the original message bits by employing two randomly-constructed sparse matrices. These codes can be mapped onto Ising spin models and studied using common methods of statistical physics. We examine various regular constructions and obtain insight into their theoretical and practical limitations. We also briefly report on results obtained for irregular code constructions, for codes with non-binary alphabet, and on how a finite system size effects the error probability.
Resumo:
In this thesis we use statistical physics techniques to study the typical performance of four families of error-correcting codes based on very sparse linear transformations: Sourlas codes, Gallager codes, MacKay-Neal codes and Kanter-Saad codes. We map the decoding problem onto an Ising spin system with many-spins interactions. We then employ the replica method to calculate averages over the quenched disorder represented by the code constructions, the arbitrary messages and the random noise vectors. We find, as the noise level increases, a phase transition between successful decoding and failure phases. This phase transition coincides with upper bounds derived in the information theory literature in most of the cases. We connect the practical decoding algorithm known as probability propagation with the task of finding local minima of the related Bethe free-energy. We show that the practical decoding thresholds correspond to noise levels where suboptimal minima of the free-energy emerge. Simulations of practical decoding scenarios using probability propagation agree with theoretical predictions of the replica symmetric theory. The typical performance predicted by the thermodynamic phase transitions is shown to be attainable in computation times that grow exponentially with the system size. We use the insights obtained to design a method to calculate the performance and optimise parameters of the high performance codes proposed by Kanter and Saad.
Resumo:
Reliability has emerged as a critical design constraint especially in memories. Designers are going to great lengths to guarantee fault free operation of the underlying silicon by adopting redundancy-based techniques, which essentially try to detect and correct every single error. However, such techniques come at a cost of large area, power and performance overheads which making many researchers to doubt their efficiency especially for error resilient systems where 100% accuracy is not always required. In this paper, we present an alternative method focusing on the confinement of the resulting output error induced by any reliability issues. By focusing on memory faults, rather than correcting every single error the proposed method exploits the statistical characteristics of any target application and replaces any erroneous data with the best available estimate of that data. To realize the proposed method a RISC processor is augmented with custom instructions and special-purpose functional units. We apply the method on the proposed enhanced processor by studying the statistical characteristics of the various algorithms involved in a popular multimedia application. Our experimental results show that in contrast to state-of-the-art fault tolerance approaches, we are able to reduce runtime and area overhead by 71.3% and 83.3% respectively.
Resumo:
A growing number of predicting corporate failure models has emerged since 60s. Economic and social consequences of business failure can be dramatic, thus it is not surprise that the issue has been of growing interest in academic research as well as in business context. The main purpose of this study is to compare the predictive ability of five developed models based on three statistical techniques (Discriminant Analysis, Logit and Probit) and two models based on Artificial Intelligence (Neural Networks and Rough Sets). The five models were employed to a dataset of 420 non-bankrupt firms and 125 bankrupt firms belonging to the textile and clothing industry, over the period 2003–09. Results show that all the models performed well, with an overall correct classification level higher than 90%, and a type II error always less than 2%. The type I error increases as we move away from the year prior to failure. Our models contribute to the discussion of corporate financial distress causes. Moreover it can be used to assist decisions of creditors, investors and auditors. Additionally, this research can be of great contribution to devisers of national economic policies that aim to reduce industrial unemployment.
Resumo:
A growing number of predicting corporate failure models has emerged since 60s. Economic and social consequences of business failure can be dramatic, thus it is not surprise that the issue has been of growing interest in academic research as well as in business context. The main purpose of this study is to compare the predictive ability of five developed models based on three statistical techniques (Discriminant Analysis, Logit and Probit) and two models based on Artificial Intelligence (Neural Networks and Rough Sets). The five models were employed to a dataset of 420 non-bankrupt firms and 125 bankrupt firms belonging to the textile and clothing industry, over the period 2003–09. Results show that all the models performed well, with an overall correct classification level higher than 90%, and a type II error always less than 2%. The type I error increases as we move away from the year prior to failure. Our models contribute to the discussion of corporate financial distress causes. Moreover it can be used to assist decisions of creditors, investors and auditors. Additionally, this research can be of great contribution to devisers of national economic policies that aim to reduce industrial unemployment.
Resumo:
A partir de las últimas décadas se ha impulsado el desarrollo y la utilización de los Sistemas de Información Geográficos (SIG) y los Sistemas de Posicionamiento Satelital (GPS) orientados a mejorar la eficiencia productiva de distintos sistemas de cultivos extensivos en términos agronómicos, económicos y ambientales. Estas nuevas tecnologías permiten medir variabilidad espacial de propiedades del sitio como conductividad eléctrica aparente y otros atributos del terreno así como el efecto de las mismas sobre la distribución espacial de los rendimientos. Luego, es posible aplicar el manejo sitio-específico en los lotes para mejorar la eficiencia en el uso de los insumos agroquímicos, la protección del medio ambiente y la sustentabilidad de la vida rural. En la actualidad, existe una oferta amplia de recursos tecnológicos propios de la agricultura de precisión para capturar variación espacial a través de los sitios dentro del terreno. El óptimo uso del gran volumen de datos derivado de maquinarias de agricultura de precisión depende fuertemente de las capacidades para explorar la información relativa a las complejas interacciones que subyacen los resultados productivos. La covariación espacial de las propiedades del sitio y el rendimiento de los cultivos ha sido estudiada a través de modelos geoestadísticos clásicos que se basan en la teoría de variables regionalizadas. Nuevos desarrollos de modelos estadísticos contemporáneos, entre los que se destacan los modelos lineales mixtos, constituyen herramientas prometedoras para el tratamiento de datos correlacionados espacialmente. Más aún, debido a la naturaleza multivariada de las múltiples variables registradas en cada sitio, las técnicas de análisis multivariado podrían aportar valiosa información para la visualización y explotación de datos georreferenciados. La comprensión de las bases agronómicas de las complejas interacciones que se producen a la escala de lotes en producción, es hoy posible con el uso de éstas nuevas tecnologías. Los objetivos del presente proyecto son: (l) desarrollar estrategias metodológicas basadas en la complementación de técnicas de análisis multivariados y geoestadísticas, para la clasificación de sitios intralotes y el estudio de interdependencias entre variables de sitio y rendimiento; (ll) proponer modelos mixtos alternativos, basados en funciones de correlación espacial de los términos de error que permitan explorar patrones de correlación espacial de los rendimientos intralotes y las propiedades del suelo en los sitios delimitados. From the last decades the use and development of Geographical Information Systems (GIS) and Satellite Positioning Systems (GPS) is highly promoted in cropping systems. Such technologies allow measuring spatial variability of site properties including electrical conductivity and others soil features as well as their impact on the spatial variability of yields. Therefore, site-specific management could be applied to improve the efficiency in the use of agrochemicals, the environmental protection, and the sustainability of the rural life. Currently, there is a wide offer of technological resources to capture spatial variation across sites within field. However, the optimum use of data coming from the precision agriculture machineries strongly depends on the capabilities to explore the information about the complex interactions underlying the productive outputs. The covariation between spatial soil properties and yields from georeferenced data has been treated in a graphical manner or with standard geostatistical approaches. New statistical modeling capabilities from the Mixed Linear Model framework are promising to deal with correlated data such those produced by the precision agriculture. Moreover, rescuing the multivariate nature of the multiple data collected at each site, several multivariate statistical approaches could be crucial tools for data analysis with georeferenced data. Understanding the basis of complex interactions at the scale of production field is now within reach the use of these new techniques. Our main objectives are: (1) to develop new statistical strategies, based on the complementarities of geostatistics and multivariate methods, useful to classify sites within field grown with grain crops and analyze the interrelationships of several soil and yield variables, (2) to propose mixed linear models to predict yield according spatial soil variability and to build contour maps to promote a more sustainable agriculture.
Resumo:
Limited information is available regarding the methodology required to characterize hashish seizures for assessing the presence or the absence of a chemical link between two seizures. This casework report presents the methodology applied for assessing that two different police seizures were coming from the same block before this latter one was split. The chemical signature was extracted using GC-MS analysis and the implemented methodology consists in a study of intra- and inter-variability distributions based on the measurement of the chemical profiles similarity using a number of hashish seizures and the calculation of the Pearson correlation coefficient. Different statistical scenarios (i.e., a combination of data pretreatment techniques and selection of target compounds) were tested to find the most discriminating one. Seven compounds showing high discrimination capabilities were selected on which a specific statistical data pretreatment was applied. Based on the results, the statistical model built for comparing the hashish seizures leads to low error rates. Therefore, the implemented methodology is suitable for the chemical profiling of hashish seizures.
Resumo:
For single-user MIMO communication with uncoded and coded QAM signals, we propose bit and power loading schemes that rely only on channel distribution information at the transmitter. To that end, we develop the relationship between the average bit error probability at the output of a ZF linear receiver and the bit rates and powers allocated at the transmitter. This relationship, and the fact that a ZF receiver decouples the MIMO parallel channels, allow leveraging bit loading algorithms already existing in the literature. We solve dual bit rate maximization and power minimization problems and present performance resultsthat illustrate the gains of the proposed scheme with respect toa non-optimized transmission.