969 resultados para Probability distribution functions
Resumo:
The purpose of this work is to provide a description of the heavy rainfall phenomenon on statistical tools from a Spanish region. We want to quantify the effect of the climate change to verify the rapidity of its evolution across the variation of the probability distributions. Our conclusions have special interest for the agrarian insurances, which may make estimates of costs more realistically. In this work, the analysis mainly focuses on: The distribution of consecutive days without rain for each gauge stations and season. We estimate density Kernel functions and Generalized Pareto Distribution (GPD) for a network of station from the Ebro River basin until a threshold value u. We can establish a relation between distributional parameters and regional characteristics. Moreover we analyze especially the tail of the probability distribution. These tails are governed by law of power means that the number of events n can be expressed as the power of another quantity x : n(x) = x? . ? can be estimated as the slope of log-log plot the number of events and the size. The most convenient way to analyze n(x) is using the empirical probability distribution. Pr(X mayor que x) ? x-?. The distribution of rainfall over percentile of order 0.95 from wet days at the seasonal scale and in a yearly scale with the same treatment of tails than in the previous section.
Resumo:
La presente Tesis plantea una metodología de análisis estadístico de roturas de tubería en redes de distribución de agua, que analiza la relación entre las roturas y la presión de agua y que propone la implantación de una gestión de presiones que reduzca el número de roturas que se producen en dichas redes. Las redes de distribución de agua se deterioran y una de sus graves consecuencias es la aparición de roturas frecuentes en sus tuberías. Las roturas llevan asociados elevados costes sociales, económicos y medioambientales y es por ello por lo que las compañías gestoras del agua tratan de reducirlas en la medida de lo posible. Las redes de distribución de agua se pueden dividir en zonas o sectores que facilitan su control y que pueden ser independientes o aislarse mediante válvulas, como ocurre en las redes de países más desarrollados, o pueden estar intercomunicados hidráulicamente. La implantación de una gestión de presiones suele llevarse a cabo a través de las válvulas reductoras de presión (VPR), que se instalan en las cabeceras de estos sectores y que controlan la presión aguas abajo de la misma, aunque varíe su caudal de entrada. Los métodos más conocidos de la gestión de presiones son la reducción de presiones, que es el control más habitual, el mantenimiento de la presión, la prevención y/o alivio de los aumentos repentinos de presión y el establecimiento de un control por alturas. A partir del año 2005 se empezó a reconocer el efecto de la gestión de presiones sobre la disminución de las roturas. En esta Tesis, se sugiere una gestión de presiones que controle los rangos de los indicadores de la presión de cabecera que más influyan en la probabilidad de roturas de tubería. Así, la presión del agua se caracteriza a través de indicadores obtenidos de la presión registrada en la cabecera de los sectores, debido a que se asume que esta presión es representativa de la presión de operación de todas las tuberías porque las pérdidas de carga son relativamente bajas y las diferencias topográficas se tienen en cuenta en el diseño de los sectores. Y los indicadores de presión, que se pueden definir como el estadístico calculado a partir de las series de la presión de cabecera sobre una ventana de tiempo, pueden proveer la información necesaria para ayudar a la toma de decisiones a los gestores del agua con el fin de reducir las roturas de tubería en las redes de distribución de agua. La primera parte de la metodología que se propone en esta Tesis trata de encontrar los indicadores de presión que influyen más en la probabilidad de roturas de tuberías. Para conocer si un indicador es influyente en la probabilidad de las roturas se comparan las estimaciones de las funciones de distribución acumulada (FDAs) de los indicadores de presiones, considerando dos situaciones: cuando se condicionan a la ocurrencia de una rotura (suceso raro) y cuando se calculan en la situación normal de operación (normal operación). Por lo general, las compañías gestoras cuentan con registros de roturas de los años más recientes y al encontrarse las tuberías enterradas se complica el acceso a la información. Por ello, se propone el uso de funciones de probabilidad que permiten reducir la incertidumbre asociada a los datos registrados. De esta forma, se determinan las funciones de distribución acumuladas (FDAs) de los valores del indicador de la serie de presión (situación normal de operación) y las FDAs de los valores del indicador en el momento de ocurrencia de las roturas (condicionado a las roturas). Si las funciones de distribución provienen de la misma población, no se puede deducir que el indicador claramente influya en la probabilidad de roturas. Sin embargo, si se prueba estadísticamente que las funciones proceden de la misma población, se puede concluir que existe una relación entre el indicador analizado y la ocurrencia de las roturas. Debido a que el número de valores del indicador de la FDA condicionada a las roturas es mucho menor que el número de valores del indicador de la FDA incondicional a las roturas, se generan series aleatorias a partir de los valores de los indicadores con el mismo número de valores que roturas registradas hay. De esta forma, se comparan las FDAs de series aleatorias del indicador con la FDA condicionada a las roturas del mismo indicador y se deduce si el indicador es influyente en la probabilidad de las roturas. Los indicadores de presión pueden depender de unos parámetros. A través de un análisis de sensibilidad y aplicando un test estadístico robusto se determina la situación en la que estos parámetros dan lugar a que el indicador sea más influyente en la probabilidad de las roturas. Al mismo tiempo, los indicadores se pueden calcular en función de dos parámetros de cálculo que se denominan el tiempo de anticipación y el ancho de ventana. El tiempo de anticipación es el tiempo (en horas) entre el final del periodo de computación del indicador de presión y la rotura, y el ancho de ventana es el número de valores de presión que se requieren para calcular el indicador de presión y que es múltiplo de 24 horas debido al comportamiento cíclico diario de la presión. Un análisis de sensibilidad de los parámetros de cálculo explica cuándo los indicadores de presión influyen más en la probabilidad de roturas. En la segunda parte de la metodología se presenta un modelo de diagnóstico bayesiano. Este tipo de modelo forma parte de los modelos estadísticos de prevención de roturas, parten de los datos registrados para establecer patrones de fallo y utilizan el teorema de Bayes para determinar la probabilidad de fallo cuando se condiciona la red a unas determinadas características. Así, a través del teorema de Bayes se comparan la FDA genérica del indicador con la FDA condicionada a las roturas y se determina cuándo la probabilidad de roturas aumenta para ciertos rangos del indicador que se ha inferido como influyente en las roturas. Se determina un ratio de probabilidad (RP) que cuando es superior a la unidad permite distinguir cuándo la probabilidad de roturas incrementa para determinados intervalos del indicador. La primera parte de la metodología se aplica a la red de distribución de la Comunidad de Madrid (España) y a la red de distribución de Ciudad de Panamá (Panamá). Tras el filtrado de datos se deduce que se puede aplicar la metodología en 15 sectores en la Comunidad de Madrid y en dos sectores, llamados corregimientos, en Ciudad de Panamá. Los resultados demuestran que en las dos redes los indicadores más influyentes en la probabilidad de las roturas son el rango de la presión, que supone la diferencia entre la presión máxima y la presión mínima, y la variabilidad de la presión, que considera la propiedad estadística de la desviación típica. Se trata, por tanto, de indicadores que hacen referencia a la dispersión de los datos, a la persistencia de la variación de la presión y que se puede asimilar en resistencia de materiales a la fatiga. La segunda parte de la metodología se ha aplicado a los indicadores influyentes en la probabilidad de las roturas de la Comunidad de Madrid y se ha deducido que la probabilidad de roturas aumenta para valores extremos del indicador del rango de la presión y del indicador de la variabilidad de la presión. Finalmente, se recomienda una gestión de presiones que limite los intervalos de los indicadores influyentes en la probabilidad de roturas que incrementen dicha probabilidad. La metodología propuesta puede aplicarse a otras redes de distribución y puede ayudar a las compañías gestoras a reducir el número de fallos en el sistema a través de la gestión de presiones. This Thesis presents a methodology for the statistical analysis of pipe breaks in water distribution networks. The methodology studies the relationship between pipe breaks and water pressure, and proposes a pressure management procedure to reduce the number of breaks that occur in such networks. One of the manifestations of the deterioration of water supply systems is frequent pipe breaks. System failures are one of the major challenges faced by water utilities, due to their associated social, economic and environmental costs. For all these reasons, water utilities aim at reducing the problem of break occurrence to as great an extent as possible. Water distribution networks can be divided into areas or sectors, which facilitates the control of the network. These areas may be independent or isolated by valves, as it usually happens in developing countries. Alternatively, they can be hydraulically interconnected. The implementation of pressure management strategies is usually carried out through pressure-reducing valves (PRV). These valves are installed at the head of the sectors and, although the inflow may vary significantly, they control the downstream pressure. The most popular methods of pressure management consist of pressure reduction, which is the common form of control, pressure sustaining, prevention and/or alleviation of pressure surges or large variations in pressure, and level/altitude control. From 2005 onwards, the effects of pressure management on burst frequencies have become more widely recognized in the technical literature. This thesis suggests a pressure management that controls the pressure indicator ranges most influential on the probability of pipe breaks. Operating pressure in a sector is characterized by means of a pressure indicator at the head of the DMA, as head losses are relatively small and topographical differences were accounted for at the design stage. The pressure indicator, which may be defined as the calculated statistic from the time series of pressure head over a specific time window, may provide necessary information to help water utilities to make decisions to reduce pipe breaks in water distribution networks. The first part of the methodology presented in this Thesis provides the pressure indicators which have the greatest impact on the probability of pipe breaks to be determined. In order to know whether a pressure indicator influences the probability of pipe breaks, the proposed methodology compares estimates of cumulative distribution functions (CDFs) of a pressure indicator through consideration of two situations: when they are conditioned to the occurrence of a pipe break (a rare event), and when they are not (a normal operation). Water utilities usually have a history of failures limited to recent periods of time, and it is difficult to have access to precise information in an underground network. Therefore, the use of distribution functions to address such imprecision of recorded data is proposed. Cumulative distribution functions (CDFs) derived from the time series of pressure indicators (normal operation) and CDFs of indicator values at times coincident with a reported pipe break (conditioned to breaks) are compared. If all estimated CDFs are drawn from the same population, there is no reason to infer that the studied indicator clearly influences the probability of the rare event. However, when it is statistically proven that the estimated CDFs do not come from the same population, the analysed indicator may have an influence on the occurrence of pipe breaks. Due to the fact that the number of indicator values used to estimate the CDF conditioned to breaks is much lower in comparison with the number of indicator values to estimate the CDF of the unconditional pressure series, and that the obtained results depend on the size of the compared samples, CDFs from random sets of the same size sampled from the unconditional indicator values are estimated. Therefore, the comparison between the estimated CDFs of random sets of the indicator and the estimated CDF conditioned to breaks allows knowledge of if the indicator is influential on the probability of pipe breaks. Pressure indicators depend on various parameters. Sensitivity analysis and a robust statistical test allow determining the indicator for which these parameters result most influential on the probability of pipe breaks. At the same time, indicators can be calculated according to two model parameters, named as the anticipation time and the window width. The anticipation time refers to the time (hours) between the end of the period for the computation of the pressure indicator and the break. The window width is the number of instantaneous pressure values required to calculate the pressure indicator and is multiple of 24 hours, as water pressure has a cyclical behaviour which lasts one day. A sensitivity analysis of the model parameters explains when the pressure indicator is more influential on the probability of pipe breaks. The second part of the methodology presents a Bayesian diagnostic model. This kind of model belongs to the class of statistical predictive models, which are based on historical data, represent break behavior and patterns in water mains, and use the Bayes’ theorem to condition the probability of failure to specific system characteristics. The Bayes’ theorem allows comparing the break-conditioned FDA and the unconditional FDA of the indicators and determining when the probability of pipe breaks increases for certain pressure indicator ranges. A defined probability ratio provides a measure to establish whether the probability of breaks increases for certain ranges of the pressure indicator. The first part of the methodology is applied to the water distribution network of Madrid (Spain) and to the water distribution network of Panama City (Panama). The data filtering method suggests that the methodology can be applied to 15 sectors in Madrid and to two areas in Panama City. The results show that, in both systems, the most influential indicators on the probability of pipe breaks are the pressure range, which is the difference between the maximum pressure and the minimum pressure, and pressure variability, referred to the statistical property of the standard deviation. Therefore, they represent the dispersion of the data, the persistence of the variation in pressure and may be related to the fatigue in material resistance. The second part of the methodology has been applied to the influential indicators on the probability of pipe breaks in the water distribution network of Madrid. The main conclusion is that the probability of pipe breaks increases for the extreme values of the pressure range indicator and of the pressure variability indicator. Finally, a pressure management which limits the ranges of the pressure indicators influential on the probability of pipe breaks that increase such probability is recommended. The methodology presented here is general, may be applied to other water distribution networks, and could help water utilities reduce the number of system failures through pressure management.
Resumo:
En la presente Tesis se ha llevado a cabo el contraste y desarrollo de metodologías que permitan mejorar el cálculo de las avenidas de proyecto y extrema empleadas en el cálculo de la seguridad hidrológica de las presas. En primer lugar se ha abordado el tema del cálculo de las leyes de frecuencia de caudales máximos y su extrapolación a altos periodos de retorno. Esta cuestión es de gran relevancia, ya que la adopción de estándares de seguridad hidrológica para las presas cada vez más exigentes, implica la utilización de periodos de retorno de diseño muy elevados cuya estimación conlleva una gran incertidumbre. Es importante, en consecuencia incorporar al cálculo de los caudales de diseño todas la técnicas disponibles para reducir dicha incertidumbre. Asimismo, es importante hacer una buena selección del modelo estadístico (función de distribución y procedimiento de ajuste) de tal forma que se garantice tanto su capacidad para describir el comportamiento de la muestra, como para predecir de manera robusta los cuantiles de alto periodo de retorno. De esta forma, se han realizado estudios a escala nacional con el objetivo de determinar el esquema de regionalización que ofrece mejores resultados para las características hidrológicas de las cuencas españolas, respecto a los caudales máximos anuales, teniendo en cuenta el numero de datos disponibles. La metodología utilizada parte de la identificación de regiones homogéneas, cuyos límites se han determinado teniendo en cuenta las características fisiográficas y climáticas de las cuencas, y la variabilidad de sus estadísticos, comprobando posteriormente su homogeneidad. A continuación, se ha seleccionado el modelo estadístico de caudales máximos anuales con un mejor comportamiento en las distintas zonas de la España peninsular, tanto para describir los datos de la muestra como para extrapolar a los periodos de retorno más altos. El proceso de selección se ha basado, entre otras cosas, en la generación sintética de series de datos mediante simulaciones de Monte Carlo, y el análisis estadístico del conjunto de resultados obtenido a partir del ajuste de funciones de distribución a estas series bajo distintas hipótesis. Posteriormente, se ha abordado el tema de la relación caudal-volumen y la definición de los hidrogramas de diseño en base a la misma, cuestión que puede ser de gran importancia en el caso de presas con grandes volúmenes de embalse. Sin embargo, los procedimientos de cálculo hidrológico aplicados habitualmente no tienen en cuenta la dependencia estadística entre ambas variables. En esta Tesis se ha desarrollado un procedimiento para caracterizar dicha dependencia estadística de una manera sencilla y robusta, representando la función de distribución conjunta del caudal punta y el volumen en base a la función de distribución marginal del caudal punta y la función de distribución condicionada del volumen respecto al caudal. Esta última se determina mediante una función de distribución log-normal, aplicando un procedimiento de ajuste regional. Se propone su aplicación práctica a través de un procedimiento de cálculo probabilístico basado en la generación estocástica de un número elevado de hidrogramas. La aplicación a la seguridad hidrológica de las presas de este procedimiento requiere interpretar correctamente el concepto de periodo de retorno aplicado a variables hidrológicas bivariadas. Para ello, se realiza una propuesta de interpretación de dicho concepto. El periodo de retorno se entiende como el inverso de la probabilidad de superar un determinado nivel de embalse. Al relacionar este periodo de retorno con las variables hidrológicas, el hidrograma de diseño de la presa deja de ser un único hidrograma para convertirse en una familia de hidrogramas que generan un mismo nivel máximo en el embalse, representados mediante una curva en el plano caudal volumen. Esta familia de hidrogramas de diseño depende de la propia presa a diseñar, variando las curvas caudal-volumen en función, por ejemplo, del volumen de embalse o la longitud del aliviadero. El procedimiento propuesto se ilustra mediante su aplicación a dos casos de estudio. Finalmente, se ha abordado el tema del cálculo de las avenidas estacionales, cuestión fundamental a la hora de establecer la explotación de la presa, y que puede serlo también para estudiar la seguridad hidrológica de presas existentes. Sin embargo, el cálculo de estas avenidas es complejo y no está del todo claro hoy en día, y los procedimientos de cálculo habitualmente utilizados pueden presentar ciertos problemas. El cálculo en base al método estadístico de series parciales, o de máximos sobre un umbral, puede ser una alternativa válida que permite resolver esos problemas en aquellos casos en que la generación de las avenidas en las distintas estaciones se deba a un mismo tipo de evento. Se ha realizado un estudio con objeto de verificar si es adecuada en España la hipótesis de homogeneidad estadística de los datos de caudal de avenida correspondientes a distintas estaciones del año. Asimismo, se han analizado los periodos estacionales para los que es más apropiado realizar el estudio, cuestión de gran relevancia para garantizar que los resultados sean correctos, y se ha desarrollado un procedimiento sencillo para determinar el umbral de selección de los datos de tal manera que se garantice su independencia, una de las principales dificultades en la aplicación práctica de la técnica de las series parciales. Por otra parte, la aplicación practica de las leyes de frecuencia estacionales requiere interpretar correctamente el concepto de periodo de retorno para el caso estacional. Se propone un criterio para determinar los periodos de retorno estacionales de forma coherente con el periodo de retorno anual y con una distribución adecuada de la probabilidad entre las distintas estaciones. Por último, se expone un procedimiento para el cálculo de los caudales estacionales, ilustrándolo mediante su aplicación a un caso de estudio. The compare and develop of a methodology in order to improve the extreme flow estimation for dam hydrologic security has been developed. First, the work has been focused on the adjustment of maximum peak flows distribution functions from which to extrapolate values for high return periods. This has become a major issue as the adoption of stricter standards on dam hydrologic security involves estimation of high design return periods which entails great uncertainty. Accordingly, it is important to incorporate all available techniques for the estimation of design peak flows in order to reduce this uncertainty. Selection of the statistical model (distribution function and adjustment method) is also important since its ability to describe the sample and to make solid predictions for high return periods quantiles must be guaranteed. In order to provide practical application of previous methodologies, studies have been developed on a national scale with the aim of determining a regionalization scheme which features best results in terms of annual maximum peak flows for hydrologic characteristics of Spanish basins taking into account the length of available data. Applied methodology starts with the delimitation of regions taking into account basin’s physiographic and climatic characteristics and the variability of their statistical properties, and continues with their homogeneity testing. Then, a statistical model for maximum annual peak flows is selected with the best behaviour for the different regions in peninsular Spain in terms of describing sample data and making solid predictions for high return periods. This selection has been based, among others, on synthetic data series generation using Monte Carlo simulations and statistical analysis of results from distribution functions adjustment following different hypothesis. Secondly, the work has been focused on the analysis of the relationship between peak flow and volume and how to define design flood hydrographs based on this relationship which can be highly important for large volume reservoirs. However, commonly used hydrologic procedures do not take statistical dependence between these variables into account. A simple and sound method for statistical dependence characterization has been developed by the representation of a joint distribution function of maximum peak flow and volume which is based on marginal distribution function of peak flow and conditional distribution function of volume for a given peak flow. The last one is determined by a regional adjustment procedure of a log-normal distribution function. Practical application is proposed by a probabilistic estimation procedure based on stochastic generation of a large number of hydrographs. The use of this procedure for dam hydrologic security requires a proper interpretation of the return period concept applied to bivariate hydrologic data. A standard is proposed in which it is understood as the inverse of the probability of exceeding a determined reservoir level. When relating return period and hydrological variables the only design flood hydrograph changes into a family of hydrographs which generate the same maximum reservoir level and that are represented by a curve in the peak flow-volume two-dimensional space. This family of design flood hydrographs depends on the dam characteristics as for example reservoir volume or spillway length. Two study cases illustrate the application of the developed methodology. Finally, the work has been focused on the calculation of seasonal floods which are essential when determining the reservoir operation and which can be also fundamental in terms of analysing the hydrologic security of existing reservoirs. However, seasonal flood calculation is complex and nowadays it is not totally clear. Calculation procedures commonly used may present certain problems. Statistical partial duration series, or peaks over threshold method, can be an alternative approach for their calculation that allow to solve problems encountered when the same type of event is responsible of floods in different seasons. A study has been developed to verify the hypothesis of statistical homogeneity of peak flows for different seasons in Spain. Appropriate seasonal periods have been analyzed which is highly relevant to guarantee correct results. In addition, a simple procedure has been defined to determine data selection threshold on a way that ensures its independency which is one of the main difficulties in practical application of partial series. Moreover, practical application of seasonal frequency laws requires a correct interpretation of the concept of seasonal return period. A standard is proposed in order to determine seasonal return periods coherently with the annual return period and with an adequate seasonal probability distribution. Finally a methodology is proposed to calculate seasonal peak flows. A study case illustrates the application of the proposed methodology.
Resumo:
Blue whiting (Micromesistius poutassou, http://www.marinespecies.org/aphia.php?p=taxdetails&id=126439) is a small mesopelagic planktivorous gadoid found throughout the North-East Atlantic. This data contains the results of a model-based analysis of larvae captured by the Continuous Plankton Recorder (CPR) during the period 1951-2005. The observations are analysed using Generalised Additive Models (GAMs) of the the spatial, seasonal and interannual variation in the occurrence of larvae. The best fitting model is chosen using the Aikaike Information Criteria (AIC). The probability of occurrence in the continous plankton recorder is then normalised and converted to a probability distribution function in space (UTM projection Zone 28) and season (day of year). The best fitting model splits the distribution into two separate spawning grounds north and south of a dividing line at 53 N. The probability distribution is therefore normalised in these two regions (ie the space-time integral over each of the two regions is 1). The modelled outputs are on a UTM Zone 28 grid: however, for convenience, the latitude ("lat") and longitude ("lon") of each of these grid points are also included as a variable in the NetCDF file. The assignment of each grid point to either the Northern or Southern component (defined here as north/south of 53 N), is also included as a further variable ("component"). Finally, the day of year ("doy") is stored as the number of days elapsed from and included January 1 (ie doy=1 on January 1) - the year is thereafter divided into 180 grid points.
Resumo:
Minimization of a sum-of-squares or cross-entropy error function leads to network outputs which approximate the conditional averages of the target data, conditioned on the input vector. For classifications problems, with a suitably chosen target coding scheme, these averages represent the posterior probabilities of class membership, and so can be regarded as optimal. For problems involving the prediction of continuous variables, however, the conditional averages provide only a very limited description of the properties of the target variables. This is particularly true for problems in which the mapping to be learned is multi-valued, as often arises in the solution of inverse problems, since the average of several correct target values is not necessarily itself a correct value. In order to obtain a complete description of the data, for the purposes of predicting the outputs corresponding to new input vectors, we must model the conditional probability distribution of the target data, again conditioned on the input vector. In this paper we introduce a new class of network models obtained by combining a conventional neural network with a mixture density model. The complete system is called a Mixture Density Network, and can in principle represent arbitrary conditional probability distributions in the same way that a conventional neural network can represent arbitrary functions. We demonstrate the effectiveness of Mixture Density Networks using both a toy problem and a problem involving robot inverse kinematics.
Resumo:
A conventional neural network approach to regression problems approximates the conditional mean of the output vector. For mappings which are multi-valued this approach breaks down, since the average of two solutions is not necessarily a valid solution. In this article mixture density networks, a principled method to model conditional probability density functions, are applied to retrieving Cartesian wind vector components from satellite scatterometer data. A hybrid mixture density network is implemented to incorporate prior knowledge of the predominantly bimodal function branches. An advantage of a fully probabilistic model is that more sophisticated and principled methods can be used to resolve ambiguities.
Resumo:
Mixture Density Networks are a principled method to model conditional probability density functions which are non-Gaussian. This is achieved by modelling the conditional distribution for each pattern with a Gaussian Mixture Model for which the parameters are generated by a neural network. This thesis presents a novel method to introduce regularisation in this context for the special case where the mean and variance of the spherical Gaussian Kernels in the mixtures are fixed to predetermined values. Guidelines for how these parameters can be initialised are given, and it is shown how to apply the evidence framework to mixture density networks to achieve regularisation. This also provides an objective stopping criteria that can replace the `early stopping' methods that have previously been used. If the neural network used is an RBF network with fixed centres this opens up new opportunities for improved initialisation of the network weights, which are exploited to start training relatively close to the optimum. The new method is demonstrated on two data sets. The first is a simple synthetic data set while the second is a real life data set, namely satellite scatterometer data used to infer the wind speed and wind direction near the ocean surface. For both data sets the regularisation method performs well in comparison with earlier published results. Ideas on how the constraint on the kernels may be relaxed to allow fully adaptable kernels are presented.
Resumo:
The ERS-1 Satellite was launched in July 1991 by the European Space Agency into a polar orbit at about km800, carrying a C-band scatterometer. A scatterometer measures the amount of radar back scatter generated by small ripples on the ocean surface induced by instantaneous local winds. Operational methods that extract wind vectors from satellite scatterometer data are based on the local inversion of a forward model, mapping scatterometer observations to wind vectors, by the minimisation of a cost function in the scatterometer measurement space.par This report uses mixture density networks, a principled method for modelling conditional probability density functions, to model the joint probability distribution of the wind vectors given the satellite scatterometer measurements in a single cell (the `inverse' problem). The complexity of the mapping and the structure of the conditional probability density function are investigated by varying the number of units in the hidden layer of the multi-layer perceptron and the number of kernels in the Gaussian mixture model of the mixture density network respectively. The optimal model for networks trained per trace has twenty hidden units and four kernels. Further investigation shows that models trained with incidence angle as an input have results comparable to those models trained by trace. A hybrid mixture density network that incorporates geophysical knowledge of the problem confirms other results that the conditional probability distribution is dominantly bimodal.par The wind retrieval results improve on previous work at Aston, but do not match other neural network techniques that use spatial information in the inputs, which is to be expected given the ambiguity of the inverse problem. Current work uses the local inverse model for autonomous ambiguity removal in a principled Bayesian framework. Future directions in which these models may be improved are given.
Resumo:
A conventional neural network approach to regression problems approximates the conditional mean of the output vector. For mappings which are multi-valued this approach breaks down, since the average of two solutions is not necessarily a valid solution. In this article mixture density networks, a principled method to model conditional probability density functions, are applied to retrieving Cartesian wind vector components from satellite scatterometer data. A hybrid mixture density network is implemented to incorporate prior knowledge of the predominantly bimodal function branches. An advantage of a fully probabilistic model is that more sophisticated and principled methods can be used to resolve ambiguities.
Resumo:
Mixture Density Networks are a principled method to model conditional probability density functions which are non-Gaussian. This is achieved by modelling the conditional distribution for each pattern with a Gaussian Mixture Model for which the parameters are generated by a neural network. This thesis presents a novel method to introduce regularisation in this context for the special case where the mean and variance of the spherical Gaussian Kernels in the mixtures are fixed to predetermined values. Guidelines for how these parameters can be initialised are given, and it is shown how to apply the evidence framework to mixture density networks to achieve regularisation. This also provides an objective stopping criteria that can replace the `early stopping' methods that have previously been used. If the neural network used is an RBF network with fixed centres this opens up new opportunities for improved initialisation of the network weights, which are exploited to start training relatively close to the optimum. The new method is demonstrated on two data sets. The first is a simple synthetic data set while the second is a real life data set, namely satellite scatterometer data used to infer the wind speed and wind direction near the ocean surface. For both data sets the regularisation method performs well in comparison with earlier published results. Ideas on how the constraint on the kernels may be relaxed to allow fully adaptable kernels are presented.
Resumo:
We investigate the feasibility of simultaneous suppressing of the amplification noise and nonlinearity, representing the most fundamental limiting factors in modern optical communication. To accomplish this task we developed a general design optimisation technique, based on concepts of noise and nonlinearity management. We demonstrate the immense efficiency of the novel approach by applying it to a design optimisation of transmission lines with periodic dispersion compensation using Raman and hybrid Raman-EDFA amplification. Moreover, we showed, using nonlinearity management considerations, that the optimal performance in high bit-rate dispersion managed fibre systems with hybrid amplification is achieved for a certain amplifier spacing – which is different from commonly known optimal noise performance corresponding to fully distributed amplification. Required for an accurate estimation of the bit error rate, the complete knowledge of signal statistics is crucial for modern transmission links with strong inherent nonlinearity. Therefore, we implemented the advanced multicanonical Monte Carlo (MMC) method, acknowledged for its efficiency in estimating distribution tails. We have accurately computed acknowledged for its efficiency in estimating distribution tails. We have accurately computed marginal probability density functions for soliton parameters, by numerical modelling of Fokker-Plank equation applying the MMC simulation technique. Moreover, applying a powerful MMC method we have studied the BER penalty caused by deviations from the optimal decision level in systems employing in-line 2R optical regeneration. We have demonstrated that in such systems the analytical linear approximation that makes a better fit in the central part of the regenerator nonlinear transfer function produces more accurate approximation of the BER and BER penalty. We present a statistical analysis of RZ-DPSK optical signal at direct detection receiver with Mach-Zehnder interferometer demodulation
Resumo:
The ERS-1 Satellite was launched in July 1991 by the European Space Agency into a polar orbit at about 800 km, carrying a C-band scatterometer. A scatterometer measures the amount of backscatter microwave radiation reflected by small ripples on the ocean surface induced by sea-surface winds, and so provides instantaneous snap-shots of wind flow over large areas of the ocean surface, known as wind fields. Inherent in the physics of the observation process is an ambiguity in wind direction; the scatterometer cannot distinguish if the wind is blowing toward or away from the sensor device. This ambiguity implies that there is a one-to-many mapping between scatterometer data and wind direction. Current operational methods for wind field retrieval are based on the retrieval of wind vectors from satellite scatterometer data, followed by a disambiguation and filtering process that is reliant on numerical weather prediction models. The wind vectors are retrieved by the local inversion of a forward model, mapping scatterometer observations to wind vectors, and minimising a cost function in scatterometer measurement space. This thesis applies a pragmatic Bayesian solution to the problem. The likelihood is a combination of conditional probability distributions for the local wind vectors given the scatterometer data. The prior distribution is a vector Gaussian process that provides the geophysical consistency for the wind field. The wind vectors are retrieved directly from the scatterometer data by using mixture density networks, a principled method to model multi-modal conditional probability density functions. The complexity of the mapping and the structure of the conditional probability density function are investigated. A hybrid mixture density network, that incorporates the knowledge that the conditional probability distribution of the observation process is predominantly bi-modal, is developed. The optimal model, which generalises across a swathe of scatterometer readings, is better on key performance measures than the current operational model. Wind field retrieval is approached from three perspectives. The first is a non-autonomous method that confirms the validity of the model by retrieving the correct wind field 99% of the time from a test set of 575 wind fields. The second technique takes the maximum a posteriori probability wind field retrieved from the posterior distribution as the prediction. For the third technique, Markov Chain Monte Carlo (MCMC) techniques were employed to estimate the mass associated with significant modes of the posterior distribution, and make predictions based on the mode with the greatest mass associated with it. General methods for sampling from multi-modal distributions were benchmarked against a specific MCMC transition kernel designed for this problem. It was shown that the general methods were unsuitable for this application due to computational expense. On a test set of 100 wind fields the MAP estimate correctly retrieved 72 wind fields, whilst the sampling method correctly retrieved 73 wind fields.
Resumo:
We find the probability distribution of the fluctuating parameters of a soliton propagating through a medium with additive noise. Our method is a modification of the instanton formalism (method of optimal fluctuation) based on a saddle-point approximation in the path integral. We first solve consistently a fundamental problem of soliton propagation within the framework of noisy nonlinear Schrödinger equation. We then consider model modifications due to in-line (filtering, amplitude and phase modulation) control. It is examined how control elements change the error probability in optical soliton transmission. Even though a weak noise is considered, we are interested here in probabilities of error-causing large fluctuations which are beyond perturbation theory. We describe in detail a new phenomenon of soliton collapse that occurs under the combined action of noise, filtering and amplitude modulation. © 2004 Elsevier B.V. All rights reserved.
Resumo:
Показано, что метод обобщенных интервальных оценок (ОИО), первоначально предназначавшийся для выявления и формализованного представления экспертных знаний об известных с неопределенностью количественных исходных данных моделей интеллектуальных систем поддержки экспертных решений (СПЭР), можно рассматривать как развитие сценарного подхода в теории принятия решений. Предложены процедуры исследования методом ОИО задач с зависимыми параметрами, таких как задача прогнозирования объемов извлекаемых запасов месторождений в зависимости от уровней цены на углеводороды. Установлены аналитические соотношения для функций распределения вероятностей обобщенных равномерных распределений, используемых в сценарном анализе и анализе результирующих показателей моделей включенных в базу моделей СПЭР.
Resumo:
Dynamics of biomolecules over various spatial and time scales are essential for biological functions such as molecular recognition, catalysis and signaling. However, reconstruction of biomolecular dynamics from experimental observables requires the determination of a conformational probability distribution. Unfortunately, these distributions cannot be fully constrained by the limited information from experiments, making the problem an ill-posed one in the terminology of Hadamard. The ill-posed nature of the problem comes from the fact that it has no unique solution. Multiple or even an infinite number of solutions may exist. To avoid the ill-posed nature, the problem needs to be regularized by making assumptions, which inevitably introduce biases into the result.
Here, I present two continuous probability density function approaches to solve an important inverse problem called the RDC trigonometric moment problem. By focusing on interdomain orientations we reduced the problem to determination of a distribution on the 3D rotational space from residual dipolar couplings (RDCs). We derived an analytical equation that relates alignment tensors of adjacent domains, which serves as the foundation of the two methods. In the first approach, the ill-posed nature of the problem was avoided by introducing a continuous distribution model, which enjoys a smoothness assumption. To find the optimal solution for the distribution, we also designed an efficient branch-and-bound algorithm that exploits the mathematical structure of the analytical solutions. The algorithm is guaranteed to find the distribution that best satisfies the analytical relationship. We observed good performance of the method when tested under various levels of experimental noise and when applied to two protein systems. The second approach avoids the use of any model by employing maximum entropy principles. This 'model-free' approach delivers the least biased result which presents our state of knowledge. In this approach, the solution is an exponential function of Lagrange multipliers. To determine the multipliers, a convex objective function is constructed. Consequently, the maximum entropy solution can be found easily by gradient descent methods. Both algorithms can be applied to biomolecular RDC data in general, including data from RNA and DNA molecules.