965 resultados para cartographic generalisation
Resumo:
Neural network learning rules can be viewed as statistical estimators. They should be studied in Bayesian framework even if they are not Bayesian estimators. Generalisation should be measured by the divergence between the true distribution and the estimated distribution. Information divergences are invariant measurements of the divergence between two distributions. The posterior average information divergence is used to measure the generalisation ability of a network. The optimal estimators for multinomial distributions with Dirichlet priors are studied in detail. This confirms that the definition is compatible with intuition. The results also show that many commonly used methods can be put under this unified framework, by assume special priors and special divergences.
Resumo:
A family of measurements of generalisation is proposed for estimators of continuous distributions. In particular, they apply to neural network learning rules associated with continuous neural networks. The optimal estimators (learning rules) in this sense are Bayesian decision methods with information divergence as loss function. The Bayesian framework guarantees internal coherence of such measurements, while the information geometric loss function guarantees invariance. The theoretical solution for the optimal estimator is derived by a variational method. It is applied to the family of Gaussian distributions and the implications are discussed. This is one in a series of technical reports on this topic; it generalises the results of ¸iteZhu95:prob.discrete to continuous distributions and serve as a concrete example of a larger picture ¸iteZhu95:generalisation.
Resumo:
Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation. The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading. The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd. It therefore offers conceptual simplification to information geometry. The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior P(p), describing the environment of all the problems; the divergence Dd, specifying the requirement of the task; and the model Q, specifying available computing resources.
Resumo:
Neural networks are statistical models and learning rules are estimators. In this paper a theory for measuring generalisation is developed by combining Bayesian decision theory with information geometry. The performance of an estimator is measured by the information divergence between the true distribution and the estimate, averaged over the Bayesian posterior. This unifies the majority of error measures currently in use. The optimal estimators also reveal some intricate interrelationships among information geometry, Banach spaces and sufficient statistics.
Resumo:
The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.
Resumo:
We explore the dependence of performance measures, such as the generalization error and generalization consistency, on the structure and the parameterization of the prior on `rules', instanced here by the noisy linear perceptron. Using a statistical mechanics framework, we show how one may assign values to the parameters of a model for a `rule' on the basis of data instancing the rule. Information about the data, such as input distribution, noise distribution and other `rule' characteristics may be embedded in the form of general gaussian priors for improving net performance. We examine explicitly two types of general gaussian priors which are useful in some simple cases. We calculate the optimal values for the parameters of these priors and show their effect in modifying the most probable, MAP, values for the rules.
Resumo:
The assessment of the reliability of systems which learn from data is a key issue to investigate thoroughly before the actual application of information processing techniques to real-world problems. Over the recent years Gaussian processes and Bayesian neural networks have come to the fore and in this thesis their generalisation capabilities are analysed from theoretical and empirical perspectives. Upper and lower bounds on the learning curve of Gaussian processes are investigated in order to estimate the amount of data required to guarantee a certain level of generalisation performance. In this thesis we analyse the effects on the bounds and the learning curve induced by the smoothness of stochastic processes described by four different covariance functions. We also explain the early, linearly-decreasing behaviour of the curves and we investigate the asymptotic behaviour of the upper bounds. The effect of the noise and the characteristic lengthscale of the stochastic process on the tightness of the bounds are also discussed. The analysis is supported by several numerical simulations. The generalisation error of a Gaussian process is affected by the dimension of the input vector and may be decreased by input-variable reduction techniques. In conventional approaches to Gaussian process regression, the positive definite matrix estimating the distance between input points is often taken diagonal. In this thesis we show that a general distance matrix is able to estimate the effective dimensionality of the regression problem as well as to discover the linear transformation from the manifest variables to the hidden-feature space, with a significant reduction of the input dimension. Numerical simulations confirm the significant superiority of the general distance matrix with respect to the diagonal one.In the thesis we also present an empirical investigation of the generalisation errors of neural networks trained by two Bayesian algorithms, the Markov Chain Monte Carlo method and the evidence framework; the neural networks have been trained on the task of labelling segmented outdoor images.
Resumo:
Coinduction is a method of growing importance in reasoning about functional languages, due to the increasing prominence of lazy data structures. Through the use of bisimulations and proofs that bisimilarity is a congruence in various domains it can be used to prove the congruence of two processes. A coinductive proof requires a relation to be chosen which can be proved to be a bisimulation. We use proof planning to develop a heuristic method which automatically constucts a candidate relation. If this relation doesn't allow the proof to go through a proof critic analyses the reasons why it failed and modifies the relation accordingly. Several proof tools have been developed to aid coinductive proofs but all require user interaction. Crucially they require the user to supply an appropriate relation which the system can then prove to be a bisimulation.
Resumo:
Intrusion Detection Systems (IDSs) provide an important layer of security for computer systems and networks, and are becoming more and more necessary as reliance on Internet services increases and systems with sensitive data are more commonly open to Internet access. An IDS’s responsibility is to detect suspicious or unacceptable system and network activity and to alert a systems administrator to this activity. The majority of IDSs use a set of signatures that define what suspicious traffic is, and Snort is one popular and actively developing open-source IDS that uses such a set of signatures known as Snort rules. Our aim is to identify a way in which Snort could be developed further by generalising rules to identify novel attacks. In particular, we attempted to relax and vary the conditions and parameters of current Snort rules, using a similar approach to classic rule learning operators such as generalisation and specialisation. We demonstrate the effectiveness of our approach through experiments with standard datasets and show that we are able to detect previously undetected variants of various attacks. We conclude by discussing the general effectiveness and appropriateness of generalisation in Snort based IDS rule processing. Keywords: anomaly detection, intrusion detection, Snort, Snort rules
Resumo:
Intrusion Detection Systems (IDSs) provide an important layer of security for computer systems and networks, and are becoming more and more necessary as reliance on Internet services increases and systems with sensitive data are more commonly open to Internet access. An IDS’s responsibility is to detect suspicious or unacceptable system and network activity and to alert a systems administrator to this activity. The majority of IDSs use a set of signatures that define what suspicious traffic is, and Snort is one popular and actively developing open-source IDS that uses such a set of signatures known as Snort rules. Our aim is to identify a way in which Snort could be developed further by generalising rules to identify novel attacks. In particular, we attempted to relax and vary the conditions and parameters of current Snort rules, using a similar approach to classic rule learning operators such as generalisation and specialisation. We demonstrate the effectiveness of our approach through experiments with standard datasets and show that we are able to detect previously undetected variants of various attacks. We conclude by discussing the general effectiveness and appropriateness of generalisation in Snort based IDS rule processing. Keywords: anomaly detection, intrusion detection, Snort, Snort rules
Resumo:
An exploratory case study which seeks to understand better the problem of low participation rates of women in Information Communication Technology (ICT) is currently being conducted in Queensland, Australia. Contextualised within the Digital Content Industry (DCI) multimedia and games production sectors, the emphasis is on women employed as interactive content creators rather than as users of the technologies. Initial findings provide rich descriptive insights into the perceptions and experiences of female DCI professionals. Influences on participation such as: existing gender ratios, gender and occupational stereotypes, access into the industry and future parental responsibilities have emerged from the data. Bandura’s (1999) Social Cognitive Theory (SCT) is used as a “scaffold” (Walsham, 1995:76) to guide data analysis and assist analytic generalisation of the case study findings. We propose that the lens of human agency and theories such as SCT assist in explaining how influences are manifested and affect women’s agency and ultimately participation in the DCI. The Sphere of Influence conceptual model (Geneve et al, 2008), which emerges from the data and underpinning theory, is proposed as a heuristic framework to further explore influences on women’s participation in the DCI industry context.
Resumo:
Maps have been published on the world wide web since its inception (Cartwright, 1999) and are still accessed and viewed by millions of users today (Peterson, 2003). While early webbased GIS products lacked a complete set of cartographic capabilities, the functionality within such systems has significantly increased over recent years. Functionalities once found only in desktop GIS products are now available in web-based GIS applications, for example, data entry, basic editing, and analysis. Applications based on web-GIS are becoming more widespread and the web-based GIS environment is replacing the traditional desktop GIS platforms in many organizations. Therefore, development of a new cartographic method for web-based GIS is vital. The broad aim of this project is to examine and discuss the challenges and opportunities of innovative cartography methods for web-based GIS platforms. The work introduces a recently developed cartographic methodology, which is based on a web-based GIS portal by the Survey of Israel (SOI). The work discusses the prospects and constraints of such methods in improving web-GIS interfaces and usability for the end user. The work also tables the preliminary findings of the initial implementation of the web-based GIS cartographic method within the portal of the Survey of Israel, as well as the applicability of those methods elsewhere.
Resumo:
Mapping the physical world, the arrangement of continents and oceans, cities and villages, mountains and deserts, while not without its own contentious aspects, can at least draw upon centuries of previous work in cartography and discovery. To map virtual spaces is another challenge altogether. Are cartographic conventions applicable to depictions of the blogosphere, or the internet in general? Is a more mathematical approach required to even start to make sense of the shape of the blogosphere, to understand the network created by and between blogs? With my research comparing information flows in the Australian and French political blogs, visualising the data obtained is important as it can demonstrate the spread of ideas and topics across blogs. However, how best to depict the flows, links, and the spaces between is still unclear. Is network theory and systems of hubs and nodes more relevant than mass communication theories to the research at hand, influencing the nature of any map produced? Is it even a good idea to try and apply boundaries like ‘Australian’ and ‘French’ to parts of a map that does not reflect international borders or the Mercator projection? While drawing upon some of my work-in-progress, this paper will also evaluate previous maps of the blogosphere and approaches to depicting networks of blogs. As such, the paper will provide a greater awareness of the tools available and the strengths and limitations of mapping methodologies, helping to shape the direction of my research in a field still very much under development.
Resumo:
Financial processes may possess long memory and their probability densities may display heavy tails. Many models have been developed to deal with this tail behaviour, which reflects the jumps in the sample paths. On the other hand, the presence of long memory, which contradicts the efficient market hypothesis, is still an issue for further debates. These difficulties present challenges with the problems of memory detection and modelling the co-presence of long memory and heavy tails. This PhD project aims to respond to these challenges. The first part aims to detect memory in a large number of financial time series on stock prices and exchange rates using their scaling properties. Since financial time series often exhibit stochastic trends, a common form of nonstationarity, strong trends in the data can lead to false detection of memory. We will take advantage of a technique known as multifractal detrended fluctuation analysis (MF-DFA) that can systematically eliminate trends of different orders. This method is based on the identification of scaling of the q-th-order moments and is a generalisation of the standard detrended fluctuation analysis (DFA) which uses only the second moment; that is, q = 2. We also consider the rescaled range R/S analysis and the periodogram method to detect memory in financial time series and compare their results with the MF-DFA. An interesting finding is that short memory is detected for stock prices of the American Stock Exchange (AMEX) and long memory is found present in the time series of two exchange rates, namely the French franc and the Deutsche mark. Electricity price series of the five states of Australia are also found to possess long memory. For these electricity price series, heavy tails are also pronounced in their probability densities. The second part of the thesis develops models to represent short-memory and longmemory financial processes as detected in Part I. These models take the form of continuous-time AR(∞) -type equations whose kernel is the Laplace transform of a finite Borel measure. By imposing appropriate conditions on this measure, short memory or long memory in the dynamics of the solution will result. A specific form of the models, which has a good MA(∞) -type representation, is presented for the short memory case. Parameter estimation of this type of models is performed via least squares, and the models are applied to the stock prices in the AMEX, which have been established in Part I to possess short memory. By selecting the kernel in the continuous-time AR(∞) -type equations to have the form of Riemann-Liouville fractional derivative, we obtain a fractional stochastic differential equation driven by Brownian motion. This type of equations is used to represent financial processes with long memory, whose dynamics is described by the fractional derivative in the equation. These models are estimated via quasi-likelihood, namely via a continuoustime version of the Gauss-Whittle method. The models are applied to the exchange rates and the electricity prices of Part I with the aim of confirming their possible long-range dependence established by MF-DFA. The third part of the thesis provides an application of the results established in Parts I and II to characterise and classify financial markets. We will pay attention to the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX), the NASDAQ Stock Exchange (NASDAQ) and the Toronto Stock Exchange (TSX). The parameters from MF-DFA and those of the short-memory AR(∞) -type models will be employed in this classification. We propose the Fisher discriminant algorithm to find a classifier in the two and three-dimensional spaces of data sets and then provide cross-validation to verify discriminant accuracies. This classification is useful for understanding and predicting the behaviour of different processes within the same market. The fourth part of the thesis investigates the heavy-tailed behaviour of financial processes which may also possess long memory. We consider fractional stochastic differential equations driven by stable noise to model financial processes such as electricity prices. The long memory of electricity prices is represented by a fractional derivative, while the stable noise input models their non-Gaussianity via the tails of their probability density. A method using the empirical densities and MF-DFA will be provided to estimate all the parameters of the model and simulate sample paths of the equation. The method is then applied to analyse daily spot prices for five states of Australia. Comparison with the results obtained from the R/S analysis, periodogram method and MF-DFA are provided. The results from fractional SDEs agree with those from MF-DFA, which are based on multifractal scaling, while those from the periodograms, which are based on the second order, seem to underestimate the long memory dynamics of the process. This highlights the need and usefulness of fractal methods in modelling non-Gaussian financial processes with long memory.