660 resultados para Bobek, Michal
Resumo:
The n-tuple pattern recognition method has been tested using a selection of 11 large data sets from the European Community StatLog project, so that the results could be compared with those reported for the 23 other algorithms the project tested. The results indicate that this ultra-fast memory-based method is a viable competitor with the others, which include optimisation-based neural network algorithms, even though the theory of memory-based neural computing is less highly developed in terms of statistical theory.
Resumo:
The n-tuple recognition method was tested on 11 large real-world data sets and its performance compared to 23 other classification algorithms. On 7 of these, the results show no systematic performance gap between the n-tuple method and the others. Evidence was found to support a possible explanation for why the n-tuple method yields poor results for certain datasets. Preliminary empirical results of a study of the confidence interval (the difference between the two highest scores) are also reported. These suggest a counter-intuitive correlation between the confidence interval distribution and the overall classification performance of the system.
Resumo:
We present results concerning the application of the Good-Turing (GT) estimation method to the frequentist n-tuple system. We show that the Good-Turing method can, to a certain extent rectify the Zero Frequency Problem by providing, within a formal framework, improved estimates of small tallies. We also show that it leads to better tuple system performance than Maximum Likelihood estimation (MLE). However, preliminary experimental results suggest that replacing zero tallies with an arbitrary constant close to zero before MLE yields better performance than that of GT system.
Resumo:
The n-tuple recognition method is briefly reviewed, summarizing the main theoretical results. Large-scale experiments carried out on Stat-Log project datasets confirm this method as a viable competitor to more popular methods due to its speed, simplicity, and accuracy on the majority of a wide variety of classification problems. A further investigation into the failure of the method on certain datasets finds the problem to be largely due to a mismatch between the scales which describe generalization and data sparseness.
Resumo:
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.
Resumo:
Most object-based approaches to Geographical Information Systems (GIS) have concentrated on the representation of geometric properties of objects in terms of fixed geometry. In our road traffic marking application domain we have a requirement to represent the static locations of the road markings but also enforce the associated regulations, which are typically geometric in nature. For example a give way line of a pedestrian crossing in the UK must be within 1100-3000 mm of the edge of the crossing pattern. In previous studies of the application of spatial rules (often called 'business logic') in GIS emphasis has been placed on the representation of topological constraints and data integrity checks. There is very little GIS literature that describes models for geometric rules, although there are some examples in the Computer Aided Design (CAD) literature. This paper introduces some of the ideas from so called variational CAD models to the GIS application domain, and extends these using a Geography Markup Language (GML) based representation. In our application we have an additional requirement; the geometric rules are often changed and vary from country to country so should be represented in a flexible manner. In this paper we describe an elegant solution to the representation of geometric rules, such as requiring lines to be offset from other objects. The method uses a feature-property model embraced in GML 3.1 and extends the possible relationships in feature collections to permit the application of parameterized geometric constraints to sub features. We show the parametric rule model we have developed and discuss the advantage of using simple parametric expressions in the rule base. We discuss the possibilities and limitations of our approach and relate our data model to GML 3.1. © 2006 Springer-Verlag Berlin Heidelberg.
Resumo:
We present an implementation of the domain-theoretic Picard method for solving initial value problems (IVPs) introduced by Edalat and Pattinson [1]. Compared to Edalat and Pattinson's implementation, our algorithm uses a more efficient arithmetic based on an arbitrary precision floating-point library. Despite the additional overestimations due to floating-point rounding, we obtain a similar bound on the convergence rate of the produced approximations. Moreover, our convergence analysis is detailed enough to allow a static optimisation in the growth of the precision used in successive Picard iterations. Such optimisation greatly improves the efficiency of the solving process. Although a similar optimisation could be performed dynamically without our analysis, a static one gives us a significant advantage: we are able to predict the time it will take the solver to obtain an approximation of a certain (arbitrarily high) quality.
Resumo:
Linear typing schemes can be used to guarantee non-interference and so the soundness of in-place update with respect to a functional semantics. But linear schemes are restrictive in practice, and more restrictive than necessary to guarantee soundness of in-place update. This limitation has prompted research into static analysis and more sophisticated typing disciplines to determine when in-place update may be safely used, or to combine linear and non-linear schemes. Here we contribute to this direction by defining a new typing scheme that better approximates the semantic property of soundness of in-place update for a functional semantics. We begin from the observation that some data are used only in a read-only context, after which it may be safely re-used before being destroyed. Formalising the in-place update interpretation in a machine model semantics allows us to refine this observation, motivating three usage aspects apparent from the semantics that are used to annotate function argument types. The aspects are (1) used destructively, (2), used read-only but shared with result, and (3) used read-only and not shared with the result. The main novelty is aspect (2), which allows a linear value to be safely read and even aliased with a result of a function without being consumed. This novelty makes our type system more expressive than previous systems for functional languages in the literature. The system remains simple and intuitive, but it enjoys a strong soundness property whose proof is non-trivial. Moreover, our analysis features principal types and feasible type reconstruction, as shown in M. Konen'y (In TYPES 2002 workshop, Nijmegen, Proceedings, Springer-Verlag, 2003).
Resumo:
Two energy grass species, switch grass, a North American tuft grass, and reed canary grass, a European native, are likely to be important sources of biomass in Western Europe for the production of biorenewable energy. Matching chemical composition to conversion efficiency is a primary goal for improvement programmes and for determining the quality of biomass feed-stocks prior to use and there is a need for methods which allow cost effective characterisation of chemical composition at high rates of sample through-put. In this paper we demonstrate that nitrogen content and alkali index, parameters greatly influencing thermal conversion efficiency, can be accurately predicted in dried samples of these species grown under a range of agronomic conditions by partial least square regression of Fourier transform infrared spectra (R2 values for plots of predicted vs. measured values of 0.938 and 0.937, respectively). We also discuss the prediction of carbon and ash content in these samples and the application of infrared based predictive methods for the breeding improvement of energy grasses.
Resumo:
Urine proteomics is emerging as a powerful tool for biomarker discovery. The purpose of this study is the development of a well-characterized "real life" sample that can be used as reference standard in urine clinical proteomics studies.
Resumo:
The subject of this thesis is the n-tuple net.work (RAMnet). The major advantage of RAMnets is their speed and the simplicity with which they can be implemented in parallel hardware. On the other hand, this method is not a universal approximator and the training procedure does not involve the minimisation of a cost function. Hence RAMnets are potentially sub-optimal. It is important to understand the source of this sub-optimality and to develop the analytical tools that allow us to quantify the generalisation cost of using this model for any given data. We view RAMnets as classifiers and function approximators and try to determine how critical their lack of' universality and optimality is. In order to understand better the inherent. restrictions of the model, we review RAMnets showing their relationship to a number of well established general models such as: Associative Memories, Kamerva's Sparse Distributed Memory, Radial Basis Functions, General Regression Networks and Bayesian Classifiers. We then benchmark binary RAMnet. model against 23 other algorithms using real-world data from the StatLog Project. This large scale experimental study indicates that RAMnets are often capable of delivering results which are competitive with those obtained by more sophisticated, computationally expensive rnodels. The Frequency Weighted version is also benchmarked and shown to perform worse than the binary RAMnet for large values of the tuple size n. We demonstrate that the main issues in the Frequency Weighted RAMnets is adequate probability estimation and propose Good-Turing estimates in place of the more commonly used :Maximum Likelihood estimates. Having established the viability of the method numerically, we focus on providillg an analytical framework that allows us to quantify the generalisation cost of RAMnets for a given datasetL. For the classification network we provide a semi-quantitative argument which is based on the notion of Tuple distance. It gives a good indication of whether the network will fail for the given data. A rigorous Bayesian framework with Gaussian process prior assumptions is given for the regression n-tuple net. We show how to calculate the generalisation cost of this net and verify the results numerically for one dimensional noisy interpolation problems. We conclude that the n-tuple method of classification based on memorisation of random features can be a powerful alternative to slower cost driven models. The speed of the method is at the expense of its optimality. RAMnets will fail for certain datasets but the cases when they do so are relatively easy to determine with the analytical tools we provide.
Resumo:
We address the question of how to communicate among distributed processes valuessuch as real numbers, continuous functions and geometrical solids with arbitrary precision, yet efficiently. We extend the established concept of lazy communication using streams of approximants by introducing explicit queries. We formalise this approach using protocols of a query-answer nature. Such protocols enable processes to provide valid approximations with certain accuracy and focusing on certain locality as demanded by the receiving processes through queries. A lattice-theoretic denotational semantics of channel and process behaviour is developed. Thequery space is modelled as a continuous lattice in which the top element denotes the query demanding all the information, whereas other elements denote queries demanding partial and/or local information. Answers are interpreted as elements of lattices constructed over suitable domains of approximations to the exact objects. An unanswered query is treated as an error anddenoted using the top element. The major novel characteristic of our semantic model is that it reflects the dependency of answerson queries. This enables the definition and analysis of an appropriate concept of convergence rate, by assigning an effort indicator to each query and a measure of information content to eachanswer. Thus we capture not only what function a process computes, but also how a process transforms the convergence rates from its inputs to its outputs. In future work these indicatorscan be used to capture further computational complexity measures. A robust prototype implementation of our model is available.
Resumo:
We develop and study the concept of dataflow process networks as used for exampleby Kahn to suit exact computation over data types related to real numbers, such as continuous functions and geometrical solids. Furthermore, we consider communicating these exact objectsamong processes using protocols of a query-answer nature as introduced in our earlier work. This enables processes to provide valid approximations with certain accuracy and focusing on certainlocality as demanded by the receiving processes through queries. We define domain-theoretical denotational semantics of our networks in two ways: (1) directly, i. e. by viewing the whole network as a composite process and applying the process semantics introduced in our earlier work; and (2) compositionally, i. e. by a fixed-point construction similarto that used by Kahn from the denotational semantics of individual processes in the network. The direct semantics closely corresponds to the operational semantics of the network (i. e. it iscorrect) but very difficult to study for concrete networks. The compositional semantics enablescompositional analysis of concrete networks, assuming it is correct. We prove that the compositional semantics is a safe approximation of the direct semantics. Wealso provide a method that can be used in many cases to establish that the two semantics fully coincide, i. e. safety is not achieved through inactivity or meaningless answers. The results are extended to cover recursively-defined infinite networks as well as nested finitenetworks. A robust prototype implementation of our model is available.
Resumo:
Background - Previous Cochrane reviews have considered the use of cholinesterase inhibitors in both Parkinson's disease with dementia (PDD) and dementia with Lewy bodies (DLB). The clinical features of DLB and PDD have much in common and are distinguished primarily on the basis of whether or not parkinsonism precedes dementia by more than a year. Patients with both conditions have particularly severe deficits in cortical levels of the neurotransmitter acetylcholine. Therefore, blocking its breakdown using cholinesterase inhibitors may lead to clinical improvement. Objectives - To assess the efficacy, safety and tolerability of cholinesterase inhibitors in dementia with Lewy bodies (DLB), Parkinson’s disease with dementia (PDD), and cognitive impairment in Parkinson’s disease falling short of dementia (CIND-PD) (considered as separate phenomena and also grouped together as Lewy body disease). Search methods - The trials were identified from a search of ALOIS, the Specialised Register of the Cochrane Dementia and Cognitive Improvement Group (on 30 August 2011) using the search terms Lewy, Parkinson, PDD, DLB, LBD. This register consists of records from major healthcare databases (MEDLINE, EMBASE, PsycINFO, CINAHL) and many ongoing trial databases and is updated regularly. Reference lists of relevant studies were searched for additional trials. Selection criteria - Randomised, double-blind, placebo-controlled trials assessing the efficacy of treatment with cholinesterase inhibitors in DLB, PDD and cognitive impairment in Parkinson’s disease (CIND-PD). Data collection and analysis - Data were extracted from published reports by one review author (MR). The data for each 'condition' (that is DLB, PDD or CIND-PD) were considered separately and, where possible, also pooled together. Statistical analysis was conducted using Review Manager version 5.0. Main results - Six trials met the inclusion criteria for this review, in which a total of 1236 participants were randomised. Four of the trials were of a parallel group design and two cross-over trials were included. Four of the trials included participants with a diagnosis of Parkinson's disease with dementia (Aarsland 2002a; Dubois 2007; Emre 2004; Ravina 2005), of which Dubois 2007 remains unpublished. Leroi 2004 included patients with cognitive impairment and Parkinson's disease (both with and without dementia). Patients with dementia with Lewy bodies (DLB) were included in only one of the trials (McKeith 2000). For global assessment, three trials comparing cholinesterase inhibitor treatment to placebo in PDD (Aarsland 2002a; Emre 2004; Ravina 2005) reported a difference in the Alzheimer's Disease Cooperative Study-Clinical Global Impression of Change (ADCS-CGIC) score of -0.38, favouring the cholinesterase inhibitors (95% CI -0.56 to -0.24, P < 0.0001). For cognitive function, a pooled estimate of the effect of cholinesterase inhibitors on cognitive function measures was consistent with the presence of a therapeutic benefit (standardised mean difference (SMD) -0.34, 95% CI -0.46 to -0.23, P < 0.00001). There was evidence of a positive effect of cholinesterase inhibitors on the Mini-Mental State Examination (MMSE) in patients with PDD (WMD 1.09, 95% CI 0.45 to 1.73, P = 0.0008) and in the single PDD and CIND-PD trial (WMD 1.05, 95% CI 0.42 to 1.68, P = 0.01) but not in the single DLB trial. For behavioural disturbance, analysis of the pooled continuous data relating to behavioural disturbance rating scales favoured treatment with cholinesterase inhibitors (SMD -0.20, 95% CI -0.36 to -0.04, P = 0.01). For activities of daily living, combined data for the ADCS and the Unified Parkinson's Disease Rating Scale (UPDRS) activities of daily living rating scales favoured treatment with cholinesterase inhibitors (SMD -0.20, 95% CI -0.38 to -0.02, P = 0.03). For safety and tolerability, those taking a cholinesterase inhibitor were more likely to experience an adverse event (318/452 versus 668/842; odds ratio (OR) 1.64, 95% CI 1.26 to 2.15, P = 0.0003) and to drop out (128/465 versus 45/279; OR 1.94, 95% CI 1.33 to 2.84, P = 0.0006). Adverse events were more common amongst those taking rivastigmine (357/421 versus 173/240; OR 2.28, 95% CI 1.53 to 3.38, P < 0.0001) but not those taking donepezil (311/421 versus 145/212; OR 1.24, 95% CI 0.86 to 1.80, P = 0.25). Parkinsonian symptoms in particular tremor (64/739 versus 12/352; OR 2.71, 95% CI 1.44 to 5.09, P = 0.002), but not falls (P = 0.39), were reported more commonly in the treatment group but this did not have a significant impact on the UPDRS (total and motor) scores (P = 0.71). Fewer deaths occurred in the treatment group than in the placebo group (4/465 versus 9/279; OR 0.28, 95% CI 0.09 to 0.84, P = 0.03). Authors' conclusions - The currently available evidence supports the use of cholinesterase inhibitors in patients with PDD, with a positive impact on global assessment, cognitive function, behavioural disturbance and activities of daily living rating scales. The effect in DLB remains unclear. There is no current disaggregated evidence to support their use in CIND-PD.