936 resultados para Lexical error
Resumo:
We present a theoretical method for a direct evaluation of the average and reliability error exponents in low-density parity-check error-correcting codes using methods of statistical physics. Results for the binary symmetric channel are presented for codes of both finite and infinite connectivity.
Resumo:
To describe the prevalence of refractive error (myopia and hyperopia) and visual impairment in a representative sample of white school children.
Resumo:
Derivational morphology proposes meaningful connections between words and is largely unrepresented in lexical databases. This thesis presents a project to enrich a lexical database with morphological links and to evaluate their contribution to disambiguation. A lexical database with sense distinctions was required. WordNet was chosen because of its free availability and widespread use. Its suitability was assessed through critical evaluation with respect to specifications and criticisms, using a transparent, extensible model. The identification of serious shortcomings suggested a portable enrichment methodology, applicable to alternative resources. Although 40% of the most frequent words are prepositions, they have been largely ignored by computational linguists, so addition of prepositions was also required. The preferred approach to morphological enrichment was to infer relations from phenomena discovered algorithmically. Both existing databases and existing algorithms can capture regular morphological relations, but cannot capture exceptions correctly; neither of them provide any semantic information. Some morphological analysis algorithms are subject to the fallacy that morphological analysis can be performed simply by segmentation. Morphological rules, grounded in observation and etymology, govern associations between and attachment of suffixes and contribute to defining the meaning of morphological relationships. Specifying character substitutions circumvents the segmentation fallacy. Morphological rules are prone to undergeneration, minimised through a variable lexical validity requirement, and overgeneration, minimised by rule reformulation and restricting monosyllabic output. Rules take into account the morphology of ancestor languages through co-occurrences of morphological patterns. Multiple rules applicable to an input suffix need their precedence established. The resistance of prefixations to segmentation has been addressed by identifying linking vowel exceptions and irregular prefixes. The automatic affix discovery algorithm applies heuristics to identify meaningful affixes and is combined with morphological rules into a hybrid model, fed only with empirical data, collected without supervision. Further algorithms apply the rules optimally to automatically pre-identified suffixes and break words into their component morphemes. To handle exceptions, stoplists were created in response to initial errors and fed back into the model through iterative development, leading to 100% precision, contestable only on lexicographic criteria. Stoplist length is minimised by special treatment of monosyllables and reformulation of rules. 96% of words and phrases are analysed. 218,802 directed derivational links have been encoded in the lexicon rather than the wordnet component of the model because the lexicon provides the optimal clustering of word senses. Both links and analyser are portable to an alternative lexicon. The evaluation uses the extended gloss overlaps disambiguation algorithm. The enriched model outperformed WordNet in terms of recall without loss of precision. Failure of all experiments to outperform disambiguation by frequency reflects on WordNet sense distinctions.
Resumo:
In a group of adult dyslexics word reading and, especially, word spelling are predicted more by what we have called lexical learning (tapped by a paired-associate task with pictures and written nonwords) than by phonological skills. Nonword reading and spelling, instead, are not associated with this task but they are predicted by phonological tasks. Consistently, surface and phonological dyslexics show opposite profiles on lexical learning and phonological tasks. The phonological dyslexics are more impaired on the phonological tasks, while the surface dyslexics are equally or more impaired on the lexical learning tasks. Finally, orthographic lexical learning explains more variation in spelling than in reading, and subtyping based on spelling returns more interpretable results than that based on reading. These results suggest that the quality of lexical representations is crucial to adult literacy skills. This is best measured by spelling and best predicted by a task of lexical learning. We hypothesize that lexical learning taps a uniquely human capacity to form new representations by recombining the units of a restricted set.
Resumo:
We investigated the ability to learn new words in a group of 22 adults with developmental dyslexia/dysgraphia and the relationship between their learning and spelling problems. We identified a deficit that affected the ability to learn both spoken and written new words (lexical learning deficit). There were no comparable problems in learning other kinds of representations (lexical/semantic and visual) and the deficit could not be explained in terms of more traditional phonological deficits associated with dyslexia (phonological awareness, phonological STM). Written new word learning accounted for further variance in the severity of the dysgraphia after phonological abilities had been partialled out. We suggest that lexical learning may be an independent ability needed to create lexical/formal representations from a series of independent units. Theoretical and clinical implications are discussed. © 2005 Psychology Press Ltd.
Resumo:
This paper is a progress report on a research path I first outlined in my contribution to “Words in Context: A Tribute to John Sinclair on his Retirement” (Heffer and Sauntson, 2000). Therefore, I first summarize that paper here, in order to provide the relevant background. The second half of the current paper consists of some further manual analyses, exploring various parameters and procedures that might assist in the design of an automated computational process for the identification of lexical sets. The automation itself is beyond the scope of the current paper.
Resumo:
Description, on English political texts, difference between lexical direction and direction in context.
Resumo:
Purpose To investigate the utility of uncorrected visual acuity measures in screening for refractive error in white school children aged 6-7-years and 12-13-years. Methods The Northern Ireland Childhood Errors of Refraction (NICER) study used a stratified random cluster design to recruit children from schools in Northern Ireland. Detailed eye examinations included assessment of logMAR visual acuity and cycloplegic autorefraction. Spherical equivalent refractive data from the right eye were used to classify significant refractive error as myopia of at least 1DS, hyperopia as greater than +3.50DS and astigmatism as greater than 1.50DC, whether it occurred in isolation or in association with myopia or hyperopia. Results Results are presented from 661 white 12-13-year-old and 392 white 6-7-year-old school-children. Using a cut-off of uncorrected visual acuity poorer than 0.20 logMAR to detect significant refractive error gave a sensitivity of 50% and specificity of 92% in 6-7-year-olds and 73% and 93% respectively in 12-13-year-olds. In 12-13-year-old children a cut-off of poorer than 0.20 logMAR had a sensitivity of 92% and a specificity of 91% in detecting myopia and a sensitivity of 41% and a specificity of 84% in detecting hyperopia. Conclusions Vision screening using logMAR acuity can reliably detect myopia, but not hyperopia or astigmatism in school-age children. Providers of vision screening programs should be cognisant that where detection of uncorrected hyperopic and/or astigmatic refractive error is an aspiration, current UK protocols will not effectively deliver.
Resumo:
Large monitoring networks are becoming increasingly common and can generate large datasets from thousands to millions of observations in size, often with high temporal resolution. Processing large datasets using traditional geostatistical methods is prohibitively slow and in real world applications different types of sensor can be found across a monitoring network. Heterogeneities in the error characteristics of different sensors, both in terms of distribution and magnitude, presents problems for generating coherent maps. An assumption in traditional geostatistics is that observations are made directly of the underlying process being studied and that the observations are contaminated with Gaussian errors. Under this assumption, sub–optimal predictions will be obtained if the error characteristics of the sensor are effectively non–Gaussian. One method, model based geostatistics, assumes that a Gaussian process prior is imposed over the (latent) process being studied and that the sensor model forms part of the likelihood term. One problem with this type of approach is that the corresponding posterior distribution will be non–Gaussian and computationally demanding as Monte Carlo methods have to be used. An extension of a sequential, approximate Bayesian inference method enables observations with arbitrary likelihoods to be treated, in a projected process kriging framework which is less computationally intensive. The approach is illustrated using a simulated dataset with a range of sensor models and error characteristics.
Resumo:
Regression problems are concerned with predicting the values of one or more continuous quantities, given the values of a number of input variables. For virtually every application of regression, however, it is also important to have an indication of the uncertainty in the predictions. Such uncertainties are expressed in terms of the error bars, which specify the standard deviation of the distribution of predictions about the mean. Accurate estimate of error bars is of practical importance especially when safety and reliability is an issue. The Bayesian view of regression leads naturally to two contributions to the error bars. The first arises from the intrinsic noise on the target data, while the second comes from the uncertainty in the values of the model parameters which manifests itself in the finite width of the posterior distribution over the space of these parameters. The Hessian matrix which involves the second derivatives of the error function with respect to the weights is needed for implementing the Bayesian formalism in general and estimating the error bars in particular. A study of different methods for evaluating this matrix is given with special emphasis on the outer product approximation method. The contribution of the uncertainty in model parameters to the error bars is a finite data size effect, which becomes negligible as the number of data points in the training set increases. A study of this contribution is given in relation to the distribution of data in input space. It is shown that the addition of data points to the training set can only reduce the local magnitude of the error bars or leave it unchanged. Using the asymptotic limit of an infinite data set, it is shown that the error bars have an approximate relation to the density of data in input space.
Resumo:
In this thesis we use statistical physics techniques to study the typical performance of four families of error-correcting codes based on very sparse linear transformations: Sourlas codes, Gallager codes, MacKay-Neal codes and Kanter-Saad codes. We map the decoding problem onto an Ising spin system with many-spins interactions. We then employ the replica method to calculate averages over the quenched disorder represented by the code constructions, the arbitrary messages and the random noise vectors. We find, as the noise level increases, a phase transition between successful decoding and failure phases. This phase transition coincides with upper bounds derived in the information theory literature in most of the cases. We connect the practical decoding algorithm known as probability propagation with the task of finding local minima of the related Bethe free-energy. We show that the practical decoding thresholds correspond to noise levels where suboptimal minima of the free-energy emerge. Simulations of practical decoding scenarios using probability propagation agree with theoretical predictions of the replica symmetric theory. The typical performance predicted by the thermodynamic phase transitions is shown to be attainable in computation times that grow exponentially with the system size. We use the insights obtained to design a method to calculate the performance and optimise parameters of the high performance codes proposed by Kanter and Saad.
Resumo:
The accuracy of altimetrically derived oceanographic and geophysical information is limited by the precision of the radial component of the satellite ephemeris. A non-dynamic technique is proposed as a method of reducing the global radial orbit error of altimetric satellites. This involves the recovery of each coefficient of an analytically derived radial error correction through a refinement of crossover difference residuals. The crossover data is supplemented by absolute height measurements to permit the retrieval of otherwise unobservable geographically correlated and linearly combined parameters. The feasibility of the radial reduction procedure is established upon application to the three day repeat orbit of SEASAT. The concept of arc aggregates is devised as a means of extending the method to incorporate longer durations, such as the 35 day repeat period of ERS-1. A continuous orbit is effectively created by including the radial misclosure between consecutive long arcs as an infallible observation. The arc aggregate procedure is validated using a combination of three successive SEASAT ephemerides. A complete simulation of the 501 revolution per 35 day repeat orbit of ERS-1 is derived and the recovery of the global radial orbit error over the full repeat period is successfully accomplished. The radial reduction is dependent upon the geographical locations of the supplementary direct height data. Investigations into the respective influences of various sites proposed for the tracking of ERS-1 by ground-based transponders are carried out. The potential effectiveness on the radial orbital accuracy of locating future tracking sites in regions of high latitudinal magnitude is demonstrated.
Resumo:
An intelligent agent, operating in an external world which cannot be fully described in its internal world model, must be able to monitor the success of a previously generated plan and to respond to any errors which may have occurred. The process of error analysis requires the ability to reason in an expert fashion about time and about processes occurring in the world. Reasoning about time is needed to deal with causality. Reasoning about processes is needed since the direct effects of a plan action can be completely specified when the plan is generated, but the indirect effects cannot. For example, the action `open tap' leads with certainty to `tap open', whereas whether there will be a fluid flow and how long it might last is more difficult to predict. The majority of existing planning systems cannot handle these kinds of reasoning, thus limiting their usefulness. This thesis argues that both kinds of reasoning require a complex internal representation of the world. The use of Qualitative Process Theory and an interval-based representation of time are proposed as a representation scheme for such a world model. The planning system which was constructed has been tested on a set of realistic planning scenarios. It is shown that even simple planning problems, such as making a cup of coffee, require extensive reasoning if they are to be carried out successfully. The final Chapter concludes that the planning system described does allow the correct solution of planning problems involving complex side effects, which planners up to now have been unable to solve.
Resumo:
The increasing cost of developing complex software systems has created a need for tools which aid software construction. One area in which significant progress has been made is with the so-called Compiler Writing Tools (CWTs); these aim at automated generation of various components of a compiler and hence at expediting the construction of complete programming language translators. A number of CWTs are already in quite general use, but investigation reveals significant drawbacks with current CWTs, such as lex and yacc. The effective use of a CWT typically requires a detailed technical understanding of its operation and involves tedious and error-prone input preparation. Moreover, CWTs such as lex and yacc address only a limited aspect of the compilation process; for example, actions necessary to perform lexical symbol valuation and abstract syntax tree construction must be explicitly coded by the user. This thesis presents a new CWT called CORGI (COmpiler-compiler from Reference Grammar Input) which deals with the entire `front-end' component of a compiler; this includes the provision of necessary data structures and routines to manipulate them, both generated from a single input specification. Compared with earlier CWTs, CORGI has a higher-level and hence more convenient user interface, operating on a specification derived directly from a `reference manual' grammar for the source language. Rather than developing a compiler-compiler from first principles, CORGI has been implemented by building a further shell around two existing compiler construction tools, namely lex and yacc. CORGI has been demonstrated to perform efficiently in realistic tests, both in terms of speed and the effectiveness of its user interface and error-recovery mechanisms.