5 resultados para Noisy corpora.
em CORA - Cork Open Research Archive - University College Cork - Ireland
Resumo:
A model for representing music scores in a form suitable for general processing by a music-analyst-programmer is proposed and implemented. Typical input to the model consists of one or more pieces of music which are encoded in a file-based score representation. File-based representations are in a form unsuited for general processing, as they do not provide a suitable level of abstraction for a programmer-analyst. Instead, a representation is created giving a programmer's view of the score. This frees the analyst-programmer from implementation details, that otherwise would form a substantial barrier to progress. The score representation uses an object-oriented approach to create a natural and robust software environment for the musicologist. The system is used to explore ways in which it could benefit musicologists. Methodologies for analysing music corpora are presented in a series of analytic examples which illustrate some of the potential of this model. Proving hypotheses or performing analysis on corpora involves the construction of algorithms. Some unique aspects of using this score model for corpus-based musicology are: - Algorithms impose a discipline which arises from the necessity for formalism. - Automatic analysis enables musicologists to complete tasks that otherwise would be infeasible because of limitations of their energy, attentiveness, accuracy and time.
Resumo:
Case-Based Reasoning (CBR) uses past experiences to solve new problems. The quality of the past experiences, which are stored as cases in a case base, is a big factor in the performance of a CBR system. The system's competence may be improved by adding problems to the case base after they have been solved and their solutions verified to be correct. However, from time to time, the case base may have to be refined to reduce redundancy and to get rid of any noisy cases that may have been introduced. Many case base maintenance algorithms have been developed to delete noisy and redundant cases. However, different algorithms work well in different situations and it may be difficult for a knowledge engineer to know which one is the best to use for a particular case base. In this thesis, we investigate ways to combine algorithms to produce better deletion decisions than the decisions made by individual algorithms, and ways to choose which algorithm is best for a given case base at a given time. We analyse five of the most commonly-used maintenance algorithms in detail and show how the different algorithms perform better on different datasets. This motivates us to develop a new approach: maintenance by a committee of experts (MACE). MACE allows us to combine maintenance algorithms to produce a composite algorithm which exploits the merits of each of the algorithms that it contains. By combining different algorithms in different ways we can also define algorithms that have different trade-offs between accuracy and deletion. While MACE allows us to define an infinite number of new composite algorithms, we still face the problem of choosing which algorithm to use. To make this choice, we need to be able to identify properties of a case base that are predictive of which maintenance algorithm is best. We examine a number of measures of dataset complexity for this purpose. These provide a numerical way to describe a case base at a given time. We use the numerical description to develop a meta-case-based classification system. This system uses previous experience about which maintenance algorithm was best to use for other case bases to predict which algorithm to use for a new case base. Finally, we give the knowledge engineer more control over the deletion process by creating incremental versions of the maintenance algorithms. These incremental algorithms suggest one case at a time for deletion rather than a group of cases, which allows the knowledge engineer to decide whether or not each case in turn should be deleted or kept. We also develop incremental versions of the complexity measures, allowing us to create an incremental version of our meta-case-based classification system. Since the case base changes after each deletion, the best algorithm to use may also change. The incremental system allows us to choose which algorithm is the best to use at each point in the deletion process.
Resumo:
Error correcting codes are combinatorial objects, designed to enable reliable transmission of digital data over noisy channels. They are ubiquitously used in communication, data storage etc. Error correction allows reconstruction of the original data from received word. The classical decoding algorithms are constrained to output just one codeword. However, in the late 50’s researchers proposed a relaxed error correction model for potentially large error rates known as list decoding. The research presented in this thesis focuses on reducing the computational effort and enhancing the efficiency of decoding algorithms for several codes from algorithmic as well as architectural standpoint. The codes in consideration are linear block codes closely related to Reed Solomon (RS) codes. A high speed low complexity algorithm and architecture are presented for encoding and decoding RS codes based on evaluation. The implementation results show that the hardware resources and the total execution time are significantly reduced as compared to the classical decoder. The evaluation based encoding and decoding schemes are modified and extended for shortened RS codes and software implementation shows substantial reduction in memory footprint at the expense of latency. Hermitian codes can be seen as concatenated RS codes and are much longer than RS codes over the same aphabet. A fast, novel and efficient VLSI architecture for Hermitian codes is proposed based on interpolation decoding. The proposed architecture is proven to have better than Kötter’s decoder for high rate codes. The thesis work also explores a method of constructing optimal codes by computing the subfield subcodes of Generalized Toric (GT) codes that is a natural extension of RS codes over several dimensions. The polynomial generators or evaluation polynomials for subfield-subcodes of GT codes are identified based on which dimension and bound for the minimum distance are computed. The algebraic structure for the polynomials evaluating to subfield is used to simplify the list decoding algorithm for BCH codes. Finally, an efficient and novel approach is proposed for exploiting powerful codes having complex decoding but simple encoding scheme (comparable to RS codes) for multihop wireless sensor network (WSN) applications.
Resumo:
Phase-locked loops (PLLs) are a crucial component in modern communications systems. Comprising of a phase-detector, linear filter, and controllable oscillator, they are widely used in radio receivers to retrieve the information content from remote signals. As such, they are capable of signal demodulation, phase and carrier recovery, frequency synthesis, and clock synchronization. Continuous-time PLLs are a mature area of study, and have been covered in the literature since the early classical work by Viterbi [1] in the 1950s. With the rise of computing in recent decades, discrete-time digital PLLs (DPLLs) are a more recent discipline; most of the literature published dates from the 1990s onwards. Gardner [2] is a pioneer in this area. It is our aim in this work to address the difficulties encountered by Gardner [3] in his investigation of the DPLL output phase-jitter where additive noise to the input signal is combined with frequency quantization in the local oscillator. The model we use in our novel analysis of the system is also applicable to another of the cases looked at by Gardner, that is the DPLL with a delay element integrated in the loop. This gives us the opportunity to look at this system in more detail, our analysis providing some unique insights into the variance `dip' seen by Gardner in [3]. We initially provide background on the probability theory and stochastic processes. These branches of mathematics are the basis for the study of noisy analogue and digital PLLs. We give an overview of the classical analogue PLL theory as well as the background on both the digital PLL and circle map, referencing the model proposed by Teplinsky et al. [4, 5]. For our novel work, the case of the combined frequency quantization and noisy input from [3] is investigated first numerically, and then analytically as a Markov chain via its Chapman-Kolmogorov equation. The resulting delay equation for the steady-state jitter distribution is treated using two separate asymptotic analyses to obtain approximate solutions. It is shown how the variance obtained in each case matches well to the numerical results. Other properties of the output jitter, such as the mean, are also investigated. In this way, we arrive at a more complete understanding of the interaction between quantization and input noise in the first order DPLL than is possible using simulation alone. We also do an asymptotic analysis of a particular case of the noisy first-order DPLL with delay, previously investigated by Gardner [3]. We show a unique feature of the simulation results, namely the variance `dip' seen for certain levels of input noise, is explained by this analysis. Finally, we look at the second-order DPLL with additive noise, using numerical simulations to see the effects of low levels of noise on the limit cycles. We show how these effects are similar to those seen in the noise-free loop with non-zero initial conditions.
Resumo:
The Leaving Certificate (LC) is the national, standardised state examination in Ireland necessary for entry to third level education – this presents a massive, raw corpus of data with the potential to yield invaluable insight into the phenomena of learner interlanguage. With samples of official LC Spanish examination data, this project has compiled a digitised corpus of learner Spanish comprised of the written and oral production of 100 candidates. This corpus was then analysed using a specific investigative corpus technique, Computer-aided Error Analysis (CEA, Dagneaux et al, 1998). CEA is a powerful apparatus in that it greatly facilitates the quantification and analysis of a large learner corpus in digital format. The corpus was both compiled and analysed with the use of UAM Corpus Tool (O’Donnell 2013). This Tool allows for the recording of candidate-specific variables such as grade, examination level, task type and gender, therefore allowing for critical analysis of the corpus as one unit, as separate written and oral sub corpora and also of performance per task, level and gender. This is an interdisciplinary work combining aspects of Applied Linguistics, Learner Corpus Research and Foreign Language (FL) Learning. Beginning with a review of the context of FL learning in Ireland and Europe, I go on to discuss the disciplinary context and theoretical framework for this work and outline the methodology applied. I then perform detailed quantitative and qualitative analyses before going on to combine all research findings outlining principal conclusions. This investigation does not make a priori assumptions about the data set, the LC Spanish examination, the context of FLs or of any aspect of learner competence. It undertakes to provide the linguistic research community and the domain of Spanish language learning and pedagogy in Ireland with an empirical, descriptive profile of real learner performance, characterising learner difficulty.