938 resultados para k-Error linear complexity


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Random Indexing K-tree is the combination of two algorithms suited for large scale document clustering.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study considers the solution of a class of linear systems related with the fractional Poisson equation (FPE) (−∇2)α/2φ=g(x,y) with nonhomogeneous boundary conditions on a bounded domain. A numerical approximation to FPE is derived using a matrix representation of the Laplacian to generate a linear system of equations with its matrix A raised to the fractional power α/2. The solution of the linear system then requires the action of the matrix function f(A)=A−α/2 on a vector b. For large, sparse, and symmetric positive definite matrices, the Lanczos approximation generates f(A)b≈β0Vmf(Tm)e1. This method works well when both the analytic grade of A with respect to b and the residual for the linear system are sufficiently small. Memory constraints often require restarting the Lanczos decomposition; however this is not straightforward in the context of matrix function approximation. In this paper, we use the idea of thick-restart and adaptive preconditioning for solving linear systems to improve convergence of the Lanczos approximation. We give an error bound for the new method and illustrate its role in solving FPE. Numerical results are provided to gauge the performance of the proposed method relative to exact analytic solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The results of a numerical investigation into the errors for least squares estimates of function gradients are presented. The underlying algorithm is obtained by constructing a least squares problem using a truncated Taylor expansion. An error bound associated with this method contains in its numerator terms related to the Taylor series remainder, while its denominator contains the smallest singular value of the least squares matrix. Perhaps for this reason the error bounds are often found to be pessimistic by several orders of magnitude. The circumstance under which these poor estimates arise is elucidated and an empirical correction of the theoretical error bounds is conjectured and investigated numerically. This is followed by an indication of how the conjecture is supported by a rigorous argument.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis deals with the problem of the instantaneous frequency (IF) estimation of sinusoidal signals. This topic plays significant role in signal processing and communications. Depending on the type of the signal, two major approaches are considered. For IF estimation of single-tone or digitally-modulated sinusoidal signals (like frequency shift keying signals) the approach of digital phase-locked loops (DPLLs) is considered, and this is Part-I of this thesis. For FM signals the approach of time-frequency analysis is considered, and this is Part-II of the thesis. In part-I we have utilized sinusoidal DPLLs with non-uniform sampling scheme as this type is widely used in communication systems. The digital tanlock loop (DTL) has introduced significant advantages over other existing DPLLs. In the last 10 years many efforts have been made to improve DTL performance. However, this loop and all of its modifications utilizes Hilbert transformer (HT) to produce a signal-independent 90-degree phase-shifted version of the input signal. Hilbert transformer can be realized approximately using a finite impulse response (FIR) digital filter. This realization introduces further complexity in the loop in addition to approximations and frequency limitations on the input signal. We have tried to avoid practical difficulties associated with the conventional tanlock scheme while keeping its advantages. A time-delay is utilized in the tanlock scheme of DTL to produce a signal-dependent phase shift. This gave rise to the time-delay digital tanlock loop (TDTL). Fixed point theorems are used to analyze the behavior of the new loop. As such TDTL combines the two major approaches in DPLLs: the non-linear approach of sinusoidal DPLL based on fixed point analysis, and the linear tanlock approach based on the arctan phase detection. TDTL preserves the main advantages of the DTL despite its reduced structure. An application of TDTL in FSK demodulation is also considered. This idea of replacing HT by a time-delay may be of interest in other signal processing systems. Hence we have analyzed and compared the behaviors of the HT and the time-delay in the presence of additive Gaussian noise. Based on the above analysis, the behavior of the first and second-order TDTLs has been analyzed in additive Gaussian noise. Since DPLLs need time for locking, they are normally not efficient in tracking the continuously changing frequencies of non-stationary signals, i.e. signals with time-varying spectra. Nonstationary signals are of importance in synthetic and real life applications. An example is the frequency-modulated (FM) signals widely used in communication systems. Part-II of this thesis is dedicated for the IF estimation of non-stationary signals. For such signals the classical spectral techniques break down, due to the time-varying nature of their spectra, and more advanced techniques should be utilized. For the purpose of instantaneous frequency estimation of non-stationary signals there are two major approaches: parametric and non-parametric. We chose the non-parametric approach which is based on time-frequency analysis. This approach is computationally less expensive and more effective in dealing with multicomponent signals, which are the main aim of this part of the thesis. A time-frequency distribution (TFD) of a signal is a two-dimensional transformation of the signal to the time-frequency domain. Multicomponent signals can be identified by multiple energy peaks in the time-frequency domain. Many real life and synthetic signals are of multicomponent nature and there is little in the literature concerning IF estimation of such signals. This is why we have concentrated on multicomponent signals in Part-H. An adaptive algorithm for IF estimation using the quadratic time-frequency distributions has been analyzed. A class of time-frequency distributions that are more suitable for this purpose has been proposed. The kernels of this class are time-only or one-dimensional, rather than the time-lag (two-dimensional) kernels. Hence this class has been named as the T -class. If the parameters of these TFDs are properly chosen, they are more efficient than the existing fixed-kernel TFDs in terms of resolution (energy concentration around the IF) and artifacts reduction. The T-distributions has been used in the IF adaptive algorithm and proved to be efficient in tracking rapidly changing frequencies. They also enables direct amplitude estimation for the components of a multicomponent

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present paper motivates the study of mind change complexity for learning minimal models of length-bounded logic programs. It establishes ordinal mind change complexity bounds for learnability of these classes both from positive facts and from positive and negative facts. Building on Angluin’s notion of finite thickness and Wright’s work on finite elasticity, Shinohara defined the property of bounded finite thickness to give a sufficient condition for learnability of indexed families of computable languages from positive data. This paper shows that an effective version of Shinohara’s notion of bounded finite thickness gives sufficient conditions for learnability with ordinal mind change bound, both in the context of learnability from positive data and for learnability from complete (both positive and negative) data. Let Omega be a notation for the first limit ordinal. Then, it is shown that if a language defining framework yields a uniformly decidable family of languages and has effective bounded finite thickness, then for each natural number m >0, the class of languages defined by formal systems of length <= m: • is identifiable in the limit from positive data with a mind change bound of Omega (power)m; • is identifiable in the limit from both positive and negative data with an ordinal mind change bound of Omega × m. The above sufficient conditions are employed to give an ordinal mind change bound for learnability of minimal models of various classes of length-bounded Prolog programs, including Shapiro’s linear programs, Arimura and Shinohara’s depth-bounded linearly covering programs, and Krishna Rao’s depth-bounded linearly moded programs. It is also noted that the bound for learning from positive data is tight for the example classes considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The following paper proposes a novel application of Skid-to-Turn maneuvers for fixed wing Unmanned Aerial Vehicles (UAVs) inspecting locally linear infrastructure. Fixed wing UAVs, following the design of manned aircraft, commonly employ Bank-to-Turn ma- neuvers to change heading and thus direction of travel. Whilst effective, banking an aircraft during the inspection of ground based features hinders data collection, with body fixed sen- sors angled away from the direction of turn and a panning motion induced through roll rate that can reduce data quality. By adopting Skid-to-Turn maneuvers, the aircraft can change heading whilst maintaining wings level flight, thus allowing body fixed sensors to main- tain a downward facing orientation. An Image-Based Visual Servo controller is developed to directly control the position of features as captured by onboard inspection sensors. This improves on the indirect approach taken by other tracking controllers where a course over ground directly above the feature is assumed to capture it centered in the field of view. Performance of the proposed controller is compared against that of a Bank-to-Turn tracking controller driven by GPS derived cross track error in a simulation environment developed to replicate the field of view of a body fixed camera.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An algorithm based on the concept of combining Kalman filter and Least Error Square (LES) techniques is proposed in this paper. The algorithm is intended to estimate signal attributes like amplitude, frequency and phase angle in the online mode. This technique can be used in protection relays, digital AVRs, DGs, DSTATCOMs, FACTS and other power electronics applications. The Kalman filter is modified to operate on a fictitious input signal and provides precise estimation results insensitive to noise and other disturbances. At the same time, the LES system has been arranged to operate in critical transient cases to compensate the delay and inaccuracy identified because of the response of the standard Kalman filter. Practical considerations such as the effect of noise, higher order harmonics, and computational issues of the algorithm are considered and tested in the paper. Several computer simulations and a laboratory test are presented to highlight the usefulness of the proposed method. Simulation results show that the proposed technique can simultaneously estimate the signal attributes, even if it is highly distorted due to the presence of non-linear loads and noise.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Streaming SIMD extension (SSE) is a special feature embedded in the Intel Pentium III and IV classes of microprocessors. It enables the execution of SIMD type operations to exploit data parallelism. This article presents improving computation performance of a railway network simulator by means of SSE. Voltage and current at various points of the supply system to an electrified railway line are crucial for design, daily operation and planning. With computer simulation, their time-variations can be attained by solving a matrix equation, whose size mainly depends upon the number of trains present in the system. A large coefficient matrix, as a result of congested railway line, inevitably leads to heavier computational demand and hence jeopardizes the simulation speed. With the special architectural features of the latest processors on PC platforms, significant speed-up in computations can be achieved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have recently demonstrated the geographic isolation of rice tungro bacilliform virus (RTBV) populations in the tungro-endemic provinces of Isabela and North Cotabato, Philippines. In this study, we examined the genetic structure of the virus populations at the tungro-outbreak sites of Lanao del Norte, a province adjacent to North Cotabato. We also analyzed the virus populations at the tungro-endemic sites of Subang, Indonesia, and Dien Khanh, Vietnam. Total DNA extracts from 274 isolates were digested with EcoRV restriction enzyme and hybridized with a full-length probe of RTBV. In the total population, 22 EcoRV-restricted genome profiles (genotypes) were identified. Although overlapping genotypes could be observed, the outbreak sites of Lanao del Norte had a genotype combination distinct from that of Subang or Dien Khanh but a genotype combination similar to that identified earlier from North Cotabato, the adjacent endemic province. Sequence analysis of the intergenic region and part of the ORF1 RTBV genome from randomly selected genotypes confirms the geographic clustering of RTBV genotypes and, combined with restriction analysis, the results suggest a fragmented spatial distribution of RTBV local populations in the three countries. Because RTBV depends on rice tungro spherical virus (RTSV) for transmission, the population dynamics of both tungro viruses were then examined at the endemic and outbreak sites within the Philippines. The RTBV genotypes and the coat protein RTSV genotypes were used as indicators for virus diversity. A shift in population structure of both viruses was observed at the outbreak sites with a reduced RTBV but increased RTSV gene diversity

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper is a report of students' responses to instruction which was based on the use of concrete representations to solve linear equations. The sample consisted of 21 Grade 8 students from a middle-class suburban state secondary school with a reputation for high academic standards and innovative mathematics teaching. The students were interviewed before and after instruction. Interviews and classroom interactions were observed and videotaped. A qualitative analysis of the responses revealed that students did not use the materials in solving problems. The increased processing load caused by concrete representations is hypothesised as a reason.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A3 √((log n)/m) (ignoring log A and log m factors), where m is the number of training patterns. This may explain the generalization performance of neural networks, particularly when the number of training examples is considerably smaller than the number of weights. It also supports heuristics (such as weight decay and early stopping) that attempt to keep the weights small during training. The proof techniques appear to be useful for the analysis of other pattern classifiers: when the input domain is a totally bounded metric space, we use the same approach to give upper bounds on misclassification probability for classifiers with decision boundaries that are far from the training examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.