968 resultados para text vector space model


Relevância:

40.00% 40.00%

Publicador:

Resumo:

An unabridged and unaltered republication of the second edition published by Charles Scribner's Sons in 1909.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we compare a well-known semantic spacemodel, Latent Semantic Analysis (LSA) with another model, Hyperspace Analogue to Language (HAL) which is widely used in different area, especially in automatic query refinement. We conduct this comparative analysis to prove our hypothesis that with respect to ability of extracting the lexical information from a corpus of text, LSA is quite similar to HAL. We regard HAL and LSA as black boxes. Through a Pearsonrsquos correlation analysis to the outputs of these two black boxes, we conclude that LSA highly co-relates with HAL and thus there is a justification that LSA and HAL can potentially play a similar role in the area of facilitating automatic query refinement. This paper evaluates LSA in a new application area and contributes an effective way to compare different semantic space models.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The effect of transmitter and receiver array configurations on the stray-light and diffraction-caused crosstalk in free-space optical interconnects was investigated. The optical system simulation software (Code V) is used to simulate both the stray-light and diffraction-caused crosstalk. Experimentally measured, spectrally-resolved, near-field images of VCSEL higher order modes were used as extended sources in our simulation model. In addition, we have included the electrical and optical noise in our analysis to give more accurate overall performance of the FSOI system. Our results show that by changing the square lattice geometry to a hexagonal configuration, we obtain an overall signal-to-noise ratio improvement of 3 dB. Furthermore, system density is increased by up to 4 channels/mm2.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We investigate the effect of transmitter and receiver array configurations on the stray-light and diffraction-caused crosstalk in free-space optical interconnects. The optical system simulation software (Code V) is used to simulate both the stray-light and diffraction-caused crosstalk. Experimentally measured, spectrally-resolved, near-field images of VCSEL higher order modes were used as extended sources in our simulation model. Our results show that by changing the square lattice geometry to a hexagonal configuration, we obtain the reduction in the stray-light crosstalk of up to 9 dB and an overall signal-to-noise ratio improvement of 3 dB.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study examines the forecasting accuracy of alternative vector autoregressive models each in a seven-variable system that comprises in turn of daily, weekly and monthly foreign exchange (FX) spot rates. The vector autoregressions (VARs) are in non-stationary, stationary and error-correction forms and are estimated using OLS. The imposition of Bayesian priors in the OLS estimations also allowed us to obtain another set of results. We find that there is some tendency for the Bayesian estimation method to generate superior forecast measures relatively to the OLS method. This result holds whether or not the data sets contain outliers. Also, the best forecasts under the non-stationary specification outperformed those of the stationary and error-correction specifications, particularly at long forecast horizons, while the best forecasts under the stationary and error-correction specifications are generally similar. The findings for the OLS forecasts are consistent with recent simulation results. The predictive ability of the VARs is very weak.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Molecular transport in phase space is crucial for chemical reactions because it defines how pre-reactive molecular configurations are found during the time evolution of the system. Using Molecular Dynamics (MD) simulated atomistic trajectories we test the assumption of the normal diffusion in the phase space for bulk water at ambient conditions by checking the equivalence of the transport to the random walk model. Contrary to common expectations we have found that some statistical features of the transport in the phase space differ from those of the normal diffusion models. This implies a non-random character of the path search process by the reacting complexes in water solutions. Our further numerical experiments show that a significant long period of non-stationarity in the transition probabilities of the segments of molecular trajectories can account for the observed non-uniform filling of the phase space. Surprisingly, the characteristic periods in the model non-stationarity constitute hundreds of nanoseconds, that is much longer time scales compared to typical lifetime of known liquid water molecular structures (several picoseconds).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work examines prosody modelling for the Standard Yorùbá (SY) language in the context of computer text-to-speech synthesis applications. The thesis of this research is that it is possible to develop a practical prosody model by using appropriate computational tools and techniques which combines acoustic data with an encoding of the phonological and phonetic knowledge provided by experts. Our prosody model is conceptualised around a modular holistic framework. The framework is implemented using the Relational Tree (R-Tree) techniques (Ehrich and Foith, 1976). R-Tree is a sophisticated data structure that provides a multi-dimensional description of a waveform. A Skeletal Tree (S-Tree) is first generated using algorithms based on the tone phonological rules of SY. Subsequent steps update the S-Tree by computing the numerical values of the prosody dimensions. To implement the intonation dimension, fuzzy control rules where developed based on data from native speakers of Yorùbá. The Classification And Regression Tree (CART) and the Fuzzy Decision Tree (FDT) techniques were tested in modelling the duration dimension. The FDT was selected based on its better performance. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration and intonation, using different techniques and their subsequent integration. Our approach provides us with a flexible and extendible model that can also be used to implement, study and explain the theory behind aspects of the phenomena observed in speech prosody.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labelled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The paper considers vector discrete optimization problem with linear fractional functions of criteria on a feasible set that has combinatorial properties of combinations. Structural properties of a feasible solution domain and of Pareto–optimal (efficient), weakly efficient, strictly efficient solution sets are examined. A relation between vector optimization problems on a combinatorial set of combinations and on a continuous feasible set is determined. One possible approach is proposed in order to solve a multicriteria combinatorial problem with linear- fractional functions of criteria on a set of combinations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Abstract A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Data fluctuation in multiple measurements of Laser Induced Breakdown Spectroscopy (LIBS) greatly affects the accuracy of quantitative analysis. A new LIBS quantitative analysis method based on the Robust Least Squares Support Vector Machine (RLS-SVM) regression model is proposed. The usual way to enhance the analysis accuracy is to improve the quality and consistency of the emission signal, such as by averaging the spectral signals or spectrum standardization over a number of laser shots. The proposed method focuses more on how to enhance the robustness of the quantitative analysis regression model. The proposed RLS-SVM regression model originates from the Weighted Least Squares Support Vector Machine (WLS-SVM) but has an improved segmented weighting function and residual error calculation according to the statistical distribution of measured spectral data. Through the improved segmented weighting function, the information on the spectral data in the normal distribution will be retained in the regression model while the information on the outliers will be restrained or removed. Copper elemental concentration analysis experiments of 16 certified standard brass samples were carried out. The average value of relative standard deviation obtained from the RLS-SVM model was 3.06% and the root mean square error was 1.537%. The experimental results showed that the proposed method achieved better prediction accuracy and better modeling robustness compared with the quantitative analysis methods based on Partial Least Squares (PLS) regression, standard Support Vector Machine (SVM) and WLS-SVM. It was also demonstrated that the improved weighting function had better comprehensive performance in model robustness and convergence speed, compared with the four known weighting functions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.