123 resultados para Compressed text search
em Indian Institute of Science - Bangalore - Índia
Resumo:
Purpose - There are many library automation packages available as open-source software, comprising two modules: staff-client module and online public access catalogue (OPAC). Although the OPAC of these library automation packages provides advanced features of searching and retrieval of bibliographic records, none of them facilitate full-text searching. Most of the available open-source digital library software facilitates indexing and searching of full-text documents in different formats. This paper makes an effort to enable full-text search features in the widely used open-source library automation package Koha, by integrating it with two open-source digital library software packages, Greenstone Digital Library Software (GSDL) and Fedora Generic Search Service (FGSS), independently. Design/methodology/approach - The implementation is done by making use of the Search and Retrieval by URL (SRU) feature available in Koha, GSDL and FGSS. The full-text documents are indexed both in Koha and GSDL and FGSS. Findings - Full-text searching capability in Koha is achieved by integrating either GSDL or FGSS into Koha and by passing an SRU request to GSDL or FGSS from Koha. The full-text documents are indexed both in the library automation package (Koha) and digital library software (GSDL, FGSS) Originality/value - This is the first implementation enabling the full-text search feature in a library automation software by integrating it into digital library software.
Resumo:
Query focused summarization is the task of producing a compressed text of original set of documents based on a query. Documents can be viewed as graph with sentences as nodes and edges can be added based on sentence similarity. Graph based ranking algorithms which use 'Biased random surfer model' like topic-sensitive LexRank have been successfully applied to query focused summarization. In these algorithms, random walk will be biased towards the sentences which contain query relevant words. Specifically, it is assumed that random surfer knows the query relevance score of the sentence to where he jumps. However, neighbourhood information of the sentence to where he jumps is completely ignored. In this paper, we propose look-ahead version of topic-sensitive LexRank. We assume that random surfer not only knows the query relevance of the sentence to where he jumps but he can also look N-step ahead from that sentence to find query relevance scores of future set of sentences. Using this look ahead information, we figure out the sentences which are indirectly related to the query by looking at number of hops to reach a sentence which has query relevant words. Then we make the random walk biased towards even to the indirect query relevant sentences along with the sentences which have query relevant words. Experimental results show 20.2% increase in ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set. Further, our system outperforms best systems in DUC 2006 and results are comparable to state of the art systems.
Resumo:
The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation,thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.
Resumo:
This paper describes an approach based on Zernike moments and Delaunay triangulation for localization of hand-written text in machine printed text documents. The Zernike moments of the image are first evaluated and we classify the text as hand-written using the nearest neighbor classifier. These features are independent of size, slant, orientation, translation and other variations in handwritten text. We then use Delaunay triangulation to reclassify the misclassified text regions. When imposing Delaunay triangulation on the centroid points of the connected components, we extract features based on the triangles and reclassify the text. We remove the noise components in the document as part of the preprocessing step so this method works well on noisy documents. The success rate of the method is found to be 86%. Also for specific hand-written elements such as signatures or similar text the accuracy is found to be even higher at 93%.
Resumo:
Compressive Sampling Matching Pursuit (CoSaMP) is one of the popular greedy methods in the emerging field of Compressed Sensing (CS). In addition to the appealing empirical performance, CoSaMP has also splendid theoretical guarantees for convergence. In this paper, we propose a modification in CoSaMP to adaptively choose the dimension of search space in each iteration, using a threshold based approach. Using Monte Carlo simulations, we show that this modification improves the reconstruction capability of the CoSaMP algorithm in clean as well as noisy measurement cases. From empirical observations, we also propose an optimum value for the threshold to use in applications.
Resumo:
Query suggestion is an important feature of the search engine with the explosive and diverse growth of web contents. Different kind of suggestions like query, image, movies, music and book etc. are used every day. Various types of data sources are used for the suggestions. If we model the data into various kinds of graphs then we can build a general method for any suggestions. In this paper, we have proposed a general method for query suggestion by combining two graphs: (1) query click graph which captures the relationship between queries frequently clicked on common URLs and (2) query text similarity graph which finds the similarity between two queries using Jaccard similarity. The proposed method provides literally as well as semantically relevant queries for users' need. Simulation results show that the proposed algorithm outperforms heat diffusion method by providing more number of relevant queries. It can be used for recommendation tasks like query, image, and product suggestion.
Resumo:
This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.
Resumo:
In this paper, we present a machine learning approach to measure the visual quality of JPEG-coded images. The features for predicting the perceived image quality are extracted by considering key human visual sensitivity (HVS) factors such as edge amplitude, edge length, background activity and background luminance. Image quality assessment involves estimating the functional relationship between HVS features and subjective test scores. The quality of the compressed images are obtained without referring to their original images ('No Reference' metric). Here, the problem of quality estimation is transformed to a classification problem and solved using extreme learning machine (ELM) algorithm. In ELM, the input weights and the bias values are randomly chosen and the output weights are analytically calculated. The generalization performance of the ELM algorithm for classification problems with imbalance in the number of samples per quality class depends critically on the input weights and the bias values. Hence, we propose two schemes, namely the k-fold selection scheme (KS-ELM) and the real-coded genetic algorithm (RCGA-ELM) to select the input weights and the bias values such that the generalization performance of the classifier is a maximum. Results indicate that the proposed schemes significantly improve the performance of ELM classifier under imbalance condition for image quality assessment. The experimental results prove that the estimated visual quality of the proposed RCGA-ELM emulates the mean opinion score very well. The experimental results are compared with the existing JPEG no-reference image quality metric and full-reference structural similarity image quality metric.
Resumo:
In a search for inorganic oxide materials showing second-order nonlinear optical (NLO) susceptibility, we investigated several berates, silicates, and a phosphate containing trans-connected MO6, octahedral chains or MO5 square pyramids, where, M = d(0): Ti(IV), Nb(V), or Ta(V), Our investigations identified two new NLO structures: batisite, Na2Ba(TiO)(2)Si4O12, containing trans-connected TiO5 octahedral chains, and fresnoite, Ba2TiOSi2O7, containing square-pyramidal TiO5. Investigation of two other materials containing square-pyramidal TiO5 viz,, Cs2TiOP2O7 and Na4Ti2Si8O22. 4H(2)O, revealed that isolated TiO5, square pyramids alone do not cause a second harmonic generation (SHG) response; rather, the orientation of TiO5 units to produce -Ti-O-Ti-O- chains with alternating long and short Ti-O distances in the fresnoite structure is most likely the origin of a strong SHG response in fresnoite,
Resumo:
To correlate the Raman frequencies of the amide I and III bands to beta-turn structures, three peptides shown to contain beta-turn structure by x-ray diffraction and NMR were examined. The compounds examined were tertiary (formula: see text). The amide I band of these compounds is seen at 1,668, 1,665, and 1,677 cm-1, and the amide III band appears at 1,267, 1,265, and 1,286 cm-1, respectively. Thus, it is concluded that the amide I band for type III beta-turn structure appears in the range between 1,665 and 1,677 cm-1 and the amide III band between 1,265 and 1,286 cm-1.
Resumo:
A novel method is proposed to treat the problem of the random resistance of a strictly one-dimensional conductor with static disorder. It is suggested, for the probability distribution of the transfer matrix of the conductor, the distribution of maximum information-entropy, constrained by the following physical requirements: 1) flux conservation, 2) time-reversal invariance and 3) scaling, with the length of the conductor, of the two lowest cumulants of ζ, where = sh2ζ. The preliminary results discussed in the text are in qualitative agreement with those obtained by sophisticated microscopic theories.
Resumo:
The wedge shape is a fairly common cross-section found in many non-axisymmetric components used in machines, aircraft, ships and automobiles. If such components are forged between two mutually inclined dies the metal displaced by the dies flows into the converging as well as into the diverging channels created by the inclined dies. The extent of each type of flow (convergent/divergent) depends on the die—material interface friction and the included die angle. Given the initial cross-section, the length as well as the exact geometry of the forged cross-section are therefore uniquely determined by these parameters. In this paper a simple stress analysis is used to predict changes in the geometry of a wedge undergoing compression between inclined platens. The flow in directions normal to the cross-section is assumed to be negligible. Experiments carried out using wedge-shaped lead billets show that, knowing the interface friction and as long as the deformation is not too large, the dimensional changes in the wedge can be predicted with reasonable accuracy. The predicted flow behaviour of metal for a wide range of die angles and interface friction is presented: these characteristics can be used by the die designer to choose the die lubricant (only) if the die angle is specified and to choose both of these parameters if there is no restriction on the exact die angle. The present work shows that the length of a wedge undergoing compression is highly sensitive to die—material interface friction. Thus in a situation where the top and bottom dies are inclined to each other, a wedge made of the material to be forged could be put between the dies and then compressed, whereupon the length of the compressed wedge — given the degree of compression — affords an estimate of the die—material interface friction.
Resumo:
The inhibitory action of the anticancer antibiotic, Adriamycin, on succinate-dependent oxidative phosphorylation in heart mitochondria was markedly potentiated by the presence of hexokinase in the reaction medium. This 'hexokinase effect' was not observed in the oxidation of NAD+-linked substrates, or when liver or kidney mitochondria were used in place of heart mitochondria. These results offer a biochemical explanation for the extreme cardiac toxicity of the drug.
Resumo:
This paper gives a brief survey of research and development work done on hand pumps in India as well as elsewhere and sets out the approach adopted by ASTRA Working Group. Ten ways in which a hand pump breakdown in practice have been identified. The physical reasons behind each type of breakdown analysed. Remedial measures have been developed from this analysis. Laboratory test rigs fabricated to evaluate these measures have been described and some experimental results presented. The course of further work has been charted.
Resumo:
The addition of guanosine 5-monophosphate (5′-GMP) to an aqueous solution of Mn2+ ions results in a decrease in ESR signal intensity and an increase in line-width of Mn2+ ions. This can be interpreted in terms of stepwise formation of outersphere and inner-sphere complexes as When Mg2+ is added to a mixture of Mn2+ and 5′-GMP, ESR signal intensity increases, presumably due to the replacement of Mn2+ by Mg2+ in the complex. From the variation of ESR signal intensity as a function of concentration of Mg2+, the product K1K2 for the magnesium complex i s calculated as 125 M−1. This difference in stability constants may indicate that both phosphate group and guanine base are involved in the formation of Mn2+-5′-GMP complex.