897 resultados para (hyper)text
Resumo:
Purpose - There are many library automation packages available as open-source software, comprising two modules: staff-client module and online public access catalogue (OPAC). Although the OPAC of these library automation packages provides advanced features of searching and retrieval of bibliographic records, none of them facilitate full-text searching. Most of the available open-source digital library software facilitates indexing and searching of full-text documents in different formats. This paper makes an effort to enable full-text search features in the widely used open-source library automation package Koha, by integrating it with two open-source digital library software packages, Greenstone Digital Library Software (GSDL) and Fedora Generic Search Service (FGSS), independently. Design/methodology/approach - The implementation is done by making use of the Search and Retrieval by URL (SRU) feature available in Koha, GSDL and FGSS. The full-text documents are indexed both in Koha and GSDL and FGSS. Findings - Full-text searching capability in Koha is achieved by integrating either GSDL or FGSS into Koha and by passing an SRU request to GSDL or FGSS from Koha. The full-text documents are indexed both in the library automation package (Koha) and digital library software (GSDL, FGSS) Originality/value - This is the first implementation enabling the full-text search feature in a library automation software by integrating it into digital library software.
Resumo:
Transliteration system for mobile phone is an area that is always in demand given the difficulties and constraints we face in its implementation. In this paper we deal with automatic transliteration system for Kannada which has a non-uniform geometry and inter-character spacing unlike non-oriental language text like English. So it is even more a challenging problem. Working model consists of part of the process taking place on a mobile with remaining on a server. Good results are achieved.
Resumo:
This paper presents the design of a full fledged OCR system for printed Kannada text. The machine recognition of Kannada characters is difficult due to similarity in the shapes of different characters, script complexity and non-uniqueness in the representation of diacritics. The document image is subject to line segmentation, word segmentation and zone detection. From the zonal information, base characters, vowel modifiers and consonant conjucts are separated. Knowledge based approach is employed for recognizing the base characters. Various features are employed for recognising the characters. These include the coefficients of the Discrete Cosine Transform, Discrete Wavelet Transform and Karhunen-Louve Transform. These features are fed to different classifiers. Structural features are used in the subsequent levels to discriminate confused characters. Use of structural features, increases recognition rate from 93% to 98%. Apart from the classical pattern classification technique of nearest neighbour, Artificial Neural Network (ANN) based classifiers like Back Propogation and Radial Basis Function (RBF) Networks have also been studied. The ANN classifiers are trained in supervised mode using the transform features. Highest recognition rate of 99% is obtained with RBF using second level approximation coefficients of Haar wavelets as the features on presegmented base characters.
Resumo:
The critical micelle concentration (CMC) of several surfactants that contain an NLO chromophore, either at the hydrocarbon tail, or at the hydrophilic headgroup, or even as a counterion, was determined by hyper-Rayleigh scattering (HRS). In all cases, the HRS signal exhibited a similar variation with surfactant concentration, wherein the CMC is inferred from a rather unprecedented drop in the signal intensity. This drop is attributed to the formation of small pre-micellar aggregates, whose concentrations become negligible above CMC. In addition, a probe molecule, which upon protonation yielded a species with significantly enhanced HRS intensity, was developed and its utility for the determination of the CIVIC of simple fatty acids was demonstrated.
Resumo:
Plant organs are initiated as primordial outgrowths, and require controlled cell division and differentiation to achieve their final size and shape. Superimposed on this is another developmental program that orchestrates the switch from vegetative to reproductive to senescence stages in the life cycle. These require sequential function of heterochronic regulators. Little is known regarding the coordination between organ and organismal growth in plants. The TCP gene family encodes transcription factors that control diverse developmental traits, and a subgroup of class II TCP genes regulate leaf morphogenesis. Absence of these genes results in large, crinkly leaves due to excess division, mainly at margins. It has been suggested that these class II TCPs modulate the spatio-temporal control of differentiation in a growing leaf, rather than regulating cell proliferation per se. However, the link between class II TCP action and cell growth has not been established. As loss-of-function mutants of individual TCP genes in Arabidopsis are not very informative due to gene redundancy, we generated a transgenic line that expressed a hyper-activated form of TCP4 in its endogenous expression domain. This resulted in premature onset of maturation and decreased cell proliferation, leading to much smaller leaves, with cup-shaped lamina in extreme cases. Further, the transgenic line initiated leaves faster than wild-type and underwent precocious reproductive maturation due to a shortened adult vegetative phase. Early senescence and severe fertility defects were also observed. Thus, hyper-activation of TCP4 revealed its role in determining the timing of crucial developmental events, both at the organ and organism level.
Resumo:
The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation,thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.
Resumo:
To realistically simulate the motion of flexible objects such as ropes, strings, snakes, or human hair,one strategy is to discretise the object into a large number of small rigid links connected by rotary or spherical joints. The discretised system is highly redundant and the rotations at the joints (or the motion of the other links) for a desired Cartesian motion of the end of a link cannot be solved uniquely. In this paper, we propose a novel strategy to resolve the redundancy in such hyper-redundant systems.We make use of the classical tractrix curve and its attractive features. For a desired Cartesian motion of the `head'of a link, the `tail' of the link is moved according to a tractrix,and recursively all links of the discretised objects are moved along different tractrix curves. We show that the use of a tractrix curve leads to a more `natural' motion of the entire object since the motion is distributed uniformly along the entire object with the displacements tending to diminish from the `head' to the `tail'. We also show that the computation of the motion of the links can be done in real time since it involves evaluation of simple algebraic, trigonometric and hyperbolic functions. The strategy is illustrated by simulations of a snake, tying of knots with a rope and a solution of the inverse kinematics of a planar hyper-redundant manipulator.
Resumo:
In this article, we report the structure of a 1:1 charge transfer complex between pyridine (PYR) and chloranil (CHL) in solution (CHCl(3)) from the measurement of hyperpolarizability (beta(HRS)) and linear and circular depolarization ratios, D and D', respectively, by the hyper-Rayleigh scattering technique and state-of-the-art quantum chemical calculations. Using linearly (electric field vector along X) and circularly polarized incident light, respectively, we have measured two macroscopic depolarization ratios D = I(X,X)(2 omega)/I(X,Z)(2 omega) and D' = I(X,C)(2 omega)/I(Z,C)(2 omega) in the laboratory fixed XYZ frame by detecting the second harmonic (SH) scattered light in a polarization resolved fashion. The stabilization energy and the optical gap calculated through the MP2/cc-pVDZ method using Gaussian09 were not significantly different to distinguish between the cofacial and T-shape structures. Only when the experimentally obtained beta(HRS) and the depolarization ratios, D and D', were matched with the theoretically computed values from single and double configuration interaction (SDCI) calculations performed using the ZINDO-SCRF technique, we concluded that the room temperature equilibrium structure of the complex is cofacial. This is in sharp contrast to an earlier theoretical prediction of the T-shape structure of the complex.
Resumo:
The present approach uses stopwords and the gaps that oc- cur between successive stopwords –formed by contentwords– as features for sentiment classification.
Resumo:
This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.
Resumo:
This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.
Resumo:
In document community support vector machines and naïve bayes classifier are known for their simplistic yet excellent performance. Normally the feature subsets used by these two approaches complement each other, however a little has been done to combine them. The essence of this paper is a linear classifier, very similar to these two. We propose a novel way of combining these two approaches, which synthesizes best of them into a hybrid model. We evaluate the proposed approach using 20ng dataset, and compare it with its counterparts. The efficacy of our results strongly corroborate the effectiveness of our approach.
Resumo:
Transductive SVM (TSVM) is a well known semi-supervised large margin learning method for binary text classification. In this paper we extend this method to multi-class and hierarchical classification problems. We point out that the determination of labels of unlabeled examples with fixed classifier weights is a linear programming problem. We devise an efficient technique for solving it. The method is applicable to general loss functions. We demonstrate the value of the new method using large margin loss on a number of multi-class and hierarchical classification datasets. For maxent loss we show empirically that our method is better than expectation regularization/constraint and posterior regularization methods, and competitive with the version of entropy regularization method which uses label constraints.