30 resultados para corpus multilingue
Resumo:
Background: During female reproductive cycles, a rapid fall in circulating progesterone (P4) levels is one of the earliest events that occur during induced luteolysis in mammals. In rodents, it is well recognized that during luteolysis, P4 is catabolized to its inactive metabolite, 20alpha-hydroxyprogesterone (20alpha-OHP) by the action of 20alpha-hydroxysteroid dehydrogenase (20alpha-HSD) enzyme and involves transcription factor, Nur77. Studies have been carried out to examine expression of 20alpha-HSD and its activity in the corpus luteum (CL) of buffalo cow. Methods: The expression of 20alpha-HSD across different bovine tissues along with CL was examined by qPCR analysis. Circulating P4 levels were monitored before and during PGF2alpha treatment. Expression of 20alpha-HSD and Nur77 mRNA was determined in CL at different time points post PGF2alpha treatment in buffalo cows. The chromatographic separation of P4 and its metabolite, 20alpha-OHP, in rat and buffalo cow serum samples were performed on reverse phase HPLC system. To further support the findings, 20alpha-HSD enzyme activity was quantitated in cytosolic fraction of CL of both rat and buffalo cow. Results: Circulating P4 concentration declined rapidly in response to PGF2alpha treatment. HPLC analysis of serum samples did not reveal changes in circulating 20alpha-OHP levels in buffalo cows but serum from pseudo pregnant rats receiving PGF2alpha treatment showed an increased 20alpha-OHP level at 24 h post treatment with accompanying decrease in P4 concentration. qPCR expression of 20alpha-HSD in CL from control and PGF2alpha-treated buffalo cows showed higher expression at 3 and 18 h post treatment, but its specific activity was not altered at different time points post PGF2alpha treatment. The Nur77 expression increased several fold 3 h post PGF2alpha treatment similar to the increased expression observed in the PGF2alpha-treated pseudo pregnant rats which perhaps suggest initiation of activation of apoptotic pathways in response to PGF2alpha treatment. Conclusions: The results taken together suggest that synthesis of P4 appears to be primarily affected by PGF2alpha treatment in buffalo cows in contrast to increased metabolism of P4 in rodents.
Resumo:
It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.
Resumo:
Sodium dodecyl sulphate-polyacrylamide gel electrophoresis of Percoll purified Leydig cell proteins from 20- and 120-day-old rats revealed a significant decrease in a low molecular weight peptide in the adult rats. Administration of human chorionic gonadotropin to immature rats resulted in a decrease in the low molecular weight peptide along with increase in testosterone production. Modulation of the peptide by human chorionic gonadotropin could be confirmed by Western blotting. The presence of a similar peptide could be detected by Western blotting in testes of immature mouse, hamster, guinea pig but not in adrenal, placenta and corpus luteum. Administration of testosterone propionate which is known to inhibit the pituitary luteinizing hormone levels in adult rats resulted in an increase in the low molecular weight peptide, as checked by Western blotting. It is suggested that this peptide may have a role in regulation of acquisition of responsiveness to luteinizing hormone by immature rat Leydig cells.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.
Resumo:
In this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary topics which best describe a given document and only extract sentences from those paragraphs within the document which are highly correlated given the summary topics. This ensures that our summaries always highlight the crux of the document without paying any attention to the grammar and the structure of the documents. Finally, we evaluate our summaries on the DUC 2002 Single document summarization data corpus using ROUGE measures. Our summaries had higher ROUGE values and better semantic similarity with the documents than the DUC summaries.
Resumo:
When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.
Resumo:
There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
Resumo:
Latent variable methods, such as PLCA (Probabilistic Latent Component Analysis) have been successfully used for analysis of non-negative signal representations. In this paper, we formulate PLCS (Probabilistic Latent Component Segmentation), which models each time frame of a spectrogram as a spectral distribution. Given the signal spectrogram, the segmentation boundaries are estimated using a maximum-likelihood approach. For an efficient solution, the algorithm imposes a hard constraint that each segment is modelled by a single latent component. The hard constraint facilitates the solution of ML boundary estimation using dynamic programming. The PLCS framework does not impose a parametric assumption unlike earlier ML segmentation techniques. PLCS can be naturally extended to model coarticulation between successive phones. Experiments on the TIMIT corpus show that the proposed technique is promising compared to most state of the art speech segmentation algorithms.
Resumo:
Scatter/Gather systems are increasingly becoming useful in browsing document corpora. Usability of the present-day systems are restricted to monolingual corpora, and their methods for clustering and labeling do not easily extend to the multilingual setting, especially in the absence of dictionaries/machine translation. In this paper, we study the cluster labeling problem for multilingual corpora in the absence of machine translation, but using comparable corpora. Using a variational approach, we show that multilingual topic models can effectively handle the cluster labeling problem, which in turn allows us to design a novel Scatter/Gather system ShoBha. Experimental results on three datasets, namely the Canadian Hansards corpus, the entire overlapping Wikipedia of English, Hindi and Bengali articles, and a trilingual news corpus containing 41,000 articles, confirm the utility of the proposed system.
Resumo:
Automatic and accurate detection of the closure-burst transition events of stops and affricates serves many applications in speech processing. A temporal measure named the plosion index is proposed to detect such events, which are characterized by an abrupt increase in energy. Using the maxima of the pitch-synchronous normalized cross correlation as an additional temporal feature, a rule-based algorithm is designed that aims at selecting only those events associated with the closure-burst transitions of stops and affricates. The performance of the algorithm, characterized by receiver operating characteristic curves and temporal accuracy, is evaluated using the labeled closure-burst transitions of stops and affricates of the entire TIMIT test and training databases. The robustness of the algorithm is studied with respect to global white and babble noise as well as local noise using the TIMIT test set and on telephone quality speech using the NTIMIT test set. For these experiments, the proposed algorithm, which does not require explicit statistical training and is based on two one-dimensional temporal measures, gives a performance comparable to or better than the state-of-the-art methods. In addition, to test the scalability, the algorithm is applied on the Buckeye conversational speech corpus and databases of two Indian languages. (C) 2014 Acoustical Society of America.
Resumo:
This paper describes a spatio-temporal registration approach for speech articulation data obtained from electromagnetic articulography (EMA) and real-time Magnetic Resonance Imaging (rtMRI). This is motivated by the potential for combining the complementary advantages of both types of data. The registration method is validated on EMA and rtMRI datasets obtained at different times, but using the same stimuli. The aligned corpus offers the advantages of high temporal resolution (from EMA) and a complete mid-sagittal view (from rtMRI). The co-registration also yields optimum placement of EMA sensors as articulatory landmarks on the magnetic resonance images, thus providing richer spatio-temporal information about articulatory dynamics. (C) 2014 Acoustical Society of America
Resumo:
In several species including the buffalo cow, prostaglandin (PG) F-2 alpha is the key molecule responsible for regression of corpus luteum (CL). Experiments were carried out to characterize gene expression changes in the CL tissue at various time points after administration of luteolytic dose of PGF(2 alpha) in buffalo cows. Circulating progesterone levels decreased within 1 h of PGF(2 alpha) treatment and evidence of apoptosis was demonstrable at 18 h post treatment. Microarray analysis indicated expression changes in several of immediate early genes and transcription factors within 3 h of treatment. Also, changes in expression of genes associated with cell to cell signaling, cytokine signaling, steroidogenesis, PG synthesis and apoptosis were observed. Analysis of various components of LH/CGR signaling in CL tissues indicated decreased LH/CGR protein expression, pCREB levels and PKA activity post PGF(2 alpha) treatment. The novel finding of this study is the down regulation of CYP19A1 gene expression accompanied by decrease in expression of E-2 receptors and circulating and intra luteal E-2 post PGF(2 alpha) treatment. Mining of microarray data revealed several differentially expressed E-2 responsive genes. Since CYP19A1 gene expression is low in the bovine CL, mining of microarray data of PGF(2 alpha)-treated macaques, the species with high luteal CYP19A1 expression, showed good correlation between differentially expressed E-2 responsive genes between both the species. Taken together, the results of this study suggest that PGF(2 alpha) interferes with luteotrophic signaling, impairs intraluteal E-2 levels and regulates various signaling pathways before the effects on structural luteolysis are manifest.
Resumo:
USC-TIMIT is an extensive database of multimodal speech production data, developed to complement existing resources available to the speech research community and with the intention of being continuously refined and augmented. The database currently includes real-time magnetic resonance imaging data from five male and five female speakers of American English. Electromagnetic articulography data have also been presently collected from four of these speakers. The two modalities were recorded in two independent sessions while the subjects produced the same 460 sentence corpus used previously in the MOCHA-TIMIT database. In both cases the audio signal was recorded and synchronized with the articulatory data. The database and companion software are freely available to the research community. (C) 2014 Acoustical Society of America.
Resumo:
Electromagnetic Articulography (EMA) technique is used to record the kinematics of different articulators while one speaks. EMA data often contains missing segments due to sensor failure. In this work, we propose a maximum a-posteriori (MAP) estimation with continuity constraint to recover the missing samples in the articulatory trajectories recorded using EMA. In this approach, we combine the benefits of statistical MAP estimation as well as the temporal continuity of the articulatory trajectories. Experiments on articulatory corpus using different missing segment durations show that the proposed continuity constraint results in a 30% reduction in average root mean squared error in estimation over statistical estimation of missing segments without any continuity constraint.
Resumo:
In subject-independent acoustic-to-articulatory inversion, the articulatory kinematics of a test subject are estimated assuming that the training corpus does not include data from the test subject. The training corpus in subject-independent inversion (SII) is formed with acoustic and articulatory kinematics data and the acoustic mismatch between training and test subjects is then estimated by an acoustic normalization using acoustic data drawn from a large pool of speakers called generic acoustic space (GAS). In this work, we focus on improving the SII performance through better acoustic normalization and adaptation. We propose unsupervised and several supervised ways of clustering GAS for acoustic normalization. We perform an adaptation of acoustic models of GAS using the acoustic data of the training and test subjects in SII. It is found that SII performance significantly improves (similar to 25% relative on average) over the subject-dependent inversion when the acoustic clusters in GAS correspond to phonetic units (or states of 3-state phonetic HMMs) and when the acoustic model built on GAS is adapted to training and test subjects while optimizing the inversion criterion. (C) 2014 Elsevier B.V. All rights reserved.