571 resultados para Corpora Pedunculata
Resumo:
Several studies have demonstrated an association between polycystic ovary syndrome (PCOS) and the dinucleotide repeat microsatellite marker D19S884, which is located in intron 55 of the fibrillin-3 (FBN3) gene. Fibrillins, including FBN1 and 2, interact with latent transforming growth factor (TGF)-β-binding proteins (LTBP) and thereby control the bioactivity of TGFβs. TGFβs stimulate fibroblast replication and collagen production. The PCOS ovarian phenotype includes increased stromal collagen and expansion of the ovarian cortex, features feasibly influenced by abnormal fibrillin expression. To examine a possible role of fibrillins in PCOS, particularly FBN3, we undertook tagging and functional single nucleotide polymorphism (SNP) analysis (32 SNPs including 10 that generate non-synonymous amino acid changes) using DNA from 173 PCOS patients and 194 controls. No SNP showed a significant association with PCOS and alleles of most SNPs showed almost identical population frequencies between PCOS and control subjects. No significant differences were observed for microsatellite D19S884. In human PCO stroma/cortex (n = 4) and non-PCO ovarian stroma (n = 9), follicles (n = 3) and corpora lutea (n = 3) and in human ovarian cancer cell lines (KGN, SKOV-3, OVCAR-3, OVCAR-5), FBN1 mRNA levels were approximately 100 times greater than FBN2 and 200–1000-fold greater than FBN3. Expression of LTBP-1 mRNA was 3-fold greater than LTBP-2. We conclude that FBN3 appears to have little involvement in PCOS but cannot rule out that other markers in the region of chromosome 19p13.2 are associated with PCOS or that FBN3 expression occurs in other organs and that this may be influencing the PCOS phenotype.
Resumo:
Information has no value unless it is accessible. Information must be connected together so a knowledge network can then be built. Such a knowledge base is a key resource for Internet users to interlink information from documents. Information retrieval, a key technology for knowledge management, guarantees access to large corpora of unstructured text. Collaborative knowledge management systems such as Wikipedia are becoming more popular than ever; however, their link creation function is not optimized for discovering possible links in the collection and the quality of automatically generated links has never been quantified. This research begins with an evaluation forum which is intended to cope with the experiments of focused link discovery in a collaborative way as well as with the investigation of the link discovery application. The research focus was on the evaluation strategy: the evaluation framework proposal, including rules, formats, pooling, validation, assessment and evaluation has proved to be efficient, reusable for further extension and efficient for conducting evaluation. The collection-split approach is used to re-construct the Wikipedia collection into a split collection comprising single passage files. This split collection is proved to be feasible for improving relevant passages discovery and is devoted to being a corpus for focused link discovery. Following these experiments, a mobile client-side prototype built on iPhone is developed to resolve the mobile Search issue by using focused link discovery technology. According to the interview survey, the proposed mobile interactive UI does improve the experience of mobile information seeking. Based on this evaluation framework, a novel cross-language link discovery proposal using multiple text collections is developed. A dynamic evaluation approach is proposed to enhance both the collaborative effort and the interacting experience between submission and evaluation. A realistic evaluation scheme has been implemented at NTCIR for cross-language link discovery tasks.
Resumo:
This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The performance of four channel compensation techniques: (a) weighted maximum margin criterion (WMMC), (b) source-normalized WMMC (SN-WMMC), (c) weighted linear discriminant analysis (WLDA), and; (d) source-normalized WLDA (SN-WLDA) have been investigated. We show that, by extracting the discriminatory information between pairs of speakers as well as capturing the source variation information in the development i-vector space, the SN-WLDA based cosine similarity scoring (CSS) i-vector system is shown to provide over 20% improvement in EER for NIST 2008 interview and microphone verification and over 10% improvement in EER for NIST 2008 telephone verification, when compared to SN-LDA based CSS i-vector system. Further, score-level fusion techniques are analyzed to combine the best channel compensation approaches, to provide over 8% improvement in DCF over the best single approach, (SN-WLDA), for NIST 2008 interview/ telephone enrolment-verification condition. Finally, we demonstrate that the improvements found in the context of CSS also generalize to state-of-the-art GPLDA with up to 14% relative improvement in EER for NIST SRE 2010 interview and microphone verification and over 7% relative improvement in EER for NIST SRE 2010 telephone verification.
Resumo:
A significant amount of speech data is required to develop a robust speaker verification system, but it is difficult to find enough development speech to match all expected conditions. In this paper we introduce a new approach to Gaussian probabilistic linear discriminant analysis (GPLDA) to estimate reliable model parameters as a linearly weighted model taking more input from the large volume of available telephone data and smaller proportional input from limited microphone data. In comparison to a traditional pooled training approach, where the GPLDA model is trained over both telephone and microphone speech, this linear-weighted GPLDA approach is shown to provide better EER and DCF performance in microphone and mixed conditions in both the NIST 2008 and NIST 2010 evaluation corpora. Based upon these results, we believe that linear-weighted GPLDA will provide a better approach than pooled GPLDA, allowing for the further improvement of GPLDA speaker verification in conditions with limited development data.
Resumo:
In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.
Resumo:
Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).
Resumo:
The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of back- ground noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed with a wide variety of noise conditions, environmental reverberant responses, and signal-to-noise ratios, this protocol provides a solid basis for the development, evaluation and benchmarking of robust speaker recognition algorithms, and is freely available to download alongside the QUT-NOISE database. In this work, we use the QUT-NOISE-SRE protocol to evaluate a state-of-the-art PLDA i-vector speaker recognition system, demonstrating the importance of designing voice-activity-detection front-ends specifically for speaker recognition, rather than aiming for perfect coherence with the true speech/non-speech boundaries.
Resumo:
The objective of the current study was to investigate the mechanism by which the corpus luteum (CL) of the monkey undergoes desensitization to luteinizing hormone following exposure to increasing concentration of human chorionic gonadotrophin (hCG) as it occurs in pregnancy. Female bonnet monkeys were injected (im) increasing doses of hCG or dghCG beginning from day 6 or 12 of the luteal phase for either 10 or 4 or 2 days. The day of oestrogen surge was considered as day '0' of luteal phase. Luteal cells obtained from CL of these animals were incubated with hCG (2 and 200 pg/ml) or dbcAMP (2.5, 25 and 100 mu M) for 3 h at 37 degrees C and progesterone secreted was estimated. Corpora lutea of normal cycling monkeys on day 10/16/22 of the luteal phase were used as controls, In addition the in vivo response to CG and deglycosylated hCG (dghCG) was assessed by determining serum steroid profiles following their administration. hCG (from 15-90 IU) but not dghCG (15-90 IU) treatment in vivo significantly (P < 0.05) elevated serum progesterone and oestradiol levels. Serum progesterone, however, could not be maintained at a elevated level by continuous treatment with hCG (from day 6-15), the progesterone level declining beyond day 13 of luteal phase. Administering low doses of hCG (15-90 IU/day) from day 6-9 or high doses (600 IU/day) on days 8 and 9 of the luteal phase resulted in significant increase (about 10-fold over corresponding control P < 0.005) in the ability of luteal cells to synthesize progesterone (incubated controls) in vitro. The luteal cells of the treated animals responded to dbcAMP (P < 0.05) but not to hCG added in vitro, The in vitro response of luteal cells to added hCG was inhibited by 0, 50 and 100% if the animals were injected with low (15-90 IU) or medium (100 IU) between day 6-9 of luteal phase and high (600 IU on day 8 and 9 of luteal phase) doses of dghCG respectively; such treatment had no effect on responsivity of the cells to dbcAMP, The luteal cell responsiveness to dbcAMP in vitro was also blocked if hCG was administered for 10 days beginning day 6 of the luteal phase. Though short term hCG treatment during late luteal phase (from days 12-15) had no effect on luteal function, 10 day treatment beginning day 12 of luteal phase resulted in regain of in vitro responsiveness to both hCG (P < 0.05) and dbcAMP (P < 0.05) suggesting that luteal rescue can occur even at this late stage. In conclusion, desensitization of the CL to hCG appears to be governed by the dose/period for which it is exposed to hCG/dghCG. That desensitization is due to receptor occupancy is brought out by the fact that (i) this can be achieved by giving a larger dose of hCG over a 2 day period instead of a lower dose of the hormone for a longer (4 to 10 days) period and (ii) the effect can largely be reproduced by using dghCG instead of hCG to block the receptor sites. It appears that to achieve desensitization to dbcAMP also it is necessary to expose the luteal cell to relatively high dose of hCG for more than 4 days.
Resumo:
This study investigates the use of unsupervised features derived from word embedding approaches and novel sequence representation approaches for improving clinical information extraction systems. Our results corroborate previous findings that indicate that the use of word embeddings significantly improve the effectiveness of concept extraction models; however, we further determine the influence that the corpora used to generate such features have. We also demonstrate the promise of sequence-based unsupervised features for further improving concept extraction.
Resumo:
This study reports a diachronic corpus investigation of common-number pronouns used to convey unknown or otherwise unspecified reference. The study charts agreement patterns in these pronouns in various diachronic and synchronic corpora. The objective is to provide base-line data on variant frequencies and distributions in the history of English, as there are no previous systematic corpus-based observations on this topic. This study seeks to answer the questions of how pronoun use is linked with the overall typological development in English and how their diachronic evolution is embedded in the linguistic and social structures in which they are used. The theoretical framework draws on corpus linguistics and historical sociolinguistics, grammaticalisation, diachronic typology, and multivariate analysis of modelling sociolinguistic variation. The method employs quantitative corpus analyses from two main electronic corpora, one from Modern English and the other from Present-day English. The Modern English material is the Corpus of Early English Correspondence, and the time frame covered is 1500-1800. The written component of the British National Corpus is used in the Present-day English investigations. In addition, the study draws supplementary data from other electronic corpora. The material is used to compare the frequencies and distributions of common-number pronouns between these two time periods. The study limits the common-number uses to two subsystems, one anaphoric to grammatically singular antecedents and one cataphoric, in which the pronoun is followed by a relative clause. Various statistical tools are used to process the data, ranging from cross-tabulations to multivariate VARBRUL analyses in which the effects of sociolinguistic and systemic parameters are assessed to model their impact on the dependent variable. This study shows how one pronoun type has extended its uses in both subsystems, an increase linked with grammaticalisation and the changes in other pronouns in English through the centuries. The variationist sociolinguistic analysis charts how grammaticalisation in the subsystems is embedded in the linguistic and social structures in which the pronouns are used. The study suggests a scale of two statistical generalisations of various sociolinguistic factors which contribute to grammaticalisation and its embedding at various stages of the process.
Resumo:
Da hatte das Pferd die Nüstern voll. Gebrauch und Funktion von Phraseologie im Kinderbuch. Untersuchungen zu Erich Kästner und anderen Autoren. [Da hatte das Pferd die Nüstern voll. Fraseologian käyttö ja tehtävä lastenkirjallisuudessa. Tutkimuksia Erich Kästnerin ja muiden kirjailijoiden tuotannossa.] Usein oletetaan, että idiomit ovat lapsille vaikeita ymmärtää, koska niiden merkitystä ei voi kokonaisuudessaan johtaa rakenteeseen kuuluvien yksittäisten sanojen merkityksestä. Silti lastenkirjallisuudessa idiomeja käytetään paljon ja monessa eri tehtävässä. Tässä tutkimuksessa tarkistetaan fraseologian (idiomien ja sanalaskujen) käytön koko skaala saksankielisessä lastenkirjallisuudessa Erich Kästnerin (1899-1976) klassikoista tähän päivään asti. Kolmen eri korpuksen avulla (905 idiomiesimerkkiä kuudesta Kästnerin lastenkirjasta, 333 idiomia kahdesta Kästnerin aikuisromaanista ja 580 esimerkkiä kuudesta eri kirjailijoiden kirjoittamasta lastenkirjasta) pyritään vastamaan mm. seuraaviin kysymyksiin: Kuinka paljon ja minkälaisia idiomeja teksteissä käytetään? Miten idiomit sijoitetaan teksteihin, minkälaisia suhteita kontekstiin rakentuu? Millaisia eroavaisuuksia idiomien käytössä on havaittavissa ensinnäkin saman kirjailijan (Kästnerin) lastenkirjojen ja aikuisille tarkoitettujen kirjojen välillä sekä toisaalta eri kirjailijoiden kirjoittamien lastenkirjojen välillä? Tutkimuksesta käy ilmi, että idiomien käyttö vaihtelee lastenkirjallisuudessa ensisijaisesti kirjailijoittain, joka näkyy erilaisten ’fraseologisten profiilien’ esiintyminä. Parafraasien käyttö (idiomin rinnalle asetetaan synonyyminen ei-idiomaattinen ilmaisu) on varsin yleistä kaikissa tutkituissa lastenkirjoissa. Kästnerin lastenkirjoissa parafraasin käyttö on selvästi yleisempää kuin aikuisromaaneissa. Näyttää siltä, että lastenkirjallisuudessa siis tietoisesti tai tiedostumatta otetaan huomioon lasten rajoitettu fraseologinen kompetenssi.
Resumo:
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.
Resumo:
In this study I look at what people want to express when they talk about time in Russian and Finnish, and why they use the means they use. The material consists of expressions of time: 1087 from Russian and 1141 from Finnish. They have been collected from dictionaries, usage guides, corpora, and the Internet. An expression means here an idiomatic set of words in a preset form, a collocation or construction. They are studied as lexical entities, without a context, and analysed and categorized according to various features. The theoretical background for the study includes two completely different approaches. Functional Syntax is used in order to find out what general meanings the speaker wishes to convey when talking about time and how these meanings are expressed in specific languages. Conceptual metaphor theory is used for explaining why the expressions are as they are, i.e. what kind of conceptual metaphors (transfers from one conceptual domain to another) they include. The study has resulted in a grammatically glossed list of time expressions in Russian and Finnish, a list of 56 general meanings involved in these time expressions and an account of the means (constructions) that these languages have for expressing the general meanings defined. It also includes an analysis of conceptual metaphors behind the expressions. The general meanings involved turned out to revolve around expressing duration, point in time, period of time, frequency, sequence, passing of time, suitable time and the right time, life as time, limitedness of time, and some other notions having less obvious semantic relations to the others. Conceptual metaphor analysis of the material has shown that time is conceptualized in Russian and Finnish according to the metaphors Time Is Space (Time Is Container, Time Has Direction, Time Is Cycle, and the Time Line Metaphor), Time Is Resource (and its submapping Time Is Substance), Time Is Actor; and some characteristics are added to these conceptualizations with the help of the secondary metaphors Time Is Nature and Time Is Life. The limits between different conceptual metaphors and the connections these metaphors have with one another are looked at with the help of the theory of conceptual integration (the blending theory) and its schemas. The results of the study show that although Russian and Finnish are typologically different, they are very similar both in the needs of expression their speakers have concerning time, and in the conceptualizations behind expressing time. This study introduces both theoretical and methodological novelties in the nature of material used, in developing empirical methodology for conceptual metaphor studies, in the exactness of defining the limits of different conceptual metaphors, and in seeking unity among the different facets of time. Keywords: time, metaphor, time expression, idiom, conceptual metaphor theory, functional syntax, blending theory
Resumo:
This study reports a corpus-based study of medieval English herbals, which are texts conveying information on medicinal plants. Herbals belong to the medieval medical register. The study charts intertextual parallels within the medieval genre, and between herbals and other contemporary medical texts. It seeks to answer questions where and how herbal texts are linked to each other, and to other medical writing. The theoretical framework of the study draws on intertextuality and genre studies, manuscript studies, corpus linguistics, and multi-dimensional text analysis. The method combines qualitative and quantitative analyses of textual material from three historical special-language corpora of Middle and Early Modern English, one of which was compiled for the purposes of this study. The text material contains over 800,000 words of medical texts. The time span of the material is from c. 1330 to 1550. Text material is retrieved from the corpora by using plant name lists as search criteria. The raw data is filtered through qualitative analysis which produces input for the quantitative analysis, multi-dimensional scaling (MDS). In MDS, the textual space that parallel text passages form is observed, and the observations are explained by a qualitative analysis. This study concentrates on evidence of material and structural intertextuality. The analysis shows patterns of affinity between the texts of the herbal genre, and between herbals and other texts in the medical register. Herbals are most closely linked with recipe collections and regimens of health: they comprise over 95 per cent of the intertextual links between herbals and other medical writing. Links to surgical texts, or to specialised medical texts are very few. This can be explained by the history of the herbal genre: as herbals carry information on medical ingredients, herbs, they are relevant for genres that are related to pharmacological therapy. Conversely, herbals draw material from recipe collections in order to illustrate the medicinal properties of the herbs they describe. The study points out the close relationship between medical recipes and recipe-like passages in herbals (recipe paraphrases). The examples of recipe paraphrases show that they may have been perceived as indirect instruction. Keywords: medieval herbals, early English medicine, corpus linguistics, intertextuality, manuscript studies