119 resultados para Corpus inscriptionum graecarum.
Resumo:
The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the data and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.
Resumo:
Extracellular matrix regulates many cellular processes likely to be important for development and regression of corpora lutea. Therefore, we identified the types and components of the extracellular matrix of the human corpus luteum at different stages of the menstrual cycle. Two different types of extracellular matrix were identified by electron microscopy; subendothelial basal laminas and an interstitial matrix located as aggregates at irregular intervals between the non-vascular cells. No basal laminas were associated with luteal cells. At all stages, collagen type IV α1 and laminins α5, β2 and γ1 were localized by immunohistochemistry to subendothelial basal laminas, and collagen type IV α1 and laminins α2, α5, β1 and β2 localized in the interstitial matrix. Laminin α4 and β1 chains occurred in the subendothelial basal lamina from mid-luteal stage to regression; at earlier stages, a punctate pattern of staining was observed. Therefore, human luteal subendothelial basal laminas potentially contain laminin 11 during early luteal development and, additionally, laminins 8, 9 and 10 at the mid-luteal phase. Laminin α1 and α3 chains were not detected in corpora lutea. Versican localized to the connective tissue extremities of the corpus luteum. Thus, during the formation of the human corpus luteum, remodelling of extracellular matrix does not result in basal laminas as present in the adrenal cortex or ovarian follicle. Instead, novel aggregates of interstitial matrix of collagen and laminin are deposited within the luteal parenchyma, and it remains to be seen whether this matrix is important for maintaining the luteal cell phenotype.
Resumo:
In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.
Resumo:
Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.
Resumo:
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.
Resumo:
The effectiveness of higher-order spectral (HOS) phase features in speaker recognition is investigated by comparison with Mel Cepstral features on the same speech data. HOS phase features retain phase information from the Fourier spectrum unlikeMel–frequency Cepstral coefficients (MFCC). Gaussian mixture models are constructed from Mel– Cepstral features and HOS features, respectively, for the same data from various speakers in the Switchboard telephone Speech Corpus. Feature clusters, model parameters and classification performance are analyzed. HOS phase features on their own provide a correct identification rate of about 97% on the chosen subset of the corpus. This is the same level of accuracy as provided by MFCCs. Cluster plots and model parameters are compared to show that HOS phase features can provide complementary information to better discriminate between speakers.
Resumo:
In this paper we analyse a 600,000 word corpus comprised of policy statements produced within supranational, national, state and local legislatures about the nature and causes of(un)employment. We identify significant rhetorical and discursive features deployed by third sector (un)employment policy authors that function to extend their legislative grasp to encompass the most intimate aspects of human association.
Resumo:
In this article I outline and demonstrate a synthesis of the methods developed by Lemke (1998) and Martin (2000) for analyzing evaluations in English. I demonstrate the synthesis using examples from a 1.3-million-word technology policy corpus drawn from institutions at the local, state, national, and supranational levels. Lemke's (1998) critical model is organized around the broad 'evaluative dimensions' that are deployed to evaluate propositions and proposals in English. Martin's (2000) model is organized with a more overtly systemic-functional orientation around the concept of 'encoded feeling'. In applying both these models at different times, whilst recognizing their individual usefulness and complementarity, I found specific limitations that led me to work towards a synthesis of the two approaches. I also argue for the need to consider genre, media, and institutional aspects more explicitly when claiming intertextual and heteroglossic relations as the basis for inferred evaluations. A basic assertion made in this article is that the perceived Desirability of a process, person, circumstance, or thing is identical to its 'value'. But the Desirability of anything is a socially and thus historically conditioned attribution that requires significant amounts of institutional inculcation of other 'types' of value-appropriateness, importance, beauty, power, and so on. I therefore propose a method informed by critical discourse analysis (CDA) that sees evaluation as happening on at least four interdependent levels of abstraction.
Resumo:
Although internet chat is a significant aspect of many internet users’ lives, the manner in which participants in quasi-synchronous chat situations orient to issues of social and moral order remains to be studied in depth. The research presented here is therefore at the forefront of a continually developing area of study. This work contributes new insights into how members construct and make accountable the social and moral orders of an adult-oriented Internet Relay Chat (IRC) channel by addressing three questions: (1) What conversational resources do participants use in addressing matters of social and moral order? (2) How are these conversational resources deployed within IRC interaction? and (3) What interactional work is locally accomplished through use of these resources? A survey of the literature reveals considerable research in the field of computer-mediated communication, exploring both asynchronous and quasi-synchronous discussion forums. The research discussed represents a range of communication interests including group and collaborative interaction, the linguistic construction of social identity, and the linguistic features of online interaction. It is suggested that the present research differs from previous studies in three ways: (1) it focuses on the interaction itself, rather than the ways in which the medium affects the interaction; (2) it offers turn-by-turn analysis of interaction in situ; and (3) it discusses membership categories only insofar as they are shown to be relevant by participants through their talk. Through consideration of the literature, the present study is firmly situated within the broader computer-mediated communication field. Ethnomethodology, conversation analysis and membership categorization analysis were adopted as appropriate methodological approaches to explore the research focus on interaction in situ, and in particular to investigate the ways in which participants negotiate and co-construct social and moral orders in the course of their interaction. IRC logs collected from one chat room were analysed using a two-pass method, based on a modification of the approaches proposed by Pomerantz and Fehr (1997) and ten Have (1999). From this detailed examination of the data corpus three interaction topics are identified by means of which participants clearly orient to issues of social and moral order: challenges to rule violations, ‘trolling’ for cybersex, and experiences regarding the 9/11 attacks. Instances of these interactional topics are subjected to fine-grained analysis, to demonstrate the ways in which participants draw upon various interactional resources in their negotiation and construction of channel social and moral orders. While these analytical topics stand alone in individual focus, together they illustrate different instances in which participants’ talk serves to negotiate social and moral orders or collaboratively construct new orders. Building on the work of Vallis (2001), Chapter 5 illustrates three ways that rule violation is initiated as a channel discussion topic: (1) through a visible violation in open channel, (2) through an official warning or sanction by a channel operator regarding the violation, and (3) through a complaint or announcement of a rule violation by a non-channel operator participant. Once the topic has been initiated, it is shown to become available as a topic for others, including the perceived violator. The fine-grained analysis of challenges to rule violations ultimately demonstrates that channel participants orient to the rules as a resource in developing categorizations of both the rule violation and violator. These categorizations are contextual in that they are locally based and understood within specific contexts and practices. Thus, it is shown that compliance with rules and an orientation to rule violations as inappropriate within the social and moral orders of the channel serves two purposes: (1) to orient the speaker as a group member, and (2) to reinforce the social and moral orders of the group. Chapter 6 explores a particular type of rule violation, solicitations for ‘cybersex’ known in IRC parlance as ‘trolling’. In responding to trolling violations participants are demonstrated to use affiliative and aggressive humour, in particular irony, sarcasm and insults. These conversational resources perform solidarity building within the group, positioning non-Troll respondents as compliant group members. This solidarity work is shown to have three outcomes: (1) consensus building, (2) collaborative construction of group membership, and (3) the continued construction and negotiation of existing social and moral orders. Chapter 7, the final data analysis chapter, offers insight into how participants, in discussing the events of 9/11 on the actual day, collaboratively constructed new social and moral orders, while orienting to issues of appropriate and reasonable emotional responses. This analysis demonstrates how participants go about ‘doing being ordinary’ (Sacks, 1992b) in formulating their ‘first thoughts’ (Jefferson, 2004). Through sharing their initial impressions of the event, participants perform support work within the interaction, in essence working to normalize both the event and their initial misinterpretation of it. Normalising as a support work mechanism is also shown in relation to participants constructing the ‘quiet’ following the event as unusual. Normalising is accomplished by reference to the indexical ‘it’ and location formulations, which participants use both to negotiate who can claim to experience the ‘unnatural quiet’ and to identify the extent of the quiet. Through their talk participants upgrade the quiet from something legitimately experienced by one person in a particular place to something that could be experienced ‘anywhere’, moving the phenomenon from local to global provenance. With its methodological design and detailed analysis and findings, this research contributes to existing knowledge in four ways. First, it shows how rules are used by participants as a resource in negotiating and constructing social and moral orders. Second, it demonstrates that irony, sarcasm and insults are three devices of humour which can be used to perform solidarity work and reinforce existing social and moral orders. Third, it demonstrates how new social and moral orders are collaboratively constructed in relation to extraordinary events, which serve to frame the event and evoke reasonable responses for participants. And last, the detailed analysis and findings further support the use of conversation analysis and membership categorization as valuable methods for approaching quasi-synchronous computer-mediated communication.
Resumo:
An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).