29 resultados para Corpus Linguistic

em Queensland University of Technology - ePrints Archive


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this article, we take a close look at the literacy demands of one task from the ‘Marvellous Micro-organisms Stage 3 Life and Living’ Primary Connections unit (Australian Academy of Science, 2005). One lesson from the unit, ‘Exploring Bread’, (pp 4-8) asks students to ‘use bread labels to locate ingredient information and synthesise understanding of bread ingredients’. We draw upon a framework offered by the New London Group (2000), that of linguistic, visual and spatial design, to consider in more detail three bread wrappers and from there the complex literacies that students need to interrelate to undertake the required task. Our findings are that although bread wrappers are an example of an everyday science text, their linguistic, visual and spatial designs and their interrelationship are not trivial. We conclude by reinforcing the need for teachers of science to also consider how the complex design elements of everyday science texts and their interrelated literacies are made visible through instructional practice.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although internet chat is a significant aspect of many internet users’ lives, the manner in which participants in quasi-synchronous chat situations orient to issues of social and moral order remains to be studied in depth. The research presented here is therefore at the forefront of a continually developing area of study. This work contributes new insights into how members construct and make accountable the social and moral orders of an adult-oriented Internet Relay Chat (IRC) channel by addressing three questions: (1) What conversational resources do participants use in addressing matters of social and moral order? (2) How are these conversational resources deployed within IRC interaction? and (3) What interactional work is locally accomplished through use of these resources? A survey of the literature reveals considerable research in the field of computer-mediated communication, exploring both asynchronous and quasi-synchronous discussion forums. The research discussed represents a range of communication interests including group and collaborative interaction, the linguistic construction of social identity, and the linguistic features of online interaction. It is suggested that the present research differs from previous studies in three ways: (1) it focuses on the interaction itself, rather than the ways in which the medium affects the interaction; (2) it offers turn-by-turn analysis of interaction in situ; and (3) it discusses membership categories only insofar as they are shown to be relevant by participants through their talk. Through consideration of the literature, the present study is firmly situated within the broader computer-mediated communication field. Ethnomethodology, conversation analysis and membership categorization analysis were adopted as appropriate methodological approaches to explore the research focus on interaction in situ, and in particular to investigate the ways in which participants negotiate and co-construct social and moral orders in the course of their interaction. IRC logs collected from one chat room were analysed using a two-pass method, based on a modification of the approaches proposed by Pomerantz and Fehr (1997) and ten Have (1999). From this detailed examination of the data corpus three interaction topics are identified by means of which participants clearly orient to issues of social and moral order: challenges to rule violations, ‘trolling’ for cybersex, and experiences regarding the 9/11 attacks. Instances of these interactional topics are subjected to fine-grained analysis, to demonstrate the ways in which participants draw upon various interactional resources in their negotiation and construction of channel social and moral orders. While these analytical topics stand alone in individual focus, together they illustrate different instances in which participants’ talk serves to negotiate social and moral orders or collaboratively construct new orders. Building on the work of Vallis (2001), Chapter 5 illustrates three ways that rule violation is initiated as a channel discussion topic: (1) through a visible violation in open channel, (2) through an official warning or sanction by a channel operator regarding the violation, and (3) through a complaint or announcement of a rule violation by a non-channel operator participant. Once the topic has been initiated, it is shown to become available as a topic for others, including the perceived violator. The fine-grained analysis of challenges to rule violations ultimately demonstrates that channel participants orient to the rules as a resource in developing categorizations of both the rule violation and violator. These categorizations are contextual in that they are locally based and understood within specific contexts and practices. Thus, it is shown that compliance with rules and an orientation to rule violations as inappropriate within the social and moral orders of the channel serves two purposes: (1) to orient the speaker as a group member, and (2) to reinforce the social and moral orders of the group. Chapter 6 explores a particular type of rule violation, solicitations for ‘cybersex’ known in IRC parlance as ‘trolling’. In responding to trolling violations participants are demonstrated to use affiliative and aggressive humour, in particular irony, sarcasm and insults. These conversational resources perform solidarity building within the group, positioning non-Troll respondents as compliant group members. This solidarity work is shown to have three outcomes: (1) consensus building, (2) collaborative construction of group membership, and (3) the continued construction and negotiation of existing social and moral orders. Chapter 7, the final data analysis chapter, offers insight into how participants, in discussing the events of 9/11 on the actual day, collaboratively constructed new social and moral orders, while orienting to issues of appropriate and reasonable emotional responses. This analysis demonstrates how participants go about ‘doing being ordinary’ (Sacks, 1992b) in formulating their ‘first thoughts’ (Jefferson, 2004). Through sharing their initial impressions of the event, participants perform support work within the interaction, in essence working to normalize both the event and their initial misinterpretation of it. Normalising as a support work mechanism is also shown in relation to participants constructing the ‘quiet’ following the event as unusual. Normalising is accomplished by reference to the indexical ‘it’ and location formulations, which participants use both to negotiate who can claim to experience the ‘unnatural quiet’ and to identify the extent of the quiet. Through their talk participants upgrade the quiet from something legitimately experienced by one person in a particular place to something that could be experienced ‘anywhere’, moving the phenomenon from local to global provenance. With its methodological design and detailed analysis and findings, this research contributes to existing knowledge in four ways. First, it shows how rules are used by participants as a resource in negotiating and constructing social and moral orders. Second, it demonstrates that irony, sarcasm and insults are three devices of humour which can be used to perform solidarity work and reinforce existing social and moral orders. Third, it demonstrates how new social and moral orders are collaboratively constructed in relation to extraordinary events, which serve to frame the event and evoke reasonable responses for participants. And last, the detailed analysis and findings further support the use of conversation analysis and membership categorization as valuable methods for approaching quasi-synchronous computer-mediated communication.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper I present an analysis of the language used by the National Endowment for Democracy (NED) on its website (NED, 2008). The specific focus of the analysis is on the NED's high usage of the word “should” revealed in computer assisted corpus analysis using Leximancer. Typically we use the word “should” as a term to propose specific courses of action for ourselves and others. It is a marker of obligation and “oughtness”. In other words, its systematic institutional use can be read as a statement of ethics, of how the NED thinks the world ought to behave. As an ostensibly democracy-promoting institution, and one with a clear agenda of implementing American foreign policy, the ethics of NED are worth understanding. Analysis reveals a pattern of grammatical metaphor in which “should” is often deployed counter intuitively, and sometimes ambiguously, as a truth-making tool rather than one for proposing action. The effect is to present NED's imperatives for action as matters of fact rather than ethical or obligatory claims.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the data and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Extracellular matrix regulates many cellular processes likely to be important for development and regression of corpora lutea. Therefore, we identified the types and components of the extracellular matrix of the human corpus luteum at different stages of the menstrual cycle. Two different types of extracellular matrix were identified by electron microscopy; subendothelial basal laminas and an interstitial matrix located as aggregates at irregular intervals between the non-vascular cells. No basal laminas were associated with luteal cells. At all stages, collagen type IV α1 and laminins α5, β2 and γ1 were localized by immunohistochemistry to subendothelial basal laminas, and collagen type IV α1 and laminins α2, α5, β1 and β2 localized in the interstitial matrix. Laminin α4 and β1 chains occurred in the subendothelial basal lamina from mid-luteal stage to regression; at earlier stages, a punctate pattern of staining was observed. Therefore, human luteal subendothelial basal laminas potentially contain laminin 11 during early luteal development and, additionally, laminins 8, 9 and 10 at the mid-luteal phase. Laminin α1 and α3 chains were not detected in corpora lutea. Versican localized to the connective tissue extremities of the corpus luteum. Thus, during the formation of the human corpus luteum, remodelling of extracellular matrix does not result in basal laminas as present in the adrenal cortex or ovarian follicle. Instead, novel aggregates of interstitial matrix of collagen and laminin are deposited within the luteal parenchyma, and it remains to be seen whether this matrix is important for maintaining the luteal cell phenotype.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Increasing global competitiveness worldwide has forced manufacturing organizations to produce high-quality products more quickly and at a competitive cost. In order to reach these goals, they need good quality components from suppliers at optimum price and lead time. This actually forced all the companies to adapt different improvement practices such as lean manufacturing, Just in Time (JIT) and effective supply chain management. Applying new improvement techniques and tools cause higher establishment costs and more Information Delay (ID). On the contrary, these new techniques may reduce the risk of stock outs and affect supply chain flexibility to give a better overall performance. But industry people are unable to measure the overall affects of those improvement techniques with a standard evaluation model .So an effective overall supply chain performance evaluation model is essential for suppliers as well as manufacturers to assess their companies under different supply chain strategies. However, literature on lean supply chain performance evaluation is comparatively limited. Moreover, most of the models assumed random values for performance variables. The purpose of this paper is to propose an effective supply chain performance evaluation model using triangular linguistic fuzzy numbers and to recommend optimum ranges for performance variables for lean implementation. The model initially considers all the supply chain performance criteria (input, output and flexibility), converts the values to triangular linguistic fuzzy numbers and evaluates overall supply chain performance under different situations. Results show that with the proposed performance measurement model, improvement area for each variable can be accurately identified.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the increase in international mobility, healthcare systems should no longer be ignoring language barriers. In addition to the benefit of reducing long‐term costs, immigrant‐friendly organizations should be concerned with mitigating the way language barriers increase individuals’ social vulnerabilities and inequities in health care and health status. This paper reports the findings of a qualitative, exploratory study of the health literacy of 28 Francophone families living in a linguistic‐minority situation in Canada. Analysis of interviews revealed that participants’ social vulnerability, mainly due to their limited social and informational networks, influenced the construction of family health literacy. Disparities in access to healthcare services could be decreased by having health professionals’ work in alliance with Francophone community groups and by hiring bilingual health professionals. Linguistic isolation and lack of knowledge about local cultural organizations among Francophone immigrants were two important findings of this study