99 resultados para word boundaries
em Queensland University of Technology - ePrints Archive
Resumo:
The Thai written language is one of the languages that does not have word boundaries. In order to discover the meaning of the document, all texts must be separated into syllables, words, sentences, and paragraphs. This paper develops a novel method to segment the Thai text by combining a non-dictionary based technique with a dictionary-based technique. This method first applies the Thai language grammar rules to the text for identifying syllables. The hidden Markov model is then used for merging possible syllables into words. The identified words are verified with a lexical dictionary and a decision tree is employed to discover the words unidentified by the lexical dictionary. Documents used in the litigation process of Thai court proceedings have been used in experiments. The results which are segmented words, obtained by the proposed method outperform the results obtained by other existing methods.
Resumo:
This paper demonstrates how Indigenous Studies is controlled in some Australian universities in ways that continue the marginalisation, denigration and exploitation of Indigenous peoples. Moreover, it shows how the engagement of white notions of “inclusion” can result in the maintenance of racism, systemic marginalisation, white race privilege and radicalised subjectivity. A case study will be utilised which draws from the experience of two Indigenous scholars who were invited to be part of a panel to review one Australian university’s plan and courses in Indigenous studies. The case study offers the opportunity to destabilise the relationships between oppression and privilege and the epistemology that maintains them. The paper argues for the need to examine exactly what is being offered when universities provide opportunities for “inclusion”.
Resumo:
This set of papers in this issue of "Addictive Behaviors" was presented at the 2004 'Addictions' conference, which, for the first time, was held in the Southern Hemisphere, on the Sunshine Coast of Queensland, Australia. The theme of the conference, Crossing Boundaries: Implications of Advances in Basic Sciences for the Management of Addiction, speaks for itself. The papers derive from a wide range of empirical paradigms and cover issues with relevance to the development of addiction, to the maintenance of problematic use, and to assessment, treatment, and relapse. Research from Europe and the United States is represented, as well as work from Australia. An international perspective is strongly emphasized from the initial paper by Obot, Poznyak, and Moneiro, (see record 2004-19599-015) which describes the WHO Report on the Neuroscience of Psychoactive Substance Use and Dependence, and summarises some of the report's implications for policy and practice. Hall, Carter, and Morley (see record 2004-19599-014) close the issue with a paper on the wide-ranging ethical implications of advances in neuroscience research, including issues arising from the identification of high risk for addiction, the potential for coercive pharmacotherapy, use of medications to enhance function, and risks to privacy.
Resumo:
In this paper, we propose an unsupervised segmentation approach, named "n-gram mutual information", or NGMI, which is used to segment Chinese documents into n-character words or phrases, using language statistics drawn from the Chinese Wikipedia corpus. The approach alleviates the tremendous effort that is required in preparing and maintaining the manually segmented Chinese text for training purposes, and manually maintaining ever expanding lexicons. Previously, mutual information was used to achieve automated segmentation into 2-character words. The NGMI approach extends the approach to handle longer n-character words. Experiments with heterogeneous documents from the Chinese Wikipedia collection show good results.
Resumo:
Review of 'Gatz', Elevator Repair Company / Brisbane Powerhouse, published in The Australian, 12 May 2009.
Resumo:
The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.
Resumo:
This volume examines the social, cultural, and political implications of the shift from traditional forms of print-based libraries to the delivery of online information in educational contexts. Despite the central role of libraries in literacy and learning, research of them has, in the main, remained isolated within the disciplinary boundaries of information and library science. By contrast, this book problematizes and thereby mainstreams the field. It brings together scholars from a wide range of academic fields to explore the dislodging of library discourse from its longstanding apolitical, modernist paradigm. Collectively, the authors interrogate the presuppositions of current library practice and examine how library as place and library as space blend together in ways that may be both complementary and contradictory. Seeking a suitable term to designate this rapidly evolving and much contested development, the editors devised the word “libr@ary,” and use the term arobase to signify the conditions of formation of new libraries within contexts of space, knowledge, and capital.
Resumo:
This paper reveals a journey of theatrical exploration. It is a journey of enquiry and investigation backed by a vigorous, direct and dense professional history of creative work.
Resumo:
My research investigates why nouns are learned disproportionately more frequently than other kinds of words during early language acquisition (Gentner, 1982; Gleitman, et al., 2004). This question must be considered in the context of cognitive development in general. Infants have two major streams of environmental information to make meaningful: perceptual and linguistic. Perceptual information flows in from the senses and is processed into symbolic representations by the primitive language of thought (Fodor, 1975). These symbolic representations are then linked to linguistic input to enable language comprehension and ultimately production. Yet, how exactly does perceptual information become conceptualized? Although this question is difficult, there has been progress. One way that children might have an easier job is if they have structures that simplify the data. Thus, if particular sorts of perceptual information could be separated from the mass of input, then it would be easier for children to refer to those specific things when learning words (Spelke, 1990; Pylyshyn, 2003). It would be easier still, if linguistic input was segmented in predictable ways (Gentner, 1982; Gleitman, et al., 2004) Unfortunately the frequency of patterns in lexical or grammatical input cannot explain the cross-cultural and cross-linguistic tendency to favor nouns over verbs and predicates. There are three examples of this failure: 1) a wide variety of nouns are uttered less frequently than a smaller number of verbs and yet are learnt far more easily (Gentner, 1982); 2) word order and morphological transparency offer no insight when you contrast the sentence structures and word inflections of different languages (Slobin, 1973) and 3) particular language teaching behaviors (e.g. pointing at objects and repeating names for them) have little impact on children's tendency to prefer concrete nouns in their first fifty words (Newport, et al., 1977). Although the linguistic solution appears problematic, there has been increasing evidence that the early visual system does indeed segment perceptual information in specific ways before the conscious mind begins to intervene (Pylyshyn, 2003). I argue that nouns are easier to learn because their referents directly connect with innate features of the perceptual faculty. This hypothesis stems from work done on visual indexes by Zenon Pylyshyn (2001, 2003). Pylyshyn argues that the early visual system (the architecture of the "vision module") segments perceptual data into pre-conceptual proto-objects called FINSTs. FINSTs typically correspond to physical things such as Spelke objects (Spelke, 1990). Hence, before conceptualization, visual objects are picked out by the perceptual system demonstratively, like a finger pointing indicating ‘this’ or ‘that’. I suggest that this primitive system of demonstration elaborates on Gareth Evan's (1982) theory of nonconceptual content. Nouns are learnt first because their referents attract demonstrative visual indexes. This theory also explains why infants less often name stationary objects such as plate or table, but do name things that attract the focal attention of the early visual system, i.e., small objects that move, such as ‘dog’ or ‘ball’. This view leaves open the question how blind children learn words for visible objects and why children learn category nouns (e.g. 'dog'), rather than proper nouns (e.g. 'Fido') or higher taxonomic distinctions (e.g. 'animal').