364 resultados para Dataset


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper is concerned with the ways Asia Literacy can be developed in response to the new Australian Curriculum. In particular, it addresses the learning possibilities of the Asian-Australian Literature and Publishing Project (AACLAP) available through AustLit: the Australian Literature Resource. The paper argues that the AACLAP dataset provides a broad range of resources through which to address the cross curriculum priority of the Australian Curriculum on Asia and Australia’s engagement with Asia. It contends that AACLAP has the potential to make a valuable contribution to teachers’ efforts to incorporate this cross curriculum priority in their classroom practice whilst also developing the general capabilities of intercultural understanding and the use of information and communication technology (ICT). This discussion is of particular significance to teachers of English and History, given that these disciplines are implemented in the first phase of the Australian Curriculum in schools. The paper concludes that by drawing on the broad range of texts available in the AACLAP collection as well as the Critical Anthology and the Research and Learning Trails, teachers and students will be much better positioned to develop a deeper understanding of the diversity of the Asian region and the complexities of Asian-Australian relationships.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we report some initial findings from our investigations into the Australian Government’s Longitudinal Study of Australian Children dataset. It is revealed that the majority of Australian children are exceeding the government’s Screen Time recommendations and that most of their screen time is spent as TV viewing, as opposed to video game play or computer use. In light of this finding,we review the body of research surrounding children’s engagement in Screen Time activities and the associated positive and negative effects. Based on existing evidence,we define two categories of Screen Time—Active Screen Time and Passive Screen Time. It is proposed that this distinction provides a more accurate classification of Screen Time and a more informative lens through which to consider the associated benefits and detrimental effects for young children.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective: To compare access and utilisation of EDs in Queensland public hospitals between people who speak only English at home and those who speak another language at home. Methods: A retrospective analysis of a Queensland statewide hospital ED dataset (ED Information System) from 1 January 2008 to 31 December 2010 was conducted. Access to ED care was measured by the proportion of the state’s population attending EDs. Logistic regression analyses were performed to determine the relationships between ambulance use and language, and between hospital admission and language, both after adjusting for age, sex and triage category. Results: The ED utilisation rate was highest in English only speakers (290 per 1000 population), followed by Arabic speakers (105), and lowest among German speakers (30). Compared with English speakers, there were lower rates of ambulance use in Chinese (odds ratio 0.50, 95% confidence interval, 0.47–0.54), Vietnamese (0.87, 0.79–0.95), Arabic (0.87, 0.78–0.97), Spanish (0.56, 0.50–0.62), Italian (0.88, 0.80–0.96), Hindi (0.61, 0.53–0.70) and German (0.87, 0.79–0.90) speakers. Compared with English speakers, German speakers had higher admission rates (odds ratio 1.17, 95% confidence interval, 1.02–1.34), whereas there were lower admission rates in Chinese (0.90, 0.86–0.99), Arabic (0.76, 0.67–0.85) and Spanish (0.83, 0.75–0.93) speakers. Conclusion: This study showed that there was a significant association between lower utilisation of emergency care and speaking languages other than English at home. Further researches are needed using in-depth methodology to investigate if there are language barriers in accessing emergency care in Queensland.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner. Methods In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns: (a) a gene set, and (b) the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty. Conclusions This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Retrieving information from Twitter is always challenging due to its large volume, inconsistent writing and noise. Most existing information retrieval (IR) and text mining methods focus on term-based approach, but suffers from the problems of terms variation such as polysemy and synonymy. This problem deteriorates when such methods are applied on Twitter due to the length limit. Over the years, people have held the hypothesis that pattern-based methods should perform better than term-based methods as it provides more context, but limited studies have been conducted to support such hypothesis especially in Twitter. This paper presents an innovative framework to address the issue of performing IR in microblog. The proposed framework discover patterns in tweets as higher level feature to assign weight for low-level features (i.e. terms) based on their distributions in higher level features. We present the experiment results based on TREC11 microblog dataset and shows that our proposed approach significantly outperforms term-based methods Okapi BM25, TF-IDF and pattern based methods, using precision, recall and F measures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks. However, the existing general purpose clustering algorithms perform poorly on the social network data due to the special nature of users' data in social networks. One main reason is the constraints that need to be considered in grouping users in social networks. Another reason is the need of capturing large amount of information about users which imposes computational complexity to an algorithm. In this paper, we propose a scalable and effective constraint-based clustering algorithm based on a global similarity measure that takes into consideration the users' constraints and their importance in social networks. Each constraint's importance is calculated based on the occurrence of this constraint in the dataset. Performance of the algorithm is demonstrated on a dataset obtained from an online dating website using internal and external evaluation measures. Results show that the proposed algorithm is able to increases the accuracy of matching users in social networks by 10% in comparison to other algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many state of the art vision-based Simultaneous Localisation And Mapping (SLAM) and place recognition systems compute the salience of visual features in their environment. As computing salience can be problematic in radically changing environments new low resolution feature-less systems have been introduced, such as SeqSLAM, all of which consider the whole image. In this paper, we implement a supervised classifier system (UCS) to learn the salience of image regions for place recognition by feature-less systems. SeqSLAM only slightly benefits from the results of training, on the challenging real world Eynsham dataset, as it already appears to filter less useful regions of a panoramic image. However, when recognition is limited to specific image regions performance improves by more than an order of magnitude by utilising the learnt image region saliency. We then investigate whether the region salience generated from the Eynsham dataset generalizes to another car-based dataset using a perspective camera. The results suggest the general applicability of an image region salience mask for optimizing route-based navigation applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpr​ed_page.php.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: A long length of stay (LOS) in the emergency department (ED) associated with overcrowding has been found to adversely affect the quality of ED care. The objective of this study is to determine whether patients who speak a language other than English at home have a longer LOS in EDs compared to those whose speak only English at home. METHODS: A secondary data analysis of a Queensland state-wide hospital EDs dataset (Emergency Department Information System) was conducted for the period, 1 January 2008 to 31 December 2010. RESULTS: The interpreter requirement was the highest among Vietnamese speakers (23.1%) followed by Chinese (19.8%) and Arabic speakers (18.7%). There were significant differences in the distributions of the departure statuses among the language groups (Chi-squared=3236.88, P<0.001). Compared with English speakers, the Beta coeffi cient for the LOS in the EDs measured in minutes was among Vietnamese, 26.3 (95%CI: 22.1–30.5); Arabic, 10.3 (95%CI: 7.3–13.2); Spanish, 9.4 (95%CI: 7.1–11.7); Chinese, 8.6 (95%CI: 2.6–14.6); Hindi, 4.0 (95%CI: 2.2–5.7); Italian, 3.5 (95%CI: 1.6–5.4); and German, 2.7 (95%CI: 1.0–4.4). The fi nal regression model explained 17% of the variability in LOS. CONCLUSION: There is a close relationship between the language spoken at home and the LOS at EDs, indicating that language could be an important predictor of prolonged LOS in EDs and improving language services might reduce LOS and ease overcrowding in EDs in Queensland's public hospitals.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The geology/reservoir program of the Queensland Geothermal Energy Centre of Excellence (QGECE) has the mission to improve the existing knowledge and develop new innovative scientific approaches for the identification of geothermal resources in Australia, with a particular focus on Queensland. Specifically, the QGECE geology/reservoir program is currently (1) producing a comprehensive geochemical dataset for high heat producing rocks, (2) conducting detailed mineralogical and geochronological studies of granites and hydrothermal alteration minerals, and ; (3) investigating the Cooper Basin representing a superb natural laboratory for understanding of radiogenic heat enrichment process and possible involvement of mantle heat flow. Seven research projects have been established, which are being conducted largely as PhD studies. In the preliminary studies, high quality and valuable results were obtained to address the research topics of understanding the causes and timing of heat producing element enrichment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The billionaires of the world attract significant attention from the media and the public. Surprisingly, only a limited number of studies have explored empirically the determinants of extraordinary wealth. Using a large dataset we investigate whether globalization and corruption affect extreme wealth accumulation. We find evidence that an increase in globalization increases super-affluence. In addition, we also find that an increase in corruption leads to an increase in the creation of super fortune. This supports the argument that in kleptocracies large sums are transferred into the hands of a small group of individuals.