999 resultados para Document’s Format
Resumo:
The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.
Resumo:
The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.
Resumo:
Relevance Feedback (RF) has been proven very effective for improving retrieval accuracy. Adaptive information filtering (AIF) technology has benefited from the improvements achieved in all the tasks involved over the last decades. A difficult problem in AIF has been how to update the system with new feedback efficiently and effectively. In current feedback methods, the updating processes focus on updating system parameters. In this paper, we developed a new approach, the Adaptive Relevance Features Discovery (ARFD). It automatically updates the system's knowledge based on a sliding window over positive and negative feedback to solve a nonmonotonic problem efficiently. Some of the new training documents will be selected using the knowledge that the system currently obtained. Then, specific features will be extracted from selected training documents. Different methods have been used to merge and revise the weights of features in a vector space. The new model is designed for Relevance Features Discovery (RFD), a pattern mining based approach, which uses negative relevance feedback to improve the quality of extracted features from positive feedback. Learning algorithms are also proposed to implement this approach on Reuters Corpus Volume 1 and TREC topics. Experiments show that the proposed approach can work efficiently and achieves the encouragement performance.
Resumo:
In 2007 I introduced short-format educational podcast resources that reinforced conceptual teaching and learning in an interdisciplinary tertiary science study area (biochemistry). This study aims to determine student attitudes to the perceived usefulness and benefit of short-format educational podcasts, and presents the findings (qualitative and quantitative) from surveys obtained from three offerings of the science teaching unit (2007, 2008 and 2009). Podcasts were recorded (MP3 audio files) separately from the instructive lecture sessions, and subsequent to the weekly lecture, short-format podcasts summarising the key learning objectives were integrated within the resources presented through the students learning management system (Blackboard). The vast majority (>88%) of students utilised the podcast resources, indicating a high level of acceptance and uptake for this portable educational technology. The respondents reported that podcasts focused their attention to core learning concepts and supported their understanding and learning of the lecture material. Furthermore, the data showed that respondents agreed strongly that podcasts assisted with study and revision for examinations and, somewhat surprisingly, there was a perception that podcasts positively impacted on examination performance. Overall, student users perceived that podcasting is as an effective and valuable educational tool that offers convenience and flexibility for their learning and understanding of a tertiary science study area, such as biochemistry.
Resumo:
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.
Resumo:
With the growing number of XML documents on theWeb it becomes essential to effectively organise these XML documents in order to retrieve useful information from them. A possible solution is to apply clustering on the XML documents to discover knowledge that promotes effective data management, information retrieval and query processing. However, many issues arise in discovering knowledge from these types of semi-structured documents due to their heterogeneity and structural irregularity. Most of the existing research on clustering techniques focuses only on one feature of the XML documents, this being either their structure or their content due to scalability and complexity problems. The knowledge gained in the form of clusters based on the structure or the content is not suitable for reallife datasets. It therefore becomes essential to include both the structure and content of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both these kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. The overall objective of this thesis is to address these issues by: (1) proposing methods to utilise frequent pattern mining techniques to reduce the dimension; (2) developing models to effectively combine the structure and content of XML documents; and (3) utilising the proposed models in clustering. This research first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. A clustering framework with two types of models, implicit and explicit, is developed. The implicit model uses a Vector Space Model (VSM) to combine the structure and the content information. The explicit model uses a higher order model, namely a 3- order Tensor Space Model (TSM), to explicitly combine the structure and the content information. This thesis also proposes a novel incremental technique to decompose largesized tensor models to utilise the decomposed solution for clustering the XML documents. The proposed framework and its components were extensively evaluated on several real-life datasets exhibiting extreme characteristics to understand the usefulness of the proposed framework in real-life situations. Additionally, this research evaluates the outcome of the clustering process on the collection selection problem in the information retrieval on the Wikipedia dataset. The experimental results demonstrate that the proposed frequent pattern mining and clustering methods outperform the related state-of-the-art approaches. In particular, the proposed framework of utilising frequent structures for constraining the content shows an improvement in accuracy over content-only and structure-only clustering results. The scalability evaluation experiments conducted on large scaled datasets clearly show the strengths of the proposed methods over state-of-the-art methods. In particular, this thesis work contributes to effectively combining the structure and the content of XML documents for clustering, in order to improve the accuracy of the clustering solution. In addition, it also contributes by addressing the research gaps in frequent pattern mining to generate efficient and concise frequent subtrees with various node relationships that could be used in clustering.
Resumo:
In Bowenbrae Pty Ltd v Flying Fighters Maintenance and Restoration [2010] QDC 347 Reid DCJ made orders requiring the plaintiffs to make application under the Freedom of Information Act 1982 (Cth) (“the FOI Act”) for documents sought by the defendant.
Resumo:
While the studio environment has been promoted as an ideal educational setting for project-based disciplines associated with the art and design, few qualitative studies have been undertaken in a comprehensive way, with even fewer giving emphasis to the teachers and students and how they feel about changing their environment. This situation is problematic given the changes and challenges facing higher education, including those associated with new technologies such as online learning. In response, this paper describes a comparative study employing grounded theory to identify and describe teachers’ and students’ perceptions of the physical design studio (PDS) as well as the virtual design studio (VDS) of architectural students in an Australian university. The findings give significance to aspects of design education activities and their role in the development of integrated hybrid learning environments.
Resumo:
Existing macro level research on the new venture creation process recognises the entrepreneur as a central agent in the process yet generally avoids, at each stage of the process, an examination of the micro level psychological behaviour of the individual entrepreneur. By integrating two theoretical approaches to entrepreneurship research, the psychology of the entrepreneur and the entrepreneurship process, this paper examines, using content analysis, the language used by new venture founders in documents directly linked to their capital raising activity. The study examined the language of 108 offer documents (information memorandum’s) which were divided between 54 new ventures that were successful in raising capital and 54 new ventures that either did not proceed further or were not successful in raising capital through the Australian Small Scale Offerings Board. Specifically, we were interested in examining the level of optimism evident in these narratives given that entrepreneurs have been previously described in the literature as being excessively optimistic.
Resumo:
Many existing information retrieval models do not explicitly take into account in- formation about word associations. Our approach makes use of rst and second order relationships found in natural language, known as syntagmatic and paradigmatic associ- ations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approach achieves statistically sig- ni cant improvements in MAP (0.158) and P@20 (0.396) over our baseline model. The ERR@20 and nDCG@20 of our system was 0.249 and 0.192 respectively. Our results and discussion suggest that information about both syntagamtic and paradigmatic associa- tions can assist with improving retrieval eectiveness on ad hoc retrieval.
Resumo:
Many existing information retrieval models do not explicitly take into account in- formation about word associations. Our approach makes use of rst and second order relationships found in natural language, known as syntagmatic and paradigmatic associ- ations, respectively. This is achieved by using a formal model of word meaning within the query expansion process. On ad hoc retrieval, our approach achieves statistically sig- ni cant improvements in MAP (0.158) and P@20 (0.396) over our baseline model. The ERR@20 and nDCG@20 of our system was 0.249 and 0.192 respectively. Our results and discussion suggest that information about both syntagamtic and paradigmatic associa- tions can assist with improving retrieval eectiveness on ad hoc retrieval.
Resumo:
RESEARCH BACKGROUND Enacted Cartography documents 10 years of creative research practice by Ian Weir Research Architect and was developed as standalone exhibition to support Dr Weir’s selection by the Australian Institute of Architects to represent innovative architectural practice via the Institute’s review entitled Formations: New Practices in Australian Architecture – which took the form of an exhibition and book presented in Venice, Italy for 13th International Architecture Exhibition (Venice Architecture Biennale). All works exhibited in Enacted Cartography are original works by Dr Weir and are generated either from or for the remote biodiverse landscapes of the Fitzgerald Bioregion on the south coast of Western Australia. RESEARCH CONTRIBUTION As a creative work in its own right, the Enacted Cartography exhibition makes the following contributions to knowledge: 1. Expands understandings of architectural practice by presenting a geographically-specific but multimodal form of architectural practice - wherein practitioners cross over discipline boundaries into art practice, landscape representation, website design, undergraduate university teaching and community advocacy. 2. Contributes to understandings of how such a diverse multimodal form of practice might be represented through both digital media and traditional print media in an exhibition format. 3. Expands understandings of how architectural practitioners might work within a particular place to develop a geographically-specific sense of identity, a ‘landscape of resistance’. RESEARCH SIGNIFICANCE Enacted Cartography was presented to an international audience during the 13th International Architecture Exhibition (Venice Architecture Biennale). The significance of Dr Weir’s research is evidence by his selected by the Australian Institute of Architects to represent innovation in architectural practice for the Biennale. Enacted Cartography addresses problems of national and international importance including: 1. The sustainable development of biodiverse remote landscapes; 2. The reconciliation of bushfire safety and biodiversity conservation; 3. The necessity for rethinking of architectural design methodologies to meet the complexity of landscape management and design; 4. It challenges orthodox forms of landscape representation (aerial photography, for example) which are demonstrably inadequate registrations of biophysical and cultural landscapes.
Resumo:
In Hare v Mount Isa City Council [2009] QDC 39 McGill DCJ examined the scope of s 27(1) of the Personal Injuries Proceedings Act 2002 (Qld) and its interpretation by the Court of Appeal in Haug v Jupiters Ltd [2008] 1 Qd R 276. The judge expressed a number of concerns about the Act and the Regulation made under it, that are worthy of consideration by the Legislature.
Resumo:
In John Kallinicos Accountants Pty Ltd v Dundrenan Pty Ltd [2009] QDC 141 Irwin DCJ considered the nature of a party’s obligation under r 222 of the Uniform Civil Procedure Rules 1999 (Qld) (UCPR) to produce documents referred to in the parties’ pleadings, particulars or affidavits. The decision examined whether the approach in Belela Pty Ltd v Menzies Excavation Pty Ltd [2005] 2 QdR 230 in relation to disclosure of documents under UCPR r 214 also applied to production of documents under r 222.
Resumo:
In Deppro Pty Ltd v Hannah [2008] QSC 193 one of the matters considered by the court related to the requirement in r 243 of the Uniform Civil Procedure Rules 1999 (Qld) that a notice of non-party disclosure must “state the allegation in issue in the pleadings about which the document sought is directly relevant.”The approach adopted by the issuing party in this case of asserting that documents sought by a notice of non-party disclosure are relevant to allegations in numbered paragraphs in pleadings, and serving copies of the pleadings with the notice, is not uncommon in practice. This decision makes it clear that this practice is fraught with danger. In circumstances where it is not apparent that the non-party has been fully apprised of the relevant issues the decision suggests an applicant for non-party disclosure who has not complied with the requirements of s 243 might be required to issue a fresh, fully compliant notice, and to suffer associated costs consequences.