999 resultados para Documents.


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a novel Hybrid Clustering approach for XML documents (HCX) that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The empirical analysis reveals that the proposed method is scalable and accurate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

XML document clustering is essential for many document handling applications such as information storage, retrieval, integration and transformation. An XML clustering algorithm should process both the structural and the content information of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. This paper introduces a novel approach that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The proposed method reduces the high dimensionality of input data by using only the structure-constrained content. The empirical analysis reveals that the proposed method can effectively cluster even very large XML datasets and outperform other existing methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many data mining techniques have been proposed for mining useful patterns in databases. However, how to effectively utilize discovered patterns is still an open research issue, especially in the domain of text mining. Most existing methods adopt term-based approaches. However, they all suffer from the problems of polysemy and synonymy. This paper presents an innovative technique, pattern taxonomy mining, to improve the effectiveness of using discovered patterns for finding useful information. Substantial experiments on RCV1 demonstrate that the proposed solution achieves encouraging performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report explains the objectives, datasets and evaluation criteria of both the clustering and classification tasks set in the INEX 2009 XML Mining track. The report also describes the approaches and results obtained by the different participants.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article discusses some recent judicial decisions to assist legal practitioners to overcome some of the problems encountered when serving Bankruptcy Notices and Creditor’s Petitions. Some of the issues covered in the discussion are: What the valid last-known address of the debtor can be, whether a Bankruptcy Notice can be validly served by email on a debtor who is located outside Australia, whether service of a Bankruptcy Notice is valid when the debtor is outside Australia when service on the debtor occurs in Australia, whether the creditor’s failure to obtain leave for service of a Bankruptcy Notice can be excused, what can be done regarding personal service of a Creditor’s Petition when a debtor is outside Australia and whether the Court can set aside a sequestration order. The article goes on to place the issues in the context of broader bankruptcy policies noting that effective service of bankruptcy documents is challenging in a world where mobility of debtors is global and new modes of communication ever changing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Textual cultural heritage artefacts present two serious problems for the encoder: how to record different or revised versions of the same work, and how to encode conflicting perspectives of the text using markup. Both are forms of textual variation, and can be accurately recorded using a multi-version document, based on a minimally redundant directed graph that cleanly separates variation from content.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The XML Document Mining track was launched for exploring two main ideas: (1) identifying key problems and new challenges of the emerging field of mining semi-structured documents, and (2) studying and assessing the potential of Machine Learning (ML) techniques for dealing with generic ML tasks in the structured domain, i.e., classification and clustering of semi-structured documents. This track has run for six editions during INEX 2005, 2006, 2007, 2008, 2009 and 2010. The first five editions have been summarized in previous editions and we focus here on the 2010 edition. INEX 2010 included two tasks in the XML Mining track: (1) unsupervised clustering task and (2) semi-supervised classification task where documents are organized in a graph. The clustering task requires the participants to group the documents into clusters without any knowledge of category labels using an unsupervised learning algorithm. On the other hand, the classification task requires the participants to label the documents in the dataset into known categories using a supervised learning algorithm and a training set. This report gives the details of clustering and classification tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Relevance Feedback (RF) has been proven very effective for improving retrieval accuracy. Adaptive information filtering (AIF) technology has benefited from the improvements achieved in all the tasks involved over the last decades. A difficult problem in AIF has been how to update the system with new feedback efficiently and effectively. In current feedback methods, the updating processes focus on updating system parameters. In this paper, we developed a new approach, the Adaptive Relevance Features Discovery (ARFD). It automatically updates the system's knowledge based on a sliding window over positive and negative feedback to solve a nonmonotonic problem efficiently. Some of the new training documents will be selected using the knowledge that the system currently obtained. Then, specific features will be extracted from selected training documents. Different methods have been used to merge and revise the weights of features in a vector space. The new model is designed for Relevance Features Discovery (RFD), a pattern mining based approach, which uses negative relevance feedback to improve the quality of extracted features from positive feedback. Learning algorithms are also proposed to implement this approach on Reuters Corpus Volume 1 and TREC topics. Experiments show that the proposed approach can work efficiently and achieves the encouragement performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Most Australian states have introduced legislation to provide for enduring documents for financial, personal and health care decision making in the event of incapacity. Since the introduction of Enduring Powers of Attorney (EPAs) and Advance Health Directives (AHDs) in Queensland in 1998, concerns have continued to be raised by service providers, professionals and individuals about the uptake, understanding and appropriate use of these documents. In response to these concerns, the Department of Justice and Attorney-General (DJAG) convened a Practical Guardianship Initiatives Working Party. This group identified the limited evidence base available to address these concerns. In 2009, a multidisciplinary research team from the University of Queensland and the Queensland University of Technology was awarded $90,000 from the Legal Practitioners Interest on Trust Account Fund to undertake a review of the current EPA and AHD forms. The goal of the research was to gather data on the content and useability of the forms from the perspectives of a range of stakeholders, particularly those completing the EPA and AHD, witnesses of these documents, attorneys appointed under an EPA, and health professionals involved in the completion of an AHD or dealing with it in a clinical context. The researchers also sought to gather information from the perspective of Aboriginal and Torres Strait Islander (ATSI) individuals as well people from culturally and linguistically diverse (CALD) groups. Although the focus of the research was on the forms and the extent to which the current design, content and format represents a barrier to uptake, in the course of the research, some broader issues were identified which have an impact on the effectiveness of the EPA and AHD in achieving the goals of planning for financial and personal and health care in advance of losing capacity. The data gathered enabled the researchers to achieve the primary goal of the research: to make recommendations to improve the content and useability of the forms which hopefully will lead to an increased uptake and appropriate use of the forms. However, the researchers thought it was important not to ignore broader policy issues that were identified in the course of the research. These broader issues have been highlighted in this Report, and the researchers have responded to them in a variety of ways. For some issues, the researchers have suggested alterations that could be made to the forms to address the particular concerns. For other issues, the researchers have suggested that Government may need to take specific action such as educating the broader community with some attention to strategies that engage particular groups within communities. Other concerns raised can only be dealt with by legislative reform and, in some of these cases, the researchers have identified issues that Government may wish to consider further. We do note, however, that it is beyond the scope of this Report to recommend changes to the law. This three stage mixed methods project aimed to provide systematic evidence from a broad range of stakeholders in regard to: (i) which groups use and do not use these documents and why, (ii) the contribution of the length/complexity/format/language of the forms as barriers to their completion and/or effective use, and (iii) the issues raised by the current documents for witnesses and attorneys. Understanding and use of EPAs and AHDs were generally explored in separate but parallel processes. A purposive sampling strategy included users of the documents as principals and attorneys, and professionals, witnesses and service providers who assist others to execute or use the forms. The first component of this study built on existing knowledge using a Critical Reference Group and material provided by the DJAG Practical Guardianship Initiatives Working Party. This assisted in the development of the data collection tools for subsequent stages. The second component comprised semi-structured interviews and focus groups with a targeted sample of current users of the forms, potential users, witnesses and other professionals to provide in-depth information on critical issues. Outreach to Aboriginal and Torres Strait Islander Elders and individuals and workers with CALD groups ensured a broad sample of potential users of the two documents. Fifty individual interviews and three focus groups were completed. Most interviews and focus groups focused on perceptions of, and experiences with, either the EPA or the AHD form. In the interviews with Indigenous people and the CALD focus groups, however, respondents provided their perceptions and experiences of both documents. In general, these respondents had not used the forms and were responding to the documents made available in the interview or focus group. In total, seventy-seven individuals were involved in interviews or focus groups. The final component comprised on-line surveys for EPA principals, EPA attorneys, AHD principals, witnesses of EPAs and AHDs and medical practitioners with experience of AHDs as nominated and/or treating doctors. The surveys were developed from the initial component and the qualitative analysis of the interview and focus group data. A total of 116 surveys were returned from major cities and regional Queensland. The survey data was analysed descriptively for patterns and trends. It is important to note that the aim of the survey was to gain insight into issues and concerns relating to the documents and not to make generalisations to the broader population.