972 resultados para Document authoring


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents (http://www.documentengineering.org). The ACM Symposium on Document Engineering is an annual meeting of researchers active in document engineering: it is sponsored by ACM by means of the ACM SIGWEB Special Interest Group. In this editorial, we first point to work carried out in the context of document engineering, which are directly related to multimedia tools and applications. We conclude with a summary of the papers presented in this special issue.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Collaborative working with the aid of computers is increasing rapidly due to the widespread use of computer networks, geographic mobility of people, and small powerful personal computers. For the past ten years research has been conducted into this use of computing technology from a wide variety of perspectives and for a wide range of uses. This thesis adds to that previous work by examining the area of collaborative writing amongst groups of people. The research brings together a number of disciplines, namely sociology for examining group dynamics, psychology for understanding individual writing and learning processes, and computer science for database, networking, and programming theory. The project initially looks at groups and how they form, communicate, and work together, progressing on to look at writing and the cognitive processes it entails for both composition and retrieval. The thesis then details a set of issues which need to be addressed in a collaborative writing system. These issues are then followed by developing a model for collaborative writing, detailing an iterative process of co-ordination, writing and annotation, consolidation, and negotiation, based on a structured but extensible document model. Implementation issues for a collaborative application are then described, along with various methods of overcoming them. Finally the design and implementation of a collaborative writing system, named Collaborwriter, is described in detail, which concludes with some preliminary results from initial user trials and testing.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is well known that a statutory requirement of formality is associated with contracts concerning land. In this regard, s 59 of the Property Law Act 1974 (Qld) provides: No action may be brought upon any contract for the sale or other disposition of land or any interest in land unless the contract upon which such action is brought, or some memorandum or note of the contract, is in writing, and signed by the party to be charged, or by some person by the party lawfully authorised. In addition to the possibility of a formal contract, the statutory wording clearly contemplates reliance on an informal note or memorandum. To constitute a sufficient note or memorandum for the purposes of the statute, the signed note or memorandum must contain details of the parties to the contract, an adequate description of the property, the price and any other essential terms. It is also accepted that the doctrine of joinder may be invoked in circumstances where the document signed by the party to be charged contains an express or implied reference to any other document. In this way, a sufficient note or memorandum may be constituted by the joinder of a number of documents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.