977 resultados para Document numérique
Resumo:
Dans les turbomachines, le bruit du volume tournant est considéré comme une source majeure d’inconfort. La connaissance et l’identification des sources de bruit du rotor sont primordiales pour la conception d’une machine silencieuse et énergétiquement plus efficace. Ce document examine la capacité à la fois de la décomposition orthogonale aux valeurs (POD) et la décomposition aux valeurs singulières (SVD) à identifier les zones sur la surface d’une source (pale de ventilateur) fixe ou en mouvement subsonique qui contribuent le plus à la puissance acoustique rayonnée. La méthode de calcul de la dynamique des fluides (CFD) du code source OpenFoam est utilisée comme une première étape pour évaluer le champ de pression à la surface de la pale en mouvement subsonique. Les fluctuations de ce champ de pression permettent d’estimer à la fois le bruit de charge et la puissance sonore qui est rayonnée par la pale basée sur l’analogie acoustique de Ffowcs Williams et Hawkings (FW&H). Dans une deuxième étape, le bruit de charge estimé est également utilisé tant pour les approches POD et SVD. On remarque que la puissance sonore reconstruite par les deux dernières approches en se fondant uniquement sur les modes acoustiques les plus importants est similaire à celle prédite par l’analogie de FW&H. De plus, les modes les plus rayonnants estimés par la méthode SVD sont projetés sur la surface de la pale, mettant ainsi en évidence leurs emplacements. Il est alors prévu que cette identification soit utilisée comme guide pour l’ingénieur dans la conception d’une roue moins bruyante.
Resumo:
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.
Resumo:
This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.
Resumo:
Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.
Resumo:
It is well known that a statutory requirement of formality is associated with contracts concerning land. In this regard, s 59 of the Property Law Act 1974 (Qld) provides: No action may be brought upon any contract for the sale or other disposition of land or any interest in land unless the contract upon which such action is brought, or some memorandum or note of the contract, is in writing, and signed by the party to be charged, or by some person by the party lawfully authorised. In addition to the possibility of a formal contract, the statutory wording clearly contemplates reliance on an informal note or memorandum. To constitute a sufficient note or memorandum for the purposes of the statute, the signed note or memorandum must contain details of the parties to the contract, an adequate description of the property, the price and any other essential terms. It is also accepted that the doctrine of joinder may be invoked in circumstances where the document signed by the party to be charged contains an express or implied reference to any other document. In this way, a sufficient note or memorandum may be constituted by the joinder of a number of documents.
Resumo:
Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.
Resumo:
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.
Resumo:
This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models.