994 resultados para Hierarchical document


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Le processus de planification forestière hiérarchique présentement en place sur les terres publiques risque d’échouer à deux niveaux. Au niveau supérieur, le processus en place ne fournit pas une preuve suffisante de la durabilité du niveau de récolte actuel. À un niveau inférieur, le processus en place n’appuie pas la réalisation du plein potentiel de création de valeur de la ressource forestière, contraignant parfois inutilement la planification à court terme de la récolte. Ces échecs sont attribuables à certaines hypothèses implicites au modèle d’optimisation de la possibilité forestière, ce qui pourrait expliquer pourquoi ce problème n’est pas bien documenté dans la littérature. Nous utilisons la théorie de l’agence pour modéliser le processus de planification forestière hiérarchique sur les terres publiques. Nous développons un cadre de simulation itératif en deux étapes pour estimer l’effet à long terme de l’interaction entre l’État et le consommateur de fibre, nous permettant ainsi d’établir certaines conditions pouvant mener à des ruptures de stock. Nous proposons ensuite une formulation améliorée du modèle d’optimisation de la possibilité forestière. La formulation classique du modèle d’optimisation de la possibilité forestière (c.-à-d., maximisation du rendement soutenu en fibre) ne considère pas que le consommateur de fibre industriel souhaite maximiser son profit, mais suppose plutôt la consommation totale de l’offre de fibre à chaque période, peu importe le potentiel de création de valeur de celle-ci. Nous étendons la formulation classique du modèle d’optimisation de la possibilité forestière afin de permettre l’anticipation du comportement du consommateur de fibre, augmentant ainsi la probabilité que l’offre de fibre soit entièrement consommée, rétablissant ainsi la validité de l’hypothèse de consommation totale de l’offre de fibre implicite au modèle d’optimisation. Nous modélisons la relation principal-agent entre le gouvernement et l’industrie à l’aide d’une formulation biniveau du modèle optimisation, où le niveau supérieur représente le processus de détermination de la possibilité forestière (responsabilité du gouvernement), et le niveau inférieur représente le processus de consommation de la fibre (responsabilité de l’industrie). Nous montrons que la formulation biniveau peux atténuer le risque de ruptures de stock, améliorant ainsi la crédibilité du processus de planification forestière hiérarchique. Ensemble, le modèle biniveau d’optimisation de la possibilité forestière et la méthodologie que nous avons développée pour résoudre celui-ci à l’optimalité, représentent une alternative aux méthodes actuellement utilisées. Notre modèle biniveau et le cadre de simulation itérative représentent un pas vers l’avant en matière de technologie de planification forestière axée sur la création de valeur. L’intégration explicite d’objectifs et de contraintes industrielles au processus de planification forestière, dès la détermination de la possibilité forestière, devrait favoriser une collaboration accrue entre les instances gouvernementales et industrielles, permettant ainsi d’exploiter le plein potentiel de création de valeur de la ressource forestière.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Business Process Management (BPM) has increased in popularity and maturity in recent years. Large enterprises engage use process management approaches to model, manage and refine repositories of process models that detail the whole enterprise. These process models can run to the thousands in number, and may contain large hierarchies of tasks and control structures that become cumbersome to maintain. Tools are therefore needed to effectively traverse this process model space in an efficient manner, otherwise the repositories remain hard to use, and thus are lowered in their effectiveness. In this paper we analyse a range of BPM tools for their effectiveness in handling large process models. We establish that the present set of commercial tools is lacking in key areas regarding visualisation of, and interaction with, large process models. We then present six tool functionalities for the development of advanced business process visualisation and interaction, presenting a design for a tool that will exploit the latest advances in 2D and 3D computer graphics to enable fast and efficient search, traversal and modification of process models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This approach to sustainable design explores the possibility of creating an architectural design process which can iteratively produce optimised and sustainable design solutions. Driven by an evolution process based on genetic algorithms, the system allows the designer to “design the building design generator” rather than to “designs the building”. The design concept is abstracted into a digital design schema, which allows transfer of the human creative vision into the rational language of a computer. The schema is then elaborated into the use of genetic algorithms to evolve innovative, performative and sustainable design solutions. The prioritisation of the project’s constraints and the subsequent design solutions synthesised during design generation are expected to resolve most of the major conflicts in the evaluation and optimisation phases. Mosques are used as the example building typology to ground the research activity. The spatial organisations of various mosque typologies are graphically represented by adjacency constraints between spaces. Each configuration is represented by a planar graph which is then translated into a non-orthogonal dual graph and fed into the genetic algorithm system with fixed constraints and expected performance criteria set to govern evolution. The resultant Hierarchical Evolutionary Algorithmic Design System is developed by linking the evaluation process with environmental assessment tools to rank the candidate designs. The proposed system generates the concept, the seed, and the schema, and has environmental performance as one of the main criteria in driving optimisation.