Biblioteca Digital

981 resultados para Tree structure

Tree structure for efficient data mining using rough sets

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.

Stable feature selection for clinical prediction: Exploiting ICD tree structure using Tree-Lasso

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.

K-tree : large scale document clustering

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

The structure of Norway spruce (Picea abies [L.] Karst.) stems in relation to wood properties of sawn timber

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An important challenge in forest industry is to get the appropriate raw material out from the forests to the wood processing industry. Growth and stem reconstruction simulators are therefore increasingly integrated in industrial conversion simulators, for linking the properties of wooden products to the three-dimensional structure of stems and their growing conditions. Static simulators predict the wood properties from stem dimensions at the end of a growth simulation period, whereas in dynamic approaches, the structural components, e.g. branches, are incremented along with the growth processes. The dynamic approach can be applied to stem reconstruction by predicting the three-dimensional stem structure from external tree variables (i.e. age, height) as a result of growth to the current state. In this study, a dynamic growth simulator, PipeQual, and a stem reconstruction simulator, RetroSTEM, are adapted to Norway spruce (Picea abies [L.] Karst.) to predict the three-dimensional structure of stems (tapers, branchiness, wood basic density) over time such that both simulators can be integrated in a sawing simulator. The parameterisation of the PipeQual and RetroSTEM simulators for Norway spruce relied on the theoretically based description of tree structure developing in the growth process and following certain conservative structural regularities while allowing for plasticity in the crown development. The crown expressed both regularity and plasticity in its development, as the vertical foliage density peaked regularly at about 5 m from the stem apex, varying below that with tree age and dominance position (Study I). Conservative stem structure was characterized in terms of (1) the pipe ratios between foliage mass and branch and stem cross-sectional areas at crown base, (2) the allometric relationship between foliage mass and crown length, (3) mean branch length relative to crown length and (4) form coefficients in branches and stem (Study II). The pipe ratio between branch and stem cross-sectional area at crown base, and mean branch length relative to the crown length may differ in trees before and after canopy closure, but the variation should be further analysed in stands of different ages and densities with varying site fertilities and climates. The predictions of the PipeQual and RetroSTEM simulators were evaluated by comparing the simulated values to measured ones (Study III, IV). Both simulators predicted stem taper and branch diameter at the individual tree level with a small bias. RetroSTEM predictions of wood density were accurate. For focusing on even more accurate predictions of stem diameters and branchiness along the stem, both simulators should be further improved by revising the following aspects in the simulators: the relationship between foliage and stem sapwood area in the upper stem, the error source in branch sizes, the crown base development and the height growth models in RetroSTEM. In Study V, the RetroSTEM simulator was integrated in the InnoSIM sawing simulator, and according to the pilot simulations, this turned out to be an efficient tool for readily producing stand scale information about stem sizes and structure when approximating the available assortments of wood products.

Fast algorithm for data exchange in reconfigurable tree structures

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a fast algorithm for data exchange in a network of processors organized as a reconfigurable tree structure. For a given data exchange table, the algorithm generates a sequence of tree configurations in which the data exchanges are to be executed. A significant feature of the algorithm is that each exchange is executed in a tree configuration in which the source and destination nodes are adjacent to each other. It has been proved in a theorem that for every pair of nodes in the reconfigurable tree structure, there always exists two and only two configurations in which these two nodes are adjacent to each other. The algorithm utilizes this fact and determines the solution so as to optimize both the number of configurations required and the time to perform the data exchanges. Analysis of the algorithm shows that it has linear time complexity, and provides a large reduction in run-time as compared to a previously proposed algorithm. This is well-confirmed from the experimental results obtained by executing a large number of randomly-generated data exchange tables. Another significant feature of the algorithm is that the bit-size of the routing information code is always two bits, irrespective of the number of nodes in the tree. This not only increases the speed of the algorithm but also results in simpler hardware inside each node.

Decision tree classification of land cover from remotely sensed data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Decision tree classification algorithms have significant potential for land cover mapping problems and have not been tested in detail by the remote sensing community relative to more conventional pattern recognition techniques such as maximum likelihood classification. In this paper, we present several types of decision tree classification algorithms arid evaluate them on three different remote sensing data sets. The decision tree classification algorithms tested include an univariate decision tree, a multivariate decision tree, and a hybrid decision tree capable of including several different types of classification algorithms within a single decision tree structure. Classification accuracies produced by each of these decision tree algorithms are compared with both maximum likelihood and linear discriminant function classifiers. Results from this analysis show that the decision tree algorithms consistently outperform the maximum likelihood and linear discriminant function classifiers in regard to classf — cation accuracy. In particular, the hybrid tree consistently produced the highest classification accuracies for the data sets tested. More generally, the results from this work show that decision trees have several advantages for remote sensing applications by virtue of their relatively simple, explicit, and intuitive classification structure. Further, decision tree algorithms are strictly nonparametric and, therefore, make no assumptions regarding the distribution of input data, and are flexible and robust with respect to nonlinear and noisy relations among input features and class labels.

Enhancing an evolving tree-based text document visualization model with fuzzy c-Means clustering

Relevância:

70.00% 70.00%

Publicador:

Resumo:

An improved evolving model, i.e., Evolving Tree (ETree) with Fuzzy c-Means (FCM), is proposed for undertaking text document visualization problems in this study. ETree forms a hierarchical tree structure in which nodes (i.e., trunks) are allowed to grow and split into child nodes (i.e., leaves), and each node represents a cluster of documents. However, ETree adopts a relatively simple approach to split its nodes. Thus, FCM is adopted as an alternative to perform node splitting in ETree. An experimental study using articles from a flagship conference of Universiti Malaysia Sarawak (UNIMAS), i.e., Engineering Conference (ENCON), is conducted. The experimental results are analyzed and discussed, and the outcome shows that the proposed ETree-FCM model is effective for undertaking text document clustering and visualization problems.

A new application of an evolving tree to failure mode and effect analysis methodology

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Failure Mode and Effect Analysis (FMEA) is a popular safety and reliability analysis methodology for examining potential failure modes of products, process, designs, or services, in a wide range of industries. Despite its popularity, there are a number of limitations of FMEA, and two highlighted issues are the bulky FMEA form and its intricacy of use. To overcome these shortcomings, we introduce the idea of visualisation pertaining to the failure modes or control actions in FMEA. A visualisation model with an incremental learning feature, i.e., the evolving tree (ETree), is adopted to allow the failure modes or control actions in FMEA to be clustered and visualized. The failure modes or control actions are grouped and visualized with consideration of their Severity, Occurrence, and Detection scores. Our proposed approach allows the failure modes or control actions to be mapped into a tree structure for visualisation. The devised approach is evaluated with a benchmark problem. The experiments show that the control actions of FMEA can be visualised through the tree structure, which provides a quick and easily understandable platform of the FMEA spreadsheet to facilitate decision making tasks.

Clustering and visualization of failure modes using an evolving tree

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Despite the popularity of Failure Mode and Effect Analysis (FMEA) in a wide range of industries, two well-known shortcomings are the complexity of the FMEA worksheet and its intricacy of use. To the best of our knowledge, the use of computation techniques for solving the aforementioned shortcomings is limited. As such, the idea of clustering and visualization pertaining to the failure modes in FMEA is proposed in this paper. A neural network visualization model with an incremental learning feature, i.e., the evolving tree (ETree), is adopted to allow the failure modes in FMEA to be clustered and visualized as a tree structure. In addition, the ideas of risk interval and risk ordering for different groups of failure modes are proposed to allow the failure modes to be ordered, analyzed, and evaluated in groups. The main advantages of the proposed method lie in its ability to transform failure modes in a complex FMEA worksheet to a tree structure for better visualization, while maintaining the risk evaluation and ordering features. It can be applied to the conventional FMEA methodology without requiring additional information or data. A real world case study in the edible bird nest industry in Sarawak (Borneo Island) is used to evaluate the usefulness of the proposed method. The experiments show that the failure modes in FMEA can be effectively visualized through the tree structure. A discussion with FMEA users engaged in the case study indicates that such visualization is helpful in comprehending and analyzing the respective failure modes, as compared with those in an FMEA table. The resulting tree structure, together with risk interval and risk ordering, provides a quick and easily understandable framework to elucidate important information from complex FMEA forms; therefore facilitating the decision-making tasks by FMEA users. The significance of this study is twofold, viz., the use of a computational visualization approach to tackling two well-known shortcomings of FMEA; and the use of ETree as an effective neural network learning paradigm to facilitate FMEA implementations. These findings aim to spearhead the potential adoption of FMEA as a useful and usable risk evaluation and management tool by the wider community.

Classical simulation of quantum many-body systems with a tree tensor network

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We show how to efficiently simulate a quantum many-body system with tree structure when its entanglement (Schmidt number) is small for any bipartite split along an edge of the tree. As an application, we show that any one-way quantum computation on a tree graph can be efficiently simulated with a classical computer.

Hash-Tree Anti-Tampering Schemes

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Procedures that provide detection, location and correction of tampering in documents are known as anti-tampering schemes. In this paper we describe how to construct an anti-tampering scheme using a pre-computed tree of hashes. The main problems of constructing such a scheme are its computational feasibility and its candidate reduction process. We show how to solve both problems by the use of secondary hashing over a tree structure. Finally, we give brief comments on our ongoing work in this area.

Fast exact nearest neighbour matching in high dimensions using d-D Sort

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Data structures such as k-D trees and hierarchical k-means trees perform very well in approximate k nearest neighbour matching, but are only marginally more effective than linear search when performing exact matching in high-dimensional image descriptor data. This paper presents several improvements to linear search that allows it to outperform existing methods and recommends two approaches to exact matching. The first method reduces the number of operations by evaluating the distance measure in order of significance of the query dimensions and terminating when the partial distance exceeds the search threshold. This method does not require preprocessing and significantly outperforms existing methods. The second method improves query speed further by presorting the data using a data structure called d-D sort. The order information is used as a priority queue to reduce the time taken to find the exact match and to restrict the range of data searched. Construction of the d-D sort structure is very simple to implement, does not require any parameter tuning, and requires significantly less time than the best-performing tree structure, and data can be added to the structure relatively efficiently.

Product feature taxonomy learning based on user reviews

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent years, the Web 2.0 has provided considerable facilities for people to create, share and exchange information and ideas. Upon this, the user generated content, such as reviews, has exploded. Such data provide a rich source to exploit in order to identify the information associated with specific reviewed items. Opinion mining has been widely used to identify the significant features of items (e.g., cameras) based upon user reviews. Feature extraction is the most critical step to identify useful information from texts. Most existing approaches only find individual features about a product without revealing the structural relationships between the features which usually exist. In this paper, we propose an approach to extract features and feature relationships, represented as a tree structure called feature taxonomy, based on frequent patterns and associations between patterns derived from user reviews. The generated feature taxonomy profiles the product at multiple levels and provides more detailed information about the product. Our experiment results based on some popularly used review datasets show that our proposed approach is able to capture the product features and relations effectively.

Structured feature extraction using association rules

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As of today, opinion mining has been widely used to iden- tify the strength and weakness of products (e.g., cameras) or services (e.g., services in medical clinics or hospitals) based upon people's feed- back such as user reviews. Feature extraction is a crucial step for opinion mining which has been used to collect useful information from user reviews. Most existing approaches only find individual features of a product without the structural relationships between the features which usually exists. In this paper, we propose an approach to extract features and feature relationship, represented as tree structure called a feature hi- erarchy, based on frequent patterns and associations between patterns derived from user reviews. The generated feature hierarchy profiles the product at multiple levels and provides more detailed information about the product. Our experiment results based on some popularly used review datasets show that the proposed feature extraction approach can identify more correct features than the baseline model. Even though the datasets used in the experiment are about cameras, our work can be ap- plied to generate features about a service such as the services in hospitals or clinics.

Spatial prediction on a river network

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This article develops methods for spatially predicting daily change of dissolved oxygen (Dochange) at both sampled locations (134 freshwater sites in 2002 and 2003) and other locations of interest throughout a river network in South East Queensland, Australia. In order to deal with the relative sparseness of the monitoring locations in comparison to the number of locations where one might want to make predictions, we make a classification of the river and stream locations. We then implement optimal spatial prediction (ordinary and constrained kriging) from geostatistics. Because of their directed-tree structure, rivers and streams offer special challenges. A complete approach to spatial prediction on a river network is given, with special attention paid to environmental exceedances. The methodology is used to produce a map of Dochange predictions for 2003. Dochange is one of the variables measured as part of the Ecosystem Health Monitoring Program conducted within the Moreton Bay Waterways and Catchments Partnership.

«
1
2
3
4
5
6
7
8
...
65
66
»