897 resultados para information bottleneck method


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The reduction of size of ensemble classifiers is important for various security applications. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization based methods. The present paper introduces and investigates a new pruning technique. It is called a Three-Level Pruning Technique, TLPT, because it simultaneously combines all three approaches in three levels of the process. This paper investigates the TLPT method combining the state-of-the-art ranking of the Ensemble Pruning via Individual Contribution ordering, EPIC, the clustering of the K-Means Pruning, KMP, and the optimisation method of Directed Hill Climbing Ensemble Pruning, DHCEP, for a phishing dataset. Our new experiments presented in this paper show that the TLPT is competitive in comparison to EPIC, KMP and DHCEP, and can achieve better outcomes. These experimental results demonstrate the effectiveness of the TLPT technique in this example of information security application.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

How to identify influential nodes is still an open hot issue in complex networks. Lots of methods (e.g., degree centrality, betweenness centrality or K-shell) are based on the topology of a network. These methods work well in scale-free networks. In order to design a universal method suitable for networks with different topologies, this paper proposes a Multiple Attribute Fusion (MAF) method through combining topological attributes and diffused attributes of a node together. Two fusion strategies have been proposed in this paper. One is based on the attribute union (FU), and the other is based on the attribute ranking (FR). Simulation results in the Susceptible-Infected (SI) model show that our proposed method gains more information propagation efficiency in different types of networks. © 2014 Springer International Publishing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray data datasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we address the problems of fully automatic localization and segmentation of 3D vertebral bodies from CT/MR images. We propose a learning-based, unified random forest regression and classification framework to tackle these two problems. More specifically, in the first stage, the localization of 3D vertebral bodies is solved with random forest regression where we aggregate the votes from a set of randomly sampled image patches to get a probability map of the center of a target vertebral body in a given image. The resultant probability map is then further regularized by Hidden Markov Model (HMM) to eliminate potential ambiguity caused by the neighboring vertebral bodies. The output from the first stage allows us to define a region of interest (ROI) for the segmentation step, where we use random forest classification to estimate the likelihood of a voxel in the ROI being foreground or background. The estimated likelihood is combined with the prior probability, which is learned from a set of training data, to get the posterior probability of the voxel. The segmentation of the target vertebral body is then done by a binary thresholding of the estimated probability. We evaluated the present approach on two openly available datasets: 1) 3D T2-weighted spine MR images from 23 patients and 2) 3D spine CT images from 10 patients. Taking manual segmentation as the ground truth (each MR image contains at least 7 vertebral bodies from T11 to L5 and each CT image contains 5 vertebral bodies from L1 to L5), we evaluated the present approach with leave-one-out experiments. Specifically, for the T2-weighted MR images, we achieved for localization a mean error of 1.6 mm, and for segmentation a mean Dice metric of 88.7% and a mean surface distance of 1.5 mm, respectively. For the CT images we achieved for localization a mean error of 1.9 mm, and for segmentation a mean Dice metric of 91.0% and a mean surface distance of 0.9 mm, respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Privacy preserving on data mining and data release has attracted an increasing research interest over a number of decades. Differential privacy is one influential privacy notion that offers a rigorous and provable privacy guarantee for data mining and data release. Existing studies on differential privacy assume that in a data set, records are sampled independently. However, in real-world applications, records in a data set are rarely independent. The relationships among records are referred to as correlated information and the data set is defined as correlated data set. A differential privacy technique performed on a correlated data set will disclose more information than expected, and this is a serious privacy violation. Although recent research was concerned with this new privacy violation, it still calls for a solid solution for the correlated data set. Moreover, how to decrease the large amount of noise incurred via differential privacy in correlated data set is yet to be explored. To fill the gap, this paper proposes an effective correlated differential privacy solution by defining the correlated sensitivity and designing a correlated data releasing mechanism. With consideration of the correlated levels between records, the proposed correlated sensitivity can significantly decrease the noise compared with traditional global sensitivity. The correlated data releasing mechanism correlated iteration mechanism is designed based on an iterative method to answer a large number of queries. Compared with the traditional method, the proposed correlated differential privacy solution enhances the privacy guarantee for a correlated data set with less accuracy cost. Experimental results show that the proposed solution outperforms traditional differential privacy in terms of mean square error on large group of queries. This also suggests the correlated differential privacy can successfully retain the utility while preserving the privacy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Autonomous Wireless sensor networks(WSNs) have sensors that are usually deployed randomly to monitor one or more phenomena. They are attractive for information discovery in large-scale data rich environments and can add value to mission–critical applications such as battlefield surveillance and emergency response systems. However, in order to fully exploit these networks for such applications, energy efficient, load balanced and scalable solutions for information discovery are essential. Multi-dimensional autonomous WSNs are deployed in complex environments to sense and collect data relating to multiple attributes (multi-dimensional data). Such networks present unique challenges to data dissemination, data storage of in-network information discovery. In this paper, we propose a novel method for information discovery for multi-dimensional autonomous WSNs which sensors are deployed randomly that can significantly increase network lifetime and minimize query processing latency, resulting in quality of service (QoS) improvements that are of immense benefit to mission–critical applications. We present simulation results to show that the proposed approach to information discovery offers significant improvements on query resolution latency compared with current approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The methodology for selecting the individual numerical scale and prioritization method has recently been presented and justified in the analytic hierarchy process (AHP). In this study, we further propose a novel AHP-group decision making (GDM) model in a local context (a unique criterion), based on the individual selection of the numerical scale and prioritization method. The resolution framework of the AHP-GDM with the individual numerical scale and prioritization method is first proposed. Then, based on linguistic Euclidean distance (LED) and linguistic minimum violations (LMV), the novel consensus measure is defined so that the consensus degree among decision makers who use different numerical scales and prioritization methods can be analyzed. Next, a consensus reaching model is proposed to help decision makers improve the consensus degree. In this consensus reaching model, the LED-based and LMV-based consensus rules are proposed and used. Finally, a new individual consistency index and its properties are proposed for the use of the individual numerical scale and prioritization method in the AHP-GDM. Simulation experiments and numerical examples are presented to demonstrate the validity of the proposed model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The global diffusion of epidemics, computer viruses, and rumors causes great damage to our society. It is critical to identify the diffusion sources and timely quarantine them. However, most methods proposed so far are unsuitable for diffusion with multiple sources because of the high computational cost and the complex spatiotemporal diffusion processes. In this paper, based on the knowledge of infected nodes and their connections, we propose a novel method to identify multiple diffusion sources, which can address three main issues in this area: 1) how many sources are there? 2) where did the diffusion emerge? and 3) when did the diffusion break out? We first derive an optimization formulation for multi-source identification problem. This is based on altering the original network into a new network concerning two key elements: 1) propagation probability and 2) the number of hops between nodes. Experiments demonstrate that the altered network can accurately reflect the complex diffusion processes with multiple sources. Second, we derive a fast method to optimize the formulation. It has been proved that the proposed method is convergent and the computational complexity is O(mn log α) , where α = α (m,n) is the slowly growing inverse-Ackermann function, n is the number of infected nodes, and m is the number of edges connecting them. Finally, we introduce an efficient algorithm to estimate the spreading time and the number of diffusion sources. To evaluate the proposed method, we compare the proposed method with many competing methods in various real-world network topologies. Our method shows significant advantages in the estimation of multiple sources and the prediction of spreading time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Drinking water distribution networks risk exposure to malicious or accidental contamination. Several levels of responses are conceivable. One of them consists to install a sensor network to monitor the system on real time. Once a contamination has been detected, this is also important to take appropriate counter-measures. In the SMaRT-OnlineWDN project, this relies on modeling to predict both hydraulics and water quality. An online model use makes identification of the contaminant source and simulation of the contaminated area possible. The objective of this paper is to present SMaRT-OnlineWDN experience and research results for hydraulic state estimation with sampling frequency of few minutes. A least squares problem with bound constraints is formulated to adjust demand class coefficient to best fit the observed values at a given time. The criterion is a Huber function to limit the influence of outliers. A Tikhonov regularization is introduced for consideration of prior information on the parameter vector. Then the Levenberg-Marquardt algorithm is applied that use derivative information for limiting the number of iterations. Confidence intervals for the state prediction are also given. The results are presented and discussed on real networks in France and Germany.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article highlights the potential benefits that the Kohonen method has for the classification of rivers with similar characteristics by determining regional ecological flows using the ELOHA (Ecological Limits of Hydrologic Alteration) methodology. Currently, there are many methodologies for the classification of rivers, however none of them include the characteristics found in Kohonen method such as (i) providing the number of groups that actually underlie the information presented, (ii) used to make variable importance analysis, (iii) which in any case can display two-dimensional classification process, and (iv) that regardless of the parameters used in the model the clustering structure remains. In order to evaluate the potential benefits of the Kohonen method, 174 flow stations distributed along the great river basin “Magdalena-Cauca” (Colombia) were analyzed. 73 variables were obtained for the classification process in each case. Six trials were done using different combinations of variables and the results were validated against reference classification obtained by Ingfocol in 2010, whose results were also framed using ELOHA guidelines. In the process of validation it was found that two of the tested models reproduced a level higher than 80% of the reference classification with the first trial, meaning that more than 80% of the flow stations analyzed in both models formed invariant groups of streams.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis provides three original contributions to the field of Decision Sciences. The first contribution explores the field of heuristics and biases. New variations of the Cognitive Reflection Test (CRT--a test to measure "the ability or disposition to resist reporting the response that first comes to mind"), are provided. The original CRT (S. Frederick [2005] Journal of Economic Perspectives, v. 19:4, pp.24-42) has items in which the response is immediate--and erroneous. It is shown that by merely varying the numerical parameters of the problems, large deviations in response are found. Not only the final results are affected by the proposed variations, but so is processing fluency. It seems that numbers' magnitudes serve as a cue to activate system-2 type reasoning. The second contribution explores Managerial Algorithmics Theory (M. Moldoveanu [2009] Strategic Management Journal, v. 30, pp. 737-763); an ambitious research program that states that managers display cognitive choices with a "preference towards solving problems of low computational complexity". An empirical test of this hypothesis is conducted, with results showing that this premise is not supported. A number of problems are designed with the intent of testing the predictions from managerial algorithmics against the predictions of cognitive psychology. The results demonstrate (once again) that framing effects profoundly affect choice, and (an original insight) that managers are unable to distinguish computational complexity problem classes. The third contribution explores a new approach to a computationally complex problem in marketing: the shelf space allocation problem (M-H Yang [2001] European Journal of Operational Research, v. 131, pp.107--118). A new representation for a genetic algorithm is developed, and computational experiments demonstrate its feasibility as a practical solution method. These studies lie at the interface of psychology and economics (with bounded rationality and the heuristics and biases programme), psychology, strategy, and computational complexity, and heuristics for computationally hard problems in management science.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The stretch zone width (SZW) data for 15-5PH steel CTOD specimens fractured at -150 degrees C to + 23 degrees C temperature were measured based on focused images and 3D maps obtained by extended depth-of-field reconstruction from light microscopy (LM) image stacks. This LM-based method, with a larger lateral resolution, seems to be as effective for quantitative analysis of SZW as scanning electron microscopy (SEM) or confocal scanning laser microscopy (CSLM), permitting to clearly identify stretch zone boundaries. Despite the worst sharpness of focused images, a robust linear correlation was established to fracture toughness (KC) and SZW data for the 15-5PH steel tested specimens, measured at their center region. The method is an alternative to evaluate the boundaries of stretched zones, at a lower cost of implementation and training, since topographic data from elevation maps can be associated with reconstructed image, which summarizes the original contrast and brightness information. Finally, the extended depth-of-field method is presented here as a valuable tool for failure analysis, as a cheaper alternative to investigate rough surfaces or fracture, compared to scanning electron or confocal light microscopes. Microsc. Res. Tech. 75:11551158, 2012. (C) 2012 Wiley Periodicals, Inc.