22 resultados para average of mutual information (AMI)
em Indian Institute of Science - Bangalore - Índia
Resumo:
A two-stage iterative algorithm for selecting a subset of a training set of samples for use in a condensed nearest neighbor (CNN) decision rule is introduced. The proposed method uses the concept of mutual nearest neighborhood for selecting samples close to the decision line. The efficacy of the algorithm is brought out by means of an example.
Resumo:
Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.
Resumo:
A method for determining the mutual nearest neighbours (MNN) and mutual neighbourhood value (mnv) of a sample point, using the conventional nearest neighbours, is suggested. A nonparametric, hierarchical, agglomerative clustering algorithm is developed using the above concepts. The algorithm is simple, deterministic, noniterative, requires low storage and is able to discern spherical and nonspherical clusters. The method is applicable to a wide class of data of arbitrary shape, large size and high dimensionality. The algorithm can discern mutually homogenous clusters. Strong or weak patterns can be discerned by properly choosing the neighbourhood width.
Resumo:
A nonparametric, hierarchical, disaggregative clustering algorithm is developed using a novel similarity measure, called the mutual neighborhood value (MNV), which takes into account the conventional nearest neighbor ranks of two samples with respect to each other. The algorithm is simple, noniterative, requires low storage, and needs no specification of the expected number of clusters. The algorithm appears very versatile as it is capable of discerning spherical and nonspherical clusters, linearly nonseparable clusters, clusters with unequal populations, and clusters with lowdensity bridges. Changing of the neighborhood size enables discernment of strong or weak patterns.
Resumo:
The production of rainfed crops in semi-arid tropics exhibits large variation in response to the variation in seasonal rainfall. There are several farm-level decisions such as the choice of cropping pattern, whether to invest in fertilizers, pesticides etc., the choice of the period for planting, plant population density etc. for which the appropriate choice (associated with maximum production or minimum risk) depends upon the nature of the rainfall variability or the prediction for a specific year. In this paper, we have addressed the problem of identifying the appropriate strategies for cultivation of rainfed groundnut in the Anantapur region in a semi-arid part of the Indian peninsula. The approach developed involves participatory research with active collaboration with farmers, so that the problems with perceived need are addressed with the modern tools and data sets available. Given the large spatial variation of climate and soil, the appropriate strategies are necessarily location specific. With the approach adopted, it is possible to tap the detailed location specific knowledge of the complex rainfed ecosystem and gain an insight into the variety of options of land use and management practices available to each category of stakeholders. We believe such a participatory approach is essential for identifying strategies that have a favourable cost-benefit ratio over the region considered and hence are associated with a high chance of acceptance by the stakeholders. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
We conducted surveys of fire and fuels managers at local, regional, and national levels to gain insights into decision processes and information flows in wildfire management. Survey results in the form of fire managers’ decision calendars show how climate information needs vary seasonally, over space, and through the organizational network, and help determine optimal points for introducing climate information and forecasts into decision processes. We identified opportunities to use climate information in fire management, including seasonal to interannual climate forecasts at all organizational levels, to improve the targeting of fuels treatments and prescribed burns, the positioning and movement of initial attack resources, and staffing and budgeting decisions. Longer-term (5–10 years) outlooks also could be useful at the national level in setting budget and research priorities. We discuss these opportunities and examine the kinds of organizational changes that could facilitate effective use of existing climate information and climate forecast capabilities.
Resumo:
A novel approach that can more effectively use the structural information provided by the traditional imaging modalities in multimodal diffuse optical tomographic imaging is introduced. This approach is based on a prior image-constrained-l(1) minimization scheme and has been motivated by the recent progress in the sparse image reconstruction techniques. It is shown that the proposed framework is more effective in terms of localizing the tumor region and recovering the optical property values both in numerical and gelatin phantom cases compared to the traditional methods that use structural information. (C) 2012 Optical Society of America
Resumo:
We study the optimal control problem of maximizing the spread of an information epidemic on a social network. Information propagation is modeled as a susceptible-infected (SI) process, and the campaign budget is fixed. Direct recruitment and word-of-mouth incentives are the two strategies to accelerate information spreading (controls). We allow for multiple controls depending on the degree of the nodes/individuals. The solution optimally allocates the scarce resource over the campaign duration and the degree class groups. We study the impact of the degree distribution of the network on the controls and present results for Erdos-Renyi and scale-free networks. Results show that more resource is allocated to high-degree nodes in the case of scale-free networks, but medium-degree nodes in the case of Erdos-Renyi networks. We study the effects of various model parameters on the optimal strategy and quantify the improvement offered by the optimal strategy over the static and bang-bang control strategies. The effect of the time-varying spreading rate on the controls is explored as the interest level of the population in the subject of the campaign may change over time. We show the existence of a solution to the formulated optimal control problem, which has nonlinear isoperimetric constraints, using novel techniques that is general and can be used in other similar optimal control problems. This work may be of interest to political, social awareness, or crowdfunding campaigners and product marketing managers, and with some modifications may be used for mitigating biological epidemics.
Resumo:
The aim of the present study was to draw inferences regarding the properties of single cells responsible for co-operative behaviour in the slug of the soil amoeba Dictyostelium discoideum. The slug is an integrated multicellular mass formed by the aggregation of starved cells. The amoebae comprising the slug differentiate according to their spatial locations relative to one another, implying that, as in the case of other regulative embryos, they must be in mutual communication. We have previously shown that one manifestation of this communication is the time taken for the anteriormost fragment of the slug, the tip, to regenerate from slugs which have been rendered tipless by amputation. We present results of tip-regeneration experiments performed on genetically mosaic slugs. By comparing the mosaics with their component pure genotypes, we were able to discriminate between a set of otherwise equally plausible modes of intercellular signalling. Neither a'pacemaker' model, in which the overall rate of tip regeneration is determined by the cell with the highest frequency of autonomous oscillation, nor an 'independent-particle' model, in which the rate of regeneration is the arithmetical average of independent cell-dependent rates, is in quantitative accord with our findings. Our results are best explained by a form of signalling which operates by means of cell-to-cell relay. Therefore intercellular communication Seems to be essential for tip regeneration.
Resumo:
A survey of amphibian mortality on roads was carried out in the Sharavathi river basin in the central Western Ghats. Road kills in three different land use areas: agricultural fields, water bodies and forests were recorded for four days along three 100m stretches in each type of area. One-hundred-and-forty-four individuals belonging to two orders, eight families, 11 genera and 13 species were recorded in the survey. Kills/km observed were: in forest 55, agricultural fields 38 and water bodies 27, for an overall average of 40 kills/km. Kill species compositions varied significantly between land use areas, but not overall kill rates.
Resumo:
The significance of treating rainfall as a chaotic system instead of a stochastic system for a better understanding of the underlying dynamics has been taken up by various studies recently. However, an important limitation of all these approaches is the dependence on a single method for identifying the chaotic nature and the parameters involved. Many of these approaches aim at only analyzing the chaotic nature and not its prediction. In the present study, an attempt is made to identify chaos using various techniques and prediction is also done by generating ensembles in order to quantify the uncertainty involved. Daily rainfall data of three regions with contrasting characteristics (mainly in the spatial area covered), Malaprabha, Mahanadi and All-India for the period 1955-2000 are used for the study. Auto-correlation and mutual information methods are used to determine the delay time for the phase space reconstruction. Optimum embedding dimension is determined using correlation dimension, false nearest neighbour algorithm and also nonlinear prediction methods. The low embedding dimensions obtained from these methods indicate the existence of low dimensional chaos in the three rainfall series. Correlation dimension method is done on th phase randomized and first derivative of the data series to check whether the saturation of the dimension is due to the inherent linear correlation structure or due to low dimensional dynamics. Positive Lyapunov exponents obtained prove the exponential divergence of the trajectories and hence the unpredictability. Surrogate data test is also done to further confirm the nonlinear structure of the rainfall series. A range of plausible parameters is used for generating an ensemble of predictions of rainfall for each year separately for the period 1996-2000 using the data till the preceding year. For analyzing the sensitiveness to initial conditions, predictions are done from two different months in a year viz., from the beginning of January and June. The reasonably good predictions obtained indicate the efficiency of the nonlinear prediction method for predicting the rainfall series. Also, the rank probability skill score and the rank histograms show that the ensembles generated are reliable with a good spread and skill. A comparison of results of the three regions indicates that although they are chaotic in nature, the spatial averaging over a large area can increase the dimension and improve the predictability, thus destroying the chaotic nature. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
The interaction of the protein atoms with the surrounding water oxygen atoms has been computed for 392 protein chains from 369 protein structures belonging to 90% non-homologous high resolution (<= 1.5 angstrom) protein Structures with a crystallographic R-factor <= 20%. The percentage composition of the polar atoms is found to be 36.3%. An average of 82.55% of water oxygen atoms are found to be in the primary hydration shell and 15.12% in the secondary hydration shell. The average Percentage of interactions of water oxygen atoms with the polar atoms of the main chain and side chain are 54% and 46%. respectively. The interaction of the acidic residues, aspartate and glutamate, with the water oxygen atoms is more when compared to that of the other residues.
Resumo:
The problem of estimating the time-dependent statistical characteristics of a random dynamical system is studied under two different settings. In the first, the system dynamics is governed by a differential equation parameterized by a random parameter, while in the second, this is governed by a differential equation with an underlying parameter sequence characterized by a continuous time Markov chain. We propose, for the first time in the literature, stochastic approximation algorithms for estimating various time-dependent process characteristics of the system. In particular, we provide efficient estimators for quantities such as the mean, variance and distribution of the process at any given time as well as the joint distribution and the autocorrelation coefficient at different times. A novel aspect of our approach is that we assume that information on the parameter model (i.e., its distribution in the first case and transition probabilities of the Markov chain in the second) is not available in either case. This is unlike most other work in the literature that assumes availability of such information. Also, most of the prior work in the literature is geared towards analyzing the steady-state system behavior of the random dynamical system while our focus is on analyzing the time-dependent statistical characteristics which are in general difficult to obtain. We prove the almost sure convergence of our stochastic approximation scheme in each case to the true value of the quantity being estimated. We provide a general class of strongly consistent estimators for the aforementioned statistical quantities with regular sample average estimators being a specific instance of these. We also present an application of the proposed scheme on a widely used model in population biology. Numerical experiments in this framework show that the time-dependent process characteristics as obtained using our algorithm in each case exhibit excellent agreement with exact results. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
We have made careful counts of the exact number of spore, stalk and basal disc cells in small fruiting bodies of Dictyostelium discoideum (undifferentiated amoebae are found only rarely and on average their fraction is 4.96 x 10(-4)). (i) Within aggregates of a given size, the relative apportioning of amoebae to the main cell types occurs with a remarkable degree of precision. In most cases the coefficient of variation (c.v.) in the mean fraction of cells that form spores is within 4.86%. The contribution of stalk and basal disc cells is highly variable when considered separately (c.v.'s upto 25% and 100%, respectively), but markedly less so when considered together. Calculations based on theoretical models indicate that purely cell-autonomous specification of cell, fate cannot account for die observed accuracy of proportioning. Cell-autonomous determination to a prestalk or prespore condition followed by cell type interconversion, and stabilised by feedbacks, suffices to explain the measured accuracy. (ii) The fraction of amoebae that differentiates into spores increases monotonically with the total number of cells. This fraction rises from an average of 73.6% for total cell numbers below 30 and reaches 86.0% for cell numbers between 170 and 200 (it remains steady thereafter at around 86%). Correspondingly, the fraction of amoebae differentiating into stalk or basal disc decreases viith total size. These trends are in accordance with evolutionary expectations and imply that a mechanism for sensing the overall size of the aggregate also plays an essential role in the determination of cell-type proportions.