Biblioteca Digital

31 resultados para mining right

em Indian Institute of Science - Bangalore - Índia

Polymorphism and conformational flexibility of DNA: right and left handed duplexes

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Left handed duplexes are shown to be in agreement with the X-ray intensity data of A-, B- and D-forms of DNA. The structures are stereochemically satisfactory because they were obtained following a stereochemical guideline derived from theory and single crystal structure data of nucleic acid components. The same stereochemical guideline also led to right handed duplexes for B- and D-forms of DNA which have stereochemically preferred conformation and hence are superior to those given by Arnott and coworkers.

Sequence-dependent molecular conformation of polynucleotides: right and left-handed helices

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based upon a stereochemical guideline, two topologically distinct types of helicalduplexes have been deduced for a polynucleotide duplex with alternating purine pyrimidine sequence (PAPP): (a) right-handed uniform (RU) helix and (b) left-handed zig-zag (LZ) helix. Both structures have trinucleoside diphosphate as the basic unit wherein the purine pyrimidine fragment has a different conformation from the pyrimidine-purine fragment. Thus, RU and LZ helices represent two different classes of sequence-dependent molecular conformations for PAPP. The conformationalf eatures of an RU helix of PAPP in B-form and three LZ-helices for B-, D- and Z-forms are discussed.

Inferring neuronal network connectivity from spike data: A temporal data mining approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.

Microbial aspects of acid generation and bioremediation with relevance to indian mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The role of Acidithiobacillus group of bacteria in acid generation and heavy metal dissolution was studied with relevance to some Indian mines. Microorganisms implicated in acid generation such as Acidithiobacillus Acidithicibacillus thiooxidans and Leptospirillum ferrooxidans were isolated from abandoned mines, waste rocks and tailing dumps. Arsenite oxidizing Thiomonas and Bacillus group of bacteria were isolated and their ability to oxidize As (111) to As (V) established. Mine isolated Sulfate reducing bacteria were used to remove dissolved copper, zinc, iron and arsenic from solutions.

A self-adaptive migration model genetic algorithm for data mining applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.

Hybrid learning scheme for data mining applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Hybrid learning scheme for data mining applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.

Data mining approaches to software fault diagnosis

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.

A hitchhiker's guide to a crowded syconium: how do fig nematodes find the right ride?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

P>1Organisms with low mobility, living within ephemeral environments,need to find vehicles that can disperse them reliably to new environments. The requirement for specificity in this passenger-vehicle relationship is enhanced within a tritrophic interaction when the environment of passenger and vehicle is provided by a third organism. Such relationships pose many interesting questions about specificity within a tritrophic framework. 2. Central to understanding how these tritrophic systems have evolved, is knowing how they function now. Determining the proximal cues and sensory modalities used by passengers to find vehicles and to discriminate between reliable and non-reliable vehicles is, therefore, essential to this investigation. 3. The ancient, co-evolved and highly species-specific nursery pollination mutualism between figs and fig wasps is host to species-specific plant-parasitic nematodes which use fig wasps to travel between figs. Since individual globular fig inflorescences, i.e. syconia, serve as incubators for hundreds of developing pollinating and parasitic wasps, a dispersal-stage nematode within such a chemically,complex and physically crowded environment is faced with the dilemma of choosing the right vehicle for dispersal into a new fig. Such a system therefore affords excellent opportunities to investigate mechanisms that contribute to the evolution of specificity between the passenger and the vehicle. 4. In this study of fig-wasp-nematode tritrophic interactions in Ficus racemosa within which seven wasp species can breed, we demonstrate using two-choice as well as cafeteria assays that plant-parasitic nematodes (Schistonchus racemosa) do not hitch rides randomly on available eclosing wasps within the fig syconium, but are specifically attracted, at close range, i.e. 3 mm distance, to only that vehicle which can quickly, within a few hours, reliably transfer it to another fig. This vehicle is the female pollinating wasp. Male wasps and female parasitic wasps are inappropriate vehicles since the former are wingless and die within the fig, while the latter never enter another fig. Nematodes distinguished between female pollinating wasps and other female parasitic wasps using volatiles and cuticular hydrocarbons. Nematodes could not distinguish between cuticular hydrocarbons of male and female pollinators but used other cues, such as volatiles, at close range, to find female pollinating wasps with which they have probably had a long history of chemical adaptation. 5. This study opens up new questions and hypotheses about the evolution and maintenance of specificity in fig-wasp-nematode tritrophic interactions.

Conformational analysis of the right-hand twisted antiparallel β-Structure

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The conformational analysis of a pair of two-linked peptide units in the anti-parallel arrangement is reported here with a view to study the effect of association of one chain with the other. The pair of two-linked peptide units were fixed in space through the hydrogen bonds between them, in accordance with certain hydrogen bond criteria. Model building was undertaken to ascertain whether the proximity of the side-chains could be used to eliminate any one of the right-hand twisted, left-hand twisted or regular β-structures. Stereochemically, it was found possible with all of them. The preference for a right-hand twisted β-structure, however, was indicated by the classical energy calculations. The relevance of the results thus obtained is discussed in the context of the preferential right-hand twist of the β-pleated sheets present in globular proteins. The agreement between the minimum energy conformations obtained for the pair of two-linked peptide units and the globular protein data is also indicated.

On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.

Mining Land Cover Information Using Multilayer Perceptron and Decision Tree from MODIS Data

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Land cover (LC) changes play a major role in global as well as at regional scale patterns of the climate and biogeochemistry of the Earth system. LC information presents critical insights in understanding of Earth surface phenomena, particularly useful when obtained synoptically from remote sensing data. However, for developing countries and those with large geographical extent, regular LC mapping is prohibitive with data from commercial sensors (high cost factor) of limited spatial coverage (low temporal resolution and band swath). In this context, free MODIS data with good spectro-temporal resolution meet the purpose. LC mapping from these data has continuously evolved with advances in classification algorithms. This paper presents a comparative study of two robust data mining techniques, the multilayer perceptron (MLP) and decision tree (DT) on different products of MODIS data corresponding to Kolar district, Karnataka, India. The MODIS classified images when compared at three different spatial scales (at district level, taluk level and pixel level) shows that MLP based classification on minimum noise fraction components on MODIS 36 bands provide the most accurate LC mapping with 86% accuracy, while DT on MODIS 36 bands principal components leads to less accurate classification (69%).

An incremental data mining algorithm for compact realization of prototypes

Relevância:

20.00% 20.00%

Publicador:

Tree structure for efficient data mining using rough sets

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In data mining, an important goal is to generate an abstraction of the data. Such an abstraction helps in reducing the space and search time requirements of the overall decision making process. Further, it is important that the abstraction is generated from the data with a small number of disk scans. We propose a novel data structure, pattern count tree (PC-tree), that can be built by scanning the database only once. PC-tree is a minimal size complete representation of the data and it can be used to represent dynamic databases with the help of knowledge that is either static or changing. We show that further compactness can be achieved by constructing the PC-tree on segmented patterns. We exploit the flexibility offered by rough sets to realize a rough PC-tree and use it for efficient and effective rough classification. To be consistent with the sizes of the branches of the PC-tree, we use upper and lower approximations of feature sets in a manner different from the conventional rough set theory. We conducted experiments using the proposed classification scheme on a large-scale hand-written digit data set. We use the experimental results to establish the efficacy of the proposed approach. (C) 2002 Elsevier Science B.V. All rights reserved.

Right-Definite Multiparameter Sturm–Liouville Problems With Eigenparameter-Dependent Boundary Conditions

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study a system of ordinary differential equations linked by parameters and subject to boundary conditions depending on parameters. We assume certain definiteness conditions on the coefficient functions and on the boundary conditions that yield, in the corresponding abstract setting, a right-definite case. We give results on location of the eigenvalues and oscillation of the eigenfunctions.

«
1
2
3
»