62 resultados para Data Streams Distribution
em Indian Institute of Science - Bangalore - Índia
Resumo:
With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window by multiple transactions at a time. Therefore, we propose a novel approach for mining CFIpsilas cumulatively by making sliding width(ges1) over high speed data streams. However, it is nontrivial to mine CFIpsilas cumulatively over stream, because such growth may lead to the generation of exponential number of candidates for closure checking. In this study, we develop an efficient algorithm, stream-close, for mining CFIpsilas over stream by exploring some interesting properties. Our performance study reveals that stream-close achieves good scalability and has promising results.
Resumo:
Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.
Resumo:
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.
Resumo:
In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Since streaming data keeps coming continuously as an ordered sequence, massive amounts of data is created. A big challenge in handling data streams is the limitation of time and space. Prototype selection on streaming data requires the prototypes to be updated in an incremental manner as new data comes in. We propose an incremental algorithm for prototype selection. This algorithm can also be used to handle very large datasets. Results have been presented on a number of large datasets and our method is compared to an existing algorithm for streaming data. Our algorithm saves time and the prototypes selected gives good classification accuracy.
Resumo:
In this paper, we consider the inference for the component and system lifetime distribution of a k-unit parallel system with independent components based on system data. The components are assumed to have identical Weibull distribution. We obtain the maximum likelihood estimates of the unknown parameters based on system data. The Fisher information matrix has been derived. We propose -expectation tolerance interval and -content -level tolerance interval for the life distribution of the system. Performance of the estimators and tolerance intervals is investigated via simulation study. A simulated dataset is analyzed for illustration.
Resumo:
Using remotely sensed Tropical Rainfall Measuring Mission (TRMM) 3B42 rainfall and topographic data from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Digital Elevation Model (DEM), the impact of oroghraphical aspects such as topography, spatial variability of elevation and altitude of apexes are examined to investigate capacious summer monsoon rainfall over the Western Ghats (WG) of India. TRMM 3B42 v7 rainfall data is validated with Indian Meteorological Department (IMD) gridded rainfall data at 0.5 degrees resolution over the WG. The analysis of spatial pattern of monsoon rainfall with orography of the WG ascertains that the grade of orographic precipitation depends mainly on topography of the mountain barrier followed by steepness of windward side slope and altitude of the mountain. Longer and broader, i.e. cascaded topography, elevated summits and gradually increasing slopes impel the enhancement in precipitation. Comparing topography of various states of the WG, it has been observed that windward side of Karnataka receives intense rainfall in the WG during summer monsoon. It has been observed that the rainfall is enhanced before the peak of the mountain and confined up to the height about 800m over the WG. In addition to this, the spatial distribution of heavy and very heavy rainfall events in the last 14 years has also been explored. Heavy and very heavy rain events on this hilly terrain are categorized with a threshold of precipitation (R) in the range 150>R>120mmday(-1) and exceeding 150mmday(-1) using probability distribution of TRMM 3B42 v7 rainfall. The areas which are prone to heavy precipitation are identified. The study would help policy makers to manage the hazard scenario and, to improve weather predictions on mountainous terrain of the WG.
Resumo:
Aerosol black carbon (BC) mass concentrations ([BC]), measured continuously during a multi-platform field experiment, Integrated Campaign for Aerosols gases and Radiation Budget (ICARB, March-May 2006), from a network of eight observatories spread over geographically distinct environments of India, (which included five mainland stations, one highland station, and two island stations (one each ill Arabian Sea and Bay of Bengal)) are examined for their spatio-temporal characteristics. During the period of study, [BC] showed large variations across the country, with values ranging from 27 mu g m(3) over industrial/urban locations to as low as 0.065 mu g m(-3) over the Arabian Sea. For all mainland stations, [BC] remained high compared to highland as well as island stations. Among the island stations, Port Blair (PBR) had higher concentration of BC, compared to Minicoy (MCY), implying more absorbing nature of Bay of Bengal aerosols than Arabian Sea. The highland station Nainital (NTL), in the central Himalayas, showed low values of [BC], comparable or even lower than that of the island station PBR, indicating the prevalence of cleaner environment over there. An examination of the changes in the mean temporal features, as the season advances from winter (December-February) to pre-monsoon (March-May), revealed that: (a) Diurnal variations were pronounced over all the mainland stations, with all afternoon low and a nighttime high: (b) At the islands, the diurnal variations, though resembled those over the mainlands, were less pronounced; and (c) In contrast to this, highland station showed an opposite pattern with an afternoon high and a late night or early morning low. The diurnal variations at all stations are mainly caused by the dynamics of local Atmospheric Boundary Layer (ABL), At the entire mainland as well as island stations (except HYD and DEL), [BC] showed a decreasing trend from January to May, This is attributed to the increased convective mixing and to the resulting enhanced vertical dispersal of species in the ABL. In addition, large short-period modulations were observed at DEL and HYD, which appeared to be episodic, An examination of this in the light of the MODIS-derived fire count data over India along with the back-trajectory analysis revealed that advection of BC from extensive forest fires and biomass-burning regions upwind were largely responsible for this episodic enhancement in BC at HYD and DEL.
Resumo:
We have carried out an analysis of crystal structure data on prolyl and hydroxyprolyl moieties in small molecules. The flexibility of the pyrrolidine ring due to the pyramidal character of nitrogen has been defined in terms of two projection angles δ1 and δ2. The distribution of these parameters in the crystal structures is found to be consistent with results of the energy calculations carried out on prolyl moieties in our laboratory.
Resumo:
A recent theoretical model developed by Imparato et al. Phys of the experimentally measured heat and work effects produced by the thermal fluctuations of single micron-sized polystyrene beads in stationary and moving optical traps has proved to be quite successful in rationalizing the observed experimental data. The model, based on the overdamped Brownian dynamics of a particle in a harmonic potential that moves at a constant speed under a time-dependent force, is used to obtain an approximate expression for the distribution of the heat dissipated by the particle at long times. In this paper, we generalize the above model to consider particle dynamics in the presence of colored noise, without passing to the overdamped limit, as a way of modeling experimental situations in which the fluctuations of the medium exhibit long-lived temporal correlations, of the kind characteristic of polymeric solutions, for instance, or of similar viscoelastic fluids. Although we have not been able to find an expression for the heat distribution itself, we do obtain exact expressions for its mean and variance, both for the static and for the moving trap cases. These moments are valid for arbitrary times and they also hold in the inertial regime, but they reduce exactly to the results of Imparato et al. in appropriate limits. DOI: 10.1103/PhysRevE.80.011118 PACS.
Resumo:
The information on altitude distribution of aerosols in the atmosphere is essential in assessing the impact of aerosol warming on thermal structure and stability of the atmosphere.In addition, aerosol altitude distribution is needed to address complex problems such as the radiative interaction of aerosols in the presence of clouds. With this objective,an extensive, multi-institutional and multi-platform field experiment (ICARB-Integrated Campaign for Aerosols, gases and Radiation Budget) was carried out under the Geosphere Biosphere Programme of the Indian Space Research Organization (ISRO-GBP) over continental India and adjoining oceans during March to May 2006. Here, we present airborne LIDAR measurements carried out over the east Coast of the India during the ICARB field campaign. An increase in aerosol extinction (scattering + absorption) was observed from the surface upwards with a maximum around 2 to 4 km. Aerosol extinction at higher atmospheric layers (>2 km) was two to three times larger compared to that of the surface. A large fraction (75-85%) of aerosol column optical depth was contributed by aerosols located above 1 km. The aerosol layer heights (defined in this paper as the height at which the gradient in extinction coefficient changes sign) showed a gradual decrease with an increase in the offshore distance. A large fraction (60-75%) of aerosol was found located above clouds indicating enhanced aerosol absorption above clouds. Our study implies that a detailed statistical evaluation of the temporal frequency and spatial extent of elevated aerosol layers is necessary to assess their significance to the climate. This is feasible using data from space-borne lidars such as CALIPSO,which fly in formation with other satellites like MODIS AQUA and MISR, as part of the A-Train constellation.
Resumo:
The photopolymerization of methyl,ethyl,butyl, and hexyl methacrylates in solution was studied. The effect of initial initiator and monomer concentrations on the time evolution of polymer concentration (M) over bar (n) and PDI was examined. The reversible chain addition and beta-scission, and primary radical termination steps were included in the mechanism along with the classical steps. The rate equations were derived using continuous distribution kinetics and solved numerically to fit the experimental data. The regressed rate coefficients compared well with the literature data. The model predicted the instantaneous increase in (M) over bar (n) and PDI to steady state values. The rate coefficients exhibited a linear increase with the size of alkyl chain of the alkyl methacrylates.
Resumo:
Volumetric method based adsorption measurements of nitrogen on two specimens of activated carbon (Fluka and Sarabhai) reported by us are refitted to two popular isotherms, namely, Dubunin−Astakhov (D−A) and Toth, in light of improved fitting methods derived recently. Those isotherms have been used to derive other data of relevance in design of engineering equipment such as the concentration dependence of heat of adsorption and Henry’s law coefficients. The present fits provide a better representation of experimental measurements than before because the temperature dependence of adsorbed phase volume and structural heterogeneity of micropore distribution have been accounted for in the D−A equation. A new correlation to the Toth equation is a further contribution. The heat of adsorption in the limiting uptake condition is correlated with the Henry’s law coefficients at the near zero uptake condition.
Resumo:
The problem of scheduling divisible loads in distributed computing systems, in presence of processor release time is considered. The objective is to find the optimal sequence of load distribution and the optimal load fractions assigned to each processor in the system such that the processing time of the entire processing load is a minimum. This is a difficult combinatorial optimization problem and hence genetic algorithms approach is presented for its solution.
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.