1000 resultados para Ocean mining
Resumo:
Localization of underwater acoustic sources is a problem of great interest in the area of ocean acoustics. There exist several algorithms for source localization based on array signal processing.It is of interest to know the theoretical performance limits of these estimators. In this paper we develop expressions for the Cramer-Rao-Bound (CRB) on the variance of direction-of-arrival(DOA) and range-depth estimators of underwater acoustic sources in a shallow range-independent ocean for the case of generalized Gaussian noise. We then study the performance of some of the popular source localization techniques,through simulations, for DOA/range-depth estimation of underwater acoustic sources in shallow ocean by comparing the variance of the estimators with the corresponding CRBs.
Resumo:
Mining association rules from a large collection of databases is based on two main tasks. One is generation of large itemsets; and the other is finding associations between the discovered large itemsets. Existing formalism for association rules are based on a single transaction database which is not sufficient to describe the association rules based on multiple database environment. In this paper, we give a general characterization of association rules and also give a framework for knowledge-based mining of multiple databases for association rules.
Resumo:
Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.
Resumo:
A method, system, and computer program product for fault data correlation in a diagnostic system are provided. The method includes receiving the fault data including a plurality of faults collected over a period of time, and identifying a plurality of episodes within the fault data, where each episode includes a sequence of the faults. The method further includes calculating a frequency of the episodes within the fault data, calculating a correlation confidence of the faults relative to the episodes as a function of the frequency of the episodes, and outputting a report of the faults with the correlation confidence.
Resumo:
A system for temporal data mining includes a computer readable medium having an application configured to receive at an input module a temporal data series having events with start times and end times, a set of allowed dwelling times and a threshold frequency. The system is further configured to identify, using a candidate identification and tracking module, one or more occurrences in the temporal data series of a candidate episode and increment a count for each identified occurrence. The system is also configured to produce at an output module an output for those episodes whose count of occurrences results in a frequency exceeding the threshold frequency.
Resumo:
During summer, the northern Indian Ocean exhibits significant atmospheric intraseasonal variability associated with active and break phases of the monsoon in the 30-90 days band. In this paper, we investigate mechanisms of the Sea Surface Temperature (SST) signature of this atmospheric variability, using a combination of observational datasets and Ocean General Circulation Model sensitivity experiments. In addition to the previously-reported intraseasonal SST signature in the Bay of Bengal, observations show clear SST signals in the Arabian Sea related to the active/break cycle of the monsoon. As the atmospheric intraseasonal oscillation moves northward, SST variations appear first at the southern tip of India (day 0), then in the Somali upwelling region (day 10), northern Bay of Bengal (day 19) and finally in the Oman upwelling region (day 23). The Bay of Bengal and Oman signals are most clearly associated with the monsoon active/break index, whereas the relationship with signals near Somali upwelling and the southern tip of India is weaker. In agreement with previous studies, we find that heat flux variations drive most of the intraseasonal SST variability in the Bay of Bengal, both in our model (regression coefficient, 0.9, against similar to 0.25 for wind stress) and in observations (0.8 regression coefficient); similar to 60% of the heat flux variation is due do shortwave radiation and similar to 40% due to latent heat flux. On the other hand, both observations and model results indicate a prominent role of dynamical oceanic processes in the Arabian Sea. Wind-stress variations force about 70-100% of SST intraseasonal variations in the Arabian Sea, through modulation of oceanic processes (entrainment, mixing, Ekman pumping, lateral advection). Our similar to 100 km resolution model suggests that internal oceanic variability (i.e. eddies) contributes substantially to intraseasonal variability at small-scale in the Somali upwelling region, but does not contribute to large-scale intraseasonal SST variability due to its small spatial scale and random phase relation to the active-break monsoon cycle. The effect of oceanic eddies; however, remains to be explored at a higher spatial resolution.
Resumo:
In this study, an effort has been made to study heavy rainfall events during cyclonic storms over Indian Ocean. This estimate is based on microwave observations from tropical rainfall measuring mission (TRMM) Microwave Imager (TMI). Regional scattering index (SI) developed for Indian region based on measurements at 19-, 21- and 85-GHz brightness temperature and polarization corrected temperature (PCT) at 85 GHz have been utilized in this study. These PCT and SI are collocated against Precipitation Radar (PR) onboard TRMM to establish a relationship between rainfall rate, PCT and SI. The retrieval technique using both linear and nonlinear regressions has been developed utilizing SI, PCT and the combination of SI and PCT. The results have been compared with the observations from PR. It was found that a nonlinear algorithm using combination of SI and PCT is more accurate than linear algorithm or nonlinear algorithm using either SI or PCT. Statistical comparison with PR exhibits the correlation coefficients (CC) of 0.68, 0.66 and 0.70, and root mean square error (RMSE) of 1.78, 1.96 and 1.68 mm/h from the observations of SI, PCT and combination of SI and PCT respectively using linear regressions. When nonlinear regression is used, the CC of 0.73, 0.71, 0.79 and RMSE of 1.64, 1.95, 1.54 mm/h are observed from the observations of SI, PCT and combination of SI and PCT, respectively. The error statistics for high rain events (above 10 mm/h) shows the CC of 0.58, 0.59, 0.60 and RMSE of 5.07, 5.47, 5.03 mm/h from the observations of SI, PCT and combination of SI and PCT, respectively, using linear regression, and on the other hand, use of nonlinear regression yields the CC of 0.66, 0.64, 0.71 and RMSE of 4.68, 5.78 and 4.02 mm/h from the observations of SI, PCT and combined SI and PCT, respectively.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
We investigate the impact of the Indian Ocean Dipole (IOD) and El Nino and the Southern Oscillation (ENSO) on sea level variations in the North Indian Ocean during 1957-2008. Using tide-gauge and altimeter data, we show that IOD and ENSO leave characteristic signatures in the sea level anomalies (SLAs) in the Bay of Bengal. During a positive IOD event, negative SLAs are observed during April-December, with the SLAs decreasing continuously to a peak during September-November. During El Nino, negative SLAs are observed twice (April-December and November-July), with a relaxation between the two peaks. SLA signatures during negative IOD and La Nina events are much weaker. We use a linear, continuously stratified model of the Indian Ocean to simulate their sea level patterns of IOD and ENSO events. We then separate solutions into parts that correspond to specific processes: coastal alongshore winds, remote forcing from the equator via reflected Rossby waves, and direct forcing by interior winds within the bay. During pure IOD events, the SLAs are forced both from the equator and by direct wind forcing. During ENSO events, they are primarily equatorially forced, with only a minor contribution from direct wind forcing. Using a lead/lag covariance analysis between the Nino-3.4 SST index and Indian Ocean wind stress, we derive a composite wind field for a typical El Nino event: the resulting solution has two negative SLA peaks. The IOD and ENSO signatures are not evident off the west coast of India.
Resumo:
Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.
Resumo:
In order to meet the ever growing demand for the prediction of oceanographic parametres in the Indian Ocean for a variety of applications, the Indian National Centre for Ocean Information Services (INCOIS) has recently set-up an operational ocean forecast system, viz. the Indian Ocean Forecast System (INDOFOS). This fully automated system, based on a state-of-the-art ocean general circulation model issues six-hourly forecasts of the sea-surface temperature, surface currents and depths of the mixed layer and the thermocline up to five-days of lead time. A brief account of INDOFOS and a statistical validation of the forecasts of these parametres using in situ and remote sensing data are presented in this article. The accuracy of the sea-surface temperature forecasts by the system is high in the Bay of Bengal and the Arabian Sea, whereas it is moderate in the equatorial Indian Ocean. On the other hand, the accuracy of the depth of the thermocline and the isothermal layers and surface current forecasts are higher near the equatorial region, while it is relatively lower in the Bay of Bengal.
Resumo:
An analysis of the retrospective predictions by seven coupled ocean atmosphere models from major forecasting centres of Europe and USA, aimed at assessing their ability in predicting the interannual variation of the Indian summer monsoon rainfall (ISMR), particularly the extremes (i.e. droughts and excess rainfall seasons) is presented in this article. On the whole, the skill in prediction of extremes is not bad since most of the models are able to predict the sign of the ISMR anomaly for a majority of the extremes. There is a remarkable coherence between the models in successes and failures of the predictions, with all the models generating loud false alarms for the normal monsoon season of 1997 and the excess monsoon season of 1983. It is well known that the El Nino and Southern Oscillation (ENSO) and the Equatorial Indian Ocean Oscillation (EQUINOO) play an important role in the interannual variation of ISMR and particularly the extremes. The prediction of the phases of these modes and their link with the monsoon has also been assessed. It is found that models are able to simulate ENSO-monsoon link realistically, whereas the EQUINOO-ISMR link is simulated realistically by only one model the ECMWF model. Furthermore, it is found that in most models this link is opposite to the observed, with the predicted ISMR being negatively (instead of positively) correlated with the rainfall over the western equatorial Indian Ocean and positively (instead of negatively) correlated with the rainfall over the eastern equatorial Indian Ocean. Analysis of the seasons for which the predictions of almost all the models have large errors has suggested the facets of ENSO and EQUINOO and the links with the monsoon that need to be improved for improving monsoon predictions by these models.