818 resultados para MULTI-RELATIONAL DATA MINING


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fusion of multi-sensor imaging data enables a synergetic interpretation of complementary information obtained by sensors of different spectral ranges. Multi-sensor data of diverse spectral, spatial and temporal resolutions require advanced numerical techniques for analysis and interpretation. This paper reviews ten advanced pixel based image fusion techniques – Component substitution (COS), Local mean and variance matching, Modified IHS (Intensity Hue Saturation), Fast Fourier Transformed-enhanced IHS, Laplacian Pyramid, Local regression, Smoothing filter (SF), Sparkle, SVHC and Synthetic Variable Ratio. The above techniques were tested on IKONOS data (Panchromatic band at 1 m spatial resolution and Multispectral 4 bands at 4 m spatial resolution). Evaluation of the fused results through various accuracy measures, revealed that SF and COS methods produce images closest to corresponding multi-sensor would observe at the highest resolution level (1 m).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over the past decade, many powerful data mining techniques have been developed to analyze temporal and sequential data. The time is now fertile for addressing problems of larger scope under the purview of temporal data mining. The fourth SIGKDD workshop on temporal data mining focused on the question: What can we infer about the structure of a complex dynamical system from observed temporal data? The goals of the workshop were to critically evaluate the need in this area by bringing together leading researchers from industry and academia, and to identify promising technologies and methodologies for doing the same. We provide a brief summary of the workshop proceedings and ideas arising out of the discussions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Facet-based sentiment analysis involves discovering the latent facets, sentiments and their associations. Traditional facet-based sentiment analysis algorithms typically perform the various tasks in sequence, and fail to take advantage of the mutual reinforcement of the tasks. Additionally,inferring sentiment levels typically requires domain knowledge or human intervention. In this paper, we propose aseries of probabilistic models that jointly discover latent facets and sentiment topics, and also order the sentiment topics with respect to a multi-point scale, in a language and domain independent manner. This is achieved by simultaneously capturing both short-range syntactic structure and long range semantic dependencies between the sentiment and facet words. The models further incorporate coherence in reviews, where reviewers dwell on one facet or sentiment level before moving on, for more accurate facet and sentiment discovery. For reviews which are supplemented with ratings, our models automatically order the latent sentiment topics, without requiring seed-words or domain-knowledge. To the best of our knowledge, our work is the first attempt to combine the notions of syntactic and semantic dependencies in the domain of review mining. Further, the concept of facet and sentiment coherence has not been explored earlier either. Extensive experimental results on real world review data show that the proposed models outperform various state of the art baselines for facet-based sentiment analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aerosol absorption is poorly quantified because of the lack of adequate measurements. It has been shown that the Ozone Monitoring Instrument (OMI) aboard EOS-Aura and the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard EOS-Aqua, which fly in formation as part of the A-train, provide an excellent opportunity to improve the accuracy of aerosol retrievals. Here, we follow a multi-satellite approach to estimate the regional distribution of aerosol absorption over continental India for the first time. Annually and regionally averaged aerosol single-scattering albedo over the Indian landmass is estimated as 0.94 +/- 0.03. Our study demonstrates the potential of multi-satellite data analysis to improve the accuracy of retrieval of aerosol absorption over land.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of classification of time series data is an interesting problem in the field of data mining. Even though several algorithms have been proposed for the problem of time series classification we have developed an innovative algorithm which is computationally fast and accurate in several cases when compared with 1NN classifier. In our method we are calculating the fuzzy membership of each test pattern to be classified to each class. We have experimented with 6 benchmark datasets and compared our method with 1NN classifier.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is an investigation into the nature of data analysis and computer software systems which support this activity.

The first chapter develops the notion of data analysis as an experimental science which has two major components: data-gathering and theory-building. The basic role of language in determining the meaningfulness of theory is stressed, and the informativeness of a language and data base pair is studied. The static and dynamic aspects of data analysis are then considered from this conceptual vantage point. The second chapter surveys the available types of computer systems which may be useful for data analysis. Particular attention is paid to the questions raised in the first chapter about the language restrictions imposed by the computer system and its dynamic properties.

The third chapter discusses the REL data analysis system, which was designed to satisfy the needs of the data analyzer in an operational relational data system. The major limitation on the use of such systems is the amount of access to data stored on a relatively slow secondary memory. This problem of the paging of data is investigated and two classes of data structure representations are found, each of which has desirable paging characteristics for certain types of queries. One representation is used by most of the generalized data base management systems in existence today, but the other is clearly preferred in the data analysis environment, as conceptualized in Chapter I.

This data representation has strong implications for a fundamental process of data analysis -- the quantification of variables. Since quantification is one of the few means of summarizing and abstracting, data analysis systems are under strong pressure to facilitate the process. Two implementations of quantification are studied: one analagous to the form of the lower predicate calculus and another more closely attuned to the data representation. A comparison of these indicates that the use of the "label class" method results in orders of magnitude improvement over the lower predicate calculus technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we constructed a Iris recognition algorithm based on point covering of high-dimensional space and Multi-weighted neuron of point covering of high-dimensional space, and proposed a new method for iris recognition based on point covering theory of high-dimensional space. In this method, irises are trained as "cognition" one class by one class, and it doesn't influence the original recognition knowledge for samples of the new added class. The results of experiments show the rejection rate is 98.9%, the correct cognition rate and the error rate are 95.71% and 3.5% respectively. The experimental results demonstrate that the rejection rate of test samples excluded in the training samples class is very high. It proves the proposed method for iris recognition is effective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

National Key Basic Research and Development Program of China [2006CB701305]; State Key Laboratory of Resource and Environment Information System [088RA400SA]; Chinese Academy of Sciences

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On the issue of geological hazard evaluation(GHE), taking remote sensing and GIS systems as experimental environment, assisting with some programming development, this thesis combines multi-knowledges of geo-hazard mechanism, statistic learning, remote sensing (RS), high-spectral recognition, spatial analysis, digital photogrammetry as well as mineralogy, and selects geo-hazard samples from Hong Kong and Three Parallel River region as experimental data, to study two kinds of core questions of GHE, geo-hazard information acquiring and evaluation model. In the aspect of landslide information acquiring by RS, three detailed topics are presented, image enhance for visual interpretation, automatic recognition of landslide as well as quantitative mineral mapping. As to the evaluation model, the latest and powerful data mining method, support vector machine (SVM), is introduced to GHE field, and a serious of comparing experiments are carried out to verify its feasibility and efficiency. Furthermore, this paper proposes a method to forecast the distribution of landslides if rainfall in future is known baseing on historical rainfall and corresponding landslide susceptibility map. The details are as following: (a) Remote sensing image enhancing methods for geo-hazard visual interpretation. The effect of visual interpretation is determined by RS data and image enhancing method, for which the most effective and regular technique is image merge between high-spatial image and multi-spectral image, but there are few researches concerning the merging methods of geo-hazard recognition. By the comparing experimental of six mainstream merging methods and combination of different remote sensing data source, this thesis presents merits of each method ,and qualitatively analyzes the effect of spatial resolution, spectral resolution and time phase on merging image. (b) Automatic recognition of shallow landslide by RS image. The inventory of landslide is the base of landslide forecast and landslide study. If persistent collecting of landslide events, updating the geo-hazard inventory in time, and promoting prediction model incessantly, the accuracy of forecast would be boosted step by step. RS technique is a feasible method to obtain landslide information, which is determined by the feature of geo-hazard distribution. An automatic hierarchical approach is proposed to identify shallow landslides in vegetable region by the combination of multi-spectral RS imagery and DEM derivatives, and the experiment is also drilled to inspect its efficiency. (c) Hazard-causing factors obtaining. Accurate environmental factors are the key to analyze and predict the risk of regional geological hazard. As to predict huge debris flow, the main challenge is still to determine the startup material and its volume in debris flow source region. Exerting the merits of various RS technique, this thesis presents the methods to obtain two important hazard-causing factors, DEM and alteration mineral, and through spatial analysis, finds the relationship between hydrothermal clay alteration minerals and geo-hazards in the arid-hot valleys of Three Parallel Rivers region. (d) Applying support vector machine (SVM) to landslide susceptibility mapping. Introduce the latest and powerful statistical learning theory, SVM, to RGHE. SVM that proved an efficient statistic learning method can deal with two-class and one-class samples, with feature avoiding produce ‘pseudo’ samples. 55 years historical samples in a natural terrain of Hong Kong are used to assess this method, whose susceptibility maps obtained by one-class SVM and two-class SVM are compared to that obtained by logistic regression method. It can conclude that two-class SVM possesses better prediction efficiency than logistic regression and one-class SVM. However, one-class SVM, only requires failed cases, has an advantage over the other two methods as only "failed" case information is usually available in landslide susceptibility mapping. (e) Predicting the distribution of rainfall-induced landslides by time-series analysis. Rainfall is the most dominating factor to bring in landslides. More than 90% losing and casualty by landslides is introduced by rainfall, so predicting landslide sites under certain rainfall is an important geological evaluating issue. With full considering the contribution of stable factors (landslide susceptibility map) and dynamic factors (rainfall), the time-series linear regression analysis between rainfall and landslide risk mapis presented, and experiments based on true samples prove that this method is perfect in natural region of Hong Kong. The following 4 practicable or original findings are obtained: 1) The RS ways to enhance geo-hazards image, automatic recognize shallow landslides, obtain DEM and mineral are studied, and the detailed operating steps are given through examples. The conclusion is practical strongly. 2) The explorative researching about relationship between geo-hazards and alteration mineral in arid-hot valley of Jinshajiang river is presented. Based on standard USGS mineral spectrum, the distribution of hydrothermal alteration mineral is mapped by SAM method. Through statistic analysis between debris flows and hazard-causing factors, the strong correlation between debris flows and clay minerals is found and validated. 3) Applying SVM theory (especially one-class SVM theory) to the landslide susceptibility mapping and system evaluation for its performance is also carried out, which proves that advantages of SVM in this field. 4) Establishing time-serial prediction method for rainfall induced landslide distribution. In a natural study area, the distribution of landslides induced by a storm is predicted successfully under a real maximum 24h rainfall based on the regression between 4 historical storms and corresponding landslides.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ferr?, S. and King, R. D. (2004) A dichotomic search algorithm for mining and learning in domain-specific logics. Fundamenta Informaticae. IOS Press. To appear

Relevância:

100.00% 100.00%

Publicador:

Resumo:

R. Jensen, Q. Shen, Data Reduction with Rough Sets, In: Encyclopedia of Data Warehousing and Mining - 2nd Edition, Vol. II, 2008.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND:In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.RESULTS:We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.CONCLUSION:A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mapping novel terrain from sparse, complex data often requires the resolution of conflicting information from sensors working at different times, locations, and scales, and from experts with different goals and situations. Information fusion methods help resolve inconsistencies in order to distinguish correct from incorrect answers, as when evidence variously suggests that an object's class is car, truck, or airplane. The methods developed here consider a complementary problem, supposing that information from sensors and experts is reliable though inconsistent, as when evidence suggests that an objects class is car, vehicle, or man-made. Underlying relationships among objects are assumed to be unknown to the automated system of the human user. The ARTMAP information fusion system uses distributed code representations that exploit the neural network's capacity for one-to-many learning in order to produce self-organizing expert systems that discover hierarchial knowledge structures. The system infers multi-level relationships among groups of output classes, without any supervised labeling of these relationships. The procedure is illustrated with two image examples.