818 resultados para MULTI-RELATIONAL DATA MINING


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A Data Mining model that is able to predict if a flight is going to leave late due to a weather delay. It is used, to be able to get a later connection if you have a connecting flight.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Expressed sequence tags (ESTs) are a source for microsatellite development. In the present study, EST-derived microsatelltes (EST-SSRs) were generated and characterized in the common carp (Cyprinus carpio) by data mining from updated public EST databases and by subsequent testing for polymorphism. About 5.5% (555) of 10,088 ESTs contain repeat motifs of various types and lengths with CA being the most abundant dinucleotide one. Out of the 60 EST-SSRs for which PCR primers were designed, 25 loci showed polymorphism in a common carp population with the alleles per locus ranging from 3 to 17 (mean 7). The observed (H-O) and expected (HE) heterozygosities of these EST-SSRs were 0.13-1.00 and 0.12-0.91, respectively. Six EST-SSR loci significantly deviated from the Hardy-Weinberg equilibrium (HWE) expectation, and the remaining 19 loci were in HWE. Of the 60 primer sets, the rates of polymorphic EST-SSRs were 42% in common carp, 17% in crucian carp (Carassius auratus), and 5% in silver carp (Hypophthalmichthys molitrix), respectively. These new EST-SSR markers would provide sufficient polymorphism for population genetic studies and genome mapping of the common carp and its closely related fishes. (c) 2007 Published by Elsevier B.V.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

本文介绍了一种用于载人潜水器的导航传感器的数据采集及信息融合技术。航行控制计算机通过基于工业以太网的数据通信系统对各传感器进行数据采集,采用卡尔曼滤波器完成对各传感器数据信息的融合,以便提高数据的精度和控制系统的性能,并将结果送给监控计算机,用于载人潜水器的姿态显示。

Relevância:

100.00% 100.00%

Publicador:

Resumo:

King, R. D. and Wise, P. H. and Clare, A. (2004) Confirmation of Data Mining Based Predictions of Protein Function. Bioinformatics 20(7), 1110-1118

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fusion ARTMAP is a self-organizing neural network architecture for multi-channel, or multi-sensor, data fusion. Single-channel Fusion ARTMAP is functionally equivalent to Fuzzy ART during unsupervised learning and to Fuzzy ARTMAP during supervised learning. The network has a symmetric organization such that each channel can be dynamically configured to serve as either a data input or a teaching input to the system. An ART module forms a compressed recognition code within each channel. These codes, in turn, become inputs to a single ART system that organizes the global recognition code. When a predictive error occurs, a process called paraellel match tracking simultaneously raises vigilances in multiple ART modules until reset is triggered in one of them. Parallel match tracking hereby resets only that portion of the recognition code with the poorest match, or minimum predictive confidence. This internally controlled selective reset process is a type of credit assignment that creates a parsimoniously connected learned network. Fusion ARTMAP's multi-channel coding is illustrated by simulations of the Quadruped Mammal database.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Time-series and sequences are important patterns in data mining. Based on an ontology of time-elements, this paper presents a formal characterization of time-series and state-sequences, where a state denotes a collection of data whose validation is dependent on time. While a time-series is formalized as a vector of time-elements temporally ordered one after another, a state-sequence is denoted as a list of states correspondingly ordered by a time-series. In general, a time-series and a state-sequence can be incomplete in various ways. This leads to the distinction between complete and incomplete time-series, and between complete and incomplete state-sequences, which allows the expression of both absolute and relative temporal knowledge in data mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As part of a wider project on European climate change over the past 4500 years, a 4.5-m peat core was taken from a lawn microform on Mannikjarve bog, Estonia. Several methods were used to yield proxy-climate data: (i) a quadrat and leaf-count method for plant macrofossil data, (ii) testate amoebae analysis, and (iii) colorimetric determination of peat humification. These data are provided with an exceptionally high resolution and precise chronology. Changes in bog surface wetness were inferred using Detrended Correspondence Analysis (DCA) and zonation of macrofossil data, particularly concerning the occurrence of Sphagnum balticum, and a transfer function for water-table depth for testate amoebae data. Based on the results, periods of high bog surface wetness appear to have occurred at c. 3100, 3010-2990, 2300, 1750-1610, 1510, 14 10, 1110, 540 and 3 10 cal. yr BP, during four longer periods between c. 3170 and 2850 cal. yr BP, 2450 and 2000 cal. yr BP, 1770 and 1530 cal. yr BP and in the period from 880 cal. yr BP until the present. In the period between 1770 and 1530 cal. yr BP. the extension or initiation of a hollow microtope occurred, which corresponds with other research results from Mannikjarve bog. This and other changes towards increasing bog surface wetness may be the responses to colder temperatures and the predominance of a more continental climate in the region, which favoured the development of bog microdepressions and a complex bog microtopography. Located in the border zone of oceanic and continental climatic sectors, in an area almost without land uplift, this study site may provide valuable information about changes in palaeohydrological and palaeoclimatological conditions in the northern parts of the eastern Baltic Sea region.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, data mining has emerged as one of the most dynamic and lively areas in information technology. Although many algorithms and techniques for data mining have been proposed, they either focus on domain independent techniques or on very specific domain problems. A general requirement in bridging the gap between academia and business is to cater to general domain-related issues surrounding real-life applications, such as constraints, organizational factors, domain expert knowledge, domain adaption, and operational knowledge. Unfortunately, these either have not been addressed, or have not been sufficiently addressed, in current data mining research and development.Domain-Driven Data Mining (D3M) aims to develop general principles, methodologies, and techniques for modeling and merging comprehensive domain-related factors and synthesized ubiquitous intelligence surrounding problem domains with the data mining process, and discovering knowledge to support business decision-making. This paper aims to report original, cutting-edge, and state-of-the-art progress in D3M. It covers theoretical and applied contributions aiming to: 1) propose next-generation data mining frameworks and processes for actionable knowledge discovery, 2) investigate effective (automated, human and machine-centered and/or human-machined-co-operated) principles and approaches for acquiring, representing, modelling, and engaging ubiquitous intelligence in real-world data mining, and 3) develop workable and operational systems balancing technical significance and applications concerns, and converting and delivering actionable knowledge into operational applications rules to seamlessly engage application processes and systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Results. Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Conclusions. Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.