980 resultados para Geolocation databases


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modeling probabilistic data is one of important issues in databases due to the fact that data is often uncertainty in real-world applications. So, it is necessary to identify potentially useful patterns in probabilistic databases. Because probabilistic data in 1NF relations is redundant, previous mining techniques don’t work well on probabilistic databases. For this reason, this paper proposes a new model for mining probabilistic databases. A partition is thus developed for preprocessing probabilistic data in a probabilistic databases. We evaluated the proposed technique, and the experimental results demonstrate that our approach is effective and efficient.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Current data mining techniques may not be helpful for mining some companies/organizations such as nuclear power plants and earthquake bureaus, which have only small databases. Apparently, these companies/organizations also expect to apply data mining techniques to extract useful patterns in their databases so as to make their decisions. However, data in these databases such as the accident database of a nuclear power plant and the earthquake database in an earthquake bureau, may not be large enough to form any patterns. To meet the applications, we present a new mining model in this paper, which is based on the collecting knowledge from such as Web, journals, and newspapers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid growth of biological databases not only provides biologists with abundant data but also presents a big challenge in relation to the analysis of data. Many data analysis approaches such as data mining, information retrieval and machine learning have been used to extract frequent patterns from diverse biological databases. However, the discrepancies, due to the differences in the structure of databases and their terminologies, result in a significant lack of interoperability. Although ontology-based approaches have been used to integrate biological databases, the inconsistent analysis of biological databases has been greatly disregarded. This paper presents a method by which to measure the degree of inconsistency between biological databases. It not only presents a guideline for correct and efficient database integration, but also exposes high quality data for data mining and knowledge discovery.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid growth of life science databases demands the fusion of knowledge from heterogeneous databases to answer complex biological questions. The discrepancies in nomenclature, various schemas and incompatible formats of biological databases, however, result in a significant lack of interoperability among databases. Therefore, data preparation is a key prerequisite for biological database mining. Integrating diverse biological molecular databases is an essential action to cope with the heterogeneity of biological databases and guarantee efficient data mining. However, the inconsistency in biological databases is a key issue for data integration. This paper proposes a framework to detect the inconsistency in biological databases using ontologies. A numeric estimate is provided to measure the inconsistency and identify those biological databases that are appropriate for further mining applications. This aids in enhancing the quality of databases and guaranteeing accurate and efficient mining of biological databases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Multi-databases mining is an urgent task. This thesis solves 4 key problems in multi-databases mining: Application-independent database classification - Local instance analysis model - Useful pattern discovery - Pattern synthesis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis aims to analyse the needs of museums in terms of computer databases, examine the ways in which these databases can assist with cataloguing and museum operations in general, and survey current database programs available. The Jewish Museum of Australia is used as a pilot study to practically apply the issues discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data perturbation is a popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing work: firstly, the concept of independent granule is introduced, and granule inference is used to distinguish between non-independent itemsets and independent itemsets. We further prove that the support counts of non-independent itemsets can be directly derived from subitemsets, so that the error-prone reconstruction process can be avoided. This could improve the efficiency of the algorithm, and bring more accurate results; secondly, through the granular-bitmap representation, the support counts can be calculated in an efficient way. The empirical results on representative synthetic and real-world databases indicate that the proposed GrC-FIM algorithm outperforms the popular EMASK algorithm in both the efficiency and the support count reconstruction accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Databases of mutations causing Mendelian disease play a crucial role in research, diagnostic and genetic health care and can play a role in life and death decisions. These databases are thus heavily used, but only gene or locus specific databases have been previously reviewed for completeness, accuracy, currency and utility. We have performed a review of the various general mutation databases that derive their data from the published literature and locus specific databases. Only two—the Human Gene Mutation Database (HGMD) and Online Mendelian Inheritance in Man (OMIM)—had useful numbers of mutations. Comparison of a number of characteristics of these databases indicated substantial inconsistencies between the two databases that included absent genes and missing mutations. This situation strengthens the case for gene specific curation of mutations and the need for an overall plan for collection, curation, storage and release of mutation data.

Relevância:

20.00% 20.00%

Publicador: