13 resultados para MULTI-RELATIONAL DATA MINING
Peat multi-proxy data from Mannikjarve bog as indicators of late Holocene climate changes in Estonia
Resumo:
As part of a wider project on European climate change over the past 4500 years, a 4.5-m peat core was taken from a lawn microform on Mannikjarve bog, Estonia. Several methods were used to yield proxy-climate data: (i) a quadrat and leaf-count method for plant macrofossil data, (ii) testate amoebae analysis, and (iii) colorimetric determination of peat humification. These data are provided with an exceptionally high resolution and precise chronology. Changes in bog surface wetness were inferred using Detrended Correspondence Analysis (DCA) and zonation of macrofossil data, particularly concerning the occurrence of Sphagnum balticum, and a transfer function for water-table depth for testate amoebae data. Based on the results, periods of high bog surface wetness appear to have occurred at c. 3100, 3010-2990, 2300, 1750-1610, 1510, 14 10, 1110, 540 and 3 10 cal. yr BP, during four longer periods between c. 3170 and 2850 cal. yr BP, 2450 and 2000 cal. yr BP, 1770 and 1530 cal. yr BP and in the period from 880 cal. yr BP until the present. In the period between 1770 and 1530 cal. yr BP. the extension or initiation of a hollow microtope occurred, which corresponds with other research results from Mannikjarve bog. This and other changes towards increasing bog surface wetness may be the responses to colder temperatures and the predominance of a more continental climate in the region, which favoured the development of bog microdepressions and a complex bog microtopography. Located in the border zone of oceanic and continental climatic sectors, in an area almost without land uplift, this study site may provide valuable information about changes in palaeohydrological and palaeoclimatological conditions in the northern parts of the eastern Baltic Sea region.
Resumo:
In the last decade, data mining has emerged as one of the most dynamic and lively areas in information technology. Although many algorithms and techniques for data mining have been proposed, they either focus on domain independent techniques or on very specific domain problems. A general requirement in bridging the gap between academia and business is to cater to general domain-related issues surrounding real-life applications, such as constraints, organizational factors, domain expert knowledge, domain adaption, and operational knowledge. Unfortunately, these either have not been addressed, or have not been sufficiently addressed, in current data mining research and development.Domain-Driven Data Mining (D3M) aims to develop general principles, methodologies, and techniques for modeling and merging comprehensive domain-related factors and synthesized ubiquitous intelligence surrounding problem domains with the data mining process, and discovering knowledge to support business decision-making. This paper aims to report original, cutting-edge, and state-of-the-art progress in D3M. It covers theoretical and applied contributions aiming to: 1) propose next-generation data mining frameworks and processes for actionable knowledge discovery, 2) investigate effective (automated, human and machine-centered and/or human-machined-co-operated) principles and approaches for acquiring, representing, modelling, and engaging ubiquitous intelligence in real-world data mining, and 3) develop workable and operational systems balancing technical significance and applications concerns, and converting and delivering actionable knowledge into operational applications rules to seamlessly engage application processes and systems.
Resumo:
Background. The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Results. Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Conclusions. Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.
Resumo:
Unmanned surface vehicles (USVs) are able to accomplish difficult and challenging tasks both in civilian and defence sectors without endangering human lives. Their ability to work round the clock makes them well-suited for matters that demand immediate attention. These issues include but not limited to mines countermeasures, measuring the extent of an oil spill and locating the source of a chemical discharge. A number of USV programmes have emerged in the last decade for a variety of aforementioned purposes. Springer USV is one such research project highlighted in this paper. The intention herein is to report results emanating from data acquired from experiments on the Springer vessel whilst testing its advanced navigation, guidance and control (NGC) subsystems. The algorithms developed for these systems are based on soft-computing methodologies. A novel form of data fusion navigation algorithm has been developed and integrated with a modified optimal controller. Experimental results are presented and analysed for various scenarios including single and multiple waypoints tracking and fixed and time-varying reference bearings. It is demonstrated that the proposed NGC system provides promising results despite the presence of modelling uncertainty and external disturbances.
Resumo:
We conducted data-mining analyses of genome wide association (GWA) studies of the CATIE and MGS-GAIN datasets, and found 13 markers in the two physically linked genes, PTPN21 and EML5, showing nominally significant association with schizophrenia. Linkage disequilibrium (LD) analysis indicated that all 7 markers from PTPN21 shared high LD (r(2)>0.8), including rs2274736 and rs2401751, the two non-synonymous markers with the most significant association signals (rs2401751, P=1.10 × 10(-3) and rs2274736, P=1.21 × 10(-3)). In a meta-analysis of all 13 replication datasets with a total of 13,940 subjects, we found that the two non-synonymous markers are significantly associated with schizophrenia (rs2274736, OR=0.92, 95% CI: 0.86-0.97, P=5.45 × 10(-3) and rs2401751, OR=0.92, 95% CI: 0.86-0.97, P=5.29 × 10(-3)). One SNP (rs7147796) in EML5 is also significantly associated with the disease (OR=1.08, 95% CI: 1.02-1.14, P=6.43 × 10(-3)). These 3 markers remain significant after Bonferroni correction. Furthermore, haplotype conditioned analyses indicated that the association signals observed between rs2274736/rs2401751 and rs7147796 are statistically independent. Given the results that 2 non-synonymous markers in PTPN21 are associated with schizophrenia, further investigation of this locus is warranted.
Resumo:
Association rule mining is an indispensable tool for discovering
insights from large databases and data warehouses.
The data in a warehouse being multi-dimensional, it is often
useful to mine rules over subsets of data defined by selections
over the dimensions. Such interactive rule mining
over multi-dimensional query windows is difficult since rule
mining is computationally expensive. Current methods using
pre-computation of frequent itemsets require counting
of some itemsets by revisiting the transaction database at
query time, which is very expensive. We develop a method
(RMW) that identifies the minimal set of itemsets to compute
and store for each cell, so that rule mining over any
query window may be performed without going back to the
transaction database. We give formal proofs that the set of
itemsets chosen by RMW is sufficient to answer any query
and also prove that it is the optimal set to be computed
for 1 dimensional queries. We demonstrate through an extensive
empirical evaluation that RMW achieves extremely
fast query response time compared to existing methods, with
only moderate overhead in pre-computation and storage
Resumo:
Achieving a clearer picture of categorial distinctions in the brain is essential for our understanding of the conceptual lexicon, but much more fine-grained investigations are required in order for this evidence to contribute to lexical research. Here we present a collection of advanced data-mining techniques that allows the category of individual concepts to be decoded from single trials of EEG data. Neural activity was recorded while participants silently named images of mammals and tools, and category could be detected in single trials with an accuracy well above chance, both when considering data from single participants, and when group-training across participants. By aggregating across all trials, single concepts could be correctly assigned to their category with an accuracy of 98%. The pattern of classifications made by the algorithm confirmed that the neural patterns identified are due to conceptual category, and not any of a series of processing-related confounds. The time intervals, frequency bands and scalp locations that proved most informative for prediction permit physiological interpretation: the widespread activation shortly after appearance of the stimulus (from 100. ms) is consistent both with accounts of multi-pass processing, and distributed representations of categories. These methods provide an alternative to fMRI for fine-grained, large-scale investigations of the conceptual lexicon. © 2010 Elsevier Inc.
Resumo:
The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as – weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases – i) discovering homogeneous regions, and ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.
Resumo:
The problem of detecting spatially-coherent groups of data that exhibit anomalous behavior has started to attract attention due to applications across areas such as epidemic analysis and weather forecasting. Earlier efforts from the data mining community have largely focused on finding outliers, individual data objects that display deviant behavior. Such point-based methods are not easy to extend to find groups of data that exhibit anomalous behavior. Scan Statistics are methods from the statistics community that have considered the problem of identifying regions where data objects exhibit a behavior that is atypical of the general dataset. The spatial scan statistic and methods that build upon it mostly adopt the framework of defining a character for regions (e.g., circular or elliptical) of objects and repeatedly sampling regions of such character followed by applying a statistical test for anomaly detection. In the past decade, there have been efforts from the statistics community to enhance efficiency of scan statstics as well as to enable discovery of arbitrarily shaped anomalous regions. On the other hand, the data mining community has started to look at determining anomalous regions that have behavior divergent from their neighborhood.In this chapter,we survey the space of techniques for detecting anomalous regions on spatial data from across the data mining and statistics communities while outlining connections to well-studied problems in clustering and image segmentation. We analyze the techniques systematically by categorizing them appropriately to provide a structured birds eye view of the work on anomalous region detection;we hope that this would encourage better cross-pollination of ideas across communities to help advance the frontier in anomaly detection.
Resumo:
This paper presents a numerical study of a linear compressor cascade to investigate the effective end wall profiling rules for highly-loaded axial compressors. The first step in the research applies a correlation analysis for the different flow field parameters by a data mining over 600 profiling samples to quantify how variations of loss, secondary flow and passage vortex interact with each other under the influence of a profiled end wall. The result identifies the dominant role of corner separation for control of total pressure loss, providing a principle that only in the flow field with serious corner separation does the does the profiled end wall change total pressure loss, secondary flow and passage vortex in the same direction. Then in the second step, a multi-objective optimization of a profiled end wall is performed to reduce loss at design point and near stall point. The development of effective end wall profiling rules is based on the manner of secondary flow control rather than the geometry features of the end wall. Using the optimum end wall cases from the Pareto front, a quantitative tool for analyzing secondary flow control is employed. The driving force induced by a profiled end wall on different regions of end wall flow are subjected to a detailed analysis and identified for their positive/negative influences in relieving corner separation, from which the effective profiling rules are further confirmed. It is found that the profiling rules on a cascade show distinct differences at design point and near stall point, thus loss control of different operating points is generally independent.