849 resultados para Incremental mining
Resumo:
Clare, A., Williams, H. E. and Lester, N. M. (2004) Scalable Multi-Relational Association Mining. In proceedings of the 4th International Conference on Data Mining ICDM '04.
Resumo:
King, R. D. and Wise, P. H. and Clare, A. (2004) Confirmation of Data Mining Based Predictions of Protein Function. Bioinformatics 20(7), 1110-1118
Resumo:
Clare, A. and King R.D. (2003) Data mining the yeast genome in a lazy functional language. In Practical Aspects of Declarative Languages (PADL'03) (won Best/Most Practical Paper award).
Resumo:
Ferr?, S. and King, R. D. (2004) A dichotomic search algorithm for mining and learning in domain-specific logics. Fundamenta Informaticae. IOS Press. To appear
Resumo:
Grattan, J.P., Al-Saad, Z., Gilbertson, D.D., Karaki, L.O., Pyatt, F.B 2005 Analyses of patterns of copper and lead mineralisation in human skeletons excavated from an ancient mining and smelting centre in the Jordanian desert Mineralogical Magazine. 69(5) 653-666.
Resumo:
Pyatt, B. Gilmore, G. Grattan, J. Hunt, C. McLaren, S. An imperial legacy? An exploration of the environmental impact of ancient metal mining and smelting in southern Jordan. Journal of Archaeological Science. 2000. 27 pp 771-778
Resumo:
Grattan, J., Huxley, S., Karaki, L. A., Toland, H., Gilbertson, D., Pyatt, B., Saad, Z. A. (2002). 'Death . . . more desirable than life'? The human skeletal record and toxicological implications of ancient copper mining and smelting in Wadi Faynan, southwestern Jordan. Toxicology and Industrial Health, 18 (6), 297-307.
Resumo:
Mighall, T. Abrahams, P. Grattan, J. Hayes, D. Timberlake, S. Forsyth, S. Geochemical evidence for atmospheric pollution derived from prehistoric copper mining at Copa Hill, Cwmystwyth, mid-Wales, UK. The Science of the Total Environment. 2002. 292 pp 69-80
Resumo:
Grattan, J.P., Gilbertson, D.D., Hunt, C.O. (2007). The local and global dimensions of metaliferrous air pollution derived from a reconstruction of an 8 thousand year record of copper smelting and mining at a desert-mountain frontier in southern Jordan. Journal of Archaeological Science 34, 83-110
Resumo:
The increasing practicality of large-scale flow capture makes it possible to conceive of traffic analysis methods that detect and identify a large and diverse set of anomalies. However the challenge of effectively analyzing this massive data source for anomaly diagnosis is as yet unmet. We argue that the distributions of packet features (IP addresses and ports) observed in flow traces reveals both the presence and the structure of a wide range of anomalies. Using entropy as a summarization tool, we show that the analysis of feature distributions leads to significant advances on two fronts: (1) it enables highly sensitive detection of a wide range of anomalies, augmenting detections by volume-based methods, and (2) it enables automatic classification of anomalies via unsupervised learning. We show that using feature distributions, anomalies naturally fall into distinct and meaningful clusters. These clusters can be used to automatically classify anomalies and to uncover new anomaly types. We validate our claims on data from two backbone networks (Abilene and Geant) and conclude that feature distributions show promise as a key element of a fairly general network anomaly diagnosis framework.
Resumo:
The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Two efficient methods to find frequent arrangements of temporal intervals are described; the first one is tree-based and uses depth first search to mine the set of frequent arrangements, whereas the second one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints consisting of regular expressions and gap constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.
Resumo:
A new neural network architecture is introduced for incremental supervised learning of recognition categories and multidimensional maps in response to arbitrary sequences of analog or binary input vectors. The architecture, called Fuzzy ARTMAP, achieves a synthesis of fuzzy logic and Adaptive Resonance Theory (ART) neural networks by exploiting a close formal similarity between the computations of fuzzy subsethood and ART category choice, resonance, and learning. Fuzzy ARTMAP also realizes a new Minimax Learning Rule that conjointly minimizes predictive error and maximizes code compression, or generalization. This is achieved by a match tracking process that increases the ART vigilance parameter by the minimum amount needed to correct a predictive error. As a result, the system automatically learns a minimal number of recognition categories, or "hidden units", to met accuracy criteria. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator (Λ) and the MAX operator (v) of fuzzy logic play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights correspond to increasing sizes of category "boxes". Smaller vigilance values lead to larger category boxes. Improved prediction is achieved by training the system several times using different orderings of the input set. This voting strategy can also be used to assign probability estimates to competing predictions given small, noisy, or incomplete training sets. Four classes of simulations illustrate Fuzzy ARTMAP performance as compared to benchmark back propagation and genetic algorithm systems. These simulations include (i) finding points inside vs. outside a circle; (ii) learning to tell two spirals apart; (iii) incremental approximation of a piecewise continuous function; and (iv) a letter recognition database. The Fuzzy ARTMAP system is also compared to Salzberg's NGE system and to Simpson's FMMC system.
Resumo:
Accepted Version
Resumo:
BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.
Resumo:
Mountaintop mining (MTM) is the primary procedure for surface coal exploration within the central Appalachian region of the eastern United States, and it is known to contaminate streams in local watersheds. In this study, we measured the chemical and isotopic compositions of water samples from MTM-impacted tributaries and streams in the Mud River watershed in West Virginia. We systematically document the isotopic compositions of three major constituents: sulfur isotopes in sulfate (δ(34)SSO4), carbon isotopes in dissolved inorganic carbon (δ(13)CDIC), and strontium isotopes ((87)Sr/(86)Sr). The data show that δ(34)SSO4, δ(13)CDIC, Sr/Ca, and (87)Sr/(86)Sr measured in saline- and selenium-rich MTM impacted tributaries are distinguishable from those of the surface water upstream of mining impacts. These tracers can therefore be used to delineate and quantify the impact of MTM in watersheds. High Sr/Ca and low (87)Sr/(86)Sr characterize tributaries that originated from active MTM areas, while tributaries from reclaimed MTM areas had low Sr/Ca and high (87)Sr/(86)Sr. Leaching experiments of rocks from the watershed show that pyrite oxidation and carbonate dissolution control the solute chemistry with distinct (87)Sr/(86)Sr ratios characterizing different rock sources. We propose that MTM operations that access the deeper Kanawha Formation generate residual mined rocks in valley fills from which effluents with distinctive (87)Sr/(86)Sr and Sr/Ca imprints affect the quality of the Appalachian watersheds.