25 resultados para mining data streams
Resumo:
In this paper we propose a graph stream clustering algorithm with a unied similarity measure on both structural and attribute properties of vertices, with each attribute being treated as a vertex. Unlike others, our approach does not require an input parameter for the number of clusters, instead, it dynamically creates new sketch-based clusters and periodically merges existing similar clusters. Experiments on two publicly available datasets reveal the advantages of our approach in detecting vertex clusters in the graph stream. We provide a detailed investigation into how parameters affect the algorithm performance. We also provide a quantitative evaluation and comparison with a well-known offline community detection algorithm which shows that our streaming algorithm can achieve comparable or better average cluster purity.
Resumo:
The cerebral cortex contains circuitry for continuously computing properties of the environment and one's body, as well as relations among those properties. The success of complex perceptuomotor performances requires integrated, simultaneous use of such relational information. Ball catching is a good example as it involves reaching and grasping of visually pursued objects that move relative to the catcher. Although integrated neural control of catching has received sparse attention in the neuroscience literature, behavioral observations have led to the identification of control principles that may be embodied in the involved neural circuits. Here, we report a catching experiment that refines those principles via a novel manipulation. Visual field motion was used to perturb velocity information about balls traveling on various trajectories relative to a seated catcher, with various initial hand positions. The experiment produced evidence for a continuous, prospective catching strategy, in which hand movements are planned based on gaze-centered ball velocity and ball position information. Such a strategy was implemented in a new neural model, which suggests how position, velocity, and temporal information streams combine to shape catching movements. The model accurately reproduces the main and interaction effects found in the behavioral experiment and provides an interpretation of recently observed target motion-related activity in the motor cortex during interceptive reaching by monkeys. It functionally interprets a broad range of neurobiological and behavioral data, and thus contributes to a unified theory of the neural control of reaching to stationary and moving targets.
Resumo:
Promoter hypermethylation is central in deregulating gene expression in cancer. Identification of novel methylation targets in specific cancers provides a basis for their use as biomarkers of disease occurrence and progression. We developed an in silico strategy to globally identify potential targets of promoter hypermethylation in prostate cancer by screening for 5' CpG islands in 631 genes that were reported as downregulated in prostate cancer. A virtual archive of 338 potential targets of methylation was produced. One candidate, IGFBP3, was selected for investigation, along with glutathione-S-transferase pi (GSTP1), a well-known methylation target in prostate cancer. Methylation of IGFBP3 was detected by quantitative methylation-specific PCR in 49/79 primary prostate adenocarcinoma and 7/14 adjacent preinvasive high-grade prostatic intraepithelial neoplasia, but in only 5/37 benign prostatic hyperplasia (P < 0.0001) and in 0/39 histologically normal adjacent prostate tissue, which implies that methylation of IGFBP3 may be involved in the early stages of prostate cancer development. Hypermethylation of IGFBP3 was only detected in samples that also demonstrated methylation of GSTP1 and was also correlated with Gleason score > or =7 (P=0.01), indicating that it has potential as a prognostic marker. In addition, pharmacological demethylation induced strong expression of IGFBP3 in LNCaP prostate cancer cells. Our concept of a methylation candidate gene bank was successful in identifying a novel target of frequent hypermethylation in early-stage prostate cancer. Evaluation of further relevant genes could contribute towards a methylation signature of this disease.
Resumo:
The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as – weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases – i) discovering homogeneous regions, and ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.
Resumo:
The problem of detecting spatially-coherent groups of data that exhibit anomalous behavior has started to attract attention due to applications across areas such as epidemic analysis and weather forecasting. Earlier efforts from the data mining community have largely focused on finding outliers, individual data objects that display deviant behavior. Such point-based methods are not easy to extend to find groups of data that exhibit anomalous behavior. Scan Statistics are methods from the statistics community that have considered the problem of identifying regions where data objects exhibit a behavior that is atypical of the general dataset. The spatial scan statistic and methods that build upon it mostly adopt the framework of defining a character for regions (e.g., circular or elliptical) of objects and repeatedly sampling regions of such character followed by applying a statistical test for anomaly detection. In the past decade, there have been efforts from the statistics community to enhance efficiency of scan statstics as well as to enable discovery of arbitrarily shaped anomalous regions. On the other hand, the data mining community has started to look at determining anomalous regions that have behavior divergent from their neighborhood.In this chapter,we survey the space of techniques for detecting anomalous regions on spatial data from across the data mining and statistics communities while outlining connections to well-studied problems in clustering and image segmentation. We analyze the techniques systematically by categorizing them appropriately to provide a structured birds eye view of the work on anomalous region detection;we hope that this would encourage better cross-pollination of ideas across communities to help advance the frontier in anomaly detection.
Resumo:
Association rule mining is an indispensable tool for discovering
insights from large databases and data warehouses.
The data in a warehouse being multi-dimensional, it is often
useful to mine rules over subsets of data defined by selections
over the dimensions. Such interactive rule mining
over multi-dimensional query windows is difficult since rule
mining is computationally expensive. Current methods using
pre-computation of frequent itemsets require counting
of some itemsets by revisiting the transaction database at
query time, which is very expensive. We develop a method
(RMW) that identifies the minimal set of itemsets to compute
and store for each cell, so that rule mining over any
query window may be performed without going back to the
transaction database. We give formal proofs that the set of
itemsets chosen by RMW is sufficient to answer any query
and also prove that it is the optimal set to be computed
for 1 dimensional queries. We demonstrate through an extensive
empirical evaluation that RMW achieves extremely
fast query response time compared to existing methods, with
only moderate overhead in pre-computation and storage
Resumo:
Seafloor massive sulfides (SMS) contain commercially viable quantities of high grade ores, making them attractive prospect sites for marine mining. SMS deposits may also contain hydrothermal vent ecosystems populated by high conservation value vent-endemic species. Responsible environmental management of these resources is best achieved by the adoption of a precautionary approach. Part of this precautionary approach involves the Environmental Impact Assessment (EIA) of exploration and exploitative activities at SMS deposits. The VentBase 2012 workshop provided a forum for stakeholders and scientists to discuss issues surrounding SMS exploration and exploitation. This forum recognised the requirement for a primer which would relate concepts underpinning EIA at SMS deposits. The purpose of this primer is to inform policy makers about EIA at SMS deposits in order to aid management decisions. The primer offers a basic introduction to SMS deposits and their associated ecology, and the basic requirements for EIA at SMS deposits; including initial data and information scoping, environmental survey, and ecological risk assessment. © 2013 Elsevier Ltd.
Resumo:
Mining seafloor massive sulfides for metals is an emergent industry faced with environmental management challenges. These revolve largely around limits to our current understanding of biological variability in marine systems, a challenge common to all marine environmental management. VentBase was established as a forum where academic, commercial, governmental, and non-governmental stakeholders can develop a consensus regarding the management of exploitative activities in the deep-sea. Participants advocate a precautionary approach with the incorporation of lessons learned from coastal studies. This workshop report from VentBase encourages the standardization of sampling methodologies for deep-sea environmental impact assessment. VentBase stresses the need for the collation of spatial data and importance of datasets amenable to robust statistical analyses. VentBase supports the identification of set-asides to prevent the local extirpation of vent-endemic communities and for the post-extraction recolonization of mine sites. © 2013.
Resumo:
Biogas from anaerobic digestion of sewage sludge is a renewable resource with high energy content, which is formed mainly of CH4 (40-75 vol.%) and CO2 (15-60 vol.%) Other components such as water (H2O, 5-10 vol.%) and trace amounts of hydrogen sulfide and siloxanes can also be present. A CH4-rich stream can be produced by removing the CO2 and other impurities so that the upgraded bio-methane can be injected into the natural gas grid or used as a vehicle fuel. The main objective of this paper is to develop a new modeling methodology to assess the technical and economic performance of biogas upgrading processes using ionic liquids which physically absorb CO2. Three different ionic liquids, namely the 1-ethyl-3-methylimidazolium bis[(trifluoromethyl)sulfonyl]imide, 1-hexyl-3-methylimidazoliumbis[(trifluoromethyl)sulfonyl]imide and trihexyl(tetradecyl)phosphonium bis[(trifluoromethyl)sulfonyl]imide, are considered for CO2 capture in a pressure-swing regenerative absorption process. The simulation software Aspen Plus and Aspen Process Economic Analyzer is used to account for mass and energy balances as well as equipment cost. In all cases, the biogas upgrading plant consists of a multistage compressor for biogas compression, a packed absorption column for CO2 absorption, a flash evaporator for solvent regeneration, a centrifugal pump for solvent recirculation, a pre-absorber solvent cooler and a gas turbine for electricity recovery. The evaluated processes are compared in terms of energy efficiency, capital investment and bio-methane production costs. The overall plant efficiency ranges from 71-86 % whereas the bio-methane production cost ranges from £6.26-7.76 per GJ (LHV). A sensitivity analysis is also performed to determine how several technical and economic parameters affect the bio-methane production costs. The results of this study show that the simulation methodology developed can predict plant efficiencies and production costs of large scale CO2 capture processes using ionic liquids without having to rely on gas solubility experimental data.