998 resultados para stream mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software-programmable `soft' processors have shown tremendous potential for efficient realisation of high performance signal processing operations on Field Programmable Gate Array (FPGA), whilst lowering the design burden by avoiding the need to design fine-grained custom circuit archi-tectures. However, the complex data access patterns, high memory bandwidth and computational requirements of sliding window applications, such as Motion Estimation (ME) and Matrix Multiplication (MM), lead to low performance, inefficient soft processor realisations. This paper resolves this issue, showing how by adding support for block data addressing and accelerators for high performance loop execution, performance and resource efficiency over four times better than current best-in-class metrics can be achieved. In addition, it demonstrates the first recorded real-time soft ME estimation realisation for H.263 systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing design complexity associated with modern Field Programmable Gate Array (FPGA) has prompted the emergence of 'soft'-programmable processors which attempt to replace at least part of the custom circuit design problem with a problem of programming parallel processors. Despite substantial advances in this technology, its performance and resource efficiency for computationally complex operations remains in doubt. In this paper we present the first recorded implementation of a softcore Fast-Fourier Transform (FFT) on Xilinx Virtex FPGA technology. By employing a streaming processing architecture, we show how it is possible to achieve architectures which offer 1.1 GSamples/s throughput and up to 19 times speed-up against the Xilinx Radix-2 FFT dedicated circuit with comparable cost.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Association rule mining is an indispensable tool for discovering
insights from large databases and data warehouses.
The data in a warehouse being multi-dimensional, it is often
useful to mine rules over subsets of data defined by selections
over the dimensions. Such interactive rule mining
over multi-dimensional query windows is difficult since rule
mining is computationally expensive. Current methods using
pre-computation of frequent itemsets require counting
of some itemsets by revisiting the transaction database at
query time, which is very expensive. We develop a method
(RMW) that identifies the minimal set of itemsets to compute
and store for each cell, so that rule mining over any
query window may be performed without going back to the
transaction database. We give formal proofs that the set of
itemsets chosen by RMW is sufficient to answer any query
and also prove that it is the optimal set to be computed
for 1 dimensional queries. We demonstrate through an extensive
empirical evaluation that RMW achieves extremely
fast query response time compared to existing methods, with
only moderate overhead in pre-computation and storage

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of mining interesting phrases from subsets of a text corpus where the subset is specified using a set of features such as keywords that form a query. Previous algorithms for the problem have proposed solutions that involve sifting through a phrase dictionary based index or a document-based index where the solution is linear in either the phrase dictionary size or the size of the document subset. We propose the usage of an independence assumption between query keywords given the top correlated phrases, wherein the pre-processing could be reduced to discovering phrases from among the top phrases per each feature in the query. We then outline an indexing mechanism where per-keyword phrase lists are stored either in disk or memory, so that popular aggregation algorithms such as No Random Access and Sort-merge Join may be adapted to do the scoring at real-time to identify the top interesting phrases. Though such an approach is expected to be approximate, we empirically illustrate that very high accuracies (of over 90%) are achieved against the results of exact algorithms. Due to the simplified list-aggregation, we are also able to provide response times that are orders of magnitude better than state-of-the-art algorithms. Interestingly, our disk-based approach outperforms the in-memory baselines by up to hundred times and sometimes more, confirming the superiority of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Seafloor massive sulfides (SMS) contain commercially viable quantities of high grade ores, making them attractive prospect sites for marine mining. SMS deposits may also contain hydrothermal vent ecosystems populated by high conservation value vent-endemic species. Responsible environmental management of these resources is best achieved by the adoption of a precautionary approach. Part of this precautionary approach involves the Environmental Impact Assessment (EIA) of exploration and exploitative activities at SMS deposits. The VentBase 2012 workshop provided a forum for stakeholders and scientists to discuss issues surrounding SMS exploration and exploitation. This forum recognised the requirement for a primer which would relate concepts underpinning EIA at SMS deposits. The purpose of this primer is to inform policy makers about EIA at SMS deposits in order to aid management decisions. The primer offers a basic introduction to SMS deposits and their associated ecology, and the basic requirements for EIA at SMS deposits; including initial data and information scoping, environmental survey, and ecological risk assessment. © 2013 Elsevier Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mining seafloor massive sulfides for metals is an emergent industry faced with environmental management challenges. These revolve largely around limits to our current understanding of biological variability in marine systems, a challenge common to all marine environmental management. VentBase was established as a forum where academic, commercial, governmental, and non-governmental stakeholders can develop a consensus regarding the management of exploitative activities in the deep-sea. Participants advocate a precautionary approach with the incorporation of lessons learned from coastal studies. This workshop report from VentBase encourages the standardization of sampling methodologies for deep-sea environmental impact assessment. VentBase stresses the need for the collation of spatial data and importance of datasets amenable to robust statistical analyses. VentBase supports the identification of set-asides to prevent the local extirpation of vent-endemic communities and for the post-extraction recolonization of mine sites. © 2013.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Seafloor massive sulfide (SMS) mining will likely occur at hydrothermal systems in the near future. Alongside their mineral wealth, SMS deposits also have considerable biological value. Active SMS deposits host endemic hydrothermal vent communities, whilst inactive deposits support communities of deep water corals and other suspension feeders. Mining activities are expected to remove all large organisms and suitable habitat in the immediate area, making vent endemic organisms particularly at risk from habitat loss and localised extinction. As part of environmental management strategies designed to mitigate the effects of mining, areas of seabed need to be protected to preserve biodiversity that is lost at the mine site and to preserve communities that support connectivity among populations of vent animals in the surrounding region. These "set-aside" areas need to be biologically similar to the mine site and be suitably connected, mostly by transport of larvae, to neighbouring sites to ensure exchange of genetic material among remaining populations. Establishing suitable set-asides can be a formidable task for environmental managers, however the application of genetic approaches can aid set-aside identification, suitability assessment and monitoring. There are many genetic tools available, including analysis of mitochondrial DNA (mtDNA) sequences (e.g. COI or other suitable mtDNA genes) and appropriate nuclear DNA markers (e.g. microsatellites, single nucleotide polymorphisms), environmental DNA (eDNA) techniques and microbial metagenomics. When used in concert with traditional biological survey techniques, these tools can help to identify species, assess the genetic connectivity among populations and assess the diversity of communities. How these techniques can be applied to set-aside decision making is discussed and recommendations are made for the genetic characteristics of set-aside sites. A checklist for environmental regulators forms a guide to aid decision making on the suitability of set-aside design and assessment using genetic tools. This non-technical primer document represents the views of participants in the VentBase 2014 workshop.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2012

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Various studies using optical remote sensing in the marine environment have shown the possibilities of spectral discrimination of benthic macro and micro-algae. For in-land water bodies only very recently studies of have explored similar use of optical remote sensing to identify the taxonomic composition of algae and rooted plant communities. The importance of these communities for the functioning of river ecosystems warrants further research. In the study presented here, field spectroscopy is used to assess the possibilities of optically detecting macrophytes in a UK chalk streams. Spectral signatures of four common macrophytes were measured using a hand-held GER1500 spectroradiometer. Despite the strong absorption of near infrared in water, the results show that information on NIR can clearly contribute to the detection of submerged vegetation in shallow UK chalk stream environments. Observed spectra compare well with simulated submerged vegetation spectra, based on water absorption coefficients only. The field investigations, which were performed in the river Wylye, also indicate the confounding effects of specular reflection from riparian vegetation. The results of this study can inform remote sensing studies of the riverine environment using multi-spectral/low altitude sensors. Such larger scale studies will be highly beneficial for monitoring variation in chalk stream bioindicators, such as ranunculus.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the age of E-Business many companies faced with massive data sets that must be analysed for gaining a competitive edge. these data sets are in many instances incomplete and quite often not of very high quality. Although statistical analysis can be used to pre-process these data sets, this technique has its own limitations. In this paper we are presenting a system - and its underlying model - that can be used to test the integrity of existing data and pre-process the data into clearer data sets to be mined. LH5 is a rule-based system, capable of self-learning and is illustrated using a medical data set.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes an MPEG (moving pictures expert group) audio layer II - LFE (lower frequency extension) bit-stream processor targeting DAB (digital audio broadcasting) receivers that will handle the decoding of the frames in a computationally efficient manner to provide a synthesis sub-band filter with the reconstructed sub-band samples. Focus is given to the frequency sample reconstruction part, which handles the re-quantization and re-scaling of the samples once the necessary information is extracted from the frame. The comparison to a direct implementation of the frequency sample reconstruction block is carried out to prove increased computational efficiency.