802 resultados para Data stream mining
Resumo:
This paper explores the concept of Value Stream Analysis and Mapping (VSA/M) as applied to Product Development (PD) efforts. Value Stream Analysis and Mapping is a method of business process improvement. The application of VSA/M began in the manufacturing community. PD efforts provide a different setting for the use of VSA/M. Site visits were made to nine major U.S. aerospace organizations. Interviews, discussions, and participatory events were used to gather data on (1) the sophistication of the tools used in PD process improvement efforts, (2) the lean context of the use of the tools, and (3) success of the efforts. It was found that all three factors were strongly correlated, suggesting success depends on both good tools and lean context. Finally, a general VSA/M method for PD activities is proposed. The method uses modified process mapping tools to analyze and improve process.
Resumo:
Abstract This seminar is a research discussion around a very interesting problem, which may be a good basis for a WAISfest theme. A little over a year ago Professor Alan Dix came to tell us of his plans for a magnificent adventure:to walk all of the way round Wales - 1000 miles 'Alan Walks Wales'. The walk was a personal journey, but also a technological and community one, exploring the needs of the walker and the people along the way. Whilst walking he recorded his thoughts in an audio diary, took lots of photos, wrote a blog and collected data from the tech instruments he was wearing. As a result Alan has extensive quantitative data (bio-sensing and location) and qualitative data (text, images and some audio). There are challenges in analysing individual kinds of data, including merging similar data streams, entity identification, time-series and textual data mining, dealing with provenance, ontologies for paths, and journeys. There are also challenges for author and third-party annotation, linking the data-sets and visualising the merged narrative or facets of it.
Resumo:
peaker(s): Jon Hare Organiser: Time: 25/06/2014 11:00-11:50 Location: B32/3077 Abstract The aggregation of items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and effectively consume the torrents of information on the social web. This task is challenging due to the scale of the streams and the inherently multimodal nature of the information being contextualised. In this talk I'll describe some of our recent work on trend and event detection in multimedia data streams. We focus on scalable streaming algorithms that can be applied to multimedia data streams from the web and the social web. The talk will cover two particular aspects of our work: mining Twitter for trending images by detecting near duplicates; and detecting social events in multimedia data with streaming clustering algorithms. I'll will describe in detail our techniques, and explore open questions and areas of potential future work, in both these tasks.
Resumo:
[1] We present a new, process-based model of soil and stream water dissolved organic carbon (DOC): the Integrated Catchments Model for Carbon (INCA-C). INCA-C is the first model of DOC cycling to explicitly include effects of different land cover types, hydrological flow paths, in-soil carbon biogeochemistry, and surface water processes on in-stream DOC concentrations. It can be calibrated using only routinely available monitoring data. INCA-C simulates daily DOC concentrations over a period of years to decades. Sources, sinks, and transformation of solid and dissolved organic carbon in peat and forest soils, wetlands, and streams as well as organic carbon mineralization in stream waters are modeled. INCA-C is designed to be applied to natural and seminatural forested and peat-dominated catchments in boreal and temperate regions. Simulations at two forested catchments showed that seasonal and interannual patterns of DOC concentration could be modeled using climate-related parameters alone. A sensitivity analysis showed that model predictions were dependent on the mass of organic carbon in the soil and that in-soil process rates were dependent on soil moisture status. Sensitive rate coefficients in the model included those for organic carbon sorption and desorption and DOC mineralization in the soil. The model was also sensitive to the amount of litter fall. Our results show the importance of climate variability in controlling surface water DOC concentrations and suggest the need for further research on the mechanisms controlling production and consumption of DOC in soils.
Resumo:
A regional overview of the water quality and ecology of the River Lee catchment is presented. Specifically, data describing the chemical, microbiological and macrobiological water quality and fisheries communities have been analysed, based on a division into river, sewage treatment works, fish-farm, lake and industrial samples. Nutrient enrichment and the highest concentrations of metals and micro-organics were found in the urbanised, lower reaches of the Lee and in the Lee Navigation. Average annual concentrations of metals were generally within environmental quality standards although, oil many occasions, concentrations of cadmium, copper, lead, mercury and zinc were in excess of the standards. Various organic substances (used as herbicides, fungicides, insecticides, chlorination by-products and industrial solvents) were widely detected in the Lee system. Concentrations of ten micro-organic substances were observed in excess of their environmental quality standards, though not in terms of annual averages. Sewage treatment works were the principal point source input of nutrients. metals and micro-organic determinands to the catchment. Diffuse nitrogen sources contributed approximately 60% and 27% of the in-stream load in the upper and lower Lee respectively, whereas approximately 60% and 20% of the in-stream phosphorus load was derived from diffuse sources in the upper and lower Lee. For metals, the most significant source was the urban runoff from North London. In reaches less affected by effluent discharges, diffuse runoff from urban and agricultural areas dominated trends. Flig-h microbiological content, observed in the River Lee particularly in urbanised reaches, was far in excess of the EC Bathing Water Directive standards. Water quality issues and degraded habitat in the lower reaches of the Lee have led to impoverished aquatic fauna but, within the mid-catchment reaches and upper agricultural tributaries, less nutrient enrichment and channel alteration has permitted more diverse aquatic fauna.
Resumo:
The beds of active ice streams in Greenland and Antarctica are largely inaccessible, hindering a full understanding of the processes that initiate, sustain and inhibit fast ice flow in ice sheets. Detailed mapping of the glacial geomorphology of palaeo-ice stream tracks is, therefore, a valuable tool for exploring the basal processes that control their behaviour. In this paper we present a map that shows detailed glacial geomorphology from a part of the Dubawnt Lake Palaeo-Ice Stream bed on the north-western Canadian Shield (Northwest Territories), which operated at the end of the last glacial cycle. The map (centred on 63 degrees 55 '' 42'N, 102 degrees 29 '' 11'W, approximate scale 1:90,000) was compiled from digital Landsat Enhanced Thematic Mapper Plus satellite imagery and digital and hard-copy stereo-aerial photographs. The ice stream bed is dominated by parallel mega-scale glacial lineations (MGSL), whose lengths exceed several kilometres but the map also reveals that they have, in places, been superimposed with transverse ridges known as ribbed moraines. The ribbed moraines lie on top of the MSGL and appear to have segmented the individual lineaments. This indicates that formation of the ribbed moraines post-date the formation of the MSGL. The presence of ribbed moraine in the onset zone of another palaeo-ice stream has been linked to oscillations between cold and warm-based ice and/or a patchwork of cold-based areas which led to acceleration and deceleration of ice velocity. Our hypothesis is that the ribbed moraines on the Dubawnt Lake Ice Stream bed are a manifestation of the process that led to ice stream shut-down and may be associated with the process of basal freeze-on. The precise formation of ribbed moraines, however, remains open to debate and field observation of their structure will provide valuable data for formal testing of models of their formation.
Resumo:
Ascertaining the location of palaeo-ice streams is crucial in order to produce accurate reconstructions of palaeo-ice sheets and examine interactions with the ocean-climate system. This paper reports evidence for a major ice stream in Amundsen Gulf, Canadian Arctic Archipelago. Mapping from satellite imagery (Landsat ETM+) and digital elevation models, including bathymetric data, is used to reconstruct flow-patterns on southwestern Victoria Island and the adjacent mainland (Nunavut and Northwest Territories). Several flow-sets indicative of ice streaming are found feeding into the marine trough and cross-cutting relationships between these flow-sets (and utilising previously published radiocarbon dates) reveal several phases of ice stream activity centred in Amundsen Gulf and Dolphin and Union Strait. A large erosional footprint on the continental shelf indicates that the ice stream (ca. 1000 km long and ca. 150 km wide) filled Amundsen Gulf, probably at the Last Glacial Maximum. Subsequent to this, the ice stream reorganised as the margin retreated back along the marine trough, eventually splitting into two separate low-gradient lobes in Prince Albert Sound and Dolphin and Union Strait. The location of this major ice stream holds important implications for ice sheet-ocean interactions and specifically, the development of Arctic Ocean ice shelves and the delivery of icebergs into the western Arctic Ocean during the late Pleistocene. Copyright (C) 2006 John Wiley & Sons, Ltd.
Resumo:
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.
Resumo:
Recently, two approaches have been introduced that distribute the molecular fragment mining problem. The first approach applies a master/worker topology, the second approach, a completely distributed peer-to-peer system, solves the scalability problem due to the bottleneck at the master node. However, in many real world scenarios the participating computing nodes cannot communicate directly due to administrative policies such as security restrictions. Thus, potential computing power is not accessible to accelerate the mining run. To solve this shortcoming, this work introduces a hierarchical topology of computing resources, which distributes the management over several levels and adapts to the natural structure of those multi-domain architectures. The most important aspect is the load balancing scheme, which has been designed and optimized for the hierarchical structure. The approach allows dynamic aggregation of heterogenous computing resources and is applied to wide area network scenarios.
Resumo:
In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.
Resumo:
Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.
Resumo:
The high variability of the intensity of suprathermal electron flux in the solar wind is usually ascribed to the high variability of sources on the Sun. Here we demonstrate that a substantial amount of the variability arises from peaks in stream interaction regions, where fast wind runs into slow wind and creates a pressure ridge at the interface. Superposed epoch analysis centered on stream interfaces in 26 interaction regions previously identified in Wind data reveal a twofold increase in 250 eV flux (integrated over pitch angle). Whether the peaks result from the compression there or are solar signatures of the coronal hole boundary, to which interfaces may map, is an open question. Suggestive of the latter, some cases show a displacement between the electron and magnetic field peaks at the interface. Since solar information is transmitted to 1 AU much more quickly by suprathermal electrons compared to convected plasma signatures, the displacement may imply a shift in the coronal hole boundary through transport of open magnetic flux via interchange reconnection. If so, however, the fact that displacements occur in both directions and that the electron and field peaks in the superposed epoch analysis are nearly coincident indicate that any systematic transport expected from differential solar rotation is overwhelmed by a random pattern, possibly owing to transport across a ragged coronal hole boundary.