970 resultados para Data Streams Distribution


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Understanding the functioning of a neural system in terms of its underlying circuitry is an important problem in neuroscience. Recent d evelopments in electrophysiology and imaging allow one to simultaneously record activities of hundreds of neurons. Inferring the underlying neuronal connectivity patterns from such multi-neuronal spike train data streams is a challenging statistical and computational problem. This task involves finding significant temporal patterns from vast amounts of symbolic time series data. In this paper we show that the frequent episode mining methods from the field of temporal data mining can be very useful in this context. In the frequent episode discovery framework, the data is viewed as a sequence of events, each of which is characterized by an event type and its time of occurrence and episodes are certain types of temporal patterns in such data. Here we show that, using the set of discovered frequent episodes from multi-neuronal data, one can infer different types of connectivity patterns in the neural system that generated it. For this purpose, we introduce the notion of mining for frequent episodes under certain temporal constraints; the structure of these temporal constraints is motivated by the application. We present algorithms for discovering serial and parallel episodes under these temporal constraints. Through extensive simulation studies we demonstrate that these methods are useful for unearthing patterns of neuronal network connectivity.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Since streaming data keeps coming continuously as an ordered sequence, massive amounts of data is created. A big challenge in handling data streams is the limitation of time and space. Prototype selection on streaming data requires the prototypes to be updated in an incremental manner as new data comes in. We propose an incremental algorithm for prototype selection. This algorithm can also be used to handle very large datasets. Results have been presented on a number of large datasets and our method is compared to an existing algorithm for streaming data. Our algorithm saves time and the prototypes selected gives good classification accuracy.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Paralarval and juvenile cephalopods collected in plankton samples on 21 western North Atlantic cruises were identified and enumerated. The 3731 specimens were assigned to 44 generic and specific taxa. This paper describes their spatial and temporal distributions and their developmental morphology. The smallest paralarvae recognized for a number of species are identified and illustrated. The two most abundant and most frequently collected taxa were identifiable to species based on known systematic characters of young, as well as on distribution of the adults. These were the neritic squids Loligo pealeii and Illex illecebrosus collected north of Cape Hatteras, both valuable fishery resources. Other abundant taxa included two morphotypes of ommastrephids, at least five species of enoploteuthids, two species of onychoteuthids, and unidentified octopods. Most taxa were distributed widely both in time and in space, although some seasonal and mesoscale-spatial patterns were indicated. The taxa that appeared to have distinct seasonal distribution included most of the neritic species and, surprisingly, the young of the bathypelagic cranchiids. In eight seasonal cruises over the continental shelf of the middle U.S. Atlantic states, neritic taxa demonstrated approximately the same seasonal patterns during two consecutive years. Interannual differences in the oceanic taxa collected on the shelf were extreme. The highest abundance and diversity of planktonic cephalopods in the oceanic samples were consistently found in the vicinity of the Gulf Stream. Only eight of the oceanic taxa appeared to have limited areal distributions, compared with twelve taxa that were found throughout the western North Atlantic regions sampled in this study. Many taxa, however, were not collected frequently enough to describe seasonal or spatial patterns. Comparisons with published accounts of other cephalopod surveys indicate both strengths and weaknesses in various sampling techniques for capturing the young of oceanic cephalopods. Enoploteuthids were abundant both in our study and in other studies using midwater trawls in several areas of the North Atlantic. Thus, this family probably is adequately sampled over its developmental range. In contrast, octopoteuthids and chtenopterygiids are rare in collections made by small to medium-sized midwater trawls but are comparatively common in plankton samples. For families that are relatively common in plankton samples, paralarval abundance, derived similarly to the familiar ichthyoplankton surveys of fisheries science, may be the most reliable method of gathering data on distribution and abundance. (PDF file contains 58 pages.)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Six years of bottom-trawl survey data, including over 6000 trawls covering over 200 km2 of bottom area throughout Alaska’s subarctic marine waters, were analyzed for patterns in species richness, diversity, density, and distribution of skates. The Bering Sea continental shelf and slope, Aleutian Islands, and Gulf of Alaska regions were stratified by geographic subregion and depth. Species richness and relative density of skates increased with depth to the shelf break in all regions. The Bering Sea shelf was dominated by the Alaska skate (Bathyraja parmifera), but species richness and diversity were low. On the Bering Sea slope, richness and diversity were higher in the shallow stratum, and relative density appeared higher in subregions dominated by canyons. In the Aleutian Islands and Gulf of Alaska, species richness and relative density were generally highest in the deepest depth strata. The data and distribution maps presented here are based on species-level data collected throughout the marine waters of Alaska, and this article represents the most comprehensive summary of the skate fauna of the region published to date.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Surveys of macroinvertebrates were carried out in the Xiangxi River system during July of 2001. Among the 121 taxa collected, Ephemeroptera, Trichoptera, and Diptera dominated (41.7, 26.0, and 24.5% of the total relative abundance, respectively). Two-way indictor species analysis and detrended correspondence analysis divided the 49 sites into four groups based on species composition and relative abundance. Canonical correspondence analysis indicated that elevation, SiO2, pH, conductivity, hardness, and NO2-N were significant environmental factors affecting the distribution of macroinvertebrates.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An individual-based model (IBM) for the simulation of year-to-year survival during the early life-history stages of the north-east Atlantic stock of mackerel (Scomber scombrus) was developed within the EU funded Shelf-Edge Advection, Mortality and Recruitment (SEAMAR) programme. The IBM included transport, growth and survival and was used to track the passive movement of mackerel eggs, larvae and post-larvae and determine their distribution and abundance after approximately 2 months of drift. One of the main outputs from the IBM, namely distributions and numbers of surviving post-larvae, are compared with field data as recruit (age-0/age-1 juveniles) distribution and abundance for the years 1998, 1999 and 2000. The juvenile distributions show more inter-annual and spatial variability than the modelled distributions of survivors; this may be due to the restriction of using the same initial egg distribution for all 3 yr of simulation. The IBM simulations indicate two main recruitment areas for the north-east Atlantic stock of mackerel, these being Porcupine Bank and the south-eastern Bay of Biscay. These areas correspond to areas of high juvenile catches, although the juveniles generally have a more widespread distribution than the model simulations. The best agreement between modelled data and field data for distribution (juveniles and model survivors) is for the year 1998. The juvenile catches in different representative nursery areas are totalled to give a field abundance index (FAI). This index is compared with a model survivor index (MSI) which is calculated from the total of survivors for the whole spawning season. The MSI compares favourably with the FAI for 1998 and 1999 but not for 2000; in this year, juvenile catches dropped sharply compared with the previous years but there was no equivalent drop in modelled survivors.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper proposes a PSO based approach to increase the probability of delivering power to any load point by identifying new investments in distribution energy systems. The statistical failure and repair data of distribution components is the main basis of the proposed methodology that uses a fuzzyprobabilistic modeling for the components outage parameters. The fuzzy membership functions of the outage parameters of each component are based on statistical records. A Modified Discrete PSO optimization model is developed in order to identify the adequate investments in distribution energy system components which allow increasing the probability of delivering power to any customer in the distribution system at the minimum possible cost for the system operator. To illustrate the application of the proposed methodology, the paper includes a case study that considers a 180 bus distribution network.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract This seminar is a research discussion around a very interesting problem, which may be a good basis for a WAISfest theme. A little over a year ago Professor Alan Dix came to tell us of his plans for a magnificent adventure:to walk all of the way round Wales - 1000 miles 'Alan Walks Wales'. The walk was a personal journey, but also a technological and community one, exploring the needs of the walker and the people along the way. Whilst walking he recorded his thoughts in an audio diary, took lots of photos, wrote a blog and collected data from the tech instruments he was wearing. As a result Alan has extensive quantitative data (bio-sensing and location) and qualitative data (text, images and some audio). There are challenges in analysing individual kinds of data, including merging similar data streams, entity identification, time-series and textual data mining, dealing with provenance, ontologies for paths, and journeys. There are also challenges for author and third-party annotation, linking the data-sets and visualising the merged narrative or facets of it.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

peaker(s): Jon Hare Organiser: Time: 25/06/2014 11:00-11:50 Location: B32/3077 Abstract The aggregation of items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and effectively consume the torrents of information on the social web. This task is challenging due to the scale of the streams and the inherently multimodal nature of the information being contextualised. In this talk I'll describe some of our recent work on trend and event detection in multimedia data streams. We focus on scalable streaming algorithms that can be applied to multimedia data streams from the web and the social web. The talk will cover two particular aspects of our work: mining Twitter for trending images by detecting near duplicates; and detecting social events in multimedia data with streaming clustering algorithms. I'll will describe in detail our techniques, and explore open questions and areas of potential future work, in both these tasks.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The paper describes a method whereby the distribution of fatigue damage along riser tensioner ropes is calculated, taking account of heave motion, set tension, system geometry, tidal range and rope specification. From these data the distribution of damage along the rope is calculated for a given time period using a Miner’s summation method. This information can then be used to help the operator decide on the length of rope to ‘slip and cut’ whereby a length from the end of the rope is removed and the rope moved through the system from a storage drum such that sections of rope that have already suffered significant fatigue damage are not moved to positions where there is another peak in the distribution. There are two main advantages to be gained by using the fatigue damage model. The first is that it shows the amount of fatigue damage accumulating at different points along the rope, enabling the most highly damaged section to be removed well before failure. The second is that it makes for greater efficiency, as damage can be spread more evenly along the rope over time, avoiding the need to scrap long sections of undamaged rope.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.