140 resultados para bigdata, data stream processing, dsp, apache storm, cyber security
Resumo:
The simulation and development work that has been undertaken to produce a signal equaliser used to improve the data rates from oil well logging instruments is presented. The instruments are lowered into the drill bore hole suspended by a cable which has poor electrical characteristics. The equaliser described in the paper corrects for the distortions that occur from the cable (dispersion and attenuation) with the result that the instrument can send data at 100 K.bits/second down its own suspension cable of 12 Km in length. The use of simulation techniques and tools were invaluable in generating a model for the distortions and proved to be a useful tool when site testing was not available.
Resumo:
Models of normal word production are well specified about the effects of frequency of linguistic stimuli on lexical access, but are less clear regarding the same effects on later stages of word production, particularly word articulation. In aphasia, this lack of specificity of down-stream frequency effects is even more noticeable because there is relatively limited amount of data on the time course of frequency effects for this population. This study begins to fill this gap by comparing the effects of variation of word frequency (lexical, whole word) and bigram frequency (sub-lexical, within word) on word production abilities in ten normal speakers and eight mild–moderate individuals with aphasia. In an immediate repetition paradigm, participants repeated single monosyllabic words in which word frequency (high or low) was crossed with bigram frequency (high or low). Indices for mapping the time course for these effects included reaction time (RT) for linguistic processing and motor preparation, and word duration (WD) for speech motor performance (word articulation time). The results indicated that individuals with aphasia had significantly longer RT and WD compared to normal speakers. RT showed a significant main effect only for word frequency (i.e., high-frequency words had shorter RT). WD showed significant main effects of word and bigram frequency; however, contrary to our expectations, high-frequency items had longer WD. Further investigation of WD revealed that independent of the influence of word and bigram frequency, vowel type (tense or lax) had the expected effect on WD. Moreover, individuals with aphasia differed from control speakers in their ability to implement tense vowel duration, even though they could produce an appropriate distinction between tense and lax vowels. The results highlight the importance of using temporal measures to identify subtle deficits in linguistic and speech motor processing in aphasia, the crucial role of phonetic characteristics of stimuli set in studying speech production and the need for the language production models to account more explicitly for word articulation.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
This article analyses the results of an empirical study on the 200 most popular UK-based websites in various sectors of e-commerce services. The study provides empirical evidence on unlawful processing of personal data. It comprises a survey on the methods used to seek and obtain consent to process personal data for direct marketing and advertisement, and a test on the frequency of unsolicited commercial emails (UCE) received by customers as a consequence of their registration and submission of personal information to a website. Part One of the article presents a conceptual and normative account of data protection, with a discussion of the ethical values on which EU data protection law is grounded and an outline of the elements that must be in place to seek and obtain valid consent to process personal data. Part Two discusses the outcomes of the empirical study, which unveils a significant departure between EU legal theory and practice in data protection. Although a wide majority of the websites in the sample (69%) has in place a system to ask separate consent for engaging in marketing activities, it is only 16.2% of them that obtain a consent which is valid under the standards set by EU law. The test with UCE shows that only one out of three websites (30.5%) respects the will of the data subject not to receive commercial communications. It also shows that, when submitting personal data in online transactions, there is a high probability (50%) of incurring in a website that will ignore the refusal of consent and will send UCE. The article concludes that there is severe lack of compliance of UK online service providers with essential requirements of data protection law. In this respect, it suggests that there is inappropriate standard of implementation, information and supervision by the UK authorities, especially in light of the clarifications provided at EU level.
Resumo:
Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.
Resumo:
The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.
Resumo:
The General Election for the 56th United Kingdom Parliament was held on 7 May 2015. Tweets related to UK politics, not only those with the specific hashtag ”#GE2015”, have been collected in the period between March 1 and May 31, 2015. The resulting dataset contains over 28 million tweets for a total of 118 GB in uncompressed format or 15 GB in compressed format. This study describes the method that was used to collect the tweets and presents some analysis, including a political sentiment index, and outlines interesting research directions on Big Social Data based on Twitter microblogging.
Resumo:
A detailed view of Southern Hemisphere storm tracks is obtained based on the application of filtered variance and modern feature-tracking techniques to a wide range of 45-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) data. It has been checked that the conclusions drawn in this study are valid even if data from only the satellite era are used. The emphasis of the paper is on the winter season, but results for the four seasons are also discussed. Both upper- and lower-tropospheric fields are used. The tracking analysis focuses on systems that last longer than 2 days and are mobile (move more than 1000 km). Many of the results support previous ideas about the storm tracks, but some new insights are also obtained. In the summer there is a rather circular, strong, deep high-latitude storm track. In winter the high-latitude storm track is more asymmetric with a spiral from the Atlantic and Indian Oceans in toward Antarctica and a subtropical jet–related lower-latitude storm track over the Pacific, again tending to spiral poleward. At all times of the year, maximum storm activity in the higher-latitude storm track is in the Atlantic and Indian Ocean regions. In the winter upper troposphere, the relative importance of, and interplay between, the subtropical and subpolar storm tracks is discussed. The genesis, lysis, and growth rate of lower-tropospheric winter cyclones together lead to a vivid picture of their behavior that is summarized as a set of overlapping plates, each composed of cyclone life cycles. Systems in each plate appear to feed the genesis in the next plate through downstream development in the upper-troposphere spiral storm track. In the lee of the Andes in South America, there is cyclogenesis associated with the subtropical jet and also, poleward of this, cyclogenesis largely associated with system decay on the upslope and regeneration on the downslope. The genesis and lysis of cyclones and anticyclones have a definite spatial relationship with each other and with the Andes. At 500 hPa, their relative longitudinal positions are consistent with vortex-stretching ideas for simple flow over a large-scale mountain. Cyclonic systems near Antarctica have generally spiraled in from lower latitudes. However, cyclogenesis associated with mobile cyclones occurs around the Antarctic coast with an interesting genesis maximum over the sea ice near 150°E. The South Pacific storm track emerges clearly from the tracking as a coherent deep feature spiraling from Australia to southern South America. A feature of the summer season is the genesis of eastward-moving cyclonic systems near the tropic of Capricorn off Brazil, in the central Pacific and, to a lesser extent, off Madagascar, followed by movement along the southwest flanks of the subtropical anticyclones and contribution to the “convergence zone” cloud bands seen in these regions.
Resumo:
Recent interest in the validation of general circulation models (GCMs) has been devoted to objective methods. A small number of authors have used the direct synoptic identification of phenomena together with a statistical analysis to perform the objective comparison between various datasets. This paper describes a general method for performing the synoptic identification of phenomena that can be used for an objective analysis of atmospheric, or oceanographic, datasets obtained from numerical models and remote sensing. Methods usually associated with image processing have been used to segment the scene and to identify suitable feature points to represent the phenomena of interest. This is performed for each time level. A technique from dynamic scene analysis is then used to link the feature points to form trajectories. The method is fully automatic and should be applicable to a wide range of geophysical fields. An example will be shown of results obtained from this method using data obtained from a run of the Universities Global Atmospheric Modelling Project GCM.
Resumo:
Data from four recent reanalysis projects [ECMWF, NCEP-NCAR, NCEP - Department of Energy ( DOE), NASA] have been diagnosed at the scale of synoptic weather systems using an objective feature tracking method. The tracking statistics indicate that, overall, the reanalyses correspond very well in the Northern Hemisphere (NH) lower troposphere, although differences for the spatial distribution of mean intensities show that the ECMWF reanalysis is systematically stronger in the main storm track regions but weaker around major orographic features. A direct comparison of the track ensembles indicates a number of systems with a broad range of intensities that compare well among the reanalyses. In addition, a number of small-scale weak systems are found that have no correspondence among the reanalyses or that only correspond upon relaxing the matching criteria, indicating possible differences in location and/or temporal coherence. These are distributed throughout the storm tracks, particularly in the regions known for small-scale activity, such as secondary development regions and the Mediterranean. For the Southern Hemisphere (SH), agreement is found to be generally less consistent in the lower troposphere with significant differences in both track density and mean intensity. The systems that correspond between the various reanalyses are considerably reduced and those that do not match span a broad range of storm intensities. Relaxing the matching criteria indicates that there is a larger degree of uncertainty in both the location of systems and their intensities compared with the NH. At upper-tropospheric levels, significant differences in the level of activity occur between the ECMWF reanalysis and the other reanalyses in both the NH and SH winters. This occurs due to a lack of coherence in the apparent propagation of the systems in ERA15 and appears most acute above 500 hPa. This is probably due to the use of optimal interpolation data assimilation in ERA15. Also shown are results based on using the same techniques to diagnose the tropical easterly wave activity. Results indicate that the wave activity is sensitive not only to the resolution and assimilation methods used but also to the model formulation.
Resumo:
The aim of this paper is to explore the use of both an Eulerian and system-centered method of storm track diagnosis applied to a wide range of meteorological fields at multiple levels to provide a range of perspectives on the Northern Hemisphere winter transient motions and to give new insight into the storm track organization and behavior. The data used are primarily from the European Centre for Medium-Range Weather Forecasts reanalyses project extended with operational analyses to the period 1979-2000. This is supplemented by data from the National Centers for Environmental Prediction and Goddard Earth Observing System 1 reanalyses. The range of fields explored include the usual mean sea level pressure and the lower- and upper-tropospheric height, meridional wind, vorticity, and temperature, as well as the potential vorticity (PV) on a 330-K isentropic surface (PV330) and potential temperature on a PV = 2 PVU surface (theta(PV2)). As well as reporting the primary analysis based on feature tracking, the standard Eulerian 2-6-day bandpass filtered variance analysis is also reported and contrasted with the tracking diagnostics. To enable the feature points to be identified as extrema for all the chosen fields, a planetary wave background structure is removed at each data time. The bandpass filtered variance derived from the different fields yield a rich picture of the nature and comparative magnitudes of the North Pacific and Atlantic storm tracks, and of the Siberian and Mediterranean candidates for storm tracks. The feature tracking allows the cyclonic and anticyclonic activities to be considered seperately. The analysis indicates that anticyclonic features are generally much weaker with less coherence than the cyclonic systems. Cyclones and features associated with them are shown to have much greater coherence and give tracking diagnostics that create a vivid storm track picture that includes the aspects highlighted by the variances as well as highlighting aspects that are not readily available from Eulerian studies. In particular, the upper-tropospheric features as shown by negative theta(PV2), for example, occur in a band spiraling around the hemisphere from the subtropical North Atlantic eastward to the high latitudes of the same ocean basin. Lower-troposphere storm tracks occupy more limited longitudinal sectors, with many of the individual storms possibly triggered from the upper-tropospheric disturbances in the spiral band of activity.
Resumo:
Flood modelling of urban areas is still at an early stage, partly because until recently topographic data of sufficiently high resolution and accuracy have been lacking in urban areas. However, Digital Surface Models (DSMs) generated from airborne scanning laser altimetry (LiDAR) having sub-metre spatial resolution have now become available, and these are able to represent the complexities of urban topography. The paper describes the development of a LiDAR post-processor for urban flood modelling based on the fusion of LiDAR and digital map data. The map data are used in conjunction with LiDAR data to identify different object types in urban areas, though pattern recognition techniques are also employed. Post-processing produces a Digital Terrain Model (DTM) for use as model bathymetry, and also a friction parameter map for use in estimating spatially-distributed friction coefficients. In vegetated areas, friction is estimated from LiDAR-derived vegetation height, and (unlike most vegetation removal software) the method copes with short vegetation less than ~1m high, which may occupy a substantial fraction of even an urban floodplain. The DTM and friction parameter map may also be used to help to generate an unstructured mesh of a vegetated urban floodplain for use by a 2D finite element model. The mesh is decomposed to reflect floodplain features having different frictional properties to their surroundings, including urban features such as buildings and roads as well as taller vegetation features such as trees and hedges. This allows a more accurate estimation of local friction. The method produces a substantial node density due to the small dimensions of many urban features.