876 resultados para bigdata, data stream processing, dsp, apache storm, cyber security
Resumo:
This paper describes a proposed new approach to the Computer Network Security Intrusion Detection Systems (NIDS) application domain knowledge processing focused on a topic map technology-enabled representation of features of the threat pattern space as well as the knowledge of situated efficacy of alternative candidate algorithms for pattern recognition within the NIDS domain. Thus an integrative knowledge representation framework for virtualisation, data intelligence and learning loop architecting in the NIDS domain is described together with specific aspects of its deployment.
Resumo:
The simulation and development work that has been undertaken to produce a signal equaliser used to improve the data rates from oil well logging instruments is presented. The instruments are lowered into the drill bore hole suspended by a cable which has poor electrical characteristics. The equaliser described in the paper corrects for the distortions that occur from the cable (dispersion and attenuation) with the result that the instrument can send data at 100 K.bits/second down its own suspension cable of 12 Km in length. The use of simulation techniques and tools were invaluable in generating a model for the distortions and proved to be a useful tool when site testing was not available.
Resumo:
Models of normal word production are well specified about the effects of frequency of linguistic stimuli on lexical access, but are less clear regarding the same effects on later stages of word production, particularly word articulation. In aphasia, this lack of specificity of down-stream frequency effects is even more noticeable because there is relatively limited amount of data on the time course of frequency effects for this population. This study begins to fill this gap by comparing the effects of variation of word frequency (lexical, whole word) and bigram frequency (sub-lexical, within word) on word production abilities in ten normal speakers and eight mild–moderate individuals with aphasia. In an immediate repetition paradigm, participants repeated single monosyllabic words in which word frequency (high or low) was crossed with bigram frequency (high or low). Indices for mapping the time course for these effects included reaction time (RT) for linguistic processing and motor preparation, and word duration (WD) for speech motor performance (word articulation time). The results indicated that individuals with aphasia had significantly longer RT and WD compared to normal speakers. RT showed a significant main effect only for word frequency (i.e., high-frequency words had shorter RT). WD showed significant main effects of word and bigram frequency; however, contrary to our expectations, high-frequency items had longer WD. Further investigation of WD revealed that independent of the influence of word and bigram frequency, vowel type (tense or lax) had the expected effect on WD. Moreover, individuals with aphasia differed from control speakers in their ability to implement tense vowel duration, even though they could produce an appropriate distinction between tense and lax vowels. The results highlight the importance of using temporal measures to identify subtle deficits in linguistic and speech motor processing in aphasia, the crucial role of phonetic characteristics of stimuli set in studying speech production and the need for the language production models to account more explicitly for word articulation.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
This article analyses the results of an empirical study on the 200 most popular UK-based websites in various sectors of e-commerce services. The study provides empirical evidence on unlawful processing of personal data. It comprises a survey on the methods used to seek and obtain consent to process personal data for direct marketing and advertisement, and a test on the frequency of unsolicited commercial emails (UCE) received by customers as a consequence of their registration and submission of personal information to a website. Part One of the article presents a conceptual and normative account of data protection, with a discussion of the ethical values on which EU data protection law is grounded and an outline of the elements that must be in place to seek and obtain valid consent to process personal data. Part Two discusses the outcomes of the empirical study, which unveils a significant departure between EU legal theory and practice in data protection. Although a wide majority of the websites in the sample (69%) has in place a system to ask separate consent for engaging in marketing activities, it is only 16.2% of them that obtain a consent which is valid under the standards set by EU law. The test with UCE shows that only one out of three websites (30.5%) respects the will of the data subject not to receive commercial communications. It also shows that, when submitting personal data in online transactions, there is a high probability (50%) of incurring in a website that will ignore the refusal of consent and will send UCE. The article concludes that there is severe lack of compliance of UK online service providers with essential requirements of data protection law. In this respect, it suggests that there is inappropriate standard of implementation, information and supervision by the UK authorities, especially in light of the clarifications provided at EU level.
Resumo:
Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.
Resumo:
The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.
Resumo:
The General Election for the 56th United Kingdom Parliament was held on 7 May 2015. Tweets related to UK politics, not only those with the specific hashtag ”#GE2015”, have been collected in the period between March 1 and May 31, 2015. The resulting dataset contains over 28 million tweets for a total of 118 GB in uncompressed format or 15 GB in compressed format. This study describes the method that was used to collect the tweets and presents some analysis, including a political sentiment index, and outlines interesting research directions on Big Social Data based on Twitter microblogging.
Resumo:
A student from the Data Processing program at the New York Trade School is shown working. Black and white photograph with some edge damage due to writing in black along the top.
Resumo:
Felice Gigante a graduate from the New York Trade School Electronics program works on a machine in his job as Data Processing Customer Engineer for the International Business Machines Corp. Original caption reads, "Felice Gigante - Electronices, International Business Machines Corp." Black and white photograph with caption glued to reverse.