921 resultados para Processing wikipedia data


Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper describes a proposed new approach to the Computer Network Security Intrusion Detection Systems (NIDS) application domain knowledge processing focused on a topic map technology-enabled representation of features of the threat pattern space as well as the knowledge of situated efficacy of alternative candidate algorithms for pattern recognition within the NIDS domain. Thus an integrative knowledge representation framework for virtualisation, data intelligence and learning loop architecting in the NIDS domain is described together with specific aspects of its deployment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The simulation and development work that has been undertaken to produce a signal equaliser used to improve the data rates from oil well logging instruments is presented. The instruments are lowered into the drill bore hole suspended by a cable which has poor electrical characteristics. The equaliser described in the paper corrects for the distortions that occur from the cable (dispersion and attenuation) with the result that the instrument can send data at 100 K.bits/second down its own suspension cable of 12 Km in length. The use of simulation techniques and tools were invaluable in generating a model for the distortions and proved to be a useful tool when site testing was not available.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article analyses the results of an empirical study on the 200 most popular UK-based websites in various sectors of e-commerce services. The study provides empirical evidence on unlawful processing of personal data. It comprises a survey on the methods used to seek and obtain consent to process personal data for direct marketing and advertisement, and a test on the frequency of unsolicited commercial emails (UCE) received by customers as a consequence of their registration and submission of personal information to a website. Part One of the article presents a conceptual and normative account of data protection, with a discussion of the ethical values on which EU data protection law is grounded and an outline of the elements that must be in place to seek and obtain valid consent to process personal data. Part Two discusses the outcomes of the empirical study, which unveils a significant departure between EU legal theory and practice in data protection. Although a wide majority of the websites in the sample (69%) has in place a system to ask separate consent for engaging in marketing activities, it is only 16.2% of them that obtain a consent which is valid under the standards set by EU law. The test with UCE shows that only one out of three websites (30.5%) respects the will of the data subject not to receive commercial communications. It also shows that, when submitting personal data in online transactions, there is a high probability (50%) of incurring in a website that will ignore the refusal of consent and will send UCE. The article concludes that there is severe lack of compliance of UK online service providers with essential requirements of data protection law. In this respect, it suggests that there is inappropriate standard of implementation, information and supervision by the UK authorities, especially in light of the clarifications provided at EU level.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A student from the Data Processing program at the New York Trade School is shown working. Black and white photograph with some edge damage due to writing in black along the top.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Felice Gigante a graduate from the New York Trade School Electronics program works on a machine in his job as Data Processing Customer Engineer for the International Business Machines Corp. Original caption reads, "Felice Gigante - Electronices, International Business Machines Corp." Black and white photograph with caption glued to reverse.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

GPS technology has been embedded into portable, low-cost electronic devices nowadays to track the movements of mobile objects. This implication has greatly impacted the transportation field by creating a novel and rich source of traffic data on the road network. Although the promise offered by GPS devices to overcome problems like underreporting, respondent fatigue, inaccuracies and other human errors in data collection is significant; the technology is still relatively new that it raises many issues for potential users. These issues tend to revolve around the following areas: reliability, data processing and the related application. This thesis aims to study the GPS tracking form the methodological, technical and practical aspects. It first evaluates the reliability of GPS based traffic data based on data from an experiment containing three different traffic modes (car, bike and bus) traveling along the road network. It then outline the general procedure for processing GPS tracking data and discuss related issues that are uncovered by using real-world GPS tracking data of 316 cars. Thirdly, it investigates the influence of road network density in finding optimal location for enhancing travel efficiency and decreasing travel cost. The results show that the geographical positioning is reliable. Velocity is slightly underestimated, whereas altitude measurements are unreliable.Post processing techniques with auxiliary information is found necessary and important when solving the inaccuracy of GPS data. The densities of the road network influence the finding of optimal locations. The influence will stabilize at a certain level and do not deteriorate when the node density is higher.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Until mid 2006, SCIAMACHY data processors for the operational retrieval of nitrogen dioxide (NO2) column data were based on the historical version 2 of the GOME Data Processor (GDP). On top of known problems inherent to GDP 2, ground-based validations of SCIAMACHY NO2 data revealed issues specific to SCIAMACHY, like a large cloud-dependent offset occurring at Northern latitudes. In 2006, the GDOAS prototype algorithm of the improved GDP version 4 was transferred to the off-line SCIAMACHY Ground Processor (SGP) version 3.0. In parallel, the calibration of SCIAMACHY radiometric data was upgraded. Before operational switch-on of SGP 3.0 and public release of upgraded SCIAMACHY NO2 data, we have investigated the accuracy of the algorithm transfer: (a) by checking the consistency of SGP 3.0 with prototype algorithms; and (b) by comparing SGP 3.0 NO2 data with ground-based observations reported by the WMO/GAW NDACC network of UV-visible DOAS/SAOZ spectrometers. This delta-validation study concludes that SGP 3.0 is a significant improvement with respect to the previous processor IPF 5.04. For three particular SCIAMACHY states, the study reveals unexplained features in the slant columns and air mass factors, although the quantitative impact on SGP 3.0 vertical columns is not significant.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The CMS Collaboration conducted a month-long data taking exercise, the Cosmic Run At Four Tesla, during October-November 2008, with the goal of commissioning the experiment for extended operation. With all installed detector systems participating, CMS recorded 270 million cosmic ray events with the solenoid at a magnetic field strength of 3.8 T. This paper describes the data flow from the detector through the various online and offline computing systems, as well as the workflows used for recording the data, for aligning and calibrating the detector, and for analysis of the data. © 2010 IOP Publishing Ltd and SISSA.