28 resultados para Data pre-processing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The long observational record is critical to our understanding of the Earth’s climate, but most observing systems were not developed with a climate objective in mind. As a result, tremendous efforts have gone into assessing and reprocessing the data records to improve their usefulness in climate studies. The purpose of this paper is to both review recent progress in reprocessing and reanalyzing observations, and summarize the challenges that must be overcome in order to improve our understanding of climate and variability. Reprocessing improves data quality through more scrutiny and improved retrieval techniques for individual observing systems, while reanalysis merges many disparate observations with models through data assimilation, yet both aim to provide a climatology of Earth processes. Many challenges remain, such as tracking the improvement of processing algorithms and limited spatial coverage. Reanalyses have fostered significant research, yet reliable global trends in many physical fields are not yet attainable, despite significant advances in data assimilation and numerical modeling. Oceanic reanalyses have made significant advances in recent years, but will only be discussed here in terms of progress toward integrated Earth system analyses. Climate data sets are generally adequate for process studies and large-scale climate variability. Communication of the strengths, limitations and uncertainties of reprocessed observations and reanalysis data, not only among the community of developers, but also with the extended research community, including the new generations of researchers and the decision makers is crucial for further advancement of the observational data records. It must be emphasized that careful investigation of the data and processing methods are required to use the observations appropriately.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The analysis step of the (ensemble) Kalman filter is optimal when (1) the distribution of the background is Gaussian, (2) state variables and observations are related via a linear operator, and (3) the observational error is of additive nature and has Gaussian distribution. When these conditions are largely violated, a pre-processing step known as Gaussian anamorphosis (GA) can be applied. The objective of this procedure is to obtain state variables and observations that better fulfil the Gaussianity conditions in some sense. In this work we analyse GA from a joint perspective, paying attention to the effects of transformations in the joint state variable/observation space. First, we study transformations for state variables and observations that are independent from each other. Then, we introduce a targeted joint transformation with the objective to obtain joint Gaussianity in the transformed space. We focus primarily in the univariate case, and briefly comment on the multivariate one. A key point of this paper is that, when (1)-(3) are violated, using the analysis step of the EnKF will not recover the exact posterior density in spite of any transformations one may perform. These transformations, however, provide approximations of different quality to the Bayesian solution of the problem. Using an example in which the Bayesian posterior can be analytically computed, we assess the quality of the analysis distributions generated after applying the EnKF analysis step in conjunction with different GA options. The value of the targeted joint transformation is particularly clear for the case when the prior is Gaussian, the marginal density for the observations is close to Gaussian, and the likelihood is a Gaussian mixture.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper describes a proposed new approach to the Computer Network Security Intrusion Detection Systems (NIDS) application domain knowledge processing focused on a topic map technology-enabled representation of features of the threat pattern space as well as the knowledge of situated efficacy of alternative candidate algorithms for pattern recognition within the NIDS domain. Thus an integrative knowledge representation framework for virtualisation, data intelligence and learning loop architecting in the NIDS domain is described together with specific aspects of its deployment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The simulation and development work that has been undertaken to produce a signal equaliser used to improve the data rates from oil well logging instruments is presented. The instruments are lowered into the drill bore hole suspended by a cable which has poor electrical characteristics. The equaliser described in the paper corrects for the distortions that occur from the cable (dispersion and attenuation) with the result that the instrument can send data at 100 K.bits/second down its own suspension cable of 12 Km in length. The use of simulation techniques and tools were invaluable in generating a model for the distortions and proved to be a useful tool when site testing was not available.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article analyses the results of an empirical study on the 200 most popular UK-based websites in various sectors of e-commerce services. The study provides empirical evidence on unlawful processing of personal data. It comprises a survey on the methods used to seek and obtain consent to process personal data for direct marketing and advertisement, and a test on the frequency of unsolicited commercial emails (UCE) received by customers as a consequence of their registration and submission of personal information to a website. Part One of the article presents a conceptual and normative account of data protection, with a discussion of the ethical values on which EU data protection law is grounded and an outline of the elements that must be in place to seek and obtain valid consent to process personal data. Part Two discusses the outcomes of the empirical study, which unveils a significant departure between EU legal theory and practice in data protection. Although a wide majority of the websites in the sample (69%) has in place a system to ask separate consent for engaging in marketing activities, it is only 16.2% of them that obtain a consent which is valid under the standards set by EU law. The test with UCE shows that only one out of three websites (30.5%) respects the will of the data subject not to receive commercial communications. It also shows that, when submitting personal data in online transactions, there is a high probability (50%) of incurring in a website that will ignore the refusal of consent and will send UCE. The article concludes that there is severe lack of compliance of UK online service providers with essential requirements of data protection law. In this respect, it suggests that there is inappropriate standard of implementation, information and supervision by the UK authorities, especially in light of the clarifications provided at EU level.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Environmental Data Abstraction Library provides a modular data management library for bringing new and diverse datatypes together for visualisation within numerous software packages, including the ncWMS viewing service, which already has very wide international uptake. The structure of EDAL is presented along with examples of its use to compare satellite, model and in situ data types within the same visualisation framework. We emphasize the value of this capability for cross calibration of datasets and evaluation of model products against observations, including preparation for data assimilation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The present study examines the processing of subject-verb (SV) number agreement with coordinate subjects in pre-verbal and post-verbal positions in Greek. Greek is a language with morphological number marked on nominal and verbal elements. Coordinate SV agreement, however, is special in Greek as it is sensitive to the coordinate subject's position: when pre-verbal, the verb is marked for plural while when post-verbal the verb can be in the singular. We conducted two experiments, an acceptability judgment task with adult monolinguals as a pre-study (Experiment 1) and a self-paced reading task as the main study (Experiment 2) in order to obtain acceptance as well as processing data. Forty adult monolingual speakers of Greek participated in Experiment 1 and a hundred and forty one in Experiment 2. Seventy one children participated in Experiment 2: 30 Albanian-Greek sequential bilingual children and 41 Greek monolingual children aged 10–12 years. The adult data in Experiment 1 establish the difference in acceptability between singular VPs in SV and VS constructions reaffirming our hypothesis. Meanwhile, the adult data in Experiment 2 show that plural verbs accelerate processing regardless of subject position. The child online data show that sequential bilingual children have longer reading times (RTs) compared to the age-matched monolingual control group. However, both child groups follow a similar processing pattern in both pre-verbal and post-verbal constructions showing longer RTs immediately after a singular verb when the subject was pre-verbal indicating a grammaticality effect. In the post-verbal coordinate subject sentences, both child groups showed longer RTs on the first subject following the plural verb due to the temporary number mismatch between the verb and the first subject. This effect was resolved in monolingual children but was still present at the end of the sentence for bilingual children indicating difficulties to reanalyze and integrate information. Taken together, these findings demonstrate that (a) 10–12 year-old sequential bilingual children are sensitive to number agreement in SV coordinate constructions parsing sentences in the same way as monolingual children even though their vocabulary abilities are lower than that of age-matched monolingual peers and (b) bilinguals are slower in processing overall.