947 resultados para Data pre-processing
Resumo:
In data fusion systems, one often encounters measurements of past target locations and then wishes to deduce where the targets are currently located. Recent research on the processing of such out-of-sequence data has culminated in the development of a number of algorithms for solving the associated tracking problem. This paper reviews these different approaches in a common Bayesian framework and proposes an architecture that orthogonalises the data association and out-of-sequence problems such that any combination of solutions to these two problems can be used together. The emphasis is not on advocating one approach over another on the basis of computational expense, but rather on understanding the relationships between the algorithms so that any approximations made are explicit.
Resumo:
1. Nutrient concentrations (particularly N and P) determine the extent to which water bodies are or may become eutrophic. Direct determination of nutrient content on a wide scale is labour intensive but the main sources of N and P are well known. This paper describes and tests an export coefficient model for prediction of total N and total P from: (i) land use, stock headage and human population; (ii) the export rates of N and P from these sources; and (iii) the river discharge. Such a model might be used to forecast the effects of changes in land use in the future and to hindcast past water quality to establish comparative or baseline states for the monitoring of change. 2. The model has been calibrated against observed data for 1988 and validated against sets of observed data for a sequence of earlier years in ten British catchments varying from uplands through rolling, fertile lowlands to the flat topography of East Anglia. 3. The model predicted total N and total P concentrations with high precision (95% of the variance in observed data explained). It has been used in two forms: the first on a specific catchment basis; the second for a larger natural region which contains the catchment with the assumption that all catchments within that region will be similar. Both models gave similar results with little loss of precision in the latter case. This implies that it will be possible to describe the overall pattern of nutrient export in the UK with only a fraction of the effort needed to carry out the calculations for each individual water body. 4. Comparison between land use, stock headage, population numbers and nutrient export for the ten catchments in the pre-war year of 1931, and for 1970 and 1988 show that there has been a substantial loss of rough grazing to fertilized temporary and permanent grasslands, an increase in the hectarage devoted to arable, consistent increases in the stocking of cattle and sheep and a marked movement of humans to these rural catchments. 5. All of these trends have increased the flows of nutrients with more than a doubling of both total N and total P loads during the period. On average in these rural catchments, stock wastes have been the greatest contributors to both N and P exports, with cultivation the next most important source of N and people of P. Ratios of N to P were high in 1931 and remain little changed so that, in these catchments, phosphorus continues to be the nutrient most likely to control algal crops in standing waters supplied by the rivers studied.
Resumo:
Very high-resolution Synthetic Aperture Radar sensors represent an alternative to aerial photography for delineating floods in built-up environments where flood risk is highest. However, even with currently available SAR image resolutions of 3 m and higher, signal returns from man-made structures hamper the accurate mapping of flooded areas. Enhanced image processing algorithms and a better exploitation of image archives are required to facilitate the use of microwave remote sensing data for monitoring flood dynamics in urban areas. In this study a hybrid methodology combining radiometric thresholding, region growing and change detection is introduced as an approach enabling the automated, objective and reliable flood extent extraction from very high-resolution urban SAR images. The method is based on the calibration of a statistical distribution of “open water” backscatter values inferred from SAR images of floods. SAR images acquired during dry conditions enable the identification of areas i) that are not “visible” to the sensor (i.e. regions affected by ‘layover’ and ‘shadow’) and ii) that systematically behave as specular reflectors (e.g. smooth tarmac, permanent water bodies). Change detection with respect to a pre- or post flood reference image thereby reduces over-detection of inundated areas. A case study of the July 2007 Severn River flood (UK) observed by the very high-resolution SAR sensor on board TerraSAR-X as well as airborne photography highlights advantages and limitations of the proposed method. We conclude that even though the fully automated SAR-based flood mapping technique overcomes some limitations of previous methods, further technological and methodological improvements are necessary for SAR-based flood detection in urban areas to match the flood mapping capability of high quality aerial photography.
Resumo:
OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.
Resumo:
In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.
Resumo:
Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
We investigated the on-line processing of unaccusative and unergative sentences in a group of eight Greek-speaking individuals diagnosed with Broca aphasia and a group of language-unimpaired subjects used as the baseline. The processing of unaccusativity refers to the reactivation of the postverbal trace by retrieving the mnemonic representation of the verb’s syntactically defined antecedent provided in the early part of the sentence. Our results demonstrate that the Broca group showed selective reactivation of the antecedent for the unaccusatives. We consider several interpretations for our data, including explanations focusing on the transitivization properties of nonactive and active voice-alternating unaccusatives, the costly procedure claimed to underlie the parsing of active nonvoice-alternating unaccusatives, and the animacy of the antecedent modulating the syntactic choices of the patients.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
This chapter introduces the latest practices and technologies in the interactive interpretation of environmental data. With environmental data becoming ever larger, more diverse and more complex, there is a need for a new generation of tools that provides new capabilities over and above those of the standard workhorses of science. These new tools aid the scientist in discovering interesting new features (and also problems) in large datasets by allowing the data to be explored interactively using simple, intuitive graphical tools. In this way, new discoveries are made that are commonly missed by automated batch data processing. This chapter discusses the characteristics of environmental science data, common current practice in data analysis and the supporting tools and infrastructure. New approaches are introduced and illustrated from the points of view of both the end user and the underlying technology. We conclude by speculating as to future developments in the field and what must be achieved to fulfil this vision.
Resumo:
The nature and scale of pre-Columbian land use and the consequences of the 1492 “Columbian Encounter” (CE) on Amazonia are among the more debated topics in New World archaeology and paleoecology. However, pre-Columbian human impact in Amazonian savannas remains poorly understood. Most paleoecological studies have been conducted in neotropical forest contexts. Of studies done in Amazonian savannas, none has the temporal resolution needed to detect changes induced by either climate or humans before and after A.D. 1492, and only a few closely integrate paleoecological and archaeological data. We report a high-resolution 2,150-y paleoecological record from a French Guianan coastal savanna that forces reconsideration of how pre-Columbian savanna peoples practiced raised-field agriculture and how the CE impacted these societies and environments. Our combined pollen, phytolith, and charcoal analyses reveal unexpectedly low levels of biomass burning associated with pre-A.D. 1492 savanna raised-field agriculture and a sharp increase in fires following the arrival of Europeans. We show that pre-Columbian raised-field farmers limited burning to improve agricultural production, contrasting with extensive use of fire in pre-Columbian tropical forest and Central American savanna environments, as well as in present-day savannas. The charcoal record indicates that extensive fires in the seasonally flooded savannas of French Guiana are a post-Columbian phenomenon, postdating the collapse of indigenous populations. The discovery that pre-Columbian farmers practiced fire-free savanna management calls into question the widely held assumption that pre-Columbian Amazonian farmers pervasively used fire to manage and alter ecosystems and offers fresh perspectives on an emerging alternative approach to savanna land use and conservation that can help reduce carbon emissions.
Resumo:
It is now established that native language affects one's perception of the world. However, it is unknown whether this effect is merely driven by conscious, language-based evaluation of the environment or whether it reflects fundamental differences in perceptual processing between individuals speaking different languages. Using brain potentials, we demonstrate that the existence in Greek of 2 color terms—ghalazio and ble—distinguishing light and dark blue leads to greater and faster perceptual discrimination of these colors in native speakers of Greek than in native speakers of English. The visual mismatch negativity, an index of automatic and preattentive change detection, was similar for blue and green deviant stimuli during a color oddball detection task in English participants, but it was significantly larger for blue than green deviant stimuli in native speakers of Greek. These findings establish an implicit effect of language-specific terminology on human color perception.
Resumo:
An important constraint on how hemodynamic neuroimaging signals such as fMRI can be interpreted in terms of the underlying evoked activity is an understanding of neurovascular coupling mechanisms that actually generate hemodynamic responses. The predominant view at present is that the hemodynamic response is most correlated with synaptic input and subsequent neural processing rather than spiking output. It is still not clear whether input or processing is more important in the generation of hemodynamics responses. In order to investigate this we measured the hemodynamic and neural responses to electrical whisker pad stimuli in rat whisker barrel somatosensory cortex both before and after the local cortical injections of the GABAA agonist muscimol. Muscimol would not be expected to affect the thalamocortical input into the cortex but would inhibit subsequent intra-cortical processing. Pre-muscimol infusion whisker stimuli elicited the expected neural and accompanying hemodynamic responses to that reported previously. Following infusion of muscimol, although the temporal profile of neural responses to each pulse of the stimulus train was similar, the average response was reduced in magnitude by ∼79% compared to that elicited pre-infusion. The whisker-evoked hemodynamic responses were reduced by a commensurate magnitude suggesting that, although the neurovascular coupling relationships were similar for synaptic input as well as for cortical processing, the magnitude of the overall response is dominated by processing rather than from that produced from the thalamocortical input alone.
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.
Resumo:
The validity of the linguistic relativity principle continues to stimulate vigorous debate and research. The debate has recently shifted from the behavioural investigation arena to a more biologically grounded field, in which tangible physiological evidence for language effects on perception can be obtained. Using brain potentials in a colour oddball detection task with Greek and English speakers, a recent study suggests that language effects may exist at early stages of perceptual integration [Thierry, G., Athanasopoulos, P., Wiggett, A., Dering, B., & Kuipers, J. (2009). Unconscious effects of language-specific terminology on pre-attentive colour perception. Proceedings of the National Academy of Sciences, 106, 4567–4570]. In this paper, we test whether in Greek speakers exposure to a new cultural environment (UK) with contrasting colour terminology from their native language affects early perceptual processing as indexed by an electrophysiological correlate of visual detection of colour luminance. We also report semantic mapping of native colour terms and colour similarity judgements. Results reveal convergence of linguistic descriptions, cognitive processing, and early perception of colour in bilinguals. This result demonstrates for the first time substantial plasticity in early, pre-attentive colour perception and has important implications for the mechanisms that are involved in perceptual changes during the processes of language learning and acculturation.