77 resultados para complex data
em CentAUR: Central Archive University of Reading - UK
Resumo:
GODIVA2 is a dynamic website that provides visual access to several terabytes of physically distributed, four-dimensional environmental data. It allows users to explore large datasets interactively without the need to install new software or download and understand complex data. Through the use of open international standards, GODIVA2 maintains a high level of interoperability with third-party systems, allowing diverse datasets to be mutually compared. Scientists can use the system to search for features in large datasets and to diagnose the output from numerical simulations and data processing algorithms. Data providers around Europe have adopted GODIVA2 as an INSPIRE-compliant dynamic quick-view system for providing visual access to their data.
Resumo:
In the earth sciences, data are commonly cast on complex grids in order to model irregular domains such as coastlines, or to evenly distribute grid points over the globe. It is common for a scientist to wish to re-cast such data onto a grid that is more amenable to manipulation, visualization, or comparison with other data sources. The complexity of the grids presents a significant technical difficulty to the regridding process. In particular, the regridding of complex grids may suffer from severe performance issues, in the worst case scaling with the product of the sizes of the source and destination grids. We present a mechanism for the fast regridding of such datasets, based upon the construction of a spatial index that allows fast searching of the source grid. We discover that the most efficient spatial index under test (in terms of memory usage and query time) is a simple look-up table. A kd-tree implementation was found to be faster to build and to give similar query performance at the expense of a larger memory footprint. Using our approach, we demonstrate that regridding of complex data may proceed at speeds sufficient to permit regridding on-the-fly in an interactive visualization application, or in a Web Map Service implementation. For large datasets with complex grids the new mechanism is shown to significantly outperform algorithms used in many scientific visualization packages.
Resumo:
In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.
Resumo:
The CHARMe project enables the annotation of climate data with key pieces of supporting information that we term “commentary”. Commentary reflects the experience that has built up in the user community, and can help new or less-expert users (such as consultants, SMEs, experts in other fields) to understand and interpret complex data. In the context of global climate services, the CHARMe system will record, retain and disseminate this commentary on climate datasets, and provide a means for feeding back this experience to the data providers. Based on novel linked data techniques and standards, the project has developed a core system, data model and suite of open-source tools to enable this information to be shared, discovered and exploited by the community.
Resumo:
Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.
Resumo:
In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.
Resumo:
It is generally acknowledged that population-level assessments provide,I better measure of response to toxicants than assessments of individual-level effects. population-level assessments generally require the use of models to integrate potentially complex data about the effects of toxicants on life-history traits, and to provide a relevant measure of ecological impact. Building on excellent earlier reviews we here briefly outline the modelling options in population-level risk assessment. Modelling is used to calculate population endpoints from available data, which is often about Individual life histories, the ways that individuals interact with each other, the environment and other species, and the ways individuals are affected by pesticides. As population endpoints, we recommend the use of population abundance, population growth rate, and the chance of population persistence. We recommend two types of model: simple life-history models distinguishing two life-history stages, juveniles and adults; and spatially-explicit individual-based landscape models. Life-history models are very quick to set up and run, and they provide a great deal or insight. At the other extreme, individual-based landscape models provide the greatest verisimilitude, albeit at the cost of greatly increased complexity. We conclude with a discussion of the cations of the severe problems of parameterising models.
Resumo:
The UK700 trial failed to demonstrate an overall benefit of intensive case management (ICM) in patients with severe psychotic illness. This does not discount a benefit for particular subgroups, and evidence of a benefit of ICM for patients of borderline intelligence has been presented. The aim of this study is to investigate whether this effect is part of a general benefit for patients with severe psychosis complicated by additional needs. In the UK700 trial patients with severe psychosis were randomly allocated to ICM or standard case management. For each patient group with complex needs the effect of ICM is compared with that in the rest of the study cohort. Outcome measures are days spent in psychiatric hospital and the admission and discharge rates. ICM may be of benefit to patients with severe psychosis complicated by borderline intelligence or depression, but may cause patients using illicit drugs to spend more time in hospital. There was no convincing evidence of an effect of ICM in a further seven patient groups. ICM is not of general benefit to patients with severe psychosis complicated by additional needs. The benefit of ICM for patients with borderline intelligence is an isolated effect which should be interpreted cautiously until further data are available.
Resumo:
Synoptic climatology relates the atmospheric circulation with the surface environment. The aim of this study is to examine the variability of the surface meteorological patterns, which are developing under different synoptic scale categories over a suburban area with complex topography. Multivariate Data Analysis techniques were performed to a data set with surface meteorological elements. Three principal components related to the thermodynamic status of the surface environment and the two components of the wind speed were found. The variability of the surface flows was related with atmospheric circulation categories by applying Correspondence Analysis. Similar surface thermodynamic fields develop under cyclonic categories, which are contrasted with the anti-cyclonic category. A strong, steady wind flow characterized by high shear values develops under the cyclonic Closed Low and the anticyclonic H–L categories, in contrast to the variable weak flow under the anticyclonic Open Anticyclone category.
Resumo:
This paper details a strategy for modifying the source code of a complex model so that the model may be used in a data assimilation context, {and gives the standards for implementing a data assimilation code to use such a model}. The strategy relies on keeping the model separate from any data assimilation code, and coupling the two through the use of Message Passing Interface (MPI) {functionality}. This strategy limits the changes necessary to the model and as such is rapid to program, at the expense of ultimate performance. The implementation technique is applied in different models with state dimension up to $2.7 \times 10^8$. The overheads added by using this implementation strategy in a coupled ocean-atmosphere climate model are shown to be an order of magnitude smaller than the addition of correlated stochastic random errors necessary for some nonlinear data assimilation techniques.
Resumo:
As we enter an era of ‘big data’, asset information is becoming a deliverable of complex projects. Prior research suggests digital technologies enable rapid, flexible forms of project organizing. This research analyses practices of managing change in Airbus, CERN and Crossrail, through desk-based review, interviews, visits and a cross-case workshop. These organizations deliver complex projects, rely on digital technologies to manage large data-sets; and use configuration management, a systems engineering approach with mid-20th century origins, to establish and maintain integrity. In them, configuration management has become more, rather than less, important. Asset information is structured, with change managed through digital systems, using relatively hierarchical, asynchronous and sequential processes. The paper contributes by uncovering limits to flexibility in complex projects where integrity is important. Challenges of managing change are discussed, considering the evolving nature of configuration management; potential use of analytics on complex projects; and implications for research and practice.
Resumo:
Data assimilation is a sophisticated mathematical technique for combining observational data with model predictions to produce state and parameter estimates that most accurately approximate the current and future states of the true system. The technique is commonly used in atmospheric and oceanic modelling, combining empirical observations with model predictions to produce more accurate and well-calibrated forecasts. Here, we consider a novel application within a coastal environment and describe how the method can also be used to deliver improved estimates of uncertain morphodynamic model parameters. This is achieved using a technique known as state augmentation. Earlier applications of state augmentation have typically employed the 4D-Var, Kalman filter or ensemble Kalman filter assimilation schemes. Our new method is based on a computationally inexpensive 3D-Var scheme, where the specification of the error covariance matrices is crucial for success. A simple 1D model of bed-form propagation is used to demonstrate the method. The scheme is capable of recovering near-perfect parameter values and, therefore, improves the capability of our model to predict future bathymetry. Such positive results suggest the potential for application to more complex morphodynamic models.
Resumo:
The paper describes a field study focused on the dispersion of a traffic-related pollutant within an area close to a busy intersection between two street canyons in Central London. Simultaneous measurements of airflow, traffic flow and carbon monoxide concentrations ([CO]) are used to explore the causes of spatial variability in [CO] over a full range of background wind directions. Depending on the roof-top wind direction, evidence of both flow channelling and recirculation regimes were identified from data collected within the main canyon and the intersection. However, at the intersection, the merging of channelled flows from the canyons increased the flow complexity and turbulence intensity. These features, coupled with the close proximity of nearby queuing traffic in several directions, led to the highest overall time-average measured [CO] occurring at the intersection. Within the main street canyon, the data supported the presence of a helical flow regime for oblique roof-top flows, leading to increased [CO] on the canyon leeward side. Predominant wind directions led to some locations having significantly higher diurnal average [CO] due to being mostly on the canyon leeward side during the study period. For all locations, small changes in the background wind direction could cause large changes in the in-street mean wind angle and local turbulence intensity, implying that dispersion mechanisms would be highly sensitive to small changes in above roof flows. During peak traffic flow periods, concentrations within parallel side streets were approximately four times lower than within the main canyon and intersection which has implications for controlling personal exposure. Overall, the results illustrate that pollutant concentrations can be highly spatially variable over even short distances within complex urban geometries, and that synoptic wind patterns, traffic queue location and building topologies all play a role in determining where pollutant hot spots occur.