912 resultados para complex data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a framework specially designed to deal with structurally complex data, where all individuals have the same structure, as is the case in many medical domains. A structurally complex individual may be composed of any type of singlevalued or multivalued attributes, including time series, for example. These attributes are structured according to domain-dependent hierarchies. Our aim is to generate reference models of population groups. These models represent the population archetype and are very useful for supporting such important tasks as diagnosis, detecting fraud, analyzing patient evolution, identifying control groups, etc.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The health system is one sector dealing with a deluge of complex data. Many healthcare organisations struggle to utilise these volumes of health data effectively and efficiently. Also, there are many healthcare organisations, which still have stand-alone systems, not integrated for management of information and decision-making. This shows, there is a need for an effective system to capture, collate and distribute this health data. Therefore, implementing the data warehouse concept in healthcare is potentially one of the solutions to integrate health data. Data warehousing has been used to support business intelligence and decision-making in many other sectors such as the engineering, defence and retail sectors. The research problem that is going to be addressed is, "how can data warehousing assist the decision-making process in healthcare". To address this problem the researcher has narrowed an investigation focusing on a cardiac surgery unit. This research used the cardiac surgery unit at the Prince Charles Hospital (TPCH) as the case study. The cardiac surgery unit at TPCH uses a stand-alone database of patient clinical data, which supports clinical audit, service management and research functions. However, much of the time, the interaction between the cardiac surgery unit information system with other units is minimal. There is a limited and basic two-way interaction with other clinical and administrative databases at TPCH which support decision-making processes. The aims of this research are to investigate what decision-making issues are faced by the healthcare professionals with the current information systems and how decision-making might be improved within this healthcare setting by implementing an aligned data warehouse model or models. As a part of the research the researcher will propose and develop a suitable data warehouse prototype based on the cardiac surgery unit needs and integrating the Intensive Care Unit database, Clinical Costing unit database (Transition II) and Quality and Safety unit database [electronic discharge summary (e-DS)]. The goal is to improve the current decision-making processes. The main objectives of this research are to improve access to integrated clinical and financial data, providing potentially better information for decision-making for both improved from the questionnaire and by referring to the literature, the results indicate a centralised data warehouse model for the cardiac surgery unit at this stage. A centralised data warehouse model addresses current needs and can also be upgraded to an enterprise wide warehouse model or federated data warehouse model as discussed in the many consulted publications. The data warehouse prototype was able to be developed using SAS enterprise data integration studio 4.2 and the data was analysed using SAS enterprise edition 4.3. In the final stage, the data warehouse prototype was evaluated by collecting feedback from the end users. This was achieved by using output created from the data warehouse prototype as examples of the data desired and possible in a data warehouse environment. According to the feedback collected from the end users, implementation of a data warehouse was seen to be a useful tool to inform management options, provide a more complete representation of factors related to a decision scenario and potentially reduce information product development time. However, there are many constraints exist in this research. For example the technical issues such as data incompatibilities, integration of the cardiac surgery database and e-DS database servers and also, Queensland Health information restrictions (Queensland Health information related policies, patient data confidentiality and ethics requirements), limited availability of support from IT technical staff and time restrictions. These factors have influenced the process for the warehouse model development, necessitating an incremental approach. This highlights the presence of many practical barriers to data warehousing and integration at the clinical service level. Limitations included the use of a small convenience sample of survey respondents, and a single site case report study design. As mentioned previously, the proposed data warehouse is a prototype and was developed using only four database repositories. Despite this constraint, the research demonstrates that by implementing a data warehouse at the service level, decision-making is supported and data quality issues related to access and availability can be reduced, providing many benefits. Output reports produced from the data warehouse prototype demonstrated usefulness for the improvement of decision-making in the management of clinical services, and quality and safety monitoring for better clinical care. However, in the future, the centralised model selected can be upgraded to an enterprise wide architecture by integrating with additional hospital units’ databases.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The health system is one sector dealing with very large amount of complex data. Many healthcare organisations struggle to utilise these volumes of health data effectively and efficiently. Therefore, there is a need for very effective system to capture, collate and distribute this health data. There are number of technologies have been identified to integrate data from different sources. Data warehousing is one technology can be used to manage clinical data in the healthcare. This paper addresses how data warehousing assist to improve cardiac surgery decision making. This research used the cardiac surgery unit at the Prince Charles Hospital (TPCH) as the case study. In order to deal with other units efficiently, it is important to integrate disparate data to a single point interrogation. We propose implementing a data warehouse for the cardiac surgery unit at TPCH. The data warehouse prototype developed using SAS enterprise data integration studio 4.2 and data was analysed using SAS enterprise edition 4.3. This improves access to integrated clinical and financial data with, improved framing of data to the clinical context, giving potentially better informed decision making for both improved management and patient care.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

GODIVA2 is a dynamic website that provides visual access to several terabytes of physically distributed, four-dimensional environmental data. It allows users to explore large datasets interactively without the need to install new software or download and understand complex data. Through the use of open international standards, GODIVA2 maintains a high level of interoperability with third-party systems, allowing diverse datasets to be mutually compared. Scientists can use the system to search for features in large datasets and to diagnose the output from numerical simulations and data processing algorithms. Data providers around Europe have adopted GODIVA2 as an INSPIRE-compliant dynamic quick-view system for providing visual access to their data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the earth sciences, data are commonly cast on complex grids in order to model irregular domains such as coastlines, or to evenly distribute grid points over the globe. It is common for a scientist to wish to re-cast such data onto a grid that is more amenable to manipulation, visualization, or comparison with other data sources. The complexity of the grids presents a significant technical difficulty to the regridding process. In particular, the regridding of complex grids may suffer from severe performance issues, in the worst case scaling with the product of the sizes of the source and destination grids. We present a mechanism for the fast regridding of such datasets, based upon the construction of a spatial index that allows fast searching of the source grid. We discover that the most efficient spatial index under test (in terms of memory usage and query time) is a simple look-up table. A kd-tree implementation was found to be faster to build and to give similar query performance at the expense of a larger memory footprint. Using our approach, we demonstrate that regridding of complex data may proceed at speeds sufficient to permit regridding on-the-fly in an interactive visualization application, or in a Web Map Service implementation. For large datasets with complex grids the new mechanism is shown to significantly outperform algorithms used in many scientific visualization packages.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The CHARMe project enables the annotation of climate data with key pieces of supporting information that we term “commentary”. Commentary reflects the experience that has built up in the user community, and can help new or less-expert users (such as consultants, SMEs, experts in other fields) to understand and interpret complex data. In the context of global climate services, the CHARMe system will record, retain and disseminate this commentary on climate datasets, and provide a means for feeding back this experience to the data providers. Based on novel linked data techniques and standards, the project has developed a core system, data model and suite of open-source tools to enable this information to be shared, discovered and exploited by the community.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Human brain imaging techniques, such as Magnetic Resonance Imaging (MRI) or Diffusion Tensor Imaging (DTI), have been established as scientific and diagnostic tools and their adoption is growing in popularity. Statistical methods, machine learning and data mining algorithms have successfully been adopted to extract predictive and descriptive models from neuroimage data. However, the knowledge discovery process typically requires also the adoption of pre-processing, post-processing and visualisation techniques in complex data workflows. Currently, a main problem for the integrated preprocessing and mining of MRI data is the lack of comprehensive platforms able to avoid the manual invocation of preprocessing and mining tools, that yields to an error-prone and inefficient process. In this work we present K-Surfer, a novel plug-in of the Konstanz Information Miner (KNIME) workbench, that automatizes the preprocessing of brain images and leverages the mining capabilities of KNIME in an integrated way. K-Surfer supports the importing, filtering, merging and pre-processing of neuroimage data from FreeSurfer, a tool for human brain MRI feature extraction and interpretation. K-Surfer automatizes the steps for importing FreeSurfer data, reducing time costs, eliminating human errors and enabling the design of complex analytics workflow for neuroimage data by leveraging the rich functionalities available in the KNIME workbench.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The purpose of this paper was to evaluate attributes derived from fully polarimetric PALSAR data to discriminate and map macrophyte species in the Amazon floodplain wetlands. Fieldwork was carried out almost simultaneously to the radar acquisition, and macrophyte biomass and morphological variables were measured in the field. Attributes were calculated from the covariance matrix [C] derived from the single-look complex data. Image attributes and macrophyte variables were compared and analyzed to investigate the sensitivity of the attributes for discriminating among species. Based on these analyses, a rule-based classification was applied to map macrophyte species. Other classification approaches were tested and compared to the rule-based method: a classification based on the Freeman-Durden and Cloude-Pottier decomposition models, a hybrid classification (Wishart classifier with the input classes based on the H/a plane), and a statistical-based classification (supervised classification using Wishart distance measures). The findings show that attributes derived from fully polarimetric L-band data have good potential for discriminating herbaceous plant species based on morphology and that estimation of plant biomass and productivity could be improved by using these polarimetric attributes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.