912 resultados para complex data
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis (Bishop98a) in several directions: 1. We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. 2. We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. 3. Using tools from differential geometry we derive expressions for local directionalcurvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model.We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set andapply our system to two more complex 12- and 19-dimensional data sets.
Resumo:
This research was conducted at the Space Research and Technology Centre o the European Space Agency at Noordvijk in the Netherlands. ESA is an international organisation that brings together a range of scientists, engineers and managers from 14 European member states. The motivation for the work was to enable decision-makers, in a culturally and technologically diverse organisation, to share information for the purpose of making decisions that are well informed about the risk-related aspects of the situations they seek to address. The research examined the use of decision support system DSS) technology to facilitate decision-making of this type. This involved identifying the technology available and its application to risk management. Decision-making is a complex activity that does not lend itself to exact measurement or precise understanding at a detailed level. In view of this, a prototype DSS was developed through which to understand the practical issues to be accommodated and to evaluate alternative approaches to supporting decision-making of this type. The problem of measuring the effect upon the quality of decisions has been approached through expert evaluation of the software developed. The practical orientation of this work was informed by a review of the relevant literature in decision-making, risk management, decision support and information technology. Communication and information technology unite the major the,es of this work. This allows correlation of the interests of the research with European public policy. The principles of communication were also considered in the topic of information visualisation - this emerging technology exploits flexible modes of human computer interaction (HCI) to improve the cognition of complex data. Risk management is itself an area characterised by complexity and risk visualisation is advocated for application in this field of endeavour. The thesis provides recommendations for future work in the fields of decision=making, DSS technology and risk management.
Resumo:
The Semantic Web relies on carefully structured, well defined, data to allow machines to communicate and understand one another. In many domains (e.g. geospatial) the data being described contains some uncertainty, often due to incomplete knowledge; meaningful processing of this data requires these uncertainties to be carefully analysed and integrated into the process chain. Currently, within the SemanticWeb there is no standard mechanism for interoperable description and exchange of uncertain information, which renders the automated processing of such information implausible, particularly where error must be considered and captured as it propagates through a processing sequence. In particular we adopt a Bayesian perspective and focus on the case where the inputs / outputs are naturally treated as random variables. This paper discusses a solution to the problem in the form of the Uncertainty Markup Language (UncertML). UncertML is a conceptual model, realised as an XML schema, that allows uncertainty to be quantified in a variety of ways i.e. realisations, statistics and probability distributions. UncertML is based upon a soft-typed XML schema design that provides a generic framework from which any statistic or distribution may be created. Making extensive use of Geography Markup Language (GML) dictionaries, UncertML provides a collection of definitions for common uncertainty types. Containing both written descriptions and mathematical functions, encoded as MathML, the definitions within these dictionaries provide a robust mechanism for defining any statistic or distribution and can be easily extended. Universal Resource Identifiers (URIs) are used to introduce semantics to the soft-typed elements by linking to these dictionary definitions. The INTAMAP (INTeroperability and Automated MAPping) project provides a use case for UncertML. This paper demonstrates how observation errors can be quantified using UncertML and wrapped within an Observations & Measurements (O&M) Observation. The interpolation service uses the information within these observations to influence the prediction outcome. The output uncertainties may be encoded in a variety of UncertML types, e.g. a series of marginal Gaussian distributions, a set of statistics, such as the first three marginal moments, or a set of realisations from a Monte Carlo treatment. Quantifying and propagating uncertainty in this way allows such interpolation results to be consumed by other services. This could form part of a risk management chain or a decision support system, and ultimately paves the way for complex data processing chains in the Semantic Web.
Resumo:
The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included.
Resumo:
The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included. © 2010 The authors.
Resumo:
Indicators are widely used by organizations as a way of evaluating, measuring and classifying organizational performance. As part of performance evaluation systems, indicators are often shared or compared across internal sectors or with other organizations. However, indicators can be vague and imprecise, and also can lack semantics, making comparisons with other indicators difficult. Thus, this paper presents a knowledge model based on an ontology that may be used to represent indicators semantically and generically, dealing with the imprecision and vagueness, and thus facilitating better comparison. Semantic technologies are shown to be suitable for this solution, so that it could be able to represent complex data involved in indicators comparison.
Resumo:
Indicators are widely used by organizations as a way of evaluating, measuring and classifying organizational performance. As part of performance evaluation systems, indicators are often shared or compared across internal sectors or with other organizations. However, indicators can be vague and imprecise, and also can lack semantics, making comparisons with other indicators difficult. Thus, this paper presents a knowledge model based on an ontology that may be used to represent indicators semantically and generically, dealing with the imprecision and vagueness, and thus facilitating better comparison. Semantic technologies are shown to be suitable for this solution, so that it could be able to represent complex data involved in indicators comparison.
Resumo:
The complexity of modern geochemical data sets is increasing in several aspects (number of available samples, number of elements measured, number of matrices analysed, geological-environmental variability covered, etc), hence it is becoming increasingly necessary to apply statistical methods to elucidate their structure. This paper presents an exploratory analysis of one such complex data set, the Tellus geochemical soil survey of Northern Ireland (NI). This exploratory analysis is based on one of the most fundamental exploratory tools, principal component analysis (PCA) and its graphical representation as a biplot, albeit in several variations: the set of elements included (only major oxides vs. all observed elements), the prior transformation applied to the data (none, a standardization or a logratio transformation) and the way the covariance matrix between components is estimated (classical estimation vs. robust estimation). Results show that a log-ratio PCA (robust or classical) of all available elements is the most powerful exploratory setting, providing the following insights: the first two processes controlling the whole geochemical variation in NI soils are peat coverage and a contrast between “mafic” and “felsic” background lithologies; peat covered areas are detected as outliers by a robust analysis, and can be then filtered out if required for further modelling; and peat coverage intensity can be quantified with the %Br in the subcomposition (Br, Rb, Ni).
Resumo:
Background: Digital forensics is a rapidly expanding field, due to the continuing advances in computer technology and increases in data stage capabilities of devices. However, the tools supporting digital forensics investigations have not kept pace with this evolution, often leaving the investigator to analyse large volumes of textual data and rely heavily on their own intuition and experience. Aim: This research proposes that given the ability of information visualisation to provide an end user with an intuitive way to rapidly analyse large volumes of complex data, such approached could be applied to digital forensics datasets. Such methods will be investigated; supported by a review of literature regarding the use of such techniques in other fields. The hypothesis of this research body is that by utilising exploratory information visualisation techniques in the form of a tool to support digital forensic investigations, gains in investigative effectiveness can be realised. Method:To test the hypothesis, this research examines three different case studies which look at different forms of information visualisation and their implementation with a digital forensic dataset. Two of these case studies take the form of prototype tools developed by the researcher, and one case study utilises a tool created by a third party research group. A pilot study by the researcher is conducted on these cases, with the strengths and weaknesses of each being drawn into the next case study. The culmination of these case studies is a prototype tool which was developed to resemble a timeline visualisation of the user behaviour on a device. This tool was subjected to an experiment involving a class of university digital forensics students who were given a number of questions about a synthetic digital forensic dataset. Approximately half were given the prototype tool, named Insight, to use, and the others given a common open-source tool. The assessed metrics included: how long the participants took to complete all tasks, how accurate their answers to the tasks were, and how easy the participants found the tasks to complete. They were also asked for their feedback at multiple points throughout the task. Results:The results showed that there was a statistically significant increase in accuracy for one of the six tasks for the participants using the Insight prototype tool. Participants also found completing two of the six tasks significantly easier when using the prototype tool. There were no statistically significant different difference between the completion times of both participant groups. There were no statistically significant differences in the accuracy of participant answers for five of the six tasks. Conclusions: The results from this body of research show that there is evidence to suggest that there is the potential for gains in investigative effectiveness when information visualisation techniques are applied to a digital forensic dataset. Specifically, in some scenarios, the investigator can draw conclusions which are more accurate than those drawn when using primarily textual tools. There is also evidence so suggest that the investigators found these conclusions to be reached significantly more easily when using a tool with a visual format. None of the scenarios led to the investigators being at a significant disadvantage in terms of accuracy or usability when using the prototype visual tool over the textual tool. It is noted that this research did not show that the use of information visualisation techniques leads to any statistically significant difference in the time taken to complete a digital forensics investigation.
Resumo:
fuzzySim is an R package for calculating fuzzy similarity in species occurrence patterns. It includes functions for data preparation, such as converting species lists (long format) to presence-absence tables (wide format), obtaining unique abbreviations of species names, or transposing (parts of) complex data frames; and sample data sets for providing practical examples. It can convert binary presence-absence to fuzzy occurrence data, using e.g. trend surface analysis, inverse distance interpolation or prevalence-independent environmental favourability modelling, for multiple species simultaneously. It then calculates fuzzy similarity among (fuzzy) species distributions and/or among (fuzzy) regional species compositions. Currently available similarity indices are Jaccard, Sørensen, Simpson, and Baroni-Urbani & Buser.
Resumo:
Measurement of marine algal toxins has traditionally focussed on shellfish monitoring while, over the last decade, passive sampling has been introduced as a complementary tool for exploratory studies. Since 2011, liquid chromatography-tandem mass spectrometry (LC-MS/MS) has been adopted as the EU reference method (No.15/2011) for detection and quantitation of lipophilic toxins. Traditional LC-MS approaches have been based on low-resolution mass spectrometry (LRMS), however, advances in instrument platforms have led to a heightened interest in the use of high-resolution mass spectrometry (HRMS) for toxin detection. This work describes the use of HRMS in combination with passive sampling as a progressive approach to marine algal toxin surveys. Experiments focused on comparison of LRMS and HRMS for determination of a broad range of toxins in shellfish and passive samplers. Matrix effects are an important issue to address in LC-MS; therefore, this phenomenon was evaluated for mussels (Mytilus galloprovincialis) and passive samplers using LRMS (triple quadrupole) and HRMS (quadrupole time-of-flight and Orbitrap) instruments. Matrix-matched calibration solutions containing okadaic acid and dinophysistoxins, pectenotoxin, azaspiracids, yessotoxins, domoic acid, pinnatoxins, gymnodimine A and 13-desmethyl spirolide C were prepared. Similar matrix effects were observed on all instruments types. Most notably, there was ion enhancement for pectenotoxins, okadaic acid/dinophysistoxins on one hand, and ion suppression for yessotoxins on the other. Interestingly, the ion selected for quantitation of PTX2 also influenced the magnitude of matrix effects, with the sodium adduct typically exhibiting less susceptibility to matrix effects than the ammonium adduct. As expected, mussel as a biological matrix, quantitatively produced significantly more matrix effects than passive sampler extracts, irrespective of toxin. Sample dilution was demonstrated as an effective measure to reduce matrix effects for all compounds, and was found to be particularly useful for the non-targeted approach. Limits of detection and method accuracy were comparable between the systems tested, demonstrating the applicability of HRMS as an effective tool for screening and quantitative analysis. HRMS offers the advantage of untargeted analysis, meaning that datasets can be retrospectively analysed. HRMS (full scan) chromatograms of passive samplers yielded significantly less complex data sets than mussels, and were thus more easily screened for unknowns. Consequently, we recommend the use of HRMS in combination with passive sampling for studies investigating emerging or hitherto uncharacterised toxins.
Resumo:
Using water quality management programs is a necessary and inevitable way for preservation and sustainable use of water resources. One of the important issues in determining the quality of water in rivers is designing effective quality control networks, so that the measured quality variables in these stations are, as far as possible, indicative of overall changes in water quality. One of the methods to achieve this goal is increasing the number of quality monitoring stations and sampling instances. Since this will dramatically increase the annual cost of monitoring, deciding on which stations and parameters are the most important ones, along with increasing the instances of sampling, in a way that shows maximum change in the system under study can affect the future decision-making processes for optimizing the efficacy of extant monitoring network, removing or adding new stations or parameters and decreasing or increasing sampling instances. This end, the efficiency of multivariate statistical procedures was studied in this thesis. Multivariate statistical procedure, with regard to its features, can be used as a practical and useful method in recognizing and analyzing rivers’ pollution and consequently in understanding, reasoning, controlling, and correct decision-making in water quality management. This research was carried out using multivariate statistical techniques for analyzing the quality of water and monitoring the variables affecting its quality in Gharasou river, in Ardabil province in northwest of Iran. During a year, 28 physical and chemical parameters were sampled in 11 stations. The results of these measurements were analyzed by multivariate procedures such as: Cluster Analysis (CA), Principal Component Analysis (PCA), Factor Analysis (FA), and Discriminant Analysis (DA). Based on the findings from cluster analysis, principal component analysis, and factor analysis the stations were divided into three groups of highly polluted (HP), moderately polluted (MP), and less polluted (LP) stations Thus, this study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding spatial variations in water quality for effective river water quality management. This study also shows the effectiveness of these techniques for getting better information about the water quality and design of monitoring network for effective management of water resources. Therefore, based on the results, Gharasou river water quality monitoring program was developed and presented.
Resumo:
Reconfigurable HW can be used to build a hardware multitasking system where tasks can be assigned to the reconfigurable HW at run-time according to the requirements of the running applications. Normally the execution in this kind of systems is controlled by an embedded processor. In these systems tasks are frequently represented as subtask graphs, where a subtask is the basic scheduling unit that can be assigned to a reconfigurable HW. In order to control the execution of these tasks, the processor must manage at run-time complex data structures, like graphs or linked list, which may generate significant execution-time penalties. In addition, HW/SW communications are frequently a system bottleneck. Hence, it is very interesting to find a way to reduce the run-time SW computations and the HW/SW communications. To this end we have developed a HW execution manager that controls the execution of subtask graphs over a set of reconfigurable units. This manager receives as input a subtask graph coupled to a subtask schedule, and guarantees its proper execution. In addition it includes support to reduce the execution-time overhead due to reconfigurations. With this HW support the execution of task graphs can be managed efficiently generating only very small run-time penalties.
Resumo:
Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.
Resumo:
Bactrocera papayae Drew & Hancock, Bactrocera philippinensis Drew & Hancock, Bactrocera carambolae Drew & Hancock, and Bactrocera invadens Drew, Tsuruta & White are four horticultural pest tephritid fruit fly species that are highly similar, morphologically and genetically, to the destructive pest, the Oriental fruit fly, Bactrocera dorsalis (Hendel) (Diptera: Tephritidae). This similarity has rendered the discovery of reliable diagnostic characters problematic, which, in view of the economic importance of these taxa and the international trade implications, has resulted in ongoing difficulties for many areas of plant protection and food security. Consequently, a major international collaborative and integrated multidisciplinary research effort was initiated in 2009 to build upon existing literature with the specific aim of resolving biological species limits among B. papayae, B. philippinensis, B. carambolae, B. invadens and B. dorsalis to overcome constraints to pest management and international trade. Bactrocera philippinensis has recently been synonymized with B. papayae as a result of this initiative and this review corroborates that finding; however, the other names remain in use. While consistent characters have been found to reliably distinguish B. carambolae from B. dorsalis, B. invadens and B. papayae, no such characters have been found to differentiate the latter three putative species. We conclude that B. carambolae is a valid species and that the remaining taxa, B. dorsalis, B. invadens and B. papayae, represent the same species. Thus, we consider B. dorsalis (Hendel) as the senior synonym of B. papayae Drew and Hancock syn.n. and B. invadens Drew, Tsuruta & White syn.n. A redescription of B. dorsalis is provided. Given the agricultural importance of B. dorsalis, this taxonomic decision will have significant global plant biosecurity implications, affecting pest management, quarantine, international trade, postharvest treatment and basic research. Throughout the paper, we emphasize the value of independent and multidisciplinary tools in delimiting species, particularly in complicated cases involving morphologically cryptic taxa.