864 resultados para pacs: data handling techniques
Resumo:
Multi-species fisheries are complex to manage and the ability to develop an appropriate governance structure is often seriously impeded because trading between sustainability objectives at the species level, economic objectives at the fleet level, and social objectives at the community scale, is complex. Many of these fisheries also tend to have a mix of information, with stock assessments available for some species and almost no information on other species. The fleets themselves comprise fishers from small family enterprises to large vertically integrated businesses. The Queensland trawl fishery in Australia is used as a case study for this kind of fishery. It has the added complexity that a large part of the fishery is within a World Heritage Area, the Great Barrier Reef Marine Park, which is managed by an agency of the Australian Commonwealth Government whereas the fishery itself is managed by the Queensland State Government. A stakeholder elicitation process was used to develop social, governance, economic and ecological objectives, and then weight the relative importance of these. An expert group was used to develop different governance strawmen (or management strategies) and these were assessed by a group of industry stakeholders and experts using multi-criteria decision analysis techniques against the different objectives. One strawman clearly provided the best overall set of outcomes given the multiple objectives, but was not optimal in terms of every objective, demonstrating that even the "best" strawman may be less than perfect. © 2012.
Resumo:
This article provides a review of techniques for the analysis of survival data arising from respiratory health studies. Popular techniques such as the Kaplan–Meier survival plot and the Cox proportional hazards model are presented and illustrated using data from a lung cancer study. Advanced issues are also discussed, including parametric proportional hazards models, accelerated failure time models, time-varying explanatory variables, simultaneous analysis of multiple types of outcome events and the restricted mean survival time, a novel measure of the effect of treatment.
Resumo:
Determination of testosterone and related compounds in body fluids is of utmost importance in doping control and the diagnosis of many diseases. Capillary electromigration techniques are a relatively new approach for steroid research. Owing to their electrical neutrality, however, separation of steroids by capillary electromigration techniques requires the use of charged electrolyte additives that interact with the steroids either specifically or non-specifically. The analysis of testosterone and related steroids by non-specific micellar electrokinetic chromatography (MEKC) was investigated in this study. The partial filling (PF) technique was employed, being suitable for detection by both ultraviolet spectrophotometry (UV) and electrospray ionization mass spectrometry (ESI-MS). Efficient, quantitative PF-MEKC UV methods for steroid standards were developed through the use of optimized pseudostationary phases comprising surfactants and cyclodextrins. PF-MEKC UV proved to be a more sensitive, efficient and repeatable method for the steroids than PF-MEKC ESI-MS. It was discovered that in PF-MEKC analyses of electrically neutral steroids, ESI-MS interfacing sets significant limitations not only on the chemistry affecting the ionization and detection processes, but also on the separation. The new PF-MEKC UV method was successfully employed in the determination of testosterone in male urine samples after microscale immunoaffinity solid-phase extraction (IA-SPE). The IA-SPE method, relying on specific interactions between testosterone and a recombinant anti-testosterone Fab fragment, is the first such method described for testosterone. Finally, new data for interactions between steroids and human and bovine serum albumins were obtained through the use of affinity capillary electrophoresis. A new algorithm for the calculation of association constants between proteins and neutral ligands is introduced.
Resumo:
This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics of Socupulini moths description, diversity and distribution were analysed at a world-wide scale and for the first time GIS techniques were used for Scopulini moth geographical distribution analysis. This study revealed that Scopulini moths have a cosmopolitan distribution. The majority of the species have been described from the low latitudes, sub-Saharan Africa being the hot spot of species diversity. However, the taxonomical effort has been uneven among biogeographical regions. Paper III showed that forest cover change can be analysed in great detail using modern airborne imagery techniques and historical aerial photographs. However, when spatiotemporal forest cover change is studied care has to be taken in co-registration and image interpretation when historical black and white aerial photography is used. In Paper (IV) human population distribution and abundance could be modelled with fairly good results using geospatial predictors and non-Gaussian predictive modelling techniques. Moreover, land cover layer is not necessary needed as a predictor because first and second-order image texture measurements derived from satellite imagery had more power to explain the variation in dwelling unit occurrence and abundance. Paper V showed that generalized linear model (GLM) is a suitable technique for fire occurrence prediction and for burned area estimation. GLM based burned area estimations were found to be more superior than the existing MODIS burned area product (MCD45A1). However, spatial autocorrelation of fires has to be taken into account when using the GLM technique for fire occurrence prediction. Paper VI showed that novel statistical predictive modelling techniques can be used to improve fire prediction, burned area estimation and fire risk mapping at a regional scale. However, some noticeable variation between different predictive modelling techniques for fire occurrence prediction and burned area estimation existed.
Resumo:
Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.
Using Big Data to manage safety-related risk in the upstream oil and gas industry: A research agenda
Resumo:
Despite considerable effort and a broad range of new approaches to safety management over the years, the upstream oil & gas industry has been frustrated by the sector’s stubbornly high rate of injuries and fatalities. This short communication points out, however, that the industry may be in a position to make considerable progress by applying “Big Data” analytical tools to the large volumes of safety-related data that have been collected by these organizations. Toward making this case, we examine existing safety-related information management practices in the upstream oil & gas industry, and specifically note that data in this sector often tends to be highly customized, difficult to analyze using conventional quantitative tools, and frequently ignored. We then contend that the application of new Big Data kinds of analytical techniques could potentially reveal patterns and trends that have been hidden or unknown thus far, and argue that these tools could help the upstream oil & gas sector to improve its injury and fatality statistics. Finally, we offer a research agenda toward accelerating the rate at which Big Data and new analytical capabilities could play a material role in helping the industry to improve its health and safety performance.
Resumo:
Objectives: 1. Estimate population parameters required for a management model. These include survival, density, age structure, growth, age and size at maturity and at recruitment to the adult eel fishery. Estimate their variability among individuals in a range of habitats. 2. Develop a management population dynamics model and use it to investigate management options. 3. Establish baseline data and sustainability indicators for long-term monitoring. 4. Assess the applicability of the above techniques to other eel fisheries in Australia, in collaboration with NSW. Distribute developed tools via the Australia and New Zealand Eel Reference Group.
Resumo:
A commercial issue currently facing native plant food producers and food processors, and identified by the industry itself, is that of delivering quality products consistently and at reasonable cost to end users based on a sound food technology and nutrition platform. A literature survey carried out in July 2001 by the DPI&F’s Centre for Food Technology, Brisbane in collaboration with the University of Queensland to collect the latest information at that time on the functional food market as it pertained to native food plants, indicated that little or no work had been published on this topic. This project addresses two key RIRDC sub program strategies: to identify and evaluate processes or products with prospects of commercial viability and to assist in the development of integrated production, harvesting, processing and marketing systems. This project proposal also reflects a key RIRDC R&D issue for 2002-2003; that of linking with prospective members of the value chain. The purpose of this project was to obtain chemical data on the post harvest stability of functional nutritional components (bio actives) in commercially available, hand harvested bush tomato and Kakadu plum. The project concentrated on evaluating bioactive stability as a measure of ingredient quality.
Resumo:
In this paper, pattern classification problem in tool wear monitoring is solved using nature inspired techniques such as Genetic Programming(GP) and Ant-Miner (AM). The main advantage of GP and AM is their ability to learn the underlying data relationships and express them in the form of mathematical equation or simple rules. The extraction of knowledge from the training data set using GP and AM are in the form of Genetic Programming Classifier Expression (GPCE) and rules respectively. The GPCE and AM extracted rules are then applied to set of data in the testing/validation set to obtain the classification accuracy. A major attraction in GP evolved GPCE and AM based classification is the possibility of obtaining an expert system like rules that can be directly applied subsequently by the user in his/her application. The performance of the data classification using GP and AM is as good as the classification accuracy obtained in the earlier study.
Resumo:
This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.
Resumo:
Lipid analysis is commonly performed by gas chromatography (GC) in laboratory conditions. Spectroscopic techniques, however, are non-destructive and can be implemented noninvasively in vivo. Excess fat (triglycerides) in visceral adipose tissue and liver is known predispose to metabolic abnormalities, collectively known as the metabolic syndrome. Insulin resistance is the likely cause with diets high in saturated fat known to impair insulin sensitivity. Tissue triglyceride composition has been used as marker of dietary intake but it can also be influenced by tissue specific handling of fatty acids. Recent studies have shown that adipocyte insulin sensitivity correlates positively with their saturated fat content, contradicting the common view of dietary effects. A better understanding of factors affecting tissue triglyceride composition is needed to provide further insights into tissue function in lipid metabolism. In this thesis two spectroscopic techniques were developed for in vitro and in vivo analysis of tissue triglyceride composition. In vitro studies (Study I) used infrared spectroscopy (FTIR), a fast and cost effective analytical technique well suited for multivariate analysis. Infrared spectra are characterized by peak overlap leading to poorly resolved absorbances and limited analytical performance. In vivo studies (Studies II, III and IV) used proton magnetic resonance spectroscopy (1H-MRS), an established non-invasive clinical method for measuring metabolites in vivo. 1H-MRS has been limited in its ability to analyze triglyceride composition due to poorly resolved resonances. Using an attenuated total reflection accessory, we were able to obtain pure triglyceride infrared spectra from adipose tissue biopsies. Using multivariate curve resolution (MCR), we were able to resolve the overlapping double bond absorbances of monounsaturated fat and polyunsaturated fat. MCR also resolved the isolated trans double bond and conjugated linoleic acids from an overlapping background absorbance. Using oil phantoms to study the effects of different fatty acid compositions on the echo time behaviour of triglycerides, it was concluded that the use of long echo times improved peak separation with T2 weighting having a negligible impact. It was also discovered that the echo time behaviour of the methyl resonance of omega-3 fats differed from other fats due to characteristic J-coupling. This novel insight could be used to detect omega-3 fats in human adipose tissue in vivo at very long echo times (TE = 470 and 540 ms). A comparison of 1H-MRS of adipose tissue in vivo and GC of adipose tissue biopsies in humans showed that long TE spectra resulted in improved peak fitting and better correlations with GC data. The study also showed that calculation of fatty acid fractions from 1H-MRS data is unreliable and should not be used. Omega-3 fatty acid content derived from long TE in vivo spectra (TE = 540 ms) correlated with total omega-3 fatty acid concentration measured by GC. The long TE protocol used for adipose tissue studies was subsequently extended to the analysis of liver fat composition. Respiratory triggering and long TE resulted in spectra with the olefinic and tissue water resonances resolved. Conversion of the derived unsaturation to double bond content per fatty acid showed that the results were in accordance with previously published gas chromatography data on liver fat composition. In patients with metabolic syndrome, liver fat was found to be more saturated than subcutaneous or visceral adipose tissue. The higher saturation observed in liver fat may be a result of a higher rate of de-novo-lipogenesis in liver than in adipose tissue. This thesis has introduced the first non-invasive method for determining adipose tissue omega-3 fatty acid content in humans in vivo. The methods introduced here have also shown that liver fat is more saturated than adipose tissue fat.