990 resultados para Data hiding
Resumo:
The average dimensions of the peptide unit have been obtained from the data reported in recent crystal structure analyses of di- and tripeptides. The bond lengths and bond angles agree with those in common use, except for the bond angle C---N---H, which is about 4° less than the accepted value, and the angle C2α---N---H which is about 4° more. The angle τ (Cα) has a mean value of 114° for glycyl residues and 110° for non-glycyl residues. Attention is directed to these mean values as observed in crystal structures, as they are relevant for model building of peptide chain structures.
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
The behaviour of laterally loaded piles is considerably influenced by the uncertainties in soil properties. Hence probabilistic models for assessment of allowable lateral load are necessary. Cone penetration test (CPT) data are often used to determine soil strength parameters, whereby the allowable lateral load of the pile is computed. In the present study, the maximum lateral displacement and moment of the pile are obtained based on the coefficient of subgrade reaction approach, considering the nonlinear soil behaviour in undrained clay. The coefficient of subgrade reaction is related to the undrained shear strength of soil, which can be obtained from CPT data. The soil medium is modelled as a one-dimensional random field along the depth, and it is described by the standard deviation and scale of fluctuation of the undrained shear strength of soil. Inherent soil variability, measurement uncertainty and transformation uncertainty are taken into consideration. The statistics of maximum lateral deflection and moment are obtained using the first-order, second-moment technique. Hasofer-Lind reliability indices for component and system failure criteria, based on the allowable lateral displacement and moment capacity of the pile section, are evaluated. The geotechnical database from the Konaseema site in India is used as a case example. It is shown that the reliability-based design approach for pile foundations, considering the spatial variability of soil, permits a rational choice of allowable lateral loads.
Resumo:
Deriving an estimate of optimal fishing effort or even an approximate estimate is very valuable for managing fisheries with multiple target species. The most challenging task associated with this is allocating effort to individual species when only the total effort is recorded. Spatial information on the distribution of each species within a fishery can be used to justify the allocations, but often such information is not available. To determine the long-term overall effort required to achieve maximum sustainable yield (MSY) and maximum economic yield (MEY), we consider three methods for allocating effort: (i) optimal allocation, which optimally allocates effort among target species; (ii) fixed proportions, which chooses proportions based on past catch data; and (iii) economic allocation, which splits effort based on the expected catch value of each species. Determining the overall fishing effort required to achieve these management objectives is a maximizing problem subject to constraints due to economic and social considerations. We illustrated the approaches using a case study of the Moreton Bay Prawn Trawl Fishery in Queensland (Australia). The results were consistent across the three methods. Importantly, our analysis demonstrated the optimal total effort was very sensitive to daily fishing costs—the effort ranged from 9500–11 500 to 6000–7000, 4000 and 2500 boat-days, using daily cost estimates of $0, $500, $750, and $950, respectively. The zero daily cost corresponds to the MSY, while a daily cost of $750 most closely represents the actual present fishing cost. Given the recent debate on which costs should be factored into the analyses for deriving MEY, our findings highlight the importance of including an appropriate cost function for practical management advice. The approaches developed here could be applied to other multispecies fisheries where only aggregated fishing effort data are recorded, as the literature on this type of modelling is sparse.
Resumo:
Contamination of urban streams is a rising topic worldwide, but the assessment and investigation of stormwater induced contamination is limited by the high amount of water quality data needed to obtain reliable results. In this study, stream bed sediments were studied to determine their contamination degree and their applicability in monitoring aquatic metal contamination in urban areas. The interpretation of sedimentary metal concentrations is, however, not straightforward, since the concentrations commonly show spatial and temporal variations as a response to natural processes. The variations of and controls on metal concentrations were examined at different scales to increase the understanding of the usefulness of sediment metal concentrations in detecting anthropogenic metal contamination patterns. The acid extractable concentrations of Zn, Cu, Pb and Cd were determined from the surface sediments and water of small streams in the Helsinki Metropolitan region, southern Finland. The data consists of two datasets: sediment samples from 53 sites located in the catchment of the Stream Gräsanoja and sediment and water samples from 67 independent catchments scattered around the metropolitan region. Moreover, the sediment samples were analyzed for their physical and chemical composition (e.g. total organic carbon, clay-%, Al, Li, Fe, Mn) and the speciation of metals (in the dataset of the Stream Gräsanoja). The metal concentrations revealed that the stream sediments were moderately contaminated and caused no immediate threat to the biota. However, at some sites the sediments appeared to be polluted with Cu or Zn. The metal concentrations increased with increasing intensity of urbanization, but site specific factors, such as point sources, were responsible for the occurrence of the highest metal concentrations. The sediment analyses revealed, thus a need for more detailed studies on the processes and factors that cause the hot spot metal concentrations. The sediment composition and metal speciation analyses indicated that organic matter is a very strong indirect control on metal concentrations, and it should be accounted for when studying anthropogenic metal contamination patterns. The fine-scale spatial and temporal variations of metal concentrations were low enough to allow meaningful interpretation of substantial metal concentration differences between sites. Furthermore, the metal concentrations in the stream bed sediments were correlated with the urbanization of the catchment better than the total metal concentrations in the water phase. These results suggest that stream sediments show true potential for wider use in detecting the spatial differences in metal contamination of urban streams. Consequently, using the sediment approach regional estimates of the stormwater related metal contamination could be obtained fairly cost-effectively, and the stability and reliability of results would be higher compared to analyses of single water samples. Nevertheless, water samples are essential in analysing the dissolved concentrations of metals, momentary discharges from point sources in particular.
Resumo:
This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics of Socupulini moths description, diversity and distribution were analysed at a world-wide scale and for the first time GIS techniques were used for Scopulini moth geographical distribution analysis. This study revealed that Scopulini moths have a cosmopolitan distribution. The majority of the species have been described from the low latitudes, sub-Saharan Africa being the hot spot of species diversity. However, the taxonomical effort has been uneven among biogeographical regions. Paper III showed that forest cover change can be analysed in great detail using modern airborne imagery techniques and historical aerial photographs. However, when spatiotemporal forest cover change is studied care has to be taken in co-registration and image interpretation when historical black and white aerial photography is used. In Paper (IV) human population distribution and abundance could be modelled with fairly good results using geospatial predictors and non-Gaussian predictive modelling techniques. Moreover, land cover layer is not necessary needed as a predictor because first and second-order image texture measurements derived from satellite imagery had more power to explain the variation in dwelling unit occurrence and abundance. Paper V showed that generalized linear model (GLM) is a suitable technique for fire occurrence prediction and for burned area estimation. GLM based burned area estimations were found to be more superior than the existing MODIS burned area product (MCD45A1). However, spatial autocorrelation of fires has to be taken into account when using the GLM technique for fire occurrence prediction. Paper VI showed that novel statistical predictive modelling techniques can be used to improve fire prediction, burned area estimation and fire risk mapping at a regional scale. However, some noticeable variation between different predictive modelling techniques for fire occurrence prediction and burned area estimation existed.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
The problem of recovering information from measurement data has already been studied for a long time. In the beginning, the methods were mostly empirical, but already towards the end of the sixties Backus and Gilbert started the development of mathematical methods for the interpretation of geophysical data. The problem of recovering information about a physical phenomenon from measurement data is an inverse problem. Throughout this work, the statistical inversion method is used to obtain a solution. Assuming that the measurement vector is a realization of fractional Brownian motion, the goal is to retrieve the amplitude and the Hurst parameter. We prove that under some conditions, the solution of the discretized problem coincides with the solution of the corresponding continuous problem as the number of observations tends to infinity. The measurement data is usually noisy, and we assume the data to be the sum of two vectors: the trend and the noise. Both vectors are supposed to be realizations of fractional Brownian motions, and the goal is to retrieve their parameters using the statistical inversion method. We prove a partial uniqueness of the solution. Moreover, with the support of numerical simulations, we show that in certain cases the solution is reliable and the reconstruction of the trend vector is quite accurate.
Resumo:
Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.
Resumo:
This thesis addresses modeling of financial time series, especially stock market returns and daily price ranges. Modeling data of this kind can be approached with so-called multiplicative error models (MEM). These models nest several well known time series models such as GARCH, ACD and CARR models. They are able to capture many well established features of financial time series including volatility clustering and leptokurtosis. In contrast to these phenomena, different kinds of asymmetries have received relatively little attention in the existing literature. In this thesis asymmetries arise from various sources. They are observed in both conditional and unconditional distributions, for variables with non-negative values and for variables that have values on the real line. In the multivariate context asymmetries can be observed in the marginal distributions as well as in the relationships of the variables modeled. New methods for all these cases are proposed. Chapter 2 considers GARCH models and modeling of returns of two stock market indices. The chapter introduces the so-called generalized hyperbolic (GH) GARCH model to account for asymmetries in both conditional and unconditional distribution. In particular, two special cases of the GARCH-GH model which describe the data most accurately are proposed. They are found to improve the fit of the model when compared to symmetric GARCH models. The advantages of accounting for asymmetries are also observed through Value-at-Risk applications. Both theoretical and empirical contributions are provided in Chapter 3 of the thesis. In this chapter the so-called mixture conditional autoregressive range (MCARR) model is introduced, examined and applied to daily price ranges of the Hang Seng Index. The conditions for the strict and weak stationarity of the model as well as an expression for the autocorrelation function are obtained by writing the MCARR model as a first order autoregressive process with random coefficients. The chapter also introduces inverse gamma (IG) distribution to CARR models. The advantages of CARR-IG and MCARR-IG specifications over conventional CARR models are found in the empirical application both in- and out-of-sample. Chapter 4 discusses the simultaneous modeling of absolute returns and daily price ranges. In this part of the thesis a vector multiplicative error model (VMEM) with asymmetric Gumbel copula is found to provide substantial benefits over the existing VMEM models based on elliptical copulas. The proposed specification is able to capture the highly asymmetric dependence of the modeled variables thereby improving the performance of the model considerably. The economic significance of the results obtained is established when the information content of the volatility forecasts derived is examined.
Resumo:
Data-assimilaatio on tekniikka, jossa havaintoja yhdistetään dynaamisiin numeerisiin malleihin tarkoituksena tuottaa optimaalista esitystä esimerkiksi ilmankehän muuttuvasta tilasta. Data-assimilaatiota käytetään muun muassa operaativisessa sään ennustamisessa. Tässä työssä esitellään eri data-assimilaatiomenetelmiä, jotka jakautuvat pääpiirteittäin Kalmanin suotimiin ja variaatioanaalisiin menetelmiin. Lisäksi esitellään erilaisia data-assimilaatiossa tarvittavia apuvälineitä kuten optimointimenetelmiä. Eri data-assimilaatiomenetelmien toimintaa havainnollistetaan esimerkkien avulla. Tässä työssä data-assimilaatiota sovelletaan muun muassa Lorenz95-malliin. Käytännön data-assimilaatio-ongelmana on GOMOS-instrumentista saatavan otsonin assimiloiminen käyttäen hyväksi ROSE-kemiakuljetusmallia.
Resumo:
Disease maps are effective tools for explaining and predicting patterns of disease outcomes across geographical space, identifying areas of potentially elevated risk, and formulating and validating aetiological hypotheses for a disease. Bayesian models have become a standard approach to disease mapping in recent decades. This article aims to provide a basic understanding of the key concepts involved in Bayesian disease mapping methods for areal data. It is anticipated that this will help in interpretation of published maps, and provide a useful starting point for anyone interested in running disease mapping methods for areal data. The article provides detailed motivation and descriptions on disease mapping methods by explaining the concepts, defining the technical terms, and illustrating the utility of disease mapping for epidemiological research by demonstrating various ways of visualising model outputs using a case study. The target audience includes spatial scientists in health and other fields, policy or decision makers, health geographers, spatial analysts, public health professionals, and epidemiologists.
Resumo:
An integrated approach of using strandings and bycatch data may provide an indicator of long-term trends for data-limited cetaceans. Strandings programs can give a faithful representation of the species composition of cetacean assemblages, while standardised bycatch rates can provide a measure of relative abundance. Comparing the two datasets may also facilitate managing impacts by understanding which species, sex or sizes are the most vulnerable to interactions with fisheries gear. Here we apply this approach to two long-term datasets in East Australia, bycatch in the Queensland Shark Control Program QSCP, 1992–2012) and strandings in the Queensland Marine Wildlife Strandings and Mortality Program StrandNet, 1996–2012). Short-beaked common dolphins, Delphinus delphis, were markedly more frequent in bycatch than in the strandings dataset, suggesting that they are more prone to being incidentally caught than other cetacean species in the region. The reverse was true for humpback whales, Megaptera novaeangliae, bottlenose dolphins, Tursiops spp.; and species predominantly found in offshore waters. QSCP bycatch was strongly skewed towards females for short-beaked common dolphins, and towards smaller sizes for Australian humpback dolphins, Sousa sahulensis. Overall, both datasets demonstrated similar seasonality and a similar long-term increase from 1996 until 2008. Analysis on a species-by-species basis was then used to explore potential explanations for long-term trends, which ranged from a recovering stock (humpback whales) to a shift in habitat use (short-beaked common dolphins).