963 resultados para count data models
Resumo:
As a thorough aggregation of probability and graph theory, Bayesian networks currently enjoy widespread interest as a means for studying factors that affect the coherent evaluation of scientific evidence in forensic science. Paper I of this series of papers intends to contribute to the discussion of Bayesian networks as a framework that is helpful for both illustrating and implementing statistical procedures that are commonly employed for the study of uncertainties (e.g. the estimation of unknown quantities). While the respective statistical procedures are widely described in literature, the primary aim of this paper is to offer an essentially non-technical introduction on how interested readers may use these analytical approaches - with the help of Bayesian networks - for processing their own forensic science data. Attention is mainly drawn to the structure and underlying rationale of a series of basic and context-independent network fragments that users may incorporate as building blocs while constructing larger inference models. As an example of how this may be done, the proposed concepts will be used in a second paper (Part II) for specifying graphical probability networks whose purpose is to assist forensic scientists in the evaluation of scientific evidence encountered in the context of forensic document examination (i.e. results of the analysis of black toners present on printed or copied documents).
Resumo:
Cross-hole radar tomography is a useful tool for mapping shallow subsurface electrical properties viz. dielectric permittivity and electrical conductivity. Common practice is to invert cross-hole radar data with ray-based tomographic algorithms using first arrival traveltimes and first cycle amplitudes. However, the resolution of conventional standard ray-based inversion schemes for cross-hole ground-penetrating radar (GPR) is limited because only a fraction of the information contained in the radar data is used. The resolution can be improved significantly by using a full-waveform inversion that considers the entire waveform, or significant parts thereof. A recently developed 2D time-domain vectorial full-waveform crosshole radar inversion code has been modified in the present study by allowing optimized acquisition setups that reduce the acquisition time and computational costs significantly. This is achieved by minimizing the number of transmitter points and maximizing the number of receiver positions. The improved algorithm was employed to invert cross-hole GPR data acquired within a gravel aquifer (4-10 m depth) in the Thur valley, Switzerland. The simulated traces of the final model obtained by the full-waveform inversion fit the observed traces very well in the lower part of the section and reasonably well in the upper part of the section. Compared to the ray-based inversion, the results from the full-waveform inversion show significantly higher resolution images. At either side, 2.5 m distance away from the cross-hole plane, borehole logs were acquired. There is a good correspondence between the conductivity tomograms and the natural gamma logs at the boundary of the gravel layer and the underlying lacustrine clay deposits. Using existing petrophysical models, the inversion results and neutron-neutron logs are converted to porosity. Without any additional calibration, the values obtained for the converted neutron-neutron logs and permittivity results are very close and similar vertical variations can be observed. The full-waveform inversion provides in both cases additional information about the subsurface. Due to the presence of the water table and associated refracted/reflected waves, the upper traces are not well fitted and the upper 2 m in the permittivity and conductivity tomograms are not reliably reconstructed because the unsaturated zone is not incorporated into the inversion domain.
Resumo:
It has been repeatedly debated which strategies people rely on in inference. These debates have been difficult to resolve, partially because hypotheses about the decision processes assumed by these strategies have typically been formulated qualitatively, making it hard to test precise quantitative predictions about response times and other behavioral data. One way to increase the precision of strategies is to implement them in cognitive architectures such as ACT-R. Often, however, a given strategy can be implemented in several ways, with each implementation yielding different behavioral predictions. We present and report a study with an experimental paradigm that can help to identify the correct implementations of classic compensatory and non-compensatory strategies such as the take-the-best and tallying heuristics, and the weighted-linear model.
Resumo:
Depth-averaged velocities and unit discharges within a 30 km reach of one of the world's largest rivers, the Rio Parana, Argentina, were simulated using three hydrodynamic models with different process representations: a reduced complexity (RC) model that neglects most of the physics governing fluid flow, a two-dimensional model based on the shallow water equations, and a three-dimensional model based on the Reynolds-averaged Navier-Stokes equations. Row characteristics simulated using all three models were compared with data obtained by acoustic Doppler current profiler surveys at four cross sections within the study reach. This analysis demonstrates that, surprisingly, the performance of the RC model is generally equal to, and in some instances better than, that of the physics based models in terms of the statistical agreement between simulated and measured flow properties. In addition, in contrast to previous applications of RC models, the present study demonstrates that the RC model can successfully predict measured flow velocities. The strong performance of the RC model reflects, in part, the simplicity of the depth-averaged mean flow patterns within the study reach and the dominant role of channel-scale topographic features in controlling the flow dynamics. Moreover, the very low water surface slopes that typify large sand-bed rivers enable flow depths to be estimated reliably in the RC model using a simple fixed-lid planar water surface approximation. This approach overcomes a major problem encountered in the application of RC models in environments characterised by shallow flows and steep bed gradients. The RC model is four orders of magnitude faster than the physics based models when performing steady-state hydrodynamic calculations. However, the iterative nature of the RC model calculations implies a reduction in computational efficiency relative to some other RC models. A further implication of this is that, if used to simulate channel morphodynamics, the present RC model may offer only a marginal advantage in terms of computational efficiency over approaches based on the shallow water equations. These observations illustrate the trade off between model realism and efficiency that is a key consideration in RC modelling. Moreover, this outcome highlights a need to rethink the use of RC morphodynamic models in fluvial geomorphology and to move away from existing grid-based approaches, such as the popular cellular automata (CA) models, that remain essentially reductionist in nature. In the case of the world's largest sand-bed rivers, this might be achieved by implementing the RC model outlined here as one element within a hierarchical modelling framework that would enable computationally efficient simulation of the morphodynamics of large rivers over millennial time scales. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Continuous field mapping has to address two conflicting remote sensing requirements when collecting training data. On one hand, continuous field mapping trains fractional land cover and thus favours mixed training pixels. On the other hand, the spectral signature has to be preferably distinct and thus favours pure training pixels. The aim of this study was to evaluate the sensitivity of training data distribution along fractional and spectral gradients on the resulting mapping performance. We derived four continuous fields (tree, shrubherb, bare, water) from aerial photographs as response variables and processed corresponding spectral signatures from multitemporal Landsat 5 TM data as explanatory variables. Subsequent controlled experiments along fractional cover gradients were then based on generalised linear models. Resulting fractional and spectral distribution differed between single continuous fields, but could be satisfactorily trained and mapped. Pixels with fractional or without respective cover were much more critical than pure full cover pixels. Error distribution of continuous field models was non-uniform with respect to horizontal and vertical spatial distribution of target fields. We conclude that a sampling for continuous field training data should be based on extent and densities in the fractional and spectral, rather than the real spatial space. Consequently, adequate training plots are most probably not systematically distributed in the real spatial space, but cover the gradient and covariate structure of the fractional and spectral space well. (C) 2009 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
Resumo:
Joint inversion of crosshole ground-penetrating radar and seismic data can improve model resolution and fidelity of the resultant individual models. Model coupling obtained by minimizing or penalizing some measure of structural dissimilarity between models appears to be the most versatile approach because only weak assumptions about petrophysical relationships are required. Nevertheless, experimental results and petrophysical arguments suggest that when porosity variations are weak in saturated unconsolidated environments, then radar wave speed is approximately linearly related to seismic wave speed. Under such circumstances, model coupling also can be achieved by incorporating cross-covariances in the model regularization. In two case studies, structural similarity is imposed by penalizing models for which the model cross-gradients are nonzero. A first case study demonstrates improvements in model resolution by comparing the resulting models with borehole information, whereas a second case study uses point-spread functions. Although radar seismic wavespeed crossplots are very similar for the two case studies, the models plot in different portions of the graph, suggesting variances in porosity. Both examples display a close, quasilinear relationship between radar seismic wave speed in unconsolidated environments that is described rather well by the corresponding lower Hashin-Shtrikman (HS) bounds. Combining crossplots of the joint inversion models with HS bounds can constrain porosity and pore structure better than individual inversion results can.
Resumo:
OBJECTIVE: To evaluate deaths from AIDS-defining malignancies (ADM) and non-AIDS-defining malignancies (nADM) in the D:A:D Study and to investigate the relationship between these deaths and immunodeficiency. DESIGN: Observational cohort study. METHODS: Patients (23 437) were followed prospectively for 104 921 person-years. We used Poisson regression models to identify factors independently associated with deaths from ADM and nADM. Analyses of factors associated with mortality due to nADM were repeated after excluding nADM known to be associated with a specific risk factor. RESULTS: Three hundred five patients died due to a malignancy, 298 prior to the cutoff for this analysis (ADM: n = 110; nADM: n = 188). The mortality rate due to ADM decreased from 20.1/1000 person-years of follow-up [95% confidence interval (CI) 14.4, 25.9] when the most recent CD4 cell count was <50 cells/microl to 0.1 (0.03, 0.3)/1000 person-years of follow-up when the CD4 cell count was more than 500 cells/microl; the mortality rate from nADM decreased from 6.0 (95% CI 3.3, 10.1) to 0.6 (0.4, 0.8) per 1000 person-years of follow-up between these two CD4 cell count strata. In multivariable regression analyses, a two-fold higher latest CD4 cell count was associated with a halving of the risk of ADM mortality. Other predictors of an increased risk of ADM mortality were homosexual risk group, older age, a previous (non-malignancy) AIDS diagnosis and earlier calendar years. Predictors of an increased risk of nADM mortality included lower CD4 cell count, older age, current/ex-smoking status, longer cumulative exposure to combination antiretroviral therapy, active hepatitis B infection and earlier calendar year. CONCLUSION: The severity of immunosuppression is predictive of death from both ADM and nADM in HIV-infected populations.
Resumo:
Surface-based ground penetrating radar (GPR) and electrical resistance tomography (ERT) are common tools for aquifer characterization, because both methods provide data that are sensitive to hydrogeologically relevant quantities. To retrieve bulk subsurface properties at high resolution, we suggest incorporating structural information derived from GPR reflection data when inverting surface ERT data. This reduces resolution limitations, which might hinder quantitative interpretations. Surface-based GPR reflection and ERT data have been recorded on an exposed gravel bar within a restored section of a previously channelized river in northeastern Switzerland to characterize an underlying gravel aquifer. The GPR reflection data acquired over an area of 240×40 m map the aquifer's thickness and two internal sub-horizontal regions with different depositional patterns. The interface between these two regions and the boundary of the aquifer with then underlying clay are incorporated in an unstructured ERT mesh. Subsequent inversions are performed without applying smoothness constraints across these boundaries. Inversion models obtained by using these structural constraints contain subtle resistivity variations within the aquifer that are hardly visible in standard inversion models as a result of strong vertical smearing in the latter. In the upper aquifer region, with high GPR coherency and horizontal layering, the resistivity is moderately high (N300 Ωm). We suggest that this region consists of sediments that were rearranged during more than a century of channelized flow. In the lower low coherency region, the GPR image reveals fluvial features (e.g., foresets) and generally more heterogeneous deposits. In this region, the resistivity is lower (~200 Ωm), which we attribute to increased amounts of fines in some of the well-sorted fluvial deposits. We also find elongated conductive anomalies that correspond to the location of river embankments that were removed in 2002.
Resumo:
BACKGROUND: A major goal of antiretroviral therapy (ART) for HIV-1-infected persons is the recovery of CD4 T lymphocytes, resulting in thorough protection against opportunistic complications. Interruptions of ART are still frequent. The long-term effect on CD4 T-cell recovery and clinical events remains unknown. METHODS: Immunological and clinical endpoints were evaluated in 2491 participants of the Swiss HIV Cohort Study initiating ART during a mean follow-up of 7.1 years. Data were analysed in persons with treatment interruptions (n = 1271; group A), continuous ART, but intermittent HIV-1 RNA at least 1000 copies/ml (n = 469; group B) and continuous ART and HIV-1 RNA constantly less than 1000 copies/ml (n = 751; group C). Risk factors for low CD4 T-cell counts and clinical events were analysed using Cox proportional hazards models. RESULTS: In groups A-C, CD4 T lymphocytes increased to a median of 427, 525 and 645 cells/μl at 8 years. In group A, 63.0 and 37.2% reached above 350 and 500 CD4 T cells/μl, whereas in group B 76.3 and 55.8% and in group C 87.3 and 68.0% reached these thresholds (P < 0.001). CD4 T-cell recovery directly depended on the cumulative duration of treatment interruptions. In addition, participants of group A had more Centers for Disease Control and Prevention B/C events, resulting in an increased risk of death. Major risk factors for not reaching CD4 T cells above 500 cells/μl included lower baseline CD4 T-cell count, higher age and hepatitis C virus co-infection. CONCLUSION: In persons receiving continuous ART larger CD4 T-cell recovery and a reduced risk for opportunistic complications and death was observed. CD4 T-cell recovery was smaller in persons with treatment interruptions more than 6 months.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.
Integrating species distribution models (SDMs) and phylogeography for two species of Alpine Primula.
Resumo:
The major intention of the present study was to investigate whether an approach combining the use of niche-based palaeodistribution modeling and phylo-geography would support or modify hypotheses about the Quaternary distributional history derived from phylogeographic methods alone. Our study system comprised two closely related species of Alpine Primula. We used species distribution models based on the extant distribution of the species and last glacial maximum (LGM) climate models to predict the distribution of the two species during the LGM. Phylogeographic data were generated using amplified fragment length polymorphisms (AFLPs). In Primula hirsuta, models of past distribution and phylogeographic data are partly congruent and support the hypothesis of widespread nunatak survival in the Central Alps. Species distribution models (SDMs) allowed us to differentiate between alpine regions that harbor potential nunatak areas and regions that have been colonized from other areas. SDMs revealed that diversity is a good indicator for nunataks, while rarity is a good indicator for peripheral relict populations that were not source for the recolonization of the inner Alps. In P. daonensis, palaeo-distribution models and phylogeographic data are incongruent. Besides the uncertainty inherent to this type of modeling approach (e.g., relatively coarse 1-km grain size), disagreement of models and data may partly be caused by shifts of ecological niche in both species. Nevertheless, we demonstrate that the combination of palaeo-distribution modeling with phylogeographical approaches provides a more differentiated picture of the distributional history of species and partly supports (P. hirsuta) and partly modifies (P. daonensis and P. hirsuta) hypotheses of Quaternary distributional history. Some of the refugial area indicated by palaeodistribution models could not have been identified with phylogeographic data.
Resumo:
Aim Species distribution models (SDMs) based on current species ranges underestimate the potential distribution when projected in time and/or space. A multi-temporal model calibration approach has been suggested as an alternative, and we evaluate this using 13,000 years of data. Location Europe. Methods We used fossil-based records of presence for Picea abies, Abies alba and Fagus sylvatica and six climatic variables for the period 13,000 to 1000yr bp. To measure the contribution of each 1000-year time step to the total niche of each species (the niche measured by pooling all the data), we employed a principal components analysis (PCA) calibrated with data over the entire range of possible climates. Then we projected both the total niche and the partial niches from single time frames into the PCA space, and tested if the partial niches were more similar to the total niche than random. Using an ensemble forecasting approach, we calibrated SDMs for each time frame and for the pooled database. We projected each model to current climate and evaluated the results against current pollen data. We also projected all models into the future. Results Niche similarity between the partial and the total-SDMs was almost always statistically significant and increased through time. SDMs calibrated from single time frames gave different results when projected to current climate, providing evidence of a change in the species realized niches through time. Moreover, they predicted limited climate suitability when compared with the total-SDMs. The same results were obtained when projected to future climates. Main conclusions The realized climatic niche of species differed for current and future climates when SDMs were calibrated considering different past climates. Building the niche as an ensemble through time represents a way forward to a better understanding of a species' range and its ecology in a changing climate.
Resumo:
Detailed large-scale information on mammal distribution has often been lacking, hindering conservation efforts. We used the information from the 2009 IUCN Red List of Threatened Species as a baseline for developing habitat suitability models for 5027 out of 5330 known terrestrial mammal species, based on their habitat relationships. We focused on the following environmental variables: land cover, elevation and hydrological features. Models were developed at 300 m resolution and limited to within species' known geographical ranges. A subset of the models was validated using points of known species occurrence. We conducted a global, fine-scale analysis of patterns of species richness. The richness of mammal species estimated by the overlap of their suitable habitat is on average one-third less than that estimated by the overlap of their geographical ranges. The highest absolute difference is found in tropical and subtropical regions in South America, Africa and Southeast Asia that are not covered by dense forest. The proportion of suitable habitat within mammal geographical ranges correlates with the IUCN Red List category to which they have been assigned, decreasing monotonically from Least Concern to Endangered. These results demonstrate the importance of fine-resolution distribution data for the development of global conservation strategies for mammals.
Resumo:
Geophysical techniques can help to bridge the inherent gap with regard to spatial resolution and the range of coverage that plagues classical hydrological methods. This has lead to the emergence of the new and rapidly growing field of hydrogeophysics. Given the differing sensitivities of various geophysical techniques to hydrologically relevant parameters and their inherent trade-off between resolution and range the fundamental usefulness of multi-method hydrogeophysical surveys for reducing uncertainties in data analysis and interpretation is widely accepted. A major challenge arising from such endeavors is the quantitative integration of the resulting vast and diverse database in order to obtain a unified model of the probed subsurface region that is internally consistent with all available data. To address this problem, we have developed a strategy towards hydrogeophysical data integration based on Monte-Carlo-type conditional stochastic simulation that we consider to be particularly suitable for local-scale studies characterized by high-resolution and high-quality datasets. Monte-Carlo-based optimization techniques are flexible and versatile, allow for accounting for a wide variety of data and constraints of differing resolution and hardness and thus have the potential of providing, in a geostatistical sense, highly detailed and realistic models of the pertinent target parameter distributions. Compared to more conventional approaches of this kind, our approach provides significant advancements in the way that the larger-scale deterministic information resolved by the hydrogeophysical data can be accounted for, which represents an inherently problematic, and as of yet unresolved, aspect of Monte-Carlo-type conditional simulation techniques. We present the results of applying our algorithm to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the local-scale porosity structure. Our procedure is first tested on pertinent synthetic data and then applied to corresponding field data collected at the Boise Hydrogeophysical Research Site near Boise, Idaho, USA.