67 resultados para inference algorithms
Resumo:
This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.
Resumo:
To make a comprehensive evaluation of organ-specific out-of-field doses using Monte Carlo (MC) simulations for different breast cancer irradiation techniques and to compare results with a commercial treatment planning system (TPS). Three breast radiotherapy techniques using 6MV tangential photon beams were compared: (a) 2DRT (open rectangular fields), (b) 3DCRT (conformal wedged fields), and (c) hybrid IMRT (open conformal+modulated fields). Over 35 organs were contoured in a whole-body CT scan and organ-specific dose distributions were determined with MC and the TPS. Large differences in out-of-field doses were observed between MC and TPS calculations, even for organs close to the target volume such as the heart, the lungs and the contralateral breast (up to 70% difference). MC simulations showed that a large fraction of the out-of-field dose comes from the out-of-field head scatter fluence (>40%) which is not adequately modeled by the TPS. Based on MC simulations, the 3DCRT technique using external wedges yielded significantly higher doses (up to a factor 4-5 in the pelvis) than the 2DRT and the hybrid IMRT techniques which yielded similar out-of-field doses. In sharp contrast to popular belief, the IMRT technique investigated here does not increase the out-of-field dose compared to conventional techniques and may offer the most optimal plan. The 3DCRT technique with external wedges yields the largest out-of-field doses. For accurate out-of-field dose assessment, a commercial TPS should not be used, even for organs near the target volume (contralateral breast, lungs, heart).
Resumo:
Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance.
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
In the first part of this research, three stages were stated for a program to increase the information extracted from ink evidence and maximise its usefulness to the criminal and civil justice system. These stages are (a) develop a standard methodology for analysing ink samples by high-performance thin layer chromatography (HPTLC) in reproducible way, when ink samples are analysed at different time, locations and by different examiners; (b) compare automatically and objectively ink samples; and (c) define and evaluate theoretical framework for the use of ink evidence in forensic context. This report focuses on the second of the three stages. Using the calibration and acquisition process described in the previous report, mathematical algorithms are proposed to automatically and objectively compare ink samples. The performances of these algorithms are systematically studied for various chemical and forensic conditions using standard performance tests commonly used in biometrics studies. The results show that different algorithms are best suited for different tasks. Finally, this report demonstrates how modern analytical and computer technology can be used in the field of ink examination and how tools developed and successfully applied in other fields of forensic science can help maximising its impact within the field of questioned documents.
Resumo:
Aim Recently developed parametric methods in historical biogeography allow researchers to integrate temporal and palaeogeographical information into the reconstruction of biogeographical scenarios, thus overcoming a known bias of parsimony-based approaches. Here, we compare a parametric method, dispersal-extinction-cladogenesis (DEC), against a parsimony-based method, dispersal-vicariance analysis (DIVA), which does not incorporate branch lengths but accounts for phylogenetic uncertainty through a Bayesian empirical approach (Bayes-DIVA). We analyse the benefits and limitations of each method using the cosmopolitan plant family Sapindaceae as a case study.Location World-wide.Methods Phylogenetic relationships were estimated by Bayesian inference on a large dataset representing generic diversity within Sapindaceae. Lineage divergence times were estimated by penalized likelihood over a sample of trees from the posterior distribution of the phylogeny to account for dating uncertainty in biogeographical reconstructions. We compared biogeographical scenarios between Bayes-DIVA and two different DEC models: one with no geological constraints and another that employed a stratified palaeogeographical model in which dispersal rates were scaled according to area connectivity across four time slices, reflecting the changing continental configuration over the last 110 million years.Results Despite differences in the underlying biogeographical model, Bayes-DIVA and DEC inferred similar biogeographical scenarios. The main differences were: (1) in the timing of dispersal events - which in Bayes-DIVA sometimes conflicts with palaeogeographical information, and (2) in the lower frequency of terminal dispersal events inferred by DEC. Uncertainty in divergence time estimations influenced both the inference of ancestral ranges and the decisiveness with which an area can be assigned to a node.Main conclusions By considering lineage divergence times, the DEC method gives more accurate reconstructions that are in agreement with palaeogeographical evidence. In contrast, Bayes-DIVA showed the highest decisiveness in unequivocally reconstructing ancestral ranges, probably reflecting its ability to integrate phylogenetic uncertainty. Care should be taken in defining the palaeogeographical model in DEC because of the possibility of overestimating the frequency of extinction events, or of inferring ancestral ranges that are outside the extant species ranges, owing to dispersal constraints enforced by the model. The wide-spanning spatial and temporal model proposed here could prove useful for testing large-scale biogeographical patterns in plants.
Resumo:
BACKGROUND: Tests for recent infections (TRIs) are important for HIV surveillance. We have shown that a patient's antibody pattern in a confirmatory line immunoassay (Inno-Lia) also yields information on time since infection. We have published algorithms which, with a certain sensitivity and specificity, distinguish between incident (< = 12 months) and older infection. In order to use these algorithms like other TRIs, i.e., based on their windows, we now determined their window periods. METHODS: We classified Inno-Lia results of 527 treatment-naïve patients with HIV-1 infection < = 12 months according to incidence by 25 algorithms. The time after which all infections were ruled older, i.e. the algorithm's window, was determined by linear regression of the proportion ruled incident in dependence of time since infection. Window-based incident infection rates (IIR) were determined utilizing the relationship 'Prevalence = Incidence x Duration' in four annual cohorts of HIV-1 notifications. Results were compared to performance-based IIR also derived from Inno-Lia results, but utilizing the relationship 'incident = true incident + false incident' and also to the IIR derived from the BED incidence assay. RESULTS: Window periods varied between 45.8 and 130.1 days and correlated well with the algorithms' diagnostic sensitivity (R(2) = 0.962; P<0.0001). Among the 25 algorithms, the mean window-based IIR among the 748 notifications of 2005/06 was 0.457 compared to 0.453 obtained for performance-based IIR with a model not correcting for selection bias. Evaluation of BED results using a window of 153 days yielded an IIR of 0.669. Window-based IIR and performance-based IIR increased by 22.4% and respectively 30.6% in 2008, while 2009 and 2010 showed a return to baseline for both methods. CONCLUSIONS: IIR estimations by window- and performance-based evaluations of Inno-Lia algorithm results were similar and can be used together to assess IIR changes between annual HIV notification cohorts.