951 resultados para Algorithmic Probability
Resumo:
Threats against computer networks evolve very fast and require more and more complex measures. We argue that teams respectively groups with a common purpose for intrusion detection and prevention improve the measures against rapid propagating attacks similar to the concept of teams solving complex tasks known from field of work sociology. Collaboration in this sense is not easy task especially for heterarchical environments. We propose CIMD (collaborative intrusion and malware detection) as a security overlay framework to enable cooperative intrusion detection approaches. Objectives and associated interests are used to create detection groups for exchange of security-related data. In this work, we contribute a tree-oriented data model for device representation in the scope of security. We introduce an algorithm for the formation of detection groups, show realization strategies for the system and conduct vulnerability analysis. We evaluate the benefit of CIMD by simulation and probabilistic analysis.
Resumo:
Reasoning with uncertain knowledge and belief has long been recognized as an important research issue in Artificial Intelligence (AI). Several methodologies have been proposed in the past, including knowledge-based systems, fuzzy sets, and probability theory. The probabilistic approach became popular mainly due to a knowledge representation framework called Bayesian networks. Bayesian networks have earned reputation of being powerful tools for modeling complex problem involving uncertain knowledge. Uncertain knowledge exists in domains such as medicine, law, geographical information systems and design as it is difficult to retrieve all knowledge and experience from experts. In design domain, experts believe that design style is an intangible concept and that its knowledge is difficult to be presented in a formal way. The aim of the research is to find ways to represent design style knowledge in Bayesian net works. We showed that these networks can be used for diagnosis (inferences) and classification of design style. The furniture design style is selected as an example domain, however the method can be used for any other domain.
Resumo:
Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.
Resumo:
Key distribution is one of the most challenging security issues in wireless sensor networks where sensor nodes are randomly scattered over a hostile territory. In such a sensor deployment scenario, there will be no prior knowledge of post deployment configuration. For security solutions requiring pair wise keys, it is impossible to decide how to distribute key pairs to sensor nodes before the deployment. Existing approaches to this problem are to assign more than one key, namely a key-chain, to each node. Key-chains are randomly drawn from a key-pool. Either two neighbouring nodes have a key in common in their key-chains, or there is a path, called key-path, among these two nodes where each pair of neighbouring nodes on this path has a key in common. Problem in such a solution is to decide on the key-chain size and key-pool size so that every pair of nodes can establish a session key directly or through a path with high probability. The size of the key-path is the key factor for the efficiency of the design. This paper presents novel, deterministic and hybrid approaches based on Combinatorial Design for key distribution. In particular, several block design techniques are considered for generating the key-chains and the key-pools. Comparison to probabilistic schemes shows that our combinatorial approach produces better connectivity with smaller key-chain sizes.
Resumo:
Previous studies have enabled exact prediction of probabilities of identity-by-descent (IBD) in randommating populations for a few loci (up to four or so), with extension to more using approximate regression methods. Here we present a precise predictor of multiple-locus IBD using simple formulas based on exact results for two loci. In particular, the probability of non-IBD X ABC at each of ordered loci A, B, and C can be well approximated by XABC = XABXBC/XB and generalizes to X123. . .k = X12X23. . .Xk-1,k/ Xk-2, where X is the probability of non-IBD at each locus. Predictions from this chain rule are very precise with population bottlenecks and migration, but are rather poorer in the presence of mutation. From these coefficients, the probabilities of multilocus IBD and non-IBD can also be computed for genomic regions as functions of population size, time, and map distances. An approximate but simple recurrence formula is also developed, which generally is less accurate than the chain rule but is more robust with mutation. Used together with the chain rule it leads to explicit equations for non-IBD in a region. The results can be applied to detection of quantitative trait loci (QTL) by computing the probability of IBD at candidate loci in terms of identity-by-state at neighboring markers.
Resumo:
A novel multiple regression method (RM) is developed to predict identity-by-descent probabilities at a locus L (IBDL), among individuals without pedigree, given information on surrounding markers and population history. These IBDL probabilities are a function of the increase in linkage disequilibrium (LD) generated by drift in a homogeneous population over generations. Three parameters are sufficient to describe population history: effective population size (Ne), number of generations since foundation (T), and marker allele frequencies among founders (p). IBD L are used in a simulation study to map a quantitative trait locus (QTL) via variance component estimation. RM is compared to a coalescent method (CM) in terms of power and robustness of QTL detection. Differences between RM and CM are small but significant. For example, RM is more powerful than CM in dioecious populations, but not in monoecious populations. Moreover, RM is more robust than CM when marker phases are unknown or when there is complete LD among founders or Ne is wrong, and less robust when p is wrong. CM utilises all marker haplotype information, whereas RM utilises information contained in each individual marker and all possible marker pairs but not in higher order interactions. RM consists of a family of models encompassing four different population structures, and two ways of using marker information, which contrasts with the single model that must cater for all possible evolutionary scenarios in CM.
Resumo:
A new deterministic method for predicting simultaneous inbreeding coefficients at three and four loci is presented. The method involves calculating the conditional probability of IBD (identical by descent) at one locus given IBD at other loci, and multiplying this probability by the prior probability of the latter loci being simultaneously IBD. The conditional probability is obtained applying a novel regression model, and the prior probability from the theory of digenic measures of Weir and Cockerham. The model was validated for a finite monoecious population mating at random, with a constant effective population size, and with or without selfing, and also for an infinite population with a constant intermediate proportion of selfing. We assumed discrete generations. Deterministic predictions were very accurate when compared with simulation results, and robust to alternative forms of implementation. These simultaneous inbreeding coefficients were more sensitive to changes in effective population size than in marker spacing. Extensions to predict simultaneous inbreeding coefficients at more than four loci are now possible.
Resumo:
The power of testing for a population-wide association between a biallelic quantitative trait locus and a linked biallelic marker locus is predicted both empirically and deterministically for several tests. The tests were based on the analysis of variance (ANOVA) and on a number of transmission disequilibrium tests (TDT). Deterministic power predictions made use of family information, and were functions of population parameters including linkage disequilibrium, allele frequencies, and recombination rate. Deterministic power predictions were very close to the empirical power from simulations in all scenarios considered in this study. The different TDTs had very similar power, intermediate between one-way and nested ANOVAs. One-way ANOVA was the only test that was not robust against spurious disequilibrium. Our general framework for predicting power deterministically can be used to predict power in other association tests. Deterministic power calculations are a powerful tool for researchers to plan and evaluate experiments and obviate the need for elaborate simulation studies.
Resumo:
Background Efficient effective child product safety (PS) responses require data on hazards, injury severity and injury probability. PS responses in Australia largely rely on reports from manufacturers/retailers, other jurisdictions/regulators, or consumers. The extent to which reactive responses reflect actual child injury priorities is unknown. Aims/Objectives/Purpose This research compared PS issues for children identified using data compiled from PS regulatory data and data compiled from health data sources in Queensland, Australia. Methods PS regulatory documents describing issues affecting children in Queensland in 2008–2009 were compiled and analysed to identify frequent products and hazards. Three health data sources (ED, injury surveillance and hospital data) were analysed to identify frequent products and hazards. Results/Outcomes Projectile toys/squeeze toys were the priority products for PS regulators with these toys having the potential to release small parts presenting choking hazards. However, across all health datasets, falls were the most common mechanism of injury, and several of the products identified were not subject to a PS system response. While some incidents may not require a response, a manual review of injury description text identified child poisonings and burns as common mechanisms of injuries in the health data where there was substantial documentation of product-involvement, yet only 10% of PS system responses focused on these two mechanisms combined. Significance/contribution to the field Regulatory data focused on products that fail compliance checks with ‘potential’ to cause harm, and health data identified actual harm, resulting in different prioritisation of products/mechanisms. Work is needed to better integrate health data into PS responses in Australia.
Resumo:
In March 2008, the Australian Government announced its intention to introduce a national Emissions Trading Scheme (ETS), now expected to start in 2015. This impending development provides an ideal setting to investigate the impact an ETS in Australia will have on the market valuation of Australian Securities Exchange (ASX) firms. This is the first empirical study into the pricing effects of the ETS in Australia. Primarily, we hypothesize that firm value will be negatively related to a firm's carbon intensity profile. That is, there will be a greater impact on firm value for high carbon emitters in the period prior (2007) to the introduction of the ETS, whether for reasons relating to the existence of unbooked liabilities associated with future compliance and/or abatement costs, or for reasons relating to reduced future earnings. Using a sample of 58 Australian listed firms (constrained by the current availability of emissions data) which comprise larger, more profitable and less risky listed Australian firms, we first undertake an event study focusing on five distinct information events argued to impact the probability of the proposed ETS being enacted. Here, we find direct evidence that the capital market is indeed pricing the proposed ETS. Second, using a modified version of the Ohlson (1995) valuation model, we undertake a valuation analysis designed not only to complement the event study results, but more importantly to provide insights into the capital market's assessment of the magnitude of the economic impact of the proposed ETS as reflected in market capitalization. Here, our results show that the market assesses the most carbon intensive sample firms a market value decrement relative to other sample firms of between 7% and 10% of market capitalization. Further, based on the carbon emission profile of the sample firms we imply a ‘future carbon permit price’ of between AUD$17 per tonne and AUD$26 per tonne of carbon dioxide emitted. This study is more precise than industry reports, which set a carbon price of between AUD$15 to AUD$74 per tonne.
Resumo:
Context: Anti-Müllerian hormone (AMH) concentration reflects ovarian aging and is argued to be a useful predictor of age at menopause (AMP). It is hypothesized that AMH falling below a critical threshold corresponds to follicle depletion, which results in menopause. With this threshold, theoretical predictions of AMP can be made. Comparisons of such predictions with observed AMP from population studies support the role for AMH as a forecaster of menopause. Objective: The objective of the study was to investigate whether previous relationships between AMH and AMP are valid using a much larger data set. Setting: AMH was measured in 27 563 women attending fertility clinics. Study Design: From these data a model of age-related AMH change was constructed using a robust regression analysis. Data on AMP from subfertile women were obtained from the population-based Prospect-European Prospective Investigation into Cancer and Nutrition (Prospect- EPIC) cohort (n � 2249). By constructing a probability distribution of age at which AMH falls below a critical threshold and fitting this to Prospect-EPIC menopausal age data using maximum likelihood, such a threshold was estimated. Main Outcome: The main outcome was conformity between observed and predicted AMP. Results: To get a distribution of AMH-predicted AMP that fit the Prospect-EPIC data, we found the critical AMH threshold should vary among women in such a way that women with low age-specific AMH would have lower thresholds, whereas women with high age-specific AMH would have higher thresholds (mean 0.075 ng/mL; interquartile range 0.038–0.15 ng/mL). Such a varying AMH threshold for menopause is a novel and biologically plausible finding. AMH became undetectable (�0.2 ng/mL) approximately 5 years before the occurrence of menopause, in line with a previous report. Conclusions: The conformity of the observed and predicted distributions of AMP supports the hypothesis that declining population averages of AMH are associated with menopause, making AMH an excellent candidate biomarker for AMP prediction. Further research will help establish the accuracy of AMH levels to predict AMP within individuals.
Resumo:
Modelling how a word is activated in human memory is an important requirement for determining the probability of recall of a word in an extra-list cueing experiment. Previous research assumed a quantum-like model in which the semantic network was modelled as entangled qubits, however the level of activation was clearly being over-estimated. This paper explores three variations of this model, each of which are distinguished by a scaling factor designed to compensate the overestimation.
Resumo:
Background: There is a well developed literature on research investigating the relationship between various driving behaviours and road crash involvement. However, this research has predominantly been conducted in developed economies dominated by western types of cultural environments. To date no research has been published that has empirically investigated this relationship within the context of the emerging economies such as Oman. Objective: The present study aims to investigate driving behaviour as indexed in the Driving Behaviour Questionnaire (DBQ) among a group of Omani university students and staff. Methods: A convenience non-probability self- selection sampling approach was utilized with Omani university students and staff. Results: A total of 1003 Omani students (n= 632) and staff (n=371) participated in the survey. Factor analysis of the BDQ revealed four main factors that were errors, speeding violation, lapses and aggressive violation. In the multivariate logistic backward regression analysis, the following factors were identified as significant predictors of being involved in causing at least one crash: driving experience, history of offences and two DBQ components i.e. errors and aggressive violation. Conclusion: This study indicates that errors and aggressive violation of the traffic regulations as well as history of having traffic offences are major risk factors for road traffic crashes among the sample. While previous international research has demonstrated that speeding is a primary cause of crashing, in the current context, the results indicate that an array of factors is associated with crashes. Further research using more rigorous methodology is warranted to inform the development of road safety countermeasures in Oman that improves overall traffic safety culture.
Resumo:
OBJECTIVE There has been a dramatic increase in vitamin D testing in Australia in recent years, prompting calls for targeted testing. We sought to develop a model to identify people most at risk of vitamin D deficiency. DESIGN AND PARTICIPANTS This is a cross-sectional study of 644 60- to 84-year-old participants, 95% of whom were Caucasian, who took part in a pilot randomized controlled trial of vitamin D supplementation. MEASUREMENTS Baseline 25(OH)D was measured using the Diasorin Liaison platform. Vitamin D insufficiency and deficiency were defined using 50 and 25 nmol/l as cut-points, respectively. A questionnaire was used to obtain information on demographic characteristics and lifestyle factors. We used multivariate logistic regression to predict low vitamin D and calculated the net benefit of using the model compared with 'test-all' and 'test-none' strategies. RESULTS The mean serum 25(OH)D was 42 (SD 14) nmol/1. Seventy-five per cent of participants were vitamin D insufficient and 10% deficient. Serum 25(OH)D was positively correlated with time outdoors, physical activity, vitamin D intake and ambient UVR, and inversely correlated with age, BMI and poor self-reported health status. These predictors explained approximately 21% of the variance in serum 25(OH)D. The area under the ROC curve predicting vitamin D deficiency was 0·82. Net benefit for the prediction model was higher than that for the 'test-all' strategy at all probability thresholds and higher than the 'test-none' strategy for probabilities up to 60%. CONCLUSION Our model could predict vitamin D deficiency with reasonable accuracy, but it needs to be validated in other populations before being implemented.
Resumo:
X-ray microtomography (micro-CT) with micron resolution enables new ways of characterizing microstructures and opens pathways for forward calculations of multiscale rock properties. A quantitative characterization of the microstructure is the first step in this challenge. We developed a new approach to extract scale-dependent characteristics of porosity, percolation, and anisotropic permeability from 3-D microstructural models of rocks. The Hoshen-Kopelman algorithm of percolation theory is employed for a standard percolation analysis. The anisotropy of permeability is calculated by means of the star volume distribution approach. The local porosity distribution and local percolation probability are obtained by using the local porosity theory. Additionally, the local anisotropy distribution is defined and analyzed through two empirical probability density functions, the isotropy index and the elongation index. For such a high-resolution data set, the typical data sizes of the CT images are on the order of gigabytes to tens of gigabytes; thus an extremely large number of calculations are required. To resolve this large memory problem parallelization in OpenMP was used to optimally harness the shared memory infrastructure on cache coherent Non-Uniform Memory Access architecture machines such as the iVEC SGI Altix 3700Bx2 Supercomputer. We see adequate visualization of the results as an important element in this first pioneering study.