986 resultados para 519 Probabilities
Resumo:
For many networks in nature, science and technology, it is possible to order the nodes so that most links are short-range, connecting near-neighbours, and relatively few long-range links, or shortcuts, are present. Given a network as a set of observed links (interactions), the task of finding an ordering of the nodes that reveals such a range-dependent structure is closely related to some sparse matrix reordering problems arising in scientific computation. The spectral, or Fiedler vector, approach for sparse matrix reordering has successfully been applied to biological data sets, revealing useful structures and subpatterns. In this work we argue that a periodic analogue of the standard reordering task is also highly relevant. Here, rather than encouraging nonzeros only to lie close to the diagonal of a suitably ordered adjacency matrix, we also allow them to inhabit the off-diagonal corners. Indeed, for the classic small-world model of Watts & Strogatz (1998, Collective dynamics of small-world networks. Nature, 393, 440442) this type of periodic structure is inherent. We therefore devise and test a new spectral algorithm for periodic reordering. By generalizing the range-dependent random graph class of Grindrod (2002, Range-dependent random graphs and their application to modeling large small-world proteome datasets. Phys. Rev. E, 66, 066702-1066702-7) to the periodic case, we can also construct a computable likelihood ratio that suggests whether a given network is inherently linear or periodic. Tests on synthetic data show that the new algorithm can detect periodic structure, even in the presence of noise. Further experiments on real biological data sets then show that some networks are better regarded as periodic than linear. Hence, we find both qualitative (reordered networks plots) and quantitative (likelihood ratios) evidence of periodicity in biological networks.
Resumo:
Applications such as neuroscience, telecommunication, online social networking, transport and retail trading give rise to connectivity patterns that change over time. In this work, we address the resulting need for network models and computational algorithms that deal with dynamic links. We introduce a new class of evolving range-dependent random graphs that gives a tractable framework for modelling and simulation. We develop a spectral algorithm for calibrating a set of edge ranges from a sequence of network snapshots and give a proof of principle illustration on some neuroscience data. We also show how the model can be used computationally and analytically to investigate the scenario where an evolutionary process, such as an epidemic, takes place on an evolving network. This allows us to study the cumulative effect of two distinct types of dynamics.
Resumo:
BACKGROUND: The presence of insects in stored grains is a significant problem for grain farmers, bulk grain handlers and distributors worldwide. Inspections of bulk grain commodities is essential to detect pests and therefore to reduce the risk of their presence in exported goods. It has been well documented that insect pests cluster in response to factors such as microclimatic conditions within bulk grain. Statistical sampling methodologies for grains, however, have typically considered pests and pathogens to be homogeneously distributed throughout grain commodities. In this paper we demonstrate a sampling methodology that accounts for the heterogeneous distribution of insects in bulk grains. RESULTS: We show that failure to account for the heterogeneous distribution of pests may lead to overestimates of the capacity for a sampling program to detect insects in bulk grains. Our results indicate the importance of the proportion of grain that is infested in addition to the density of pests within the infested grain. We also demonstrate that the probability of detecting pests in bulk grains increases as the number of sub-samples increases, even when the total volume or mass of grain sampled remains constant. CONCLUSION: This study demonstrates the importance of considering an appropriate biological model when developing sampling methodologies for insect pests. Accounting for a heterogeneous distribution of pests leads to a considerable improvement in the detection of pests over traditional sampling models.
Resumo:
It is important to examine the nature of the relationships between roadway, environmental, and traffic factors and motor vehicle crashes, with the aim to improve the collective understanding of causal mechanisms involved in crashes and to better predict their occurrence. Statistical models of motor vehicle crashes are one path of inquiry often used to gain these initial insights. Recent efforts have focused on the estimation of negative binomial and Poisson regression models (and related deviants) due to their relatively good fit to crash data. Of course analysts constantly seek methods that offer greater consistency with the data generating mechanism (motor vehicle crashes in this case), provide better statistical fit, and provide insight into data structure that was previously unavailable. One such opportunity exists with some types of crash data, in particular crash-level data that are collected across roadway segments, intersections, etc. It is argued in this paper that some crash data possess hierarchical structure that has not routinely been exploited. This paper describes the application of binomial multilevel models of crash types using 548 motor vehicle crashes collected from 91 two-lane rural intersections in the state of Georgia. Crash prediction models are estimated for angle, rear-end, and sideswipe (both same direction and opposite direction) crashes. The contributions of the paper are the realization of hierarchical data structure and the application of a theoretically appealing and suitable analysis approach for multilevel data, yielding insights into intersection-related crashes by crash type.
Resumo:
One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic results for the fraction of data that becomes support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions.
Resumo:
At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the translation using cross-lingual document name triangulation performs very well. The evaluation shows encouraging results for our system.
Resumo:
Background: A random QTL effects model uses a function of probabilities that two alleles in the same or in different animals at a particular genomic position are identical by descent (IBD). Estimates of such IBD probabilities and therefore, modeling and estimating QTL variances, depend on marker polymorphism, strength of linkage and linkage disequilibrium of markers and QTL, and the relatedness of animals in the pedigree. The effect of relatedness of animals in a pedigree on IBD probabilities and their characteristics was examined in a simulation study. Results: The study based on nine multi-generational family structures, similar to a pedigree structure of a real dairy population, distinguished by an increased level of inbreeding from zero to 28 % across the studied population. Highest inbreeding level in the pedigree, connected with highest relatedness, was accompanied by highest IBD probabilities of two alleles at the same locus, and by lower relative variation coefficients. Profiles of correlation coefficients of IBD probabilities along the marked chromosomal segment with those at the true QTL position were steepest when the inbreeding coefficient in the pedigree was highest. Precision of estimated QTL location increased with increasing inbreeding and pedigree relatedness. A method to assess the optimum level of inbreeding for QTL detection is proposed, depending on population parameters. Conclusions: An increased overall relationship in a QTL mapping design has positive effects on precision of QTL position estimates. But the relationship of inbreeding level and the capacity for QTL detection depending on the recombination rate of QTL and adjacent informative marker is not linear. 2010 Freyer et al., licensee BioMed Central Ltd.
Resumo:
The terrorist attacks in the United States on September 11, 2001 appeared to be a harbinger of increased terrorism and violence in the 21st century, bringing terrorism and political violence to the forefront of public discussion. Questions about these events abound, and Estimating the Historical and Future Probabilities of Large Scale Terrorist Event [Clauset and Woodard (2013)] asks specifically, how rare are large scale terrorist events? and, in general, encourages discussion on the role of quantitative methods in terrorism research and policy and decision-making. Answering the primary question raises two challenges. The first is identify- ing terrorist events. The second is finding a simple yet robust model for rare events that has good explanatory and predictive capabilities. The challenges of identifying terrorist events is acknowledged and addressed by reviewing and using data from two well-known and reputable sources: the Memorial Institute for the Prevention of Terrorism-RAND database (MIPT-RAND) [Memorial Institute for the Prevention of Terrorism] and the Global Terror- ism Database (GTD) [National Consortium for the Study of Terrorism and Responses to Terrorism (START) (2012), LaFree and Dugan (2007)]. Clauset and Woodard (2013) provide a detailed discussion of the limitations of the data and the models used, in the context of the larger issues surrounding terrorism and policy.
Resumo:
In this thesis we investigate the use of quantum probability theory for ranking documents. Quantum probability theory is used to estimate the probability of relevance of a document given a user's query. We posit that quantum probability theory can lead to a better estimation of the probability of a document being relevant to a user's query than the common approach, i. e. the Probability Ranking Principle (PRP), which is based upon Kolmogorovian probability theory. Following our hypothesis, we formulate an analogy between the document retrieval scenario and a physical scenario, that of the double slit experiment. Through the analogy, we propose a novel ranking approach, the quantum probability ranking principle (qPRP). Key to our proposal is the presence of quantum interference. Mathematically, this is the statistical deviation between empirical observations and expected values predicted by the Kolmogorovian rule of additivity of probabilities of disjoint events in configurations such that of the double slit experiment. We propose an interpretation of quantum interference in the document ranking scenario, and examine how quantum interference can be effectively estimated for document retrieval. To validate our proposal and to gain more insights about approaches for document ranking, we (1) analyse PRP, qPRP and other ranking approaches, exposing the assumptions underlying their ranking criteria and formulating the conditions for the optimality of the two ranking principles, (2) empirically compare three ranking principles (i. e. PRP, interactive PRP, and qPRP) and two state-of-the-art ranking strategies in two retrieval scenarios, those of ad-hoc retrieval and diversity retrieval, (3) analytically contrast the ranking criteria of the examined approaches, exposing similarities and differences, (4) study the ranking behaviours of approaches alternative to PRP in terms of the kinematics they impose on relevant documents, i. e. by considering the extent and direction of the movements of relevant documents across the ranking recorded when comparing PRP against its alternatives. Our findings show that the effectiveness of the examined ranking approaches strongly depends upon the evaluation context. In the traditional evaluation context of ad-hoc retrieval, PRP is empirically shown to be better or comparable to alternative ranking approaches. However, when we turn to examine evaluation contexts that account for interdependent document relevance (i. e. when the relevance of a document is assessed also with respect to other retrieved documents, as it is the case in the diversity retrieval scenario) then the use of quantum probability theory and thus of qPRP is shown to improve retrieval and ranking effectiveness over the traditional PRP and alternative ranking strategies, such as Maximal Marginal Relevance, Portfolio theory, and Interactive PRP. This work represents a significant step forward regarding the use of quantum theory in information retrieval. It demonstrates in fact that the application of quantum theory to problems within information retrieval can lead to improvements both in modelling power and retrieval effectiveness, allowing the constructions of models that capture the complexity of information retrieval situations. Furthermore, the thesis opens up a number of lines for future research. These include: (1) investigating estimations and approximations of quantum interference in qPRP; (2) exploiting complex numbers for the representation of documents and queries, and; (3) applying the concepts underlying qPRP to tasks other than document ranking.